Sample Code – R




A big part of quantitative and social networks research involves accepting one major thing as fact:
my code sucks, and that’s okay!

It’s not specific to my code–all code sucks, and approaching coding with the assumption that no code is perfect is an important step towards participating in the crowd-sourcing practices that are vital to platforms, like R, that are user-driven. It’s also useful for modeling. Just as no code is perfect, no model is perfect, but you have to practice both coding and modeling to obtain results. This means that your results are, similarly, not perfect. Learning to embrace coding and modeling, including the failed code and models, as informative parts of the research process is useful when interpreting the science. For me, it is one of the ways I check my own biases as a social scientist, a constant reminder that “facts” are not objective or static.

The files available on this page demonstrate my process during the “learning” stage of a research project, when still learning the structure and components of the data. The files do not contain final code executed for the projects, nor do they contain final results.

All code and analysis was executed using RStudio desktop for Mac (Intel, Ventura). Code and output files were composed with Quarto.

Please feel free to borrow and share for your own purposes!


Topic Correlation Network for a 30-Topic Structural Topic Model of Tweets about Vaccination

Examples from my own research

Natural Language Processing and Latent Dirichlet Allocation (LDA)

  1. Quantitative text analysis using text from tweets about masks posted to Twitter during March 2020
Graphs in this example
NRC Emotions Network with 10 Nodes representing trust, fear, disgust, surprise, negative, positive, anger, anticipation, joy, and sadness
NRC Emotions Network with 10 Nodes
Positive vs. Negative Sentiment Network with 4 Nodes, w/ Control for Neg-Pos and Neg-Neg
Positive vs. Negative Sentiment Network with 4 Nodes, w/ Control for Neg-Pos and Neg-Neg
Positive vs. Negative Sentiment Network with 2 Nodes
Positive vs. Negative Sentiment Network with 2 Nodes

A line plot showing the proportion of Google search hits between January 1, 2017 and December 31, 2022, globally, for disability keywords: disability, chronic, illness, spoonie, and zebra. The proportion compares hits of each keyword versus the other four. The highest proportion of hits for most of the period are for disability, while the proportion of hits for spoonie remained below 1% for the entire period. Searches for Zebra and Chronic account account for a similar proportion of the hits (compared to each other) for the entire period, with each falling below disability and above illness and spoonie on the plot.

Tutorials

R for Social Science Research – The Basics

  1. Combining and Manipulating Dataframes with “dplyr”

Google Trends

  1. Search trends for disability and chronic illness terms between 2017-2022 using the “GtrendsR” package
Plots in this example

Time Series Plots

Disability Search Trends Jan 2017 - Dec 2022, basic plot with no extra formatting, shows trends over time for "chronic" "disability" "illness" "spoonie" and "zebra"
Disability Search Trends Jan 2017 – Dec 2022, basic plot with no extra formatting
A line plot showing the proportion of Google search hits between January 1, 2017 and December 31, 2022, globally, for disability keywords: disability, chronic, illness, spoonie, and zebra. The proportion compares hits of each keyword versus the other four. The highest proportion of hits for most of the period are for disability, while the proportion of hits for spoonie remained below 1% for the entire period. Searches for Zebra and Chronic account account for a similar proportion of the hits (compared to each other) for the entire period, with each falling below disability and above illness and spoonie on the plot.
Google Search Trends for Terms Related to Disability and Chronic Illness, 2017-2022, Some formatting added
Histograms

Proportion of google search hits for "chronic" "disability" "illness" and "zebra" for the period between 2017-2022, by country, faceted by keyword. Basic default plot with no formatting.
Proportion of google search hits for “chronic” “disability” “illness” and “zebra” for the period between 2017-2022, by country, faceted by keyword. Basic default plot with no formatting.
Proportion of google search hits for "chronic" "disability" "illness" and "zebra" for the period between 2017-2022, by country, faceted by keyword. Some formatting added.
Proportion of google search hits for “chronic” “disability” “illness” and “zebra” for the period between 2017-2022, by country, faceted by keyword. Some formatting added.
Inappropriate Plots

Dot plot of disability key words by related query text, no formatting
Dot plot of disability key words by related query text, no formatting
Dot plot of disability key words by related query text, some formatting added
Dot plot of disability key words by related query text, some formatting added

Twitter / Text Data

  1. Collecting data from Twitter and preparing tweets for analysis with the “twitteR” package and “tidyverse,” an example using a keyword search for terms related to COVID-19.
  1. Creating a text corpus, obtaining word frequencies, and basic data visualization for word/count with the “quanteda” and “ggplot2” packages and “tidyverse,” an example using tweets mentioning the keyword “covid”
Plots in this example
Word Clouds
circular worldcloud with navy Arial font, the words "covid" and "rt" are large in the center, surrounded by "vaccine," "covid19," "just," and "bailout," with about 45 additional words that are too small to be legible.
A circular word cloud showing the words extracted from tweets mentioning “covid” with words sized by frequency, dark blue Arial font.
circular worldcloud with Arial font, the words "covid" and "rt" are turquoise and large in the center, surrounded by "vaccine" in light red, "covid19," "just," and "bailout" in bold black, with about 45 additional words in regular black font that are too small to be legible.
A circular word cloud showing the 50 most frequent words in a set of tweets mentioning “covid,” palette set to “ggplot2”
circular worldcloud with Arial font, the words "covid" and "rt" are orange and large in the center, surrounded by "vaccine" in blue, "covid19," "just," and "bailout" in bold red, with about 45 additional words in regular light red font that are too small to be legible.
A circular word cloud showing the 50 most frequent words in a set of tweets mentioning “covid,” RColorBrewer palette “Set1”
Bar Plots, R Base Graphics
Bar plot of the 50 most frequent words in a set of tweets mentioning "covid," x axis only shows the words "rt," "just," "time," "can," "dure," "us," "1," "tri," "dr," "silicon," "blue," and "see"
Bar plot of the 50 most frequent words in a set of tweets mentioning “covid,” x axis only shows the words “rt,” “just,” “time,” “can,” “dure,” “us,” “1,” “tri,” “dr,” “silicon,” “blue,” and “see”
Bar Plots, ggplot2
Bar plot with ggplot2, default aesthetics. X axis is labeled "word" but the word labels on the axis ticks are crowded and overlapping, making them illegible. The y axis is labeled "count" and ranges from 0 to 1000. Labels use Arial font, black. Plot background is light grey with white gridlines visible, bars are dark grey.
Bar plot with ggplot2, default aesthetics.
Barplot showing words by frequency, ggplot2, flipped coordinates so words are along the y axis and count is along the x axis, making the words legible but crowded,. y axis is labeled "word," x axis is labeled "count" and ranges from 0-1000. Plot area is very light grey with horizontal and vertical gridlines showing. The bars are dark grey. Font is Arial.
Bar plot with ggplot2, axes flipped to show [x = count] and [y = word].
Barplot showing words by frequency, ggplot2, flipped coordinates so words are along the y axis and count is along the x axis, making the words legible but crowded,. y axis is labeled "word," x axis is labeled "count" and ranges from 0-1000. Plot area is very light grey with horizontal and vertical gridlines showing. The bars are dark grey. Font is Arial. bars have labels within the plot that are black but not legible against the dark grey bars.
Barplot with ggplot2, annotations added.
Barplot showing words by frequency, ggplot2, flipped coordinates so words are along the y axis and count is along the x axis, making the words legible but crowded,. y axis is labeled "word," x axis is labeled "count" and ranges from 0-1000. Plot area is is black with no gridlines showing. The bars are dark grey. Font is Arial. There are white labels on each bar showing the accompanying text on the y axis ticks (words).
Bar plot with ggplot2, dark background with light text
Barplot showing words by frequency, ggplot2, flipped coordinates so words are along the y axis and count is along the x axis, making the words legible but crowded,. y axis is labeled "word," x axis is labeled "count" and ranges from 0-1000. Plot title reads "number of tweets containing top words from a query of tweets with the keyword COVID". Plot area is is black with no gridlines showing. The bars are dark grey. Font is Arial. There are white labels on each bar showing the accompanying text on the y axis ticks (words).
Bar plot with ggplot2, titled.
Barplot showing words by frequency, ggplot2, flipped coordinates so words are along the y axis and count is along the x axis, making the words legible but crowded,. y axis is labeled "word," x axis is labeled "count" and ranges from 0-1000. Plot title reads "number of tweets containing top words from a query of tweets with the keyword COVID". Plot area is is black with no gridlines showing. The bars are dark grey. Font is Times New Roman. There are white labels on each bar showing the accompanying text on the y axis ticks (words).
Bar plot with ggplot2, Times New Roman font family

Create a website or blog at WordPress.com