It’s not specific to my code–all code sucks, and approaching coding with the assumption that no code is perfect is an important step towards participating in the crowd-sourcing practices that are vital to platforms, like R, that are user-driven. It’s also useful for modeling. Just as no code is perfect, no model is perfect, but you have to practice both coding and modeling to obtain results. This means that your results are, similarly, not perfect. Learning to embrace coding and modeling, including the failed code and models, as informative parts of the research process is useful when interpreting the science. For me, it is one of the ways I check my own biases as a social scientist, a constant reminder that “facts” are not objective or static.

The files available on this page demonstrate my process during the “learning” stage of a research project, when still learning the structure and components of the data. The files do not contain final code executed for the projects, nor do they contain final results.

All code and analysis was executed using RStudio desktop for Mac (Intel, Ventura). Code and output files were composed with Quarto.

Please feel free to borrow and share for your own purposes!

Examples from my own research

Natural Language Processing and Latent Dirichlet Allocation (LDA)

Quantitative text analysis using text from tweets about masks posted to Twitter during March 2020

Graphs in this example

NRC Emotions Network with 10 Nodes representing trust, fear, disgust, surprise, negative, positive, anger, anticipation, joy, and sadness — NRC Emotions Network with 10 Nodes

Positive vs. Negative Sentiment Network with 4 Nodes, w/ Control for Neg-Pos and Neg-Neg

Positive vs. Negative Sentiment Network with 2 Nodes

View the example

Natural Language Processing (NLP) and Latent Dirichlet Analysis (LDA) of Tweets (Example)

Tutorials

R for Social Science Research – The Basics

Combining and Manipulating Dataframes with “dplyr”

View the Example

Combining Dataframes and Manipulating Variables with dplyr in R

Google Trends

Search trends for disability and chronic illness terms between 2017-2022 using the “GtrendsR” package

Plots in this example

Time Series Plots

Disability Search Trends Jan 2017 - Dec 2022, basic plot with no extra formatting, shows trends over time for "chronic" "disability" "illness" "spoonie" and "zebra" — Disability Search Trends Jan 2017 – Dec 2022, basic plot with no extra formatting

A line plot showing the proportion of Google search hits between January 1, 2017 and December 31, 2022, globally, for disability keywords: disability, chronic, illness, spoonie, and zebra. The proportion compares hits of each keyword versus the other four. The highest proportion of hits for most of the period are for disability, while the proportion of hits for spoonie remained below 1% for the entire period. Searches for Zebra and Chronic account account for a similar proportion of the hits (compared to each other) for the entire period, with each falling below disability and above illness and spoonie on the plot. — Google Search Trends for Terms Related to Disability and Chronic Illness, 2017-2022, Some formatting added

Histograms

Proportion of google search hits for "chronic" "disability" "illness" and "zebra" for the period between 2017-2022, by country, faceted by keyword. Basic default plot with no formatting. — Proportion of google search hits for “chronic” “disability” “illness” and “zebra” for the period between 2017-2022, by country, faceted by keyword. Basic default plot with no formatting.

Proportion of google search hits for "chronic" "disability" "illness" and "zebra" for the period between 2017-2022, by country, faceted by keyword. Some formatting added. — Proportion of google search hits for “chronic” “disability” “illness” and “zebra” for the period between 2017-2022, by country, faceted by keyword. Some formatting added.

Inappropriate Plots

Dot plot of disability key words by related query text, no formatting

Dot plot of disability key words by related query text, some formatting added

View the example

Exploring Google Search Trends Over Time and by Country Using the GtrendsR Package [R, via RStudio]

Twitter / Text Data

Collecting data from Twitter and preparing tweets for analysis with the “twitteR” package and “tidyverse,” an example using a keyword search for terms related to COVID-19.

View the example

Downloading and Preparing Tweets for Analysis with the twitteR package in R

Creating a text corpus, obtaining word frequencies, and basic data visualization for word/count with the “quanteda” and “ggplot2” packages and “tidyverse,” an example using tweets mentioning the keyword “covid”

Plots in this example

Word Clouds