An Example Using Keywords Related to Disability and Chronic Illness
WP_Examples
Categories
title: “Exploring Google Search Trends Over Time and by Country Using the GtrendsR Package [R, via RStudio]” author: “Heather Sue M. Rosen” format: html: embed-resources: true editor: visual
Packages Required for This Example:
I will use the “tidyverse” and “gtrends” packages in R. I will use “ggplot2” for the plots in this example.
Start by loading the packages into the workspace.
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0 ✔ purrr 1.0.1
✔ tibble 3.1.8 ✔ dplyr 1.1.0
✔ tidyr 1.3.0 ✔ stringr 1.5.0
✔ readr 2.1.4 ✔ forcats 0.5.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
Now that the packages are loaded, you are ready to start collecting data.
Step 1: Querying the Google Search Hits
The “gtrends” package allows you to query search hits for up to 5 keywords. Let’s look at search hits for terms related to disability and chronic illness.
Now, say I want to make a decision about time frame. It is important to think about meaningful ways search trends may have changed using historical and cultural context. In this example, it is possible that search trends for terms related to disability and chronic illness were different prior to COVID-19 than after the pandemic’s onset. I will search “disability,” “chronic,” “illness,” and two terms used by groups in the disabled community to identify themselves as chronically ill, “spoonie” and “zebra.”
The pandemic is in the beginning of its 4th year, so maybe it makes sense to set the time frame to include the full pandemic years, 1-3 (2020-2022). To balance it out, I’ll also look at the three years prior to the declaration that COVID-19 constitutes an emergency of global concern (2017-2019).
The search is executed with the “gtrends” command. I want to store the results as an object in my R environment (use the “<-” left arrow, placing the new object on the left and the command on the right), and I need to give the object a name that will differentiate it from other objects. In this case, I call the object “goog_chron_1722” to tell me this is a search from google, dealing with chronic illness, looking at 2017-2022.
With the gtrends command, first I want to identify my keywords, then I can identify any additional filtering parameters, like time frame. I can specify “keyword = c( )” or I can shorthand “c( )”. If you do not specify the parameter, order is extremely important. in this case, you’ll want to make sure the keyword(s) are identified before all other parameters.
I can filter the results later, so to save myself from executing a query for the same keywords several times, I will set “geo” to the default (“all,” for worldwide search hits). If I wanted to limit my search to the U.S., I would specify “geo” as “US” [geo = c(“US”)]. Because I am using the default option, I do not need to specify the geo parameter.
I said I was interested in change between 2017-2022, so I want to specify the “time” parameter. There are several formats available for this, but I prefer the “Y-m-d Y-m-d” (start-finish) format. If you want to pull the trends since the beginning of time (in this case, 2004, when google trends became available), you would specify time = “all” . To ensure I capture the entire time period, I will set the start date to the first day of the year 2017, and the end date to the last day of 2022.
The “compared_breakdown” parameter is a logical (true or false) specifying whether to return additional results comparing search hits by city and subregion. I am using more than one keyword, so I can (and will) set the compared breakdown to “TRUE” (T). Even though I want to maximize my data, I also want to be efficient with data collection. I will set “low_search_volume” to “FALSE” (F) to exclude regions with negligible contributions to the search volume. Because I am not specifying country, I will have values of “NULL” for interest by region and related topics.
Step 2: Extract the Interest Over Time and Store in a New Object
The object returned is a “list” object with 7 dataframes embedded. I can extract these for analysis. To look over time, I will extract the “interest_over_time” without differentiating by country. Again, I want to store this information as an object in the environment. I will identify it using some of the information from the list object name followed by “_iot”
Now I have a new dataframe object with 360 observations of 7 variables.
Step 3: Transform the Variables to Appropriate Format for Analysis/Plotting
The variables are not in an easily analyzable format, so I need to make them more interpretable. To do this, I want to change the “hits” variable to “numeric” from “chr” (character). This will ensure the hits are plotted as a continuous measure instead of a discrete one (we want hits to represent quantity, not category). I also want the “date” variable to be in “date” format instead of “POSIXct”.
There are many ways you can transform variables in R. I will use a tidy method and the “lubridate” package. I can call the lubridate package using “lubridate::” , and since the package should already be attached through the tidyverse, I will call the package manually to ensure R uses the desired one instead of the default method. I want to preserve the original data in case the transformations do not produce the desired result, so I will use the “mutate” function (dplyr package) and the pipe operator (%>%) to add two new variables representing the numeric hits and the transformed date variables.
The character “hits” variable includes one measure that will produce “NA” for “missing” if it is not transformed. This means we will transform the variable, but we will also transform one value within the old variable when we make the new variable. The “>1” value must be transformed to something that can be interpreted as a number. To keep things simple, I will avoid transforming to a negative value and instead select an arbitrary decimal between 0-1 to represent all cases with fewer than 1 hits for a keyword. To change a single value within an existing variable without adding a new variable, we will use “mutate_at” instead of “mutate” before we add the new columns to the dataframe.
Here, instead of creating a new object, I want to store the information in my existing iot dataframe as two new columns. To accomplish this, I will specify the existing dataframe object on both the left and right of the “<-” insert arrow.
Now we can start to plot the information from the iot dataframe.
Step 4: Plotting Interest Over Time (iot)
A basic line plot
Starting with a basic line plot, I want to call the “ggplot” command and identify the important aesthetic attributes of the plot. I need to specify the variables constituting “X” and “Y”. I also need to be able to differentiate the lines by keyword, so I will set the “color” (“colour”) aesthetic to equal the “keyword” variable. This will tell ggplot2 to use the default palette to assign one color to each keyword. It will also produce a legend unless I specify to suppress it. I want the time to span the horizontal axis, so I set x to equal the new date variable “d_date”. I set x to equal the numeric hits “n_hits” variable.
If I do not add further commands, ggplot will not produce a visualization (it does not know what to do with the aesthetics…make a bar? line? pie? we need to tell it what to do). After we identify the base aesthetics for the plot, we need to add layers using the “+” symbol (instead of the %>% operator). I want the plot to include points, connected by lines, so I add two layers. First I add the points with “+” and “geom_point()”. I want this to be a layer that inherits the base aesthetics, so I do not want to include anything inside of the parentheses after “geom_point”. Next I want to connect the points, so I add a layer of lines using “+” followed by “geom_line()”, again leaving the parentheses empty to ensure the aesthetics are inherited.
I want to include one final layer specifying titles and labels for the axes using “+” followed by “labs”. With “labs,” I can specify the main title, subtitle, x axis label, and y axis label. Below, I specify that I want to title my plot “Disability Search Trends, Jan 2017 – Dec 2022,” with a subtitle identifying myself as the report’s author. The x axis represents the “date” variable, so I will label it as “Date.” The y axis represents the proportion of search hits (out of 100), so I label it “Proportion of Global Search Hits.”
I can also include information crediting the gtrendsR package author (Massicotte 2022). Here I do this by adding a caption with the citation. The caption default placement is right-justified just above the bottom outer margin of the plot.
It is good practice to include alt-text in any images. You can usually add this retroactively through a web platform when uploading to a post or in word processing programs. You can also ensure the alt-text is generated with the plot image by specifying “alt-text = paste(” “)”, which will “paste” the text specified in quotes as alt-text. This will help people using screen readers, who may not be able to see the image of the plot. Be sure to include the important information, and be specific! I try to think about what I would want to know about an image when thinking about alt-text. For this example, I give some basic info, stating that the image shows a line plot representing the proportion of Google search hits over time for keywords, then I provide the key words and the timeframe examined. I finish the alt-text out with a quick note about how to interpret the proportion. You may want to go back and include additional information as alt-text after viewing the plot. For now, we haven’t looked at any results, so we can’t say much more without generating the plot.
gchr1722_iot %>%
ggplot(aes(x = d_date,
y = n_hits,
color = keyword)) +
geom_point() +
geom_line() +
labs(title = "Disability Search Trends, Jan 2017 - Dec 2022",
subtitle = "Google Trends Report by Heather Sue M. Rosen (Generated 02/24/2022)",
caption = "Search performed with the gtrendsR package (Massicotte 2022)",
x = "Date",
y = "Proportion of Global Search Hits",
alt = paste("A line plot showing the proportion of Google search hits between January 1, 2017 and December 31, 2022, globally, for disability keywords: disability, chronic, illness, spoonie, and zebra. The proportion compares hits of each keyword versus the other four."))
Interpreting the plot
Before we mess around with fonts and colors for the plot, what can this plot tell us? Let’s take a look.
The first thing to notice is that the line/dots representing search hits for the term “spoonie,” shown by the blue line/dots at the bottom of the plot, account for an extremely low proportion of search hits in this set of keywords (less than 1% for the entire period). This does not mean people did not search for the term “spoonie,” but it does tell us that a vast majority of searches in this set of key terms do NOT refer to “spoonie.” Importantly, spoonie was one of two terms coined from within the disabled and chronically ill community, the other being “zebra.” This also shows that searches for “zebra” were much more common than searches for “spoonie,” which could be an indication that “zebra” is the more widely used of the two terms. It could also be an indication that people in the community were less familiar with the term “zebra” in 2017 than they were with the word “spoonie.”
Next, we can see that there is a lot of month-to-month fluctuation in search hits for the other terms (chronic, illness, disability, and zebra). there appears to be a net increase in the proportion of search hits for “disability,” “chronic,” and “illness,” while search hits for “zebra” look like a slight net decrease by the end of 2022. This gives merit to the latter explanation about the differences between “spoonie” and “zebra” being about familiarity with rather than preference for one or the other term.
Disability accounts for the largest proportion of search hits for almost the entire period. It dips to be almost equal with the proportion of hits for “zebra” towards the end of 2018, though it does remain slightly higher than “zebra” at this point. This leads me to think there may have been some study released towards the end of 2018 about rare disease (“zebra” is usually used to refer to the rare disease community and/or the Ehlers-Danlos community), which would also explain the low proportion of hits for “spoonie” comparatively for the same period. It could also be mere coincidence, so this is usually something you would mention in a limitations section of a paper with a suggestion for future research to address the limitation. Here, if I were writing these results for publication, I might say something like:
“Future research on the chronic illness community should investigate the importance of the words”spoonie” and “zebra,” referring to historical record about rare disease research to see if there were notable findings released prior to this time period that may explain the low proportion of hits for “spoonie” paired with a high proportion of hits for “zebra” between 2017-2022.”
There is one really interesting and unexpected finding here that may tell us something about issues of chronic illness and disability related to the COVID-19 pandemic. Searches for “chronic” overtake searches for “disability” briefly in the beginning of 2022. This is about the same time it became apparent that people could get infected several times a year, even if they had been vaccinated against the disease. We also see brief peaks in search hits for both “disability” and “chronic” immediately following the brief heightened popularity of searches for “chronic” compared to “disability.” I would be interested to know if this is picking up on a growing concern about the longterm effects of COVID-19 on health beginning in year 3 of the pandemic. A task for another day.
Let’s look at some ways to play with formatting on the plot.
A fancier line plot
For the next plot, I start with the same identification of the base aesthetics used for the basic plot, then adding a layer of “points” with “geom_point”.
The “geom_line” command gives us a good idea of the month-to-month change, but what if I want to see within month fuctuation? I can use “geom_step” to capture some of this.
Next I add a new layer before the labels. Say I want to specify my own color scheme for the lines-I can do this with “scale_color_manual,” first identifying the values (the color names), then identifying the “breaks” associated with each color. In this case, the “breaks” are the keywords you want to link to each color. I change the colors to ones that will be easier to see against a dark background (which I will change in a later step). I select five colors, one for each keyword: “plum2,” “lightgoldenrod2,” “mediumspringgreen,” “cyan,” and “firebrick.” If I want the lines and points for “disability” to be “plum2,” I list “disability” as the first keyword and “plum2” as the first color. Same goes for the other colors and keywords–you want the order to match.
Next I add my labels. I don’t need to change any of my code here from the previous plot. However, remember we didn’t have results when we wrote the alt-text. Now is a good time to add some info to the alt text reflecting the results depicted on the plot. Here I will add some comments about the terms accounting for the highest versus lowest proportions and whether/how they changed over time.
I will end with another layer changing the overall theme of the plot. Remember, I changed the colors of the lines to be more visible against a dark background–now I attach the “dark” theme as a new layer by specifying “+” followed by “theme_dark”. This will change the background of the plot area and the legend to be dark grey, keeping the margins white and the text black. The “theme_” functions also include options to alter the theme further by changing things like text size and font family.
Note, in R, “font” refers to whether the text should be italicized or bolded, while “family” refers to the shape of the letters. The “family” in R is what we refer to as “font” in word processing programs. So, say I want to change the letters to look like the “Times New Roman” ‘font’ – in R, I would need to specify family =____. If I want to bold or italicize something, I can specify font=____. The other caveat to changing the font family in R deals with calling the family. If you do not attach special packages with additional fonts (ex: “extrafont” and “extrafontdb”), you call “Times New Roman” by identifying the family as “serif” . Here I change the family in the theme to “serif” because many publications will require submissions be formatted using Times New Roman. If your goal is accessibility/readability and not academic publishing, a “sans” family (the default, shown in our basic plot above) is a better option. You may also want to consider a larger base size for the text than I specify here. The default is 11pt font, but I want my text to be about 12 pt font with the main title slightly larger and the caption slightly smaller than the rest of the text, so I specify “base_size” to equal 14. The last thing I do is set the line thickness using “base_line_size” and “base_rect_size.” I set both to 14/20.
Now I am ready to execute the command to generate the plot.
gchr1722_iot %>%
ggplot(aes(x = d_date,
y = n_hits,
color = keyword)) +
geom_point() +
geom_step() +
scale_color_manual(values = c("plum2",
"lightgoldenrod2",
"mediumspringgreen",
"cyan",
"firebrick1"),
breaks = c("disability",
"chronic",
"illness",
"spoonie",
"zebra")) +
labs(title = "Disability Search Trends, Jan 2017 - Dec 2022",
subtitle = "Google Trends Report by Heather Sue M. Rosen (Generated 02/24/2022)",
caption = "Search performed with the gtrendsR package (Massicotte 2022)",
x = "Date",
y = "Proportion of Hits in the United States",
alt = paste("A line plot showing the proportion of Google search hits between January 1, 2017 and December 31, 2022, globally, for disability keywords: disability, chronic, illness, spoonie, and zebra. The proportion compares hits of each keyword versus the other four. The highest proportion of hits for most of the period are for disability, while the proportion of hits for spoonie remained below 1% for the entire period. Searches for Zebra and Chronic account account for a similar proportion of the hits (compared to each other) for the entire period, with each falling below disability and above illness and spoonie on the plot.")) +
theme_dark(
base_size = 14,
base_family = "serif",
base_line_size = 14/20,
base_rect_size = 14/20
)
Now I am ready to move on to analyze and plot some of the other objects contained in the original list of 7. In addition to interest over time, we also have a dataframe for “interest by country” (ibc), “interest by region” (ibr), “interest by DMA” (dma), “interest by city” (ibcity), “related topics” (rt), and “related queries” (rq). I will only touch on the ibc and rq in this example.
Step 5: Extract the Interest by Country (ibc)
I need to extract the interest by country dataframe using the same process I used to extract the interest over time for the previous plots.
Now I have a new dataframe object containing 1250 obervations of 5 variables.
The output for ibc is not as neat as the iot data. So, while the “hits” variable is similarly designated as “chr” (character) instead of “num” (numeric) in the ibc, differently than the iot, the ibc hits include the percent symbol (%) instead of simple digits (i.e., 49% instead of 49). This will make the transformation a little more complicated, but it is still possible. Let’s try it!
Step 6: Transform the Hits Variable to the Correct Format
Instead of having a “<1” designation for fewer than 1% of the total hits like the iot hits variable, the ibc hits variable leaves these instances blank (” “). I want to be sure to recode the empty spaces to equal something meaningful. I set them to zero. I want to filter out these cases using the”filter” function (dplyr) on my new variable to select cases where hits are >0.
I will add a “hits2” and a “hits3” variable this time, instead of transforming within the original “hits” variable. This is because I will be transforming the hits variable to a factor instead of a number.
I’ll also give an example of how to re-level a factor variable. We don’t have a date variable in the ibc, so we will not create a line plot showing change over time.
Once I have the new factor variable available, I need to look at the levels, how many and what they include. I use the levels function to find the levels first, then I use it to change the levels. I can look at the information in the dropdown for the object in the environment to find the number of levels–in this case, we have a factor with 46 levels. This tells us that we need to look at the levels and order them. We know we do not have 100 levels, so we need to see which numbers from 1-100 to include.
[1] "10%" "11%" "12%" "13%" "14%" "15%" "16%" "17%" "18%" "19%" "20%" "21%"
[13] "22%" "23%" "24%" "25%" "26%" "28%" "29%" "3%" "30%" "32%" "33%" "35%"
[25] "36%" "37%" "39%" "4%" "40%" "43%" "44%" "49%" "5%" "53%" "62%" "66%"
[37] "68%" "7%" "72%" "74%" "76%" "8%" "81%" "9%"
Notice that the percentages are not in order from lowest to highest. They are random, so we need to order them before we plot. Here we are transforming from a nominal to an ordinal factor variable. I am going to write out the percentages in order from lowest to highest, using the “<-” to place the ordering into the hits 3 variable while calling the “levels” command. Instead of pulling the existing levels like we did with our first use of the command, this changes the levels.
levels(gchr1722_ibc$hits3) <- c("3%",
"4%",
"5%",
"7%",
"8%",
"9%",
"10%",
"11%",
"12%",
"13%",
"14%",
"15%",
"16%",
"17%",
"18%",
"19%",
"20%",
"21%",
"22%",
"23%",
"24%",
"25%",
"26%",
"28%",
"29%",
"30%",
"32%",
"33%",
"35%",
"36%",
"37%",
"39%",
"40%",
"41%",
"43%",
"44%",
"49%",
"53%",
"62%",
"66%",
"68%",
"72%",
"73%",
"74%",
"76%",
"81%")
Now my hits3 levels should be ordered from 3-81. I should check, so I use the same levels command I used the first time, checking the changed variable.
[1] "3%" "4%" "5%" "7%" "8%" "9%" "10%" "11%" "12%" "13%" "14%" "15%"
[13] "16%" "17%" "18%" "19%" "20%" "21%" "22%" "23%" "24%" "25%" "26%" "28%"
[25] "29%" "30%" "32%" "33%" "35%" "36%" "37%" "39%" "40%" "41%" "43%" "44%"
[37] "49%" "53%" "62%" "66%" "68%" "72%" "73%" "74%" "76%" "81%"
It worked, so I can move on to plotting.
Step 7: Plot the Interest by Country
A Basic Bar Chart
I want to create a bar chart this time, setting “hits3” as the “Y” and “location” as the “X”. I will still set the color to change by keyword, but I need to use the “fill” command because I want it to “fill” the bars with color, not specify the outline color of the bars. I add a layer after the base layer with the “+” followed by “geom_col( )” . This time, before I add the labels and titles with “labs,” I want to separate the barchart out by keyword to display five separate barcharts instead of one barchart with several bars per country. I do this by adding “+” and a “facet_wrap” layer. In this layer, I specify the variable to base the faceting on (keyword), then I specify the number of rows and columns to have 3 rows and 2 columns using “nrow” and “ncol”. I also use the “shrink” logical set to “T” (True) to help with sizing. I expect that there will need to be further adjustments to the size parameters, so let’s see what the initial plot looks like:
gchr1722_ibc %>%
ggplot(aes(x = location,
y = hits3,
fill = keyword)) +
geom_col() +
facet_wrap(vars(keyword),
nrow = 3,
ncol = 2,
shrink = T) +
labs(title = "Disability Search Trends, Jan 2017 - Dec 2022",
subtitle = "Google Trends Report by Heather Sue M. Rosen (Generated 02/24/2022)",
caption = "Search performed with the gtrendsR package (Massicotte 2022)",
x = "Country",
y = "Proportion of Google Search Hits",
alt = paste("A barchart showing proportion of Google search hits for keywords relating to chronic illness and disability (disability, chronic, illness, spoonie, zebra), by country and separated by keyword, for the entire period between January 1, 2017 and December 31, 2022"))
Interpreting the Bar Chart
Not too bad! A few important things to notice immediately. First, we only have 4 facets-remember, we had 5 key words. Where is the last facet? Also remember we got rid of cases in the interest by country that accounted for 0% of the search hits in the data-the absence of the “spoonie” facet paired with our prior knowledge that “spoonie” searches accounted for less than 1% of search hits for the entire time frame in our exploration of the interest over time would suggest that “spoonie” was part of the excluded cases. Now, only having 4 facets means this sizing is not horrible, but it isn’t interpretable enough yet. We need to effectively “zoom out” to give the axes more space so we can see their tick labels.
A Fancier Bar Chart
In the updated chart, I make changes to the previous plot’s code starting with the “facet_wrap” parameters. I only include the variable specification for keyword and the logical setting shrink to true. Only having four facets should alleviate the need for nrow and ncol. I will leave the labs alone, then I add the dark theme-this time I ONLY include base size and family inside the parentheses after “theme_dark( )”, this time setting base size smaller, to 10 instead of 14. Next, I add another layer with “+” followed by “theme( )” and include the remaining information about axis text size here using the “axis.text.x( )” and “axis_text_y( )” parameters. In the axis text commands, I reduce the x axis text to 5 pt size, and I rotate the tick labels 45 degrees to help with the overlap, and I set the y axis text to an even smaller 3 pt size.
gchr1722_ibc %>%
ggplot(aes(x = location,
y = hits3,
fill = keyword)) +
geom_col() +
facet_wrap(vars(keyword),
shrink = T) +
labs(title = "Disability Search Trends, Jan 2017 - Dec 2022",
subtitle = "Google Trends Report by Heather Sue M. Rosen (Generated 02/24/2022)",
caption = "Search performed with the gtrendsR package (Massicotte 2022)",
x = "Country",
y = "Proportion of Google Search Hits",
alt = paste("A barchart showing proportion of Google search hits for keywords relating to chronic illness and disability (disability, chronic, illness, spoonie, zebra), by country and separated by keyword, for the entire period between January 1, 2017 and December 31, 2022")) +
theme_dark(base_size = 10,
base_family = "serif") +
theme(axis.text.x = element_text(angle = 45,
size = 5),
axis.text.y = element_text(size = 3),
plot.caption = (element_text(size = 8)))
Now we have a legible bar chart. We can definitely adjust the parameters further here, but for now, we will end with the example of how to adjust them and move to the last step of this analysis and look at the “related queries” returned with our data.
Step 8: Extract the Related Queries Object
Using the same process we used to extract the iot and ibc, we will extract the related queries (rq) from the original list object, storing the rq as a dataframe object.
Now we have a third dataframe consisting of 241 observations and 5 variables.
Step 9: Plot the Related Queries (RQ)
We can get a count plot using ggplot2 showing overlap between keywords and related queries using geom_count. The terms in the related queries are stored as the “value” variable.
gchr1722_rq %>%
ggplot(aes(x = keyword, y = value, color = keyword)) +
geom_count() +
labs(title = "Disability Search Trends, Jan 2017 - Dec 2022",
subtitle = "Google Trends Report by Heather Sue M. Rosen (Generated 02/24/2022)",
caption = "Search performed with the gtrendsR package (Massicotte 2022)",
x = "Related Query Text",
y = "Keyword",
nudge_x = -2,
alt = paste("A count plot showing queries executed between January 1, 2017 and December 31, 2022 that were related to the keywords about chronic illness and disability (disability, chronic, illness, spoonie, zebra)"))
We can adjust the sizing on this plot to minimize word overlap, just as we did previously.
gchr1722_rq %>%
ggplot(aes(x = value, y = keyword, color = keyword)) +
geom_count() +
labs(title = "Disability Search Trends, Jan 2017 - Dec 2022",
subtitle = "Google Trends Report by Heather Sue M. Rosen (Generated 02/24/2022)",
caption = "Search performed with the gtrendsR package (Massicotte 2022)",
x = "Related Query Text",
y = "Keyword",
nudge_x = -2,
alt = paste("A count plot showing queries executed between January 1, 2017 and December 31, 2022 that were related to the keywords about chronic illness and disability (disability, chronic, illness, spoonie, zebra)")) +
theme_dark(base_size = 6,
base_family = "serif") +
theme(axis.text.x = element_text(angle = 90,
size = 3),
axis.text.y = element_text(size = 5),
plot.caption = (element_text(size = 4)),
plot.margin = (unit(c(0.1, 0.1, 0.1, 0.1), "cm")),
plot.background = element_rect(size = 1))
Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.
Let’s try one more technique to make the plot easier to read.
Step 10: Split the Dataframe into Five Separate DF’s, One per Keyword
We can also split up the dataframe and plot separately for each key word using the “filter” command (dplyr). If we want to preserve space, we can limit the new object to the two variables used here, “value” and “keyword” with the “select” command (dplyr). With select, the resulting object variables are ordered to match the input (first entry = column 1, etc.).
disabilityDF <- gchr1722_rq %>%
select(keyword, value) %>%
filter(keyword=="disability")
chronicDF <- gchr1722_rq %>%
select(keyword, value) %>%
filter(keyword=="chronic")
illnessDF <- gchr1722_rq %>%
select(keyword, value) %>%
filter(keyword=="illness")
spoonieDF <- gchr1722_rq %>%
select(keyword, value) %>%
filter(keyword=="spoonie")
zebraDF <- gchr1722_rq %>%
select(keyword, value) %>%
filter(keyword=="zebra")
Now I have 5 dataframes with two variables each.
Step 11: Visualize the Related Queries for each Keyword (5 total)
Let’s list them one at a time to finish this example. First I will demonstrate what this would look like if we wanted to do a plot like the count ones above. We don’t need to change the colors for this plots since it only shows one keyword. I’ll use “disability” as the example dataframe for the plot.
disabilityDF %>%
ggplot(aes(x = value, y = keyword)) +
geom_count() +
labs(title = "Queries Related to Disability, Jan 2017 - Dec 2022",
subtitle = "Google Trends Report by Heather Sue M. Rosen (Generated 02/24/2022)",
caption = "Search performed with the gtrendsR package (Massicotte 2022)",
x = "Value",
y = "Keyword",
nudge_x = -2,
alt = paste("A count plot showing queries executed between January 1, 2017 and December 31, 2022 that were related to the term 'disability'")) +
theme_dark(base_size = 6,
base_family = "serif") +
theme(axis.text.x = element_text(angle = 90,
size = 3),
axis.text.y = element_text(size = 5),
plot.caption = (element_text(size = 4)),
plot.margin = (unit(c(0.1, 0.1, 0.1, 0.1), "cm")),
plot.background = element_rect(size = 1))
Notice this is much easier to read than our plot containing information for all five keywords. Is this the best way to visualize these data now that they are separated by keyword, though?
The type of visualization should be meaningful, and in this case, a list or a table are most appropriate, even if other visualizations like the count plot are technically possible. The easiest way to accomplish this is by printing the raw vector containing the related queries for each keyword.
[1] "social security"
[2] "social security disability"
[3] "va disability"
[4] "what is disability"
[5] "disability benefits"
[6] "disability insurance"
[7] "short term disability"
[8] "disability services"
[9] "learning disability"
[10] "state disability"
[11] "ssi disability"
[12] "ssi"
[13] "intellectual"
[14] "what is a disability"
[15] "intellectual disability"
[16] "apply for disability"
[17] "long term disability"
[18] "disability act"
[19] "disabilities"
[20] "disability allowance"
[21] "how to get disability"
[22] "disability meaning"
[23] "mental disability"
[24] "disability pension"
[25] "disability california"
[26] "2022 va disability rates"
[27] "va disability rates 2023"
[28] "unique disability id"
[29] "va disability pay chart 2022"
[30] "ssi disability stimulus check"
[31] "va disability rates 2021"
[32] "2023 va disability pay chart"
[33] "va disability rates 2019"
[34] "va disability rates 2018"
[35] "va disability rates 2020"
[36] "hidden disability lanyard"
[37] "disability pride month"
[38] "va disability pay chart 2019"
[39] "kadeena cox disability"
[40] "disability certificate download"
[41] "disability lawyers near me"
[42] "disability meaning in marathi"
[43] "disability lawyer near me"
[44] "disability attorney near me"
[45] "social security disability benefits pay chart"
[46] "disability meaning in hindi"
[47] "disability office near me"
[48] "intellectual disability meaning"
[49] "locomotor disability meaning in hindi"
[50] "disability confident"
Now I have a neat list of the queries related to “dissability”. I can do the same for the other four keywords. In some instances, the output will show in two-three columns.
[1] "chronic disease"
[2] "chronic pain"
[3] "the chronic"
[4] "what is chronic"
[5] "chronic fatigue"
[6] "fatigue"
[7] "acute"
[8] "chronic kidney disease"
[9] "kidney disease"
[10] "chronic fatigue syndrome"
[11] "chronic meaning"
[12] "chronic illness"
[13] "chronic cough"
[14] "bronchitis"
[15] "chronic bronchitis"
[16] "chronic infection"
[17] "chronic back pain"
[18] "acute and chronic"
[19] "chronic definition"
[20] "chronic leukemia"
[21] "leukemia"
[22] "chronic diseases"
[23] "chronic stress"
[24] "chronic sinusitis"
[25] "chronic inflammation"
[26] "sci hub"
[27] "scihub"
[28] "chronic respiratory failure icd 10"
[29] "chronic ethanol abuse"
[30] "chronic pancreatitis icd 10"
[31] "chronic constipation icd 10"
[32] "chronic meaning in tamil"
[33] "chronic liver disease icd 10"
[34] "chronic spontaneous urticaria"
[35] "chronic back pain icd 10"
[36] "icd 10 chronic pain"
[37] "chronic diarrhea icd 10"
[38] "chronic meaning in hindi"
[39] "chronic kidney disease icd 10"
[40] "chronic disease meaning"
[41] "chronic sinusitis icd 10"
[42] "cad icd 10"
[43] "chronic lymphocytic leukemia icd 10"
[44] "chronic pain syndrome icd 10"
[45] "icd 10 code for chronic pain"
[46] "chronic bronchitis icd 10"
[47] "icd 10 code for chronic back pain"
[48] "icd 10 code for chronic kidney disease"
[49] "chronic illness meaning"
[50] "chronic renal failure icd 10"
[1] "mental" "mental illness"
[3] "what is illness" "critical illness"
[5] "mental health" "chronic illness"
[7] "illness meaning" "disease"
[9] "illness insurance" "what is mental illness"
[11] "illness definition" "foodborne illness"
[13] "critical illness insurance" "mental disorder"
[15] "depression" "anxiety"
[17] "terminal illness" "me illness"
[19] "common illness" "depression mental illness"
[21] "what is a mental illness" "illnesses"
[23] "ill" "illness benefit"
[25] "physical illness" "schools closing due to illness"
[27] "bruce willis" "bruce willis illness"
[29] "vaping illness" "chris kamara illness"
[31] "vaping illness symptoms" "gary rhodes illness"
[33] "enhanced illness benefit" "elijah cummings illness"
[35] "paul sinha illness" "mental illness adalah"
[37] "kris aquino illness" "canada neurological illness"
[39] "jolene blalock illness" "janice long illness"
[41] "apa itu mental illness" "jordan peterson illness"
[43] "south sudan mysterious illness" "pat sajak illness"
[45] "breast implant illness" "rush limbaugh illness"
[47] "selma blair illness" "peter kay illness"
[49] "mental illness artinya" "anthony rumble johnson illness"
[1] "what is spoonie" "spoonie meaning"
[3] "what is a spoonie" "spoonie gee"
[5] "spoonie society" "spoonie illness"
[7] "spoonie theory" "spoon"
[9] "spoonie chronic illness" "what does spoonie mean"
[11] "chronic illness" "spoonie love"
[13] "spoonie definition" "spoonie luv"
[15] "the spoonie society" "spoonie life"
[17] "spoonie duck" "spoony"
[19] "spooning" "spoon theory"
[21] "spoonies" "whats a spoonie"
[23] "spoonie define" "pots"
[25] "spoons" "spoonie society"
[27] "the spoonie society" "spoonie essentials box"
[29] "coffee spoonie" "spoonie witch"
[31] "ehlers danlos syndrome" "chiari malformation"
[33] "pots disease" "spoonie essential box"
[35] "spoonie meaning" "spoonie life"
[37] "spoonie illness" "whats a spoonie"
[39] "spoons" "spoonie chronic illness"
[41] "chronic illness"
[1] "the zebra" "yeezy"
[3] "zebra yeezy" "zebra printer"
[5] "zebra print" "pink zebra"
[7] "zebra crossing" "zebra animal"
[9] "zebra fish" "zebra perde"
[11] "zebra yeezy 350" "yeezy 350"
[13] "horse" "giraffe"
[15] "lion" "zebra giraffe"
[17] "green zebra" "red zebra"
[19] "zebra horse" "baby zebra"
[21] "zebra mussels" "zebra finch"
[23] "blue zebra" "la zebra"
[25] "tiger" "yeezy 350"
[27] "yeezy" "zebra yeezy"
[29] "yeezy 350 v2 zebra" "yeezy 350 v2"
[31] "zebra yeezy 350" "yeezy v2 zebra"
[33] "yeezy boost 350 v2" "yeezy boost 350 v2 zebra"
[35] "yeezy boost zebra" "yeezy boost 350"
[37] "adidas yeezy" "yeezy boost 350 zebra"
[39] "adidas yeezy zebra" "yeezys"
[41] "yeezys zebra" "zebra zd220"
[43] "adidas yeezy boost 350 v2" "adidas yeezy boost 350 v2 zebra"
[45] "zebra zd420" "zebra tc21"
[47] "zebra tc56" "zebra ds2278"
[49] "zebra bets" "zebra orange cameroun"
A quick note about the related queries for “zebra” and our other findings for search hits by keyword. Notice the related queries for “zebra” mostly relate to things other than disability and chronic illness, and there are several related queries containing the word “yeezy”. The peak in search hits for “zebra” observed in the timeplots may have been related to the release of a new edition of the “yeezy” collection by Adidas.