Tracking our happiness throughout 2020, using Twitter
2020 was quite the year. In this post, I’ll be using data from U.S. Twitter Trends to look closer at those ups and downs.
Below are some of the key explorations of this analysis:
- What can Twitter tell us about our mental health in 2020?
- Were there months or days that were particularly positive or negative?
- Did certain events have a significant effect on Twitter sentiment?
- How was your happiness during 2020?
To start off, we can take a look at how popular the search terms “happiness” and “health” have been on Google throughout the year. As expected, in late March, aka peak quarantine, we do see a huge rise for health. Perhaps people wanted to take charge of their health with their newfound time, whether it be through working out or meditating, or it could also be taken as a signal of a decline in health, seeking out advice via Google. With happiness, there is a significant decline in search hits around June and then some upticks in September and December. While this chart simply shows how popular the searches are, we can turn to Twitter trends to get a better sense of what was going on around these times.
Twitter Trend Sentiment Analysis
I scraped daily Twitter trends for 2020 and used an R package, Syuzhet, to calculate a sentiment value for each trend and then a total sentiment score for each day. I used the AFINN sentiment lexicon, which calculates a sentiment value of -5 to +5 based on the text inputted. According to the Syuzhet documentation, summing sentiment values can be “a measure of the overall emotional valence of the text.” Therefore, my daily sentiment score was calculated as the sum of the sentiment values for trends on that day.
As you can see, there is quite a bit of day to day variation. I’ve highlighted some key events of 2020 that may have affected Twitter trends significantly, and thus the sentiment of that day and subsequent days. Immediately, we see a decline in sentiment score after Kobe Bryant’s death, around when shutdowns begin, and around the George Floyd protests, but other events are more difficult to visually discern a change in sentiment. There also does appear to be a general pattern of Twitter sentiment decreasing around March, increasing until mid-April, dropping around June, and then increasing slightly for the rest of the year.
Presented above, we see the limitations of using Twitter trends as a proxy for how people feel. We see that “trump” is the single most common words in Twitter trends by far, signaling how important Twitter may be in the political realm, but otherwise, the most frequent trends tend to be neutral. Hashtags and trends are meant to be to the point, and as a result, people may not necessarily turn sentences that are more emotional or personal into a hashtag. Additionally, Twitter is a very popular platform for keeping up with sports. As seen above, football is the second most mentioned word in Twitter trends, and basketball and wrestling are also seen in the wordcloud. The most frequent trends on Twitter do not necessarily convey much emotion. However, this does not rule out the usefulness of Twitter trends. The Syuzhet package considers categories such as sports, names or locations as having a sentiment as 0. As a result, this allows trends that may not be as frequent but do carry stronger sentiment to play a larger role in this analysis.
Month by Month
To get more detail, I wanted to look at the Twitter sentiment on a monthly basis. Were there months that were far happier than others? Sadder?
Right off the bat, we see that 2020 started and ended on a good note — January, February, November, and December all among the highest sentiment scores. On the other hand, March and May were considerably less enjoyable. March brought us the onset of the pandemic, which understandably is an extremely negative topic. In fact, some Twitter trends with the most negative sentiment values during March were “Trump lies, Americans die,” “Corona crisis,” and “odd panic buy items.”
However, May was even more negative than March, having a sentiment score close to 0. This means that there were almost more Twitter trends with negative sentiment values than positive. It was objectively the most negative month of 2020, passing the onset of the pandemic by far. I thought it would be interesting to look further into May and see what may have caused this negativity, and juxtapose it with December, the happiest month, according to the above.
With May, we see that sentiment actually hovers around similar values as other months until a huge drop off right around the last week of May. Unsurprisingly, this was also around the time when George Floyd was killed and Black Lives Matter protests began to start. While there is little mention of these trends in the May word cloud, some of the most negative hashtags were “What is manslaughter,” “Breanna Taylor killed by police,” “Joe Biden is a racist,” “protests 2020,” “philadelphia riots,” and “kawasaki disease.” While these trends were not as frequent, resulting in them not appearing in the word cloud, they were poignant and effective enough in bringing the sentiment score of May down to zero, outweighing any positivity that may have come from Mother’s Day or Memorial Day. It is also important to note that this may also be downplaying the extent of how sad, angry, and frustrated those on Twitter may have felt. According to the AFINN sentiment lexicon, a name such as “George Floyd” would have registered as having a sentiment value of 0 and phrases such as “Justice for _______” actually received a positive sentiment of value of 1, despite that not being the true emotion associated with the Twitter trend.
On the other hand, December almost always had positive sentiment scores every day, building up to an extremely joyous Christmas and New Year’s Eve. Interestingly, we do see a drop off on December 26th, perhaps lingering sadness that Christmas is over but the happiness does pick up in time for the New Year. Some trends with particularly high sentiment scores in December were “Happy birthday Taylor swift,” “The voice winner 2020,” “Happy Hanukkah,” “Happy holidays,” “Christmas,” and of course, “happy new year.” There were still trends with negative scores: “How bad is your Spotify,” “Trump treason,” and “nashville bomb suspect.” The first one was actually a fun A.I. project that looked at your Spotify listening history and judged your music taste with a score. It really was not negative, despite its assigned -4 sentiment value. Again, this highlights the limitations of using a predetermined sentiment lexicon that isn’t updated with recent terminology, events, or innovations.
Day by Day
Next, I wanted to get as granular as possible and look at Twitter sentiment on a day by day level. A few questions I had going in were: which days were the happiest of 2020? The saddest of 2020? Were there notable events where sentiment in the following days were especially more positive or negative?
February 27th was apparently the happiest day of the year, topping even New Year’s Eve. The cause of such happiness? A drop from popular lifestyle brand Supreme, a WWE showdown, and the release of the film, My Hero Academia: Heroes Rising. These trended multiple times on Twitter that day and while these may not be super noteworthy events, what made February 27th the happiest day wasn’t actually an excess of positivity, but rather a lack of negativity. There were only two trends with negative sentiment scores that day.
On the other hand, the saddest day of the year occurred on May 30th, a day when intense protests and riots erupted in over 30 cities across the U.S., calling for justice for the death of George Floyd. A significant number of the trends, already mentioned above, occurred on this day, in addition to “George Floyd protests,” “Portland protest,” “riots in Atlanta,” and “Bloody Sunday.”
I chose to hone in on this event, opting to observe the sentiment in the two weeks before and after the death of George Floyd. As seen to the left, the average daily sentiment in the two weeks after Floyd’s death was -35, compared to an average of -8 in the two weeks prior. An interesting extension of this analysis would be to separate the sentiment values into two groups: those before and after George Floyd’s death and conduct a statistical t-test between the groups to see whether or not their means of -8 and -35 were significantly different from one another.
Were there any events in 2020 that were particularly interesting to you? I compiled the data and made it available for you via Shinyapps.io for you to conduct your own event study to see how Twitter sentiment may have differed in the days leading up to and after your chosen date. It also retrieves five random NYTimes headlines from your chosen date to provide a bit of context on what occurred around that time. Some ideas for you may be your birthday, when Parasite won best picture, the Beirut port explosion, or when we all thought Kim Jong Un had secretly died.
Person by Person
That was how 2020 went for the American Twitter base. But people obviously all have unique experiences and emotions, and to boil everyone down and equate their year to that of an average Twitter user would be ignoring that individuality. Therefore, I want to ask: how was your happiness during 2020? I created a Shiny application to put that into numbers for you. It retrieves the last 1000 tweets from any public user of your choice. It then creates a word cloud, plots a timeline for the daily sentiment of their tweets, and categorizes the general sentiments of the person’s tweets by emotion. For your reference: here’s a classic favorite, Rihanna.
As expected, most of her tweet tend to be around the release of her makeup and skincare lines, Fenty Beauty. However, there is a huge downtick in sentiment when she speaks up about the Black Lives Matter movement in June 2020 and against AAPI hate crimes in March 2021. Generally, her tweets tend to be positive and in anticipation, likely promoting the launches of her new beauty products or fashion lines. Feel free to use this to do some deep introspective analysis or if you’re Twitter-less like me, just analyze your favorite celebrity.
Overall, this data project reinforced some of my expectations, but also provided surprising new insights into how Twitter can be used as a tool to understand how people feel. While I foresaw the holiday season to be the most positive time of the year, I had also thought March or April would have been the most negative time of the year due to the financial downturn and quarantine. George Floyd’s death had an extremely strong and lasting effect on how people felt and it was disheartening to see all the frustration and anger materialize in the numbers. Overall, health is an all encompassing word for our emotional, physical and mental well-being, and although it seems like it’s constantly echoed, it really is important to take care of our own health and others’ during difficult times.
As addressed prior, Twitter and the AFINN sentiment dictionary have their drawbacks and using them as a proxy for measuring happiness may not be the most accurate or representative. As a result, if I were to extend this analysis, I would consider adding other platforms such as YouTube, Reddit, or Facebook, as well as averaging sentiment scores from multiple dictionaries. It would also be interesting to use statistics to see whether sentiment was significantly different before and after events, or between months. Lastly, it would be interesting to try to build a model to predict new tweets based on prior tweets.
Below are additional data visualizations that I created but as I was writing, felt that they were not as central to the story I was trying to tell. I started off this project hoping to do a comparison between Twitter trends and Google trends. I felt that Twitter trends would give insight on people’s curated personas and who they may want to convey themselves as to the public, and meanwhile, Google trends would provide information on what people are doing in private, and what they are actually thinking when nobody is watching. Unfortunately, however, Google searches are not as emotionally strong as Twitter trends are. As seen below, the average sentiment score for each day hovered between -1 to 1. This makes sense as I think about what I typically search on Google — boring things such as people, restaurants, or even instructions. Additionally, the Google monthly analysis provided similar information as the Twitter monthly analysis did and the word cloud included similar topics such as sports, stocks and COVID. I felt the repetition would draw away from my points. As a result, I decided to exclude the Google Trend sentiment analysis from my main project.
Other areas of interests included seeing how more “macro” factors such as the stock market, unemployment, COVID deaths and alcohol consumption may correlate with happiness. For all four, we do see a significant movement with the onset of the pandemic. The S&P500 and unemployment claims have just about returned to pre-COVID levels by the end of 2020, despite daily new COVID deaths being much higher by the end of 2020 (likely due to holiday gatherings). The most interesting to me was looking at alcohol consumption. There is an apparent spike at the start of the pandemic, due to the death of loved ones as well as sudden change, but the real peak comes in the summertime. Sunshine and good weather likely encouraged people to sit back and enjoy a drink or two to relax. While these were not as central to the Twitter sentiment analysis as planned, they still provided interesting visualizations of how people may have felt. It supports the conclusion that March, June, and December were months of strong emotion, as seen throughout this analysis.
Lastly, these were additional event studies that I had wanted to look at it. The Twitter public seems to be left-leaning, despite “trump” being the most frequent trend of 2020, considering that sentiment in the two weeks after Biden’s victory was much higher than the 2 weeks before. Additionally, as expected, the start of the COVID shutdowns was a huge downer for many. I excluded this from my main project, as I implemented the event study Shiny application afterwards, where users could have chosen these dates themselves.