Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Look Into Reddit's Star Dish

A Look Into Reddit's Star Dish

An analysis of comments on Reddit.

CM Tech
Spring 2017
Cornell Tech

Frances Coronel

May 04, 2017
Tweet

More Decks by Frances Coronel

Other Decks in Research

Transcript

  1. 1 Background Human & social motivation, high-level research questions Since

    the launch of Reddit in June 2005, the site has become the 7th most visited in the U.S., and its users have posted billions of comments. Those comments are filled with abbreviations, internet memes and slang, much like the rest of the web, and collectively they form a trove of data about how people use language online. 3 To Note As of the end of 2015, the site’s visitors were mostly 35 or younger, and about 80 percent male according to Google Adwords.
  2. 2 Literature The connective media theories associated with this data

    analysis Some of the topics covered... ▪ Emotional Contagion ▪ Group Polarization ▪ Meforming versus Informing 4
  3. Individuals tending to endorse a more extreme position in the

    direction already favored by the group 6 Group Polarization
  4. Users that typically post messages relating to themselves or their

    thoughts versus posting messages that are informing in nature 7 Meformers vs Informers
  5. In other words, what components make up Reddit’s secret dish

    of comments and what allows them as a whole to succeed in a digital world where many platforms fail to be regulate such discussion systems? 8
  6. 3 Hypotheses High-level research questions 1. What kind of communication

    style in comments drives the highest reply rates? Passive, assertive, aggressive, or sarcastic sentiment? 2. What kind of information style drives the highest reply rates? Meforming or informing? 9
  7. There will be a positive correlation between response rate and

    level of aggression. 10 What kind of sentiment in comments drives the highest reply rates? Passive, assertive, aggressive, or sarcastic sentiment?
  8. There will be a positive correlation between response rate and

    meforming. 11 What kind of information style drives the highest reply rates? Meforming or informing?
  9. 30GB Recently Reddit released an enormous dataset containing all ~1.7

    billion of their publicly available comments. The full dataset is a crazy 1+ terabyte uncompressed, so Kaggle decided to just share a small portion of the comments from May 2015 for folks like connective media students to tinker with (8GB compressed, 30 GB uncompressed). 13
  10. 5 Analysis Describe how you addressed the questions with the

    data and talk about the results ▪ Sentiment Analysis - 4 styles - Aggressive, Assertive, Passive, Sarcastic - Identified keywords that are representative of these communication styles ▪ Meforming versus Informing - Identified keywords which might denote meforming - All other comments are identified as informing 14
  11. Based off these results, our hypothesis on the positive correlation

    between aggression and reply rates is rejected. However, it is clear that there is in fact a positive correlation between aggression and the number of upvotes. 15 Communication Style Analysis Aggressive comments had the highest number of upvotes with a ranking score of 6.45 which is ~11% higher than the second best of assertive comments. In turn, assertive comments had the highest reply rates with nearly 90,000 comments which is 200% better (2x) than the next best of aggressive comments. Sarcasm, in contrast, rarely received high scores.
  12. Based off these results, interestingly enough, our hypothesis on the

    positive correlation between meforming and reply rates is rejected. Meforming fared much worse when it came to reply rates but surprisingly was slightly higher when it came to number of upvotes. 16 Meforming versus Informing Meforming comments had the highest number of upvotes with a ranking score of 5.68 which is only ~1% higher compared to informing. Informing comments had the highest reply rates with over 1mill comments which is a staggering ~500% higher than meforming.
  13. 6 Conclusions Describe how you addressed the questions with the

    data and talk about the results ▪ A user on Reddit is more likely to have a higher reply rate for a comment that is assertive and informing. ▪ In turn, it can also be concluded that a user on Reddit is less likely to have a higher reply rate for a comment that is sarcastic and meforming. 17
  14. Credits Special thanks to all the people who made and

    released these awesome resources for free. 18 ▪ Presentation template by SlidesCarnival ▪ Dataset provided by Kaggle ▪ The brains of Sindhu Babu & Frances Coronel ▪ See our report for academic references