Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Do Angry People Have Poor Grammar? An Explorati...

Do Angry People Have Poor Grammar? An Exploration of Language Processing and Statistics in Python

This talk is about two things: natural language processing (NLP) and statistical dependence. We will embark on a data science workflow using various python scientific computing tools to better understand the behavior of commenters on Reddit. To do this we'll go through an introduction to sentiment analysis in Python (mostly using NLTK) and a swift explanation of the statistics of variable dependence.

We'll couple these freshly learned methods with an excellent dataset for this domain: every public reddit comment. We'll talk a bit about handling and preprocessing data of this size and character. Then we'll compile scores for both sentiment and spelling/grammar. In the end we may just discover if angry comment are also grammatically poor comments. And the audience will walk away a few more tools in scientific computing toolbelt.

Deck as presented at PyData Amsterdam 2016

Ben Fields

March 12, 2016
Tweet

More Decks by Ben Fields

Other Decks in Technology

Transcript

  1. @alsothings - Do Angry People Have Poor Grammar? 1. Find

    pile of comments 2. Measure style and sentiment 3. ???? 4. Profit
  2. @alsothings - Do Angry People Have Poor Grammar? 1. Find

    pile of comments 2. Measure style and sentiment 3. Statistical dependance? 4. Profit
  3. @alsothings - Do Angry People Have Poor Grammar? 1. Find

    pile of comments 2. Measure style and sentiment 3. Statistical dependance? 4. more twitter followers
  4. @alsothings - Do Angry People Have Poor Grammar? 59 Million

    reddit comments! https://mega.nz/#!ysBWXRqK!yPXLr25PgJi184pbJU3GtnqUY4wG7YvuPpxJjEmnb9A
  5. @alsothings - Do Angry People Have Poor Grammar? from nltk

    import tokenize … for comment in pile_of_comment: num_tokens = len(filter(lambda t:t not in punctuation, tokenize.word_tokenize(comment['body']))) if num_tokens < 5: continue
  6. @alsothings - Do Angry People Have Poor Grammar? Great for

    analysing complex and ambiguous sentence structure http://www.nltk.org/book/ch08.html
  7. @alsothings - Do Angry People Have Poor Grammar? “One morning,

    I shot an elephant in my pyjamas. How he got in my pyjamas, I don't know.” http://www.nltk.org/book/ch08.html
  8. @alsothings - Do Angry People Have Poor Grammar? “I shot

    an elephant in my pyjamas.” http://www.nltk.org/book/ch08.html
  9. @alsothings - Do Angry People Have Poor Grammar? S ->

    NP VP PP -> P NP NP -> Det N | Det N PP | 'I' VP -> V NP | VP PP Det -> 'an' | 'my' N -> 'elephant' | 'pyjamas' V -> 'shot' P -> 'in' http://www.nltk.org/book/ch08.html
  10. @alsothings - Do Angry People Have Poor Grammar? groucho_grammar =

    nltk.CFG.fromstring(from_last_slide) sent = ['I', 'shot', 'an', 'elephant', 'in', 'my', 'pyjamas'] parser = nltk.ChartParser(groucho_grammar) for tree in parser.parse(sent): print(tree) http://www.nltk.org/book/ch08.html
  11. @alsothings - Do Angry People Have Poor Grammar? (S (NP

    I) (VP (VP (V shot) (NP (Det an) (N elephant))) (PP (P in) (NP (Det my) (N pyjamas))))) (S (NP I) (VP (V shot) (NP (Det an) (N elephant) (PP (P in) (NP (Det my) (N pajamas)))))) http://www.nltk.org/book/ch08.html
  12. @alsothings - Do Angry People Have Poor Grammar? http://www.nltk.org/book/ch08.html S

    S NP VP NP VP VP PP V NP V NP P NP Det N Det N Det N PP P NP Det N I shot an elephant in my pyjamas I shot an elephant in my pyjamas
  13. @alsothings - Do Angry People Have Poor Grammar? S ->

    NP VP PP -> P NP NP -> Det N | Det N PP | 'I' VP -> V NP | VP PP Det -> 'an' | 'my' N -> 'elephant' | 'pyjamas' V -> 'shot' P -> 'in' http://www.nltk.org/book/ch08.html
  14. @alsothings - Do Angry People Have Poor Grammar? Extended ad

    nauseum: a static simplification of english
  15. @alsothings - Do Angry People Have Poor Grammar? “The main

    problem is that there is no common agreement on what are grammatically correct (English) sentences; nor has anyone yet been able to offer a grammar precise enough to propose as definitive.” http://dl.acm.org/citation.cfm?id=1882777
  16. @alsothings - Do Angry People Have Poor Grammar? https://github.com/amperser/proselint/ {

    "max_errors": 1000, "checks": { "butterick.symbols" : true, "carlin.filth" : true, "consistency.spacing" : true, "consistency.spelling" : true, "garner.airlinese" : true, … "inc.corporate_speak" : true, "leonard.exclamation" : true, "leonard.hell" : true, … "write_good.weasel_words" : true, "wsj.athletes" : true } }
  17. @alsothings - Do Angry People Have Poor Grammar? https://github.com/amperser/proselint/ (from

    checks/leonard/exclamation.py) @memoize def check_repeated_exclamations(text): """Check the text.""" err = "leonard.exclamation.multiple" msg = u"Stop yelling. Keep your exclamation points under control." regex = r"[^A-Z]\b((\s[A-Z]+){3,})" return existence_check( text, [regex], err, msg, require_padding=False, ignore_case=False,max_errors=1, dotall=True)
  18. @alsothings - Do Angry People Have Poor Grammar? mean std

    dev min 25% 50% 75% max lints 0.4298 1.0861 0 0 0 1 400 normed lints 0.0216 0.0497 0 0 0 0.0178 1.2
  19. @alsothings - Do Angry People Have Poor Grammar? VADER (Valence

    Aware Dictionary for sEntiment Reasoning) http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf
  20. @alsothings - Do Angry People Have Poor Grammar? Test Condition

    Example Text Baseline Yay. Another good phone interview. Punctuation1 Yay! Another good phone interview! Punctuation1 + Degree Mod. Yay! Another extremely good phone interview! Punctuation2 Yay!! Another good phone interview!! Capitalization YAY. Another GOOD phone interview. Punct1 + Cap. YAY! Another GOOD phone interview! Punct2 + Cap. YAY!! Another GOOD phone interview!! Punct3 + Cap. YAY!!! Another GOOD phone interview!!! Punct3 + Cap. + Degree Mod. YAY!!! Another EXTREMELY GOOD phone in- terview!!! Table 2: Example of baseline text with eight test conditions com- prised of grammatical and syntactical variations. post synt diffe 2. Mov from from tive the tenc 3. Tec leve prod (200 4. Opi http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf
  21. @alsothings - Do Angry People Have Poor Grammar? mean std

    dev min 25% 50% 75% max positive 0.14381 0.15702 0 0 0.10800 0.22900 1 neutral 0.776917 0.175849 0 0.667000 0.789000 0.915000 1 0.079243 0.079243 0 0 0 0.128000 1
  22. @alsothings - Do Angry People Have Poor Grammar? Pearson’s negative

    to correlation p lints 0.0311 4.535E-249 normed lints 0.0600 2.4228E-68
  23. @alsothings - Do Angry People Have Poor Grammar? but when

    they aren’t, the language isn’t any stylistically worse