Do Angry People Have Poor Grammar? An Exploration of Language Processing and Statistics in Python

Do Angry People Have Poor Grammar? Ben Fields

An Exploration of Language Processing and Statistics in Python Ben
Fields

intro and motivation

@alsothings - Do Angry People Have Poor Grammar? Have you
ever noticed on social media

@alsothings - Do Angry People Have Poor Grammar? that the
‘loudest’

@alsothings - Do Angry People Have Poor Grammar? seem to
not construct the best sentences?

@alsothings - Do Angry People Have Poor Grammar? me too.

@alsothings - Do Angry People Have Poor Grammar? But I
am a skeptic

@alsothings - Do Angry People Have Poor Grammar? http://dx.doi.org/10.1037/1089-2680.2.2.175

@alsothings - Do Angry People Have Poor Grammar? So lets
do some analysis

@alsothings - Do Angry People Have Poor Grammar? 1. Find
pile of comments 2. Measure style and sentiment 3. ???? 4. Profit

pile of comments 2. Measure style and sentiment 3. Statistical dependance? 4. Profit

pile of comments 2. Measure style and sentiment 3. Statistical dependance? 4. more twitter followers

1. Find pile of comments

@alsothings - Do Angry People Have Poor Grammar? all of
the reddit comments!

@alsothings - Do Angry People Have Poor Grammar? https://www.reddit.com/r/datasets/comments/3bxlg7/ i_have_every_publicly_available_reddit_comment

@alsothings - Do Angry People Have Poor Grammar? 1.7 Trillion
reddit comments!

@alsothings - Do Angry People Have Poor Grammar? 59 Million
reddit comments! https://mega.nz/#!ysBWXRqK!yPXLr25PgJi184pbJU3GtnqUY4wG7YvuPpxJjEmnb9A

@alsothings - Do Angry People Have Poor Grammar? (actually ~1%
sample of that: 390k comments)

@alsothings - Do Angry People Have Poor Grammar? from nltk
import tokenize … for comment in pile_of_comment: num_tokens = len(filter(lambda t:t not in punctuation, tokenize.word_tokenize(comment['body']))) if num_tokens < 5: continue

@alsothings - Do Angry People Have Poor Grammar? 319K comments
from January 2015

2. Measure style and sentiment

@alsothings - Do Angry People Have Poor Grammar? grammar?

@alsothings - Do Angry People Have Poor Grammar? Context-Free Grammar!
http://www.nltk.org/book/ch08.html

@alsothings - Do Angry People Have Poor Grammar? Great for
analysing complex and ambiguous sentence structure http://www.nltk.org/book/ch08.html

@alsothings - Do Angry People Have Poor Grammar? http://www.imdb.com/title/tt0020640/

@alsothings - Do Angry People Have Poor Grammar? “One morning,
I shot an elephant in my pyjamas. How he got in my pyjamas, I don't know.” http://www.nltk.org/book/ch08.html

@alsothings - Do Angry People Have Poor Grammar? “I shot
an elephant in my pyjamas.” http://www.nltk.org/book/ch08.html

@alsothings - Do Angry People Have Poor Grammar? groucho_grammar =
nltk.CFG.fromstring(from_last_slide) sent = ['I', 'shot', 'an', 'elephant', 'in', 'my', 'pyjamas'] parser = nltk.ChartParser(groucho_grammar) for tree in parser.parse(sent): print(tree) http://www.nltk.org/book/ch08.html

@alsothings - Do Angry People Have Poor Grammar? (S (NP
I) (VP (VP (V shot) (NP (Det an) (N elephant))) (PP (P in) (NP (Det my) (N pyjamas))))) (S (NP I) (VP (V shot) (NP (Det an) (N elephant) (PP (P in) (NP (Det my) (N pajamas)))))) http://www.nltk.org/book/ch08.html

@alsothings - Do Angry People Have Poor Grammar? http://www.nltk.org/book/ch08.html S
S NP VP NP VP VP PP V NP V NP P NP Det N Det N Det N PP P NP Det N I shot an elephant in my pyjamas I shot an elephant in my pyjamas

@alsothings - Do Angry People Have Poor Grammar? Extended ad
nauseum: a definition of english

@alsothings - Do Angry People Have Poor Grammar? Extended ad
nauseum: a static simplification of english

@alsothings - Do Angry People Have Poor Grammar? pCFG

@alsothings - Do Angry People Have Poor Grammar? probabilistic CFG

@alsothings - Do Angry People Have Poor Grammar? “The main
problem is that there is no common agreement on what are grammatically correct (English) sentences; nor has anyone yet been able to offer a grammar precise enough to propose as definitive.” http://dl.acm.org/citation.cfm?id=1882777

@alsothings - Do Angry People Have Poor Grammar? Style checking!

@alsothings - Do Angry People Have Poor Grammar? lint for
prose

@alsothings - Do Angry People Have Poor Grammar? proselint.com

@alsothings - Do Angry People Have Poor Grammar? proselint.com/write

@alsothings - Do Angry People Have Poor Grammar? https://github.com/amperser/proselint/ {
"max_errors": 1000, "checks": { "butterick.symbols" : true, "carlin.filth" : true, "consistency.spacing" : true, "consistency.spelling" : true, "garner.airlinese" : true, … "inc.corporate_speak" : true, "leonard.exclamation" : true, "leonard.hell" : true, … "write_good.weasel_words" : true, "wsj.athletes" : true } }

@alsothings - Do Angry People Have Poor Grammar? https://github.com/amperser/proselint/ (from
checks/leonard/exclamation.py) @memoize def check_repeated_exclamations(text): """Check the text.""" err = "leonard.exclamation.multiple" msg = u"Stop yelling. Keep your exclamation points under control." regex = r"[^A-Z]\b((\s[A-Z]+){3,})" return existence_check( text, [regex], err, msg, require_padding=False, ignore_case=False,max_errors=1, dotall=True)

@alsothings - Do Angry People Have Poor Grammar? mean std
dev min 25% 50% 75% max lints 0.4298 1.0861 0 0 0 1 400 normed lints 0.0216 0.0497 0 0 0 0.0178 1.2

sentiment analysis http://www.nltk.org/howto/sentiment.html

@alsothings - Do Angry People Have Poor Grammar?

@alsothings - Do Angry People Have Poor Grammar? lul or
luuuuuuuuuuuulz?

@alsothings - Do Angry People Have Poor Grammar? VADER (Valence
Aware Dictionary for sEntiment Reasoning) http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf

@alsothings - Do Angry People Have Poor Grammar? Test Condition
Example Text Baseline Yay. Another good phone interview. Punctuation1 Yay! Another good phone interview! Punctuation1 + Degree Mod. Yay! Another extremely good phone interview! Punctuation2 Yay!! Another good phone interview!! Capitalization YAY. Another GOOD phone interview. Punct1 + Cap. YAY! Another GOOD phone interview! Punct2 + Cap. YAY!! Another GOOD phone interview!! Punct3 + Cap. YAY!!! Another GOOD phone interview!!! Punct3 + Cap. + Degree Mod. YAY!!! Another EXTREMELY GOOD phone interview!!! Table 2: Example of baseline text with eight test conditions com- prised of grammatical and syntactical variations. post synt diffe 2. Mov from from tive the tenc 3. Tec leve prod (200 4. Opi http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf

@alsothings - Do Angry People Have Poor Grammar? mean std
dev min 25% 50% 75% max positive 0.14381 0.15702 0 0 0.10800 0.22900 1 neutral 0.776917 0.175849 0 0.667000 0.789000 0.915000 1 0.079243 0.079243 0 0 0 0.128000 1

3. Statistical dependance?

@alsothings - Do Angry People Have Poor Grammar? y =
f ( x )

@alsothings - Do Angry People Have Poor Grammar? regression! http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf

@alsothings - Do Angry People Have Poor Grammar? http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf S
= n X i=1 ri 2

@alsothings - Do Angry People Have Poor Grammar? correlation testing

@alsothings - Do Angry People Have Poor Grammar? Pearson’s negative
to correlation p lints 0.0311 4.535E-249 normed lints 0.0600 2.4228E-68

@alsothings - Do Angry People Have Poor Grammar? Nope.

@alsothings - Do Angry People Have Poor Grammar? NOPE.

conclusions!

@alsothings - Do Angry People Have Poor Grammar? People on
reddit are generally reasonable,

@alsothings - Do Angry People Have Poor Grammar? but when
they aren’t, the language isn’t any stylistically worse

@alsothings - Do Angry People Have Poor Grammar? at least
last January.

Let’s have some questions !/alsothings

Do Angry People Have Poor Grammar? An Explorati...

Do Angry People Have Poor Grammar? An Exploration of Language Processing and Statistics in Python

More Decks by Ben Fields

Other Decks in Technology

Featured

Transcript