Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Text Mining: Exploratory Data Analysis to Machi...
Search
Julia Silge
March 04, 2019
Technology
1
240
Text Mining: Exploratory Data Analysis to Machine Learning
March 2019 talk at WiDS Salt Lake City regional event
Julia Silge
March 04, 2019
Tweet
Share
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
310
The right tool for the job
juliasilge
0
47
Good practices for applied machine learning
juliasilge
0
200
Applied machine learning with tidymodels
juliasilge
0
120
Maintaining an R Package
juliasilge
0
370
Publishing the Stack Overflow Developer Survey
juliasilge
2
72
Text Mining Using Tidy Data Principles
juliasilge
0
140
North American Developer Hiring Landscape
juliasilge
0
53
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.5k
Other Decks in Technology
See All in Technology
AIエージェント就活入門 - MCPが履歴書になる未来
eltociear
0
430
Evolution on AI Agent and Beyond - AGI への道のりと、シンギュラリティの3つのシナリオ
masayamoriofficial
0
170
現場が抱える様々な問題は “組織設計上” の問題によって生じていることがある / Team-oriented Organization Design 20250827
mtx2s
3
790
会社にデータエンジニアがいることでできるようになること
10xinc
9
1.6k
Yahoo!広告ビジネス基盤におけるバックエンド開発
lycorptech_jp
PRO
1
270
マイクロモビリティシェアサービスを支える プラットフォームアーキテクチャ
grimoh
1
200
モダンフロントエンド 開発研修
recruitengineers
PRO
2
260
JOAI発表資料 @ 関東kaggler会
joai_committee
1
260
夢の印税生活 / Life on Royalties
tmtms
0
280
VPC Latticeのサービスエンドポイント機能を使用した複数VPCアクセス
duelist2020jp
0
180
見てわかるテスト駆動開発
recruitengineers
PRO
4
280
EKS Pod Identity における推移的な session tags
z63d
1
200
Featured
See All Featured
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
229
22k
Side Projects
sachag
455
43k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
50
5.5k
A designer walks into a library…
pauljervisheath
207
24k
Imperfection Machines: The Place of Print at Facebook
scottboms
268
13k
Typedesign – Prime Four
hannesfritz
42
2.8k
Stop Working from a Prison Cell
hatefulcrawdad
271
21k
Statistics for Hackers
jakevdp
799
220k
Navigating Team Friction
lara
189
15k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
48
9.6k
Product Roadmaps are Hard
iamctodd
PRO
54
11k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
31
2.2k
Transcript
T E X T M I N I N G
EXPLORATORY DATA ANALYSIS TO MACHINE LEARNING
HELLO T I D Y T E X T Data
Scientist at Stack Overflow @juliasilge https://juliasilge.com/ I’m Julia Silge
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT NLP TRAINING IS SCARCE ON THE GROUND
TIDY DATA PRINCIPLES + COUNT-BASED METHODS = T I D
Y T E X T
https://github.com/juliasilge/tidytext
https://github.com/juliasilge/tidytext
http://tidytextmining.com/
T I D Y T E X T EXPLORATORY DATA
ANALYSIS N-GRAMS AND MORE WORDS MACHINE LEARNING
EXPLORATORY DATA ANALYSIS T I D Y T E X
T
from the Washington Post’s Wonkblog
from the Washington Post’s Wonkblog
D3 visualization on Glitch
WHAT IS A DOCUMENT ABOUT? T I D Y T
E X T TERM FREQUENCY INVERSE DOCUMENT FREQUENCY
None
None
• As part of the NASA Datanauts program, I worked
on a project to understand NASA datasets • Metadata includes title, description, keywords, etc
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L N-GRAMS, NETWORKS, & NEGATION
None
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TOPIC MODELING
TOPIC MODELING T I D Y T E X T
•Each DOCUMENT = mixture of topics •Each TOPIC = mixture of words
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TEXT CLASSIFICATION
TRAIN A GLMNET MODEL T I D Y T E
X T
TEXT CLASSIFICATION T I D Y T E X T
> library(glmnet) > library(doMC) > registerDoMC(cores = 8) > > is_jane <- books_joined$title == "Pride and Prejudice" > > model <- cv.glmnet(sparse_words, is_jane, family = "binomial", + parallel = TRUE, keep = TRUE)
None
None
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com JULIA SILGE
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com Author portraits from Wikimedia Photos by Glen Noble and Kimberly Farmer on Unsplash JULIA SILGE