Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Text Mining: Exploratory Data Analysis to Machi...
Search
Julia Silge
March 04, 2019
Technology
1
240
Text Mining: Exploratory Data Analysis to Machine Learning
March 2019 talk at WiDS Salt Lake City regional event
Julia Silge
March 04, 2019
Tweet
Share
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
300
The right tool for the job
juliasilge
0
46
Good practices for applied machine learning
juliasilge
0
200
Applied machine learning with tidymodels
juliasilge
0
120
Maintaining an R Package
juliasilge
0
370
Publishing the Stack Overflow Developer Survey
juliasilge
2
71
Text Mining Using Tidy Data Principles
juliasilge
0
140
North American Developer Hiring Landscape
juliasilge
0
52
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.5k
Other Decks in Technology
See All in Technology
20250807_Kiroと私の反省会
riz3f7
0
130
【CEDEC2025】大規模言語モデルを活用したゲーム内会話パートのスクリプト作成支援への取り組み
cygames
PRO
2
770
LLMをツールからプラットフォームへ〜Ai Workforceの戦略〜 #BetAIDay
layerx
PRO
1
840
Amazon Q Developerを活用したアーキテクチャのリファクタリング
k1nakayama
2
180
猫でもわかるQ_CLI(CDK開発編)+ちょっとだけKiro
kentapapa
0
3.4k
alecthomas/kong はいいぞ
fujiwara3
6
1.4k
風が吹けばWHOISが使えなくなる~なぜWHOIS・RDAPはサーバー証明書のメール認証に使えなくなったのか~
orangemorishita
15
5.4k
データ基盤の管理者からGoogle Cloud全体の管理者になっていた話
zozotech
PRO
0
330
相互運用可能な学修歴クレデンシャルに向けた標準技術と国際動向
fujie
0
200
【CEDEC2025】『ウマ娘 プリティーダービー』における映像制作のさらなる高品質化へ!~ 豊富な素材出力と制作フローの改善を実現するツールについて~
cygames
PRO
0
230
恐怖!テストコードなき夜
tsukuboshi
2
110
2025新卒研修・HTML/CSS #弁護士ドットコム
bengo4com
3
13k
Featured
See All Featured
Facilitating Awesome Meetings
lara
54
6.5k
Git: the NoSQL Database
bkeepers
PRO
431
65k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
34
6k
Building Better People: How to give real-time feedback that sticks.
wjessup
367
19k
Side Projects
sachag
455
43k
Agile that works and the tools we love
rasmusluckow
329
21k
GraphQLの誤解/rethinking-graphql
sonatard
71
11k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.5k
Documentation Writing (for coders)
carmenintech
73
5k
Done Done
chrislema
185
16k
Unsuck your backbone
ammeep
671
58k
Faster Mobile Websites
deanohume
308
31k
Transcript
T E X T M I N I N G
EXPLORATORY DATA ANALYSIS TO MACHINE LEARNING
HELLO T I D Y T E X T Data
Scientist at Stack Overflow @juliasilge https://juliasilge.com/ I’m Julia Silge
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT NLP TRAINING IS SCARCE ON THE GROUND
TIDY DATA PRINCIPLES + COUNT-BASED METHODS = T I D
Y T E X T
https://github.com/juliasilge/tidytext
https://github.com/juliasilge/tidytext
http://tidytextmining.com/
T I D Y T E X T EXPLORATORY DATA
ANALYSIS N-GRAMS AND MORE WORDS MACHINE LEARNING
EXPLORATORY DATA ANALYSIS T I D Y T E X
T
from the Washington Post’s Wonkblog
from the Washington Post’s Wonkblog
D3 visualization on Glitch
WHAT IS A DOCUMENT ABOUT? T I D Y T
E X T TERM FREQUENCY INVERSE DOCUMENT FREQUENCY
None
None
• As part of the NASA Datanauts program, I worked
on a project to understand NASA datasets • Metadata includes title, description, keywords, etc
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L N-GRAMS, NETWORKS, & NEGATION
None
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TOPIC MODELING
TOPIC MODELING T I D Y T E X T
•Each DOCUMENT = mixture of topics •Each TOPIC = mixture of words
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TEXT CLASSIFICATION
TRAIN A GLMNET MODEL T I D Y T E
X T
TEXT CLASSIFICATION T I D Y T E X T
> library(glmnet) > library(doMC) > registerDoMC(cores = 8) > > is_jane <- books_joined$title == "Pride and Prejudice" > > model <- cv.glmnet(sparse_words, is_jane, family = "binomial", + parallel = TRUE, keep = TRUE)
None
None
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com JULIA SILGE
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com Author portraits from Wikimedia Photos by Glen Noble and Kimberly Farmer on Unsplash JULIA SILGE