Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Text Mining: Exploratory Data Analysis to Machi...
Search
Julia Silge
March 04, 2019
Technology
1
250
Text Mining: Exploratory Data Analysis to Machine Learning
March 2019 talk at WiDS Salt Lake City regional event
Julia Silge
March 04, 2019
Tweet
Share
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
340
The right tool for the job
juliasilge
0
65
Good practices for applied machine learning
juliasilge
0
220
Applied machine learning with tidymodels
juliasilge
0
150
Maintaining an R Package
juliasilge
0
400
Publishing the Stack Overflow Developer Survey
juliasilge
2
82
Text Mining Using Tidy Data Principles
juliasilge
0
160
North American Developer Hiring Landscape
juliasilge
0
66
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.5k
Other Decks in Technology
See All in Technology
株式会社ビザスク_AI__Engineering_Summit_Tokyo_2025_登壇資料.pdf
eikohashiba
1
120
モダンデータスタックの理想と現実の間で~1.3億人Vポイントデータ基盤の現在地とこれから~
taromatsui_cccmkhd
2
270
たまに起きる外部サービスの障害に備えたり備えなかったりする話
egmc
0
420
MariaDB Connector/C のcaching_sha2_passwordプラグインの仕様について
boro1234
0
1.1k
7,000万ユーザーの信頼を守る「TimeTree」のオブザーバビリティ実践 ( Datadog Live Tokyo )
bell033
1
100
なぜ あなたはそんなに re:Invent に行くのか?
miu_crescent
PRO
0
220
MySQLのSpatial(GIS)機能をもっと充実させたい ~ MyNA望年会2025LT
sakaik
0
130
AI との良い付き合い方を僕らは誰も知らない
asei
0
280
Amazon Bedrock Knowledge Bases × メタデータ活用で実現する検証可能な RAG 設計
tomoaki25
6
2.4k
Amazon Quick Suite で始める手軽な AI エージェント
shimy
2
1.9k
_第4回__AIxIoTビジネス共創ラボ紹介資料_20251203.pdf
iotcomjpadmin
0
140
ESXi のAIOps だ!2025冬
unnowataru
0
390
Featured
See All Featured
Facilitating Awesome Meetings
lara
57
6.7k
A better future with KSS
kneath
240
18k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
How Fast Is Fast Enough? [PerfNow 2025]
tammyeverts
3
410
The Anti-SEO Checklist Checklist. Pubcon Cyber Week
ryanjones
0
29
Skip the Path - Find Your Career Trail
mkilby
0
27
ラッコキーワード サービス紹介資料
rakko
0
1.8M
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.5k
Optimising Largest Contentful Paint
csswizardry
37
3.5k
Music & Morning Musume
bryan
46
7k
VelocityConf: Rendering Performance Case Studies
addyosmani
333
24k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
9
1k
Transcript
T E X T M I N I N G
EXPLORATORY DATA ANALYSIS TO MACHINE LEARNING
HELLO T I D Y T E X T Data
Scientist at Stack Overflow @juliasilge https://juliasilge.com/ I’m Julia Silge
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT NLP TRAINING IS SCARCE ON THE GROUND
TIDY DATA PRINCIPLES + COUNT-BASED METHODS = T I D
Y T E X T
https://github.com/juliasilge/tidytext
https://github.com/juliasilge/tidytext
http://tidytextmining.com/
T I D Y T E X T EXPLORATORY DATA
ANALYSIS N-GRAMS AND MORE WORDS MACHINE LEARNING
EXPLORATORY DATA ANALYSIS T I D Y T E X
T
from the Washington Post’s Wonkblog
from the Washington Post’s Wonkblog
D3 visualization on Glitch
WHAT IS A DOCUMENT ABOUT? T I D Y T
E X T TERM FREQUENCY INVERSE DOCUMENT FREQUENCY
None
None
• As part of the NASA Datanauts program, I worked
on a project to understand NASA datasets • Metadata includes title, description, keywords, etc
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L N-GRAMS, NETWORKS, & NEGATION
None
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TOPIC MODELING
TOPIC MODELING T I D Y T E X T
•Each DOCUMENT = mixture of topics •Each TOPIC = mixture of words
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TEXT CLASSIFICATION
TRAIN A GLMNET MODEL T I D Y T E
X T
TEXT CLASSIFICATION T I D Y T E X T
> library(glmnet) > library(doMC) > registerDoMC(cores = 8) > > is_jane <- books_joined$title == "Pride and Prejudice" > > model <- cv.glmnet(sparse_words, is_jane, family = "binomial", + parallel = TRUE, keep = TRUE)
None
None
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com JULIA SILGE
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com Author portraits from Wikimedia Photos by Glen Noble and Kimberly Farmer on Unsplash JULIA SILGE