Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Text Mining: Exploratory Data Analysis to Machi...
Search
Julia Silge
March 04, 2019
Technology
1
240
Text Mining: Exploratory Data Analysis to Machine Learning
March 2019 talk at WiDS Salt Lake City regional event
Julia Silge
March 04, 2019
Tweet
Share
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
320
The right tool for the job
juliasilge
0
51
Good practices for applied machine learning
juliasilge
0
210
Applied machine learning with tidymodels
juliasilge
0
140
Maintaining an R Package
juliasilge
0
380
Publishing the Stack Overflow Developer Survey
juliasilge
2
77
Text Mining Using Tidy Data Principles
juliasilge
0
140
North American Developer Hiring Landscape
juliasilge
0
56
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.5k
Other Decks in Technology
See All in Technology
会社を支える Pythonという言語戦略 ~なぜPythonを主要言語にしているのか?~
curekoshimizu
3
880
Azure Well-Architected Framework入門
tomokusaba
1
140
Open Table Format (OTF) が必要になった背景とその機能 (2025.10.28)
simosako
2
370
様々なファイルシステム
sat
PRO
0
260
クラウドとリアルの融合により、製造業はどう変わるのか?〜クラスメソッドの製造業への取組と共に〜
hamadakoji
0
450
仕様駆動開発を実現する上流工程におけるAIエージェント活用
sergicalsix
1
410
AI時代の開発を加速する組織づくり - ブログでは書けなかったリアル
hiro8ma
2
330
OSSで50の競合と戦うためにやったこと
yamadashy
3
1k
AIエージェントによる業務効率化への飽くなき挑戦-AWS上の実開発事例から学んだ効果、現実そしてギャップ-
nasuvitz
5
1.3k
RemoteFunctionを使ったコロケーション
mkazutaka
1
130
AI時代の発信活動 ~技術者として認知してもらうための発信法~ / 20251028 Masaki Okuda
shift_evolve
PRO
1
110
QA業務を変える(!?)AIを併用した不具合分析の実践
ma2ri
0
160
Featured
See All Featured
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
15k
Bash Introduction
62gerente
615
210k
A designer walks into a library…
pauljervisheath
209
24k
Rails Girls Zürich Keynote
gr2m
95
14k
KATA
mclloyd
PRO
32
15k
Learning to Love Humans: Emotional Interface Design
aarron
274
41k
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.2k
A Modern Web Designer's Workflow
chriscoyier
697
190k
The Art of Programming - Codeland 2020
erikaheidi
56
14k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Building an army of robots
kneath
305
46k
Keith and Marios Guide to Fast Websites
keithpitt
411
23k
Transcript
T E X T M I N I N G
EXPLORATORY DATA ANALYSIS TO MACHINE LEARNING
HELLO T I D Y T E X T Data
Scientist at Stack Overflow @juliasilge https://juliasilge.com/ I’m Julia Silge
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT NLP TRAINING IS SCARCE ON THE GROUND
TIDY DATA PRINCIPLES + COUNT-BASED METHODS = T I D
Y T E X T
https://github.com/juliasilge/tidytext
https://github.com/juliasilge/tidytext
http://tidytextmining.com/
T I D Y T E X T EXPLORATORY DATA
ANALYSIS N-GRAMS AND MORE WORDS MACHINE LEARNING
EXPLORATORY DATA ANALYSIS T I D Y T E X
T
from the Washington Post’s Wonkblog
from the Washington Post’s Wonkblog
D3 visualization on Glitch
WHAT IS A DOCUMENT ABOUT? T I D Y T
E X T TERM FREQUENCY INVERSE DOCUMENT FREQUENCY
None
None
• As part of the NASA Datanauts program, I worked
on a project to understand NASA datasets • Metadata includes title, description, keywords, etc
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L N-GRAMS, NETWORKS, & NEGATION
None
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TOPIC MODELING
TOPIC MODELING T I D Y T E X T
•Each DOCUMENT = mixture of topics •Each TOPIC = mixture of words
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TEXT CLASSIFICATION
TRAIN A GLMNET MODEL T I D Y T E
X T
TEXT CLASSIFICATION T I D Y T E X T
> library(glmnet) > library(doMC) > registerDoMC(cores = 8) > > is_jane <- books_joined$title == "Pride and Prejudice" > > model <- cv.glmnet(sparse_words, is_jane, family = "binomial", + parallel = TRUE, keep = TRUE)
None
None
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com JULIA SILGE
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com Author portraits from Wikimedia Photos by Glen Noble and Kimberly Farmer on Unsplash JULIA SILGE