Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Text Mining: Exploratory Data Analysis to Machi...
Search
Julia Silge
March 04, 2019
Technology
1
220
Text Mining: Exploratory Data Analysis to Machine Learning
March 2019 talk at WiDS Salt Lake City regional event
Julia Silge
March 04, 2019
Tweet
Share
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
210
The right tool for the job
juliasilge
0
27
Good practices for applied machine learning
juliasilge
0
180
Applied machine learning with tidymodels
juliasilge
0
96
Maintaining an R Package
juliasilge
0
330
Publishing the Stack Overflow Developer Survey
juliasilge
2
56
Text Mining Using Tidy Data Principles
juliasilge
0
120
North American Developer Hiring Landscape
juliasilge
0
35
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.4k
Other Decks in Technology
See All in Technology
組織貢献をするフリーランスエンジニアという生き方
n_takehata
1
1.2k
君も受託系GISエンジニアにならないか
sudataka
2
410
Developer Summit 2025 [14-D-1] Yuki Hattori
yuhattor
19
5.8k
10分で紹介するAmazon Bedrock利用時のセキュリティ対策 / 10-minutes introduction to security measures when using Amazon Bedrock
hideakiaoyagi
0
180
Larkご案内資料
customercloud
PRO
0
650
Data-centric AI入門第6章:Data-centric AIの実践例
x_ttyszk
1
390
インフラをつくるとはどういうことなのか、 あるいはPlatform Engineeringについて
nwiizo
5
2.4k
技術的負債解消の取り組みと専門チームのお話 #技術的負債_Findy
bengo4com
1
1.2k
Cloud Spanner 導入で実現した快適な開発と運用について
colopl
1
320
AndroidデバイスにFTPサーバを建立する
e10dokup
0
240
急成長する企業で作った、エンジニアが輝ける制度/ 20250214 Rinto Ikenoue
shift_evolve
2
1.1k
N=1から解き明かすAWS ソリューションアーキテクトの魅力
kiiwami
0
110
Featured
See All Featured
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
31
2.1k
Build The Right Thing And Hit Your Dates
maggiecrowley
34
2.5k
Designing Experiences People Love
moore
139
23k
How to train your dragon (web standard)
notwaldorf
91
5.8k
Raft: Consensus for Rubyists
vanstee
137
6.8k
Practical Orchestrator
shlominoach
186
10k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
Measuring & Analyzing Core Web Vitals
bluesmoon
6
240
Fontdeck: Realign not Redesign
paulrobertlloyd
82
5.4k
How GitHub (no longer) Works
holman
313
140k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
49
2.3k
GraphQLとの向き合い方2022年版
quramy
44
13k
Transcript
T E X T M I N I N G
EXPLORATORY DATA ANALYSIS TO MACHINE LEARNING
HELLO T I D Y T E X T Data
Scientist at Stack Overflow @juliasilge https://juliasilge.com/ I’m Julia Silge
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT NLP TRAINING IS SCARCE ON THE GROUND
TIDY DATA PRINCIPLES + COUNT-BASED METHODS = T I D
Y T E X T
https://github.com/juliasilge/tidytext
https://github.com/juliasilge/tidytext
http://tidytextmining.com/
T I D Y T E X T EXPLORATORY DATA
ANALYSIS N-GRAMS AND MORE WORDS MACHINE LEARNING
EXPLORATORY DATA ANALYSIS T I D Y T E X
T
from the Washington Post’s Wonkblog
from the Washington Post’s Wonkblog
D3 visualization on Glitch
WHAT IS A DOCUMENT ABOUT? T I D Y T
E X T TERM FREQUENCY INVERSE DOCUMENT FREQUENCY
None
None
• As part of the NASA Datanauts program, I worked
on a project to understand NASA datasets • Metadata includes title, description, keywords, etc
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L N-GRAMS, NETWORKS, & NEGATION
None
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TOPIC MODELING
TOPIC MODELING T I D Y T E X T
•Each DOCUMENT = mixture of topics •Each TOPIC = mixture of words
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TEXT CLASSIFICATION
TRAIN A GLMNET MODEL T I D Y T E
X T
TEXT CLASSIFICATION T I D Y T E X T
> library(glmnet) > library(doMC) > registerDoMC(cores = 8) > > is_jane <- books_joined$title == "Pride and Prejudice" > > model <- cv.glmnet(sparse_words, is_jane, family = "binomial", + parallel = TRUE, keep = TRUE)
None
None
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com JULIA SILGE
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com Author portraits from Wikimedia Photos by Glen Noble and Kimberly Farmer on Unsplash JULIA SILGE