Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Text Mining: Exploratory Data Analysis to Machi...
Search
Julia Silge
March 04, 2019
Technology
1
230
Text Mining: Exploratory Data Analysis to Machine Learning
March 2019 talk at WiDS Salt Lake City regional event
Julia Silge
March 04, 2019
Tweet
Share
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
270
The right tool for the job
juliasilge
0
41
Good practices for applied machine learning
juliasilge
0
200
Applied machine learning with tidymodels
juliasilge
0
110
Maintaining an R Package
juliasilge
0
360
Publishing the Stack Overflow Developer Survey
juliasilge
2
67
Text Mining Using Tidy Data Principles
juliasilge
0
130
North American Developer Hiring Landscape
juliasilge
0
50
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.5k
Other Decks in Technology
See All in Technology
生成AIでwebアプリケーションを作ってみた
tajimon
2
120
Agentic Workflowという選択肢を考える
tkikuchi1002
1
370
“社内”だけで完結していた私が、AWS Community Builder になるまで
nagisa53
1
160
知識を整理して未来を作る 〜SKDとAI協業への助走〜
yosh1995
0
140
Claude Code Actionを使ったコード品質改善の取り組み
potix2
PRO
3
1.4k
活きてなかったデータを活かしてみた話 / Shirokane Kougyou vol 19
sansan_randd
1
410
Абьюзим random_bytes(). Фёдор Кулаков, разработчик Lamoda Tech
lamodatech
0
270
Workflows から Agents へ ~ 生成 AI アプリの成長過程とアプローチ~
belongadmin
3
170
VISITS_AIIoTビジネス共創ラボ登壇資料.pdf
iotcomjpadmin
0
140
CIでのgolangci-lintの実行を約90%削減した話
kazukihayase
0
340
A2Aのクライアントを自作する
rynsuke
1
150
25分で解説する「最小権限の原則」を実現するための AWS「ポリシー」大全 / 20250625-aws-summit-aws-policy
opelab
6
700
Featured
See All Featured
Side Projects
sachag
455
42k
Navigating Team Friction
lara
187
15k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
20
1.3k
Building Applications with DynamoDB
mza
95
6.5k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
53
2.8k
Building a Scalable Design System with Sketch
lauravandoore
462
33k
Done Done
chrislema
184
16k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
StorybookのUI Testing Handbookを読んだ
zakiyama
30
5.8k
Become a Pro
speakerdeck
PRO
28
5.4k
Making the Leap to Tech Lead
cromwellryan
134
9.3k
Build your cross-platform service in a week with App Engine
jlugia
231
18k
Transcript
T E X T M I N I N G
EXPLORATORY DATA ANALYSIS TO MACHINE LEARNING
HELLO T I D Y T E X T Data
Scientist at Stack Overflow @juliasilge https://juliasilge.com/ I’m Julia Silge
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT NLP TRAINING IS SCARCE ON THE GROUND
TIDY DATA PRINCIPLES + COUNT-BASED METHODS = T I D
Y T E X T
https://github.com/juliasilge/tidytext
https://github.com/juliasilge/tidytext
http://tidytextmining.com/
T I D Y T E X T EXPLORATORY DATA
ANALYSIS N-GRAMS AND MORE WORDS MACHINE LEARNING
EXPLORATORY DATA ANALYSIS T I D Y T E X
T
from the Washington Post’s Wonkblog
from the Washington Post’s Wonkblog
D3 visualization on Glitch
WHAT IS A DOCUMENT ABOUT? T I D Y T
E X T TERM FREQUENCY INVERSE DOCUMENT FREQUENCY
None
None
• As part of the NASA Datanauts program, I worked
on a project to understand NASA datasets • Metadata includes title, description, keywords, etc
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L N-GRAMS, NETWORKS, & NEGATION
None
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TOPIC MODELING
TOPIC MODELING T I D Y T E X T
•Each DOCUMENT = mixture of topics •Each TOPIC = mixture of words
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TEXT CLASSIFICATION
TRAIN A GLMNET MODEL T I D Y T E
X T
TEXT CLASSIFICATION T I D Y T E X T
> library(glmnet) > library(doMC) > registerDoMC(cores = 8) > > is_jane <- books_joined$title == "Pride and Prejudice" > > model <- cv.glmnet(sparse_words, is_jane, family = "binomial", + parallel = TRUE, keep = TRUE)
None
None
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com JULIA SILGE
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com Author portraits from Wikimedia Photos by Glen Noble and Kimberly Farmer on Unsplash JULIA SILGE