Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Text Mining: Exploratory Data Analysis to Machi...
Search
Julia Silge
March 04, 2019
Technology
1
220
Text Mining: Exploratory Data Analysis to Machine Learning
March 2019 talk at WiDS Salt Lake City regional event
Julia Silge
March 04, 2019
Tweet
Share
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
240
The right tool for the job
juliasilge
0
31
Good practices for applied machine learning
juliasilge
0
190
Applied machine learning with tidymodels
juliasilge
0
100
Maintaining an R Package
juliasilge
0
350
Publishing the Stack Overflow Developer Survey
juliasilge
2
61
Text Mining Using Tidy Data Principles
juliasilge
0
120
North American Developer Hiring Landscape
juliasilge
0
41
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.5k
Other Decks in Technology
See All in Technology
【Λ(らむだ)】最近のアプデ情報 / RPALT20250422
lambda
0
110
The Tale of Leo: Brave Lion and Curious Little Bug
canalun
1
130
今日からはじめるプラットフォームエンジニアリング
jacopen
4
240
品質文化を支える小さいクロスファンクショナルなチーム / Cross-functional teams fostering quality culture
toma_sm
0
120
ガバクラのAWS長期継続割引 ~次の4/1に慌てないために~
hamijay_cloud
1
270
Conquering PDFs: document understanding beyond plain text
inesmontani
PRO
0
110
エンジニアリングで組織のアウトカムを最速で最大化する!
ham0215
1
110
Cursor AgentによるパーソナルAIアシスタント育成入門―業務のプロンプト化・MCPの活用
os1ma
14
4.9k
技術者はかっこいいものだ!!~キルラキルから学んだエンジニアの生き方~
masakiokuda
2
270
システムとの会話から生まれる先手のDevOps
kakehashi
PRO
0
290
30代からでも遅くない! 内製開発の世界に飛び込み、最前線で戦うLLMアプリ開発エンジニアになろう
minorun365
PRO
11
3.3k
AIでめっちゃ便利になったけど、結局みんなで学ぶよねっていう話
kakehashi
PRO
0
180
Featured
See All Featured
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
30
2.3k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.2k
Practical Orchestrator
shlominoach
186
11k
Why Our Code Smells
bkeepers
PRO
336
57k
Stop Working from a Prison Cell
hatefulcrawdad
268
20k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
60k
What’s in a name? Adding method to the madness
productmarketing
PRO
22
3.4k
The Straight Up "How To Draw Better" Workshop
denniskardys
232
140k
BBQ
matthewcrist
88
9.6k
Automating Front-end Workflow
addyosmani
1369
200k
Adopting Sorbet at Scale
ufuk
76
9.3k
The Cost Of JavaScript in 2023
addyosmani
49
7.7k
Transcript
T E X T M I N I N G
EXPLORATORY DATA ANALYSIS TO MACHINE LEARNING
HELLO T I D Y T E X T Data
Scientist at Stack Overflow @juliasilge https://juliasilge.com/ I’m Julia Silge
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT NLP TRAINING IS SCARCE ON THE GROUND
TIDY DATA PRINCIPLES + COUNT-BASED METHODS = T I D
Y T E X T
https://github.com/juliasilge/tidytext
https://github.com/juliasilge/tidytext
http://tidytextmining.com/
T I D Y T E X T EXPLORATORY DATA
ANALYSIS N-GRAMS AND MORE WORDS MACHINE LEARNING
EXPLORATORY DATA ANALYSIS T I D Y T E X
T
from the Washington Post’s Wonkblog
from the Washington Post’s Wonkblog
D3 visualization on Glitch
WHAT IS A DOCUMENT ABOUT? T I D Y T
E X T TERM FREQUENCY INVERSE DOCUMENT FREQUENCY
None
None
• As part of the NASA Datanauts program, I worked
on a project to understand NASA datasets • Metadata includes title, description, keywords, etc
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L N-GRAMS, NETWORKS, & NEGATION
None
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TOPIC MODELING
TOPIC MODELING T I D Y T E X T
•Each DOCUMENT = mixture of topics •Each TOPIC = mixture of words
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TEXT CLASSIFICATION
TRAIN A GLMNET MODEL T I D Y T E
X T
TEXT CLASSIFICATION T I D Y T E X T
> library(glmnet) > library(doMC) > registerDoMC(cores = 8) > > is_jane <- books_joined$title == "Pride and Prejudice" > > model <- cv.glmnet(sparse_words, is_jane, family = "binomial", + parallel = TRUE, keep = TRUE)
None
None
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com JULIA SILGE
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com Author portraits from Wikimedia Photos by Glen Noble and Kimberly Farmer on Unsplash JULIA SILGE