Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Text Mining: Exploratory Data Analysis to Machi...
Search
Julia Silge
March 04, 2019
Technology
1
220
Text Mining: Exploratory Data Analysis to Machine Learning
March 2019 talk at WiDS Salt Lake City regional event
Julia Silge
March 04, 2019
Tweet
Share
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
230
The right tool for the job
juliasilge
0
30
Good practices for applied machine learning
juliasilge
0
190
Applied machine learning with tidymodels
juliasilge
0
100
Maintaining an R Package
juliasilge
0
340
Publishing the Stack Overflow Developer Survey
juliasilge
2
58
Text Mining Using Tidy Data Principles
juliasilge
0
120
North American Developer Hiring Landscape
juliasilge
0
38
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.5k
Other Decks in Technology
See All in Technology
SpannerとAurora DSQLの同時実行制御の違いに想いを馳せる
masakikato5
0
570
スケールアップ企業のQA組織のバリューを最大限に引き出すための取り組み
tarappo
4
960
大規模プロジェクトにおける 品質管理の要点と実践 / 20250327 Suguru Ishii
shift_evolve
0
290
モンテカルロ木探索のパフォーマンスを予測する Kaggleコンペ解説 〜生成AIによる未知のゲーム生成〜
rist
4
1.2k
コンソールで学ぶ!AWS CodePipelineの機能とオプション
umekou
2
120
大規模サービスにおける カスケード障害
takumiogawa
3
650
PostgreSQL Unconference #52 pg_tde
nori_shinoda
1
240
Engineering Managementのグローバルトレンド #emoasis / Engineering Management Global Trend
kyonmm
PRO
6
990
開発現場とセキュリティ担当をつなぐ脅威モデリング
cloudace
0
110
【5分でわかる】セーフィー エンジニア向け会社紹介
safie_recruit
0
20k
Road to SRE NEXT@仙台 IVRyの組織の形とSLO運用の現状
abnoumaru
0
410
LINE Notify互換のボットを作った話
kenichirokimura
0
180
Featured
See All Featured
The Art of Programming - Codeland 2020
erikaheidi
53
13k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
33
2.1k
The Cost Of JavaScript in 2023
addyosmani
48
7.6k
Code Reviewing Like a Champion
maltzj
522
39k
Embracing the Ebb and Flow
colly
85
4.6k
Large-scale JavaScript Application Architecture
addyosmani
511
110k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
227
22k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
30
2.3k
Navigating Team Friction
lara
184
15k
Building Flexible Design Systems
yeseniaperezcruz
328
38k
Building Your Own Lightsaber
phodgson
104
6.3k
Measuring & Analyzing Core Web Vitals
bluesmoon
6
320
Transcript
T E X T M I N I N G
EXPLORATORY DATA ANALYSIS TO MACHINE LEARNING
HELLO T I D Y T E X T Data
Scientist at Stack Overflow @juliasilge https://juliasilge.com/ I’m Julia Silge
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT NLP TRAINING IS SCARCE ON THE GROUND
TIDY DATA PRINCIPLES + COUNT-BASED METHODS = T I D
Y T E X T
https://github.com/juliasilge/tidytext
https://github.com/juliasilge/tidytext
http://tidytextmining.com/
T I D Y T E X T EXPLORATORY DATA
ANALYSIS N-GRAMS AND MORE WORDS MACHINE LEARNING
EXPLORATORY DATA ANALYSIS T I D Y T E X
T
from the Washington Post’s Wonkblog
from the Washington Post’s Wonkblog
D3 visualization on Glitch
WHAT IS A DOCUMENT ABOUT? T I D Y T
E X T TERM FREQUENCY INVERSE DOCUMENT FREQUENCY
None
None
• As part of the NASA Datanauts program, I worked
on a project to understand NASA datasets • Metadata includes title, description, keywords, etc
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L N-GRAMS, NETWORKS, & NEGATION
None
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TOPIC MODELING
TOPIC MODELING T I D Y T E X T
•Each DOCUMENT = mixture of topics •Each TOPIC = mixture of words
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TEXT CLASSIFICATION
TRAIN A GLMNET MODEL T I D Y T E
X T
TEXT CLASSIFICATION T I D Y T E X T
> library(glmnet) > library(doMC) > registerDoMC(cores = 8) > > is_jane <- books_joined$title == "Pride and Prejudice" > > model <- cv.glmnet(sparse_words, is_jane, family = "binomial", + parallel = TRUE, keep = TRUE)
None
None
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com JULIA SILGE
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com Author portraits from Wikimedia Photos by Glen Noble and Kimberly Farmer on Unsplash JULIA SILGE