Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Text Mining: Exploratory Data Analysis to Machi...
Search
Julia Silge
March 04, 2019
Technology
1
240
Text Mining: Exploratory Data Analysis to Machine Learning
March 2019 talk at WiDS Salt Lake City regional event
Julia Silge
March 04, 2019
Tweet
Share
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
320
The right tool for the job
juliasilge
0
50
Good practices for applied machine learning
juliasilge
0
210
Applied machine learning with tidymodels
juliasilge
0
130
Maintaining an R Package
juliasilge
0
380
Publishing the Stack Overflow Developer Survey
juliasilge
2
76
Text Mining Using Tidy Data Principles
juliasilge
0
140
North American Developer Hiring Landscape
juliasilge
0
55
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.5k
Other Decks in Technology
See All in Technology
AI ReadyなData PlatformとしてのAutonomous Databaseアップデート
oracle4engineer
PRO
0
190
Findy Team+のSOC2取得までの道のり
rvirus0817
0
360
Large Vision Language Modelを用いた 文書画像データ化作業自動化の検証、運用 / shibuya_AI
sansan_randd
0
110
社内報はAIにやらせよう / Let AI handle the company newsletter
saka2jp
3
290
KAGのLT会 #8 - 東京リージョンでGAしたAmazon Q in QuickSightを使って、報告用の資料を作ってみた
0air
0
210
KMP の Swift export
kokihirokawa
0
330
職種別ミートアップで社内から盛り上げる アウトプット文化の醸成と関係強化/ #DevRelKaigi
nishiuma
2
140
英語は話せません!それでも海外チームと信頼関係を作るため、対話を重ねた2ヶ月間のまなび
niioka_97
0
120
リーダーになったら未来を語れるようになろう/Speak the Future
sanogemaru
0
280
後進育成のしくじり〜任せるスキルとリーダーシップの両立〜
matsu0228
7
2.5k
フルカイテン株式会社 エンジニア向け採用資料
fullkaiten
0
9.1k
Oracle Base Database Service 技術詳細
oracle4engineer
PRO
11
77k
Featured
See All Featured
Become a Pro
speakerdeck
PRO
29
5.5k
Java REST API Framework Comparison - PWX 2021
mraible
33
8.8k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
32
2.2k
Six Lessons from altMBA
skipperchong
28
4k
Optimising Largest Contentful Paint
csswizardry
37
3.4k
Context Engineering - Making Every Token Count
addyosmani
5
190
GraphQLとの向き合い方2022年版
quramy
49
14k
Mobile First: as difficult as doing things right
swwweet
224
10k
Rebuilding a faster, lazier Slack
samanthasiow
84
9.2k
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
How to Think Like a Performance Engineer
csswizardry
27
2k
GitHub's CSS Performance
jonrohan
1032
460k
Transcript
T E X T M I N I N G
EXPLORATORY DATA ANALYSIS TO MACHINE LEARNING
HELLO T I D Y T E X T Data
Scientist at Stack Overflow @juliasilge https://juliasilge.com/ I’m Julia Silge
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT NLP TRAINING IS SCARCE ON THE GROUND
TIDY DATA PRINCIPLES + COUNT-BASED METHODS = T I D
Y T E X T
https://github.com/juliasilge/tidytext
https://github.com/juliasilge/tidytext
http://tidytextmining.com/
T I D Y T E X T EXPLORATORY DATA
ANALYSIS N-GRAMS AND MORE WORDS MACHINE LEARNING
EXPLORATORY DATA ANALYSIS T I D Y T E X
T
from the Washington Post’s Wonkblog
from the Washington Post’s Wonkblog
D3 visualization on Glitch
WHAT IS A DOCUMENT ABOUT? T I D Y T
E X T TERM FREQUENCY INVERSE DOCUMENT FREQUENCY
None
None
• As part of the NASA Datanauts program, I worked
on a project to understand NASA datasets • Metadata includes title, description, keywords, etc
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L N-GRAMS, NETWORKS, & NEGATION
None
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TOPIC MODELING
TOPIC MODELING T I D Y T E X T
•Each DOCUMENT = mixture of topics •Each TOPIC = mixture of words
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TEXT CLASSIFICATION
TRAIN A GLMNET MODEL T I D Y T E
X T
TEXT CLASSIFICATION T I D Y T E X T
> library(glmnet) > library(doMC) > registerDoMC(cores = 8) > > is_jane <- books_joined$title == "Pride and Prejudice" > > model <- cv.glmnet(sparse_words, is_jane, family = "binomial", + parallel = TRUE, keep = TRUE)
None
None
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com JULIA SILGE
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com Author portraits from Wikimedia Photos by Glen Noble and Kimberly Farmer on Unsplash JULIA SILGE