Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Text Mining: Exploratory Data Analysis to Machi...
Search
Julia Silge
March 04, 2019
Technology
1
210
Text Mining: Exploratory Data Analysis to Machine Learning
March 2019 talk at WiDS Salt Lake City regional event
Julia Silge
March 04, 2019
Tweet
Share
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
0
42
The right tool for the job
juliasilge
0
17
Good practices for applied machine learning
juliasilge
0
160
Applied machine learning with tidymodels
juliasilge
0
84
Maintaining an R Package
juliasilge
0
300
Publishing the Stack Overflow Developer Survey
juliasilge
2
55
Text Mining Using Tidy Data Principles
juliasilge
0
110
North American Developer Hiring Landscape
juliasilge
0
31
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.4k
Other Decks in Technology
See All in Technology
Tricentisにおけるテスト自動化へのAI活用ご紹介/20240910Shunsuke Katakura
shift_evolve
0
200
Road to Single Activity
yurihondo
2
240
「認証認可」という体験をデザインする ~Nekko Cloud認証認可基盤計画
logica0419
2
450
ネットワークだけ隔離されたコンテナ作成デモ / Kichijoji.pm36
tenforward
1
230
可視化により内部品質をあげるAIドキュメントリバース/20240910 Hiromitsu Akiba
shift_evolve
0
220
持続可能なソフトウェア開発を支える『GitHub CI/CD実践ガイド』
tmknom
8
1.4k
Google CloudのLLM活用の選択肢を広げるVertex AIのパートナーモデル
nayuts
0
130
あなたの知らないiOS開発の世界
recruitengineers
PRO
3
180
テスト”ケース”駆動開発 で手戻りをなくそう
ryohma0510
0
320
効果的なオンコール対応と障害対応
ryuichi1208
6
3.1k
不動産tech Product Night#2_AIことはじめ_GA橋本
takehikohashimoto
0
190
開発者の定量・定性データを組み合わせて開発者体験を把握するための取り組み
ham0215
1
140
Featured
See All Featured
Designing Dashboards & Data Visualisations in Web Apps
destraynor
227
52k
The Power of CSS Pseudo Elements
geoffreycrofte
71
5.3k
Practical Orchestrator
shlominoach
185
10k
The Invisible Side of Design
smashingmag
296
50k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
230
17k
GitHub's CSS Performance
jonrohan
1030
450k
What's new in Ruby 2.0
geeforr
340
31k
Optimising Largest Contentful Paint
csswizardry
31
2.8k
Java REST API Framework Comparison - PWX 2021
mraible
PRO
27
7.4k
How STYLIGHT went responsive
nonsquared
93
5.1k
Thoughts on Productivity
jonyablonski
66
4.2k
Art, The Web, and Tiny UX
lynnandtonic
294
20k
Transcript
T E X T M I N I N G
EXPLORATORY DATA ANALYSIS TO MACHINE LEARNING
HELLO T I D Y T E X T Data
Scientist at Stack Overflow @juliasilge https://juliasilge.com/ I’m Julia Silge
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT NLP TRAINING IS SCARCE ON THE GROUND
TIDY DATA PRINCIPLES + COUNT-BASED METHODS = T I D
Y T E X T
https://github.com/juliasilge/tidytext
https://github.com/juliasilge/tidytext
http://tidytextmining.com/
T I D Y T E X T EXPLORATORY DATA
ANALYSIS N-GRAMS AND MORE WORDS MACHINE LEARNING
EXPLORATORY DATA ANALYSIS T I D Y T E X
T
from the Washington Post’s Wonkblog
from the Washington Post’s Wonkblog
D3 visualization on Glitch
WHAT IS A DOCUMENT ABOUT? T I D Y T
E X T TERM FREQUENCY INVERSE DOCUMENT FREQUENCY
None
None
• As part of the NASA Datanauts program, I worked
on a project to understand NASA datasets • Metadata includes title, description, keywords, etc
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L N-GRAMS, NETWORKS, & NEGATION
None
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TOPIC MODELING
TOPIC MODELING T I D Y T E X T
•Each DOCUMENT = mixture of topics •Each TOPIC = mixture of words
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TEXT CLASSIFICATION
TRAIN A GLMNET MODEL T I D Y T E
X T
TEXT CLASSIFICATION T I D Y T E X T
> library(glmnet) > library(doMC) > registerDoMC(cores = 8) > > is_jane <- books_joined$title == "Pride and Prejudice" > > model <- cv.glmnet(sparse_words, is_jane, family = "binomial", + parallel = TRUE, keep = TRUE)
None
None
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com JULIA SILGE
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com Author portraits from Wikimedia Photos by Glen Noble and Kimberly Farmer on Unsplash JULIA SILGE