Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
1
480
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
1k
Word and Document Embeddings
lmcinnes
0
150
Topological Data Analysis
lmcinnes
1
340
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.6k
A Guide to Dimension Reduction
lmcinnes
3
1.4k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.7k
Other Decks in Research
See All in Research
Tiaccoon: Unified Access Control with Multiple Transports in Container Networks
hiroyaonoe
0
550
Thirty Years of Progress in Speech Synthesis: A Personal Perspective on the Past, Present, and Future
ktokuda
0
160
第二言語習得研究における 明示的・暗示的知識の再検討:この分類は何に役に立つか,何に役に立たないか
tam07pb915
0
1.1k
自動運転におけるデータ駆動型AIに対する安全性の考え方 / Safety Engineering for Data-Driven AI in Autonomous Driving Systems
ishikawafyu
0
120
ドメイン知識がない領域での自然言語処理の始め方
hargon24
1
240
OWASP KansaiDAY 2025.09_文系OSINTハンズオン
owaspkansai
0
110
Time to Cash: The Full Stack Breakdown of Modern ATM Attacks
ratatata
0
190
Upgrading Multi-Agent Pathfinding for the Real World
kei18
0
160
存立危機事態の再検討
jimboken
0
240
それ、チームの改善になってますか?ー「チームとは?」から始めた組織の実験ー
hirakawa51
0
610
Satellites Reveal Mobility: A Commuting Origin-destination Flow Generator for Global Cities
satai
3
490
超高速データサイエンス
matsui_528
2
370
Featured
See All Featured
BBQ
matthewcrist
89
10k
The State of eCommerce SEO: How to Win in Today's Products SERPs - #SEOweek
aleyda
2
9.5k
SEO for Brand Visibility & Recognition
aleyda
0
4.2k
Why Our Code Smells
bkeepers
PRO
340
58k
Statistics for Hackers
jakevdp
799
230k
Everyday Curiosity
cassininazir
0
130
Kristin Tynski - Automating Marketing Tasks With AI
techseoconnect
PRO
0
130
Joys of Absence: A Defence of Solitary Play
codingconduct
1
290
Paper Plane
katiecoart
PRO
0
46k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
7.9k
Un-Boring Meetings
codingconduct
0
200
Why Your Marketing Sucks and What You Can Do About It - Sophie Logan
marketingsoph
0
71
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop