Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
1
460
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
1k
Word and Document Embeddings
lmcinnes
0
150
Topological Data Analysis
lmcinnes
1
330
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.6k
A Guide to Dimension Reduction
lmcinnes
3
1.4k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.6k
Other Decks in Research
See All in Research
その推薦システムの評価指標、ユーザーの感覚とズレてるかも
kuri8ive
1
290
製造業主導型経済からサービス経済化における中間層形成メカニズムのパラダイムシフト
yamotty
0
360
生成的情報検索時代におけるAI利用と認知バイアス
trycycle
PRO
0
110
"主観で終わらせない"定性データ活用 ― プロダクトディスカバリーを加速させるインサイトマネジメント / Utilizing qualitative data that "doesn't end with subjectivity" - Insight management that accelerates product discovery
kaminashi
15
17k
[Devfest Incheon 2025] 모두를 위한 친절한 언어모델(LLM) 학습 가이드
beomi
2
1.3k
LLM-jp-3 and beyond: Training Large Language Models
odashi
1
720
Open Gateway 5GC利用への期待と不安
stellarcraft
2
170
論文紹介:Not All Tokens Are What You Need for Pretraining
kosuken
1
220
データサイエンティストをめぐる環境の違い2025年版〈一般ビジネスパーソン調査の国際比較〉
datascientistsociety
PRO
0
330
Tiaccoon: Unified Access Control with Multiple Transports in Container Networks
hiroyaonoe
0
190
世界の人気アプリ100個を分析して見えたペイウォール設計の心得
akihiro_kokubo
PRO
65
35k
自動運転におけるデータ駆動型AIに対する安全性の考え方 / Safety Engineering for Data-Driven AI in Autonomous Driving Systems
ishikawafyu
0
110
Featured
See All Featured
How to audit for AI Accessibility on your Front & Back End
davetheseo
0
120
Six Lessons from altMBA
skipperchong
29
4.1k
Gemini Prompt Engineering: Practical Techniques for Tangible AI Outcomes
mfonobong
2
230
Building the Perfect Custom Keyboard
takai
1
660
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
9.8k
Amusing Abliteration
ianozsvald
0
69
YesSQL, Process and Tooling at Scale
rocio
174
15k
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
The Impact of AI in SEO - AI Overviews June 2024 Edition
aleyda
5
680
Reality Check: Gamification 10 Years Later
codingconduct
0
1.9k
The Pragmatic Product Professional
lauravandoore
37
7.1k
The browser strikes back
jonoalderson
0
120
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop