Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Leland McInnes
July 12, 2019
Research
1
480
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
1k
Word and Document Embeddings
lmcinnes
0
160
Topological Data Analysis
lmcinnes
1
340
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.6k
A Guide to Dimension Reduction
lmcinnes
3
1.4k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.7k
Other Decks in Research
See All in Research
令和最新技術で伝統掲示板を再構築: HonoX で作る型安全なスレッドフロート型掲示板 / かろっく@calloc134 - Hono Conference 2025
calloc134
0
560
2025-11-21-DA-10th-satellite
yegusa
0
130
SREはサイバネティクスの夢をみるか? / Do SREs Dream of Cybernetics?
yuukit
3
410
LLM-jp-3 and beyond: Training Large Language Models
odashi
1
770
A History of Approximate Nearest Neighbor Search from an Applications Perspective
matsui_528
1
180
"主観で終わらせない"定性データ活用 ― プロダクトディスカバリーを加速させるインサイトマネジメント / Utilizing qualitative data that "doesn't end with subjectivity" - Insight management that accelerates product discovery
kaminashi
15
22k
その推薦システムの評価指標、ユーザーの感覚とズレてるかも
kuri8ive
1
330
Combining Deep Learning and Street View Imagery to Map Smallholder Crop Types
satai
3
630
情報技術の社会実装に向けた応用と課題:ニュースメディアの事例から / appmech-jsce 2025
upura
0
330
svc-hook: hooking system calls on ARM64 by binary rewriting
retrage
1
130
生成AI による論文執筆サポート・ワークショップ 論文執筆・推敲編 / Generative AI-Assisted Paper Writing Support Workshop: Drafting and Revision Edition
ks91
PRO
0
130
生成的情報検索時代におけるAI利用と認知バイアス
trycycle
PRO
0
340
Featured
See All Featured
Everyday Curiosity
cassininazir
0
140
Hiding What from Whom? A Critical Review of the History of Programming languages for Music
tomoyanonymous
2
460
Lightning talk: Run Django tests with GitHub Actions
sabderemane
0
130
A designer walks into a library…
pauljervisheath
210
24k
Leadership Guide Workshop - DevTernity 2021
reverentgeek
1
220
Navigating Team Friction
lara
192
16k
Crafting Experiences
bethany
1
65
Kristin Tynski - Automating Marketing Tasks With AI
techseoconnect
PRO
0
170
How to build a perfect <img>
jonoalderson
1
5.2k
The #1 spot is gone: here's how to win anyway
tamaranovitovic
2
960
GitHub's CSS Performance
jonrohan
1032
470k
Why Your Marketing Sucks and What You Can Do About It - Sophie Logan
marketingsoph
0
85
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop