Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Leland McInnes
July 12, 2019
Research
500
1
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
1k
Word and Document Embeddings
lmcinnes
0
170
Topological Data Analysis
lmcinnes
1
370
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.6k
A Guide to Dimension Reduction
lmcinnes
3
1.4k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.8k
Other Decks in Research
See All in Research
Unified Audio Source Separation (Defense Slides)
kohei_1979
1
620
非試合日の野球場を楽しむためのARホームランボールキャッチ体験システムの開発 / EC79-miyazaki
yumulab
0
230
敵対生成プロンプト同時探索による内省型プロンプト最適化
kinoue_smarthr
0
220
(SIGQS17) Frasco-VS:フラグメントに基づく薬剤候補化合物選抜の量子アニーリングによる実現
keisukeyanagisawa
PRO
0
120
機械学習で作った ポケモン対戦bot で 遊ぼう!
fufufukakaka
0
290
羽田新ルート運用6年の検証
1manken
0
160
[BlackHatAsia2026] Hidden Telemetry: Uncovering TraceLogging ETW Providers You're Not Using (Yet)
asuna_jp
1
530
業界横断 副業コンプライアンス調査 三者(副業者・本業先・発注者)におけるトラブル認知ギャップの構造分析
fkske
0
1.3k
【Zozo Research 技術共有会】三次元領域の現在と展望
mickey_0226
3
390
討議:RACDA設立30周年記念都市交通フォーラム2026
trafficbrain
0
960
「車1割削減、渋滞半減、公共交通2倍」を 熊本から岡山へ@RACDA設立30周年記念都市交通フォーラム2026
trafficbrain
1
1.2k
Claude Code × autoresearch 実践
mathbullet
0
170
Featured
See All Featured
Producing Creativity
orderedlist
PRO
348
40k
Designing Experiences People Love
moore
143
24k
Optimising Largest Contentful Paint
csswizardry
37
3.7k
Neural Spatial Audio Processing for Sound Field Analysis and Control
skoyamalab
0
340
Groundhog Day: Seeking Process in Gaming for Health
codingconduct
0
210
Amusing Abliteration
ianozsvald
1
210
The Language of Interfaces
destraynor
162
27k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
2k
End of SEO as We Know It (SMX Advanced Version)
ipullrank
3
4.2k
How Fast Is Fast Enough? [PerfNow 2025]
tammyeverts
3
610
State of Search Keynote: SEO is Dead Long Live SEO
ryanjones
0
210
<Decoding/> the Language of Devs - We Love SEO 2024
nikkihalliwell
1
250
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop