Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
1
450
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
1k
Word and Document Embeddings
lmcinnes
0
140
Topological Data Analysis
lmcinnes
1
320
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.5k
A Guide to Dimension Reduction
lmcinnes
3
1.4k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.5k
Other Decks in Research
See All in Research
ドメイン知識がない領域での自然言語処理の始め方
hargon24
1
120
CoRL2025速報
rpc
1
2.9k
[CV勉強会@関東 CVPR2025] VLM自動運転model S4-Driver
shinkyoto
2
620
20250624_熊本経済同友会6月例会講演
trafficbrain
1
750
なめらかなシステムと運用維持の終わらぬ未来 / dicomo2025_coherently_fittable_system
monochromegane
0
4.7k
Submeter-level land cover mapping of Japan
satai
3
480
IMC の細かすぎる話 2025
smly
2
730
【輪講資料】Moshi: a speech-text foundation model for real-time dialogue
hpprc
3
780
投資戦略202508
pw
0
570
単施設でできる臨床研究の考え方
shuntaros
0
3.2k
EOGS: Gaussian Splatting for Efficient Satellite Image Photogrammetry
satai
4
770
機械学習と数理最適化の融合 (MOAI) による革新
mickey_kubo
1
410
Featured
See All Featured
The Cost Of JavaScript in 2023
addyosmani
55
9.2k
[RailsConf 2023] Rails as a piece of cake
palkan
57
6k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
359
30k
Balancing Empowerment & Direction
lara
5
740
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
Mobile First: as difficult as doing things right
swwweet
225
10k
Become a Pro
speakerdeck
PRO
29
5.6k
4 Signs Your Business is Dying
shpigford
186
22k
Agile that works and the tools we love
rasmusluckow
331
21k
The Language of Interfaces
destraynor
162
25k
Into the Great Unknown - MozCon
thekraken
40
2.2k
Site-Speed That Sticks
csswizardry
13
960
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop