Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
490
1
Share
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
1k
Word and Document Embeddings
lmcinnes
0
170
Topological Data Analysis
lmcinnes
1
360
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.6k
A Guide to Dimension Reduction
lmcinnes
3
1.4k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.7k
Other Decks in Research
See All in Research
FUSE-RSVLM: Feature Fusion Vision-Language Model for Remote Sensing
satai
3
490
正規分布と最適化について
koide3
0
110
Any-Optical-Model: A Universal Foundation Model for Optical Remote Sensing
satai
3
470
業界横断 副業コンプライアンス調査 三者(副業者・本業先・発注者)におけるトラブル認知ギャップの構造分析
fkske
0
1.2k
AI Agentの精度改善に見るML開発との共通点 / commonalities in accuracy improvements in agentic era
shimacos
6
1.6k
2026-01-30-MandSL-textbook-jp-cos-lod
yegusa
1
1k
ICCV2025参加報告_採択されやすいワークショップの選び方
kobayashi31
0
110
Dual Quadric表現を用いた動的物体追跡とRGB-D・IMU制約の密結合によるオドメトリ推定
nanoshimarobot
0
340
【SIGGRAPH Asia 2025】Lo-Fi Photograph with Lo-Fi Communication
toremolo72
0
150
Earth AI: Unlocking Geospatial Insights with Foundation Models and Cross-Modal Reasoning
satai
3
910
A History of Approximate Nearest Neighbor Search from an Applications Perspective
matsui_528
1
250
ウェブ・ソーシャルメディア論文読み会 第36回: The Stepwise Deception: Simulating the Evolution from True News to Fake News with LLM Agents (EMNLP, 2025)
hkefka385
0
220
Featured
See All Featured
30 Presentation Tips
portentint
PRO
1
270
Speed Design
sergeychernyshev
33
1.6k
Bridging the Design Gap: How Collaborative Modelling removes blockers to flow between stakeholders and teams @FastFlow conf
baasie
0
510
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Color Theory Basics | Prateek | Gurzu
gurzu
0
290
Ecommerce SEO: The Keys for Success Now & Beyond - #SERPConf2024
aleyda
1
1.9k
The Invisible Side of Design
smashingmag
302
52k
SEO in 2025: How to Prepare for the Future of Search
ipullrank
3
3.4k
Building Flexible Design Systems
yeseniaperezcruz
330
40k
Amusing Abliteration
ianozsvald
1
150
Navigating Algorithm Shifts & AI Overviews - #SMXNext
aleyda
1
1.2k
The untapped power of vector embeddings
frankvandijk
2
1.7k
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop