Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
500
1
Share
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
1k
Word and Document Embeddings
lmcinnes
0
170
Topological Data Analysis
lmcinnes
1
360
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.6k
A Guide to Dimension Reduction
lmcinnes
3
1.4k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.7k
Other Decks in Research
See All in Research
IEEE AIxVR 2026 Keynote Talk: "Beyond Visibility: Understanding Scenes and Humans under Challenging Conditions with Diverse Sensing"
miso2024
0
190
Model Discovery and Graph Simulation: A Lightweight Gateway to Chaos Engineering
anatolykr
0
180
LINEヤフー データサイエンス Meetup「三井物産コモディティ予測チャレンジ」の舞台裏-AlpacaTechパート
gamella
1
540
R&Dチームを起ち上げる
shibuiwilliam
1
260
LiDAR点群の地表面分類手法の比較・検証
vegapunkhiroshi79
0
110
Collective Predictive Coding and World Models in LLMs: A System 0/1/2/3 Perspective on Hierarchical Physical AI (IEEE SII 2026 Plenary Talk)
tanichu
1
400
AIを叩き台として、 「検証」から「共創」へと進化するリサーチ
mela_dayo
0
270
それ、チームの改善になってますか?ー「チームとは?」から始めた組織の実験ー
hirakawa51
0
1.2k
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
shunk031
4
970
2026年3月1日(日)福島「除染土」の公共利用をかんがえる
atsukomasano2026
0
610
PGDM: Physically Guided Diffusion Model for L Downscaling
satai
0
220
Apache Gravitinoで実現する Icebergカタログ統合とアクセスの一元化
matsumooon
0
240
Featured
See All Featured
Max Prin - Stacking Signals: How International SEO Comes Together (And Falls Apart)
techseoconnect
PRO
0
170
Chasing Engaging Ingredients in Design
codingconduct
0
200
A designer walks into a library…
pauljervisheath
211
24k
The untapped power of vector embeddings
frankvandijk
2
1.7k
The Illustrated Children's Guide to Kubernetes
chrisshort
51
52k
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
Large-scale JavaScript Application Architecture
addyosmani
515
110k
My Coaching Mixtape
mlcsv
0
140
Introduction to Domain-Driven Design and Collaborative software design
baasie
1
810
Lightning Talk: Beautiful Slides for Beginners
inesmontani
PRO
2
560
For a Future-Friendly Web
brad_frost
183
10k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
360
30k
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop