Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
1
440
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
980
Word and Document Embeddings
lmcinnes
0
130
Topological Data Analysis
lmcinnes
1
300
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.5k
A Guide to Dimension Reduction
lmcinnes
3
1.3k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.5k
Other Decks in Research
See All in Research
時系列データに対する解釈可能な 決定木クラスタリング
mickey_kubo
2
860
Adaptive fusion of multi-modal remote sensing data for optimal sub-field crop yield prediction
satai
3
230
20250502_ABEJA_論文読み会_スライド
flatton
0
190
利用シーンを意識した推薦システム〜SpotifyとAmazonの事例から〜
kuri8ive
1
240
最適化と機械学習による問題解決
mickey_kubo
0
160
SSII2025 [TS1] 光学・物理原理に基づく深層画像生成
ssii
PRO
4
4k
20250624_熊本経済同友会6月例会講演
trafficbrain
1
530
データサイエンティストの就労意識~2015→2024 一般(個人)会員アンケートより
datascientistsociety
PRO
0
840
2025年度 生成AIの使い方/接し方
hkefka385
1
750
A scalable, annual aboveground biomass product for monitoring carbon impacts of ecosystem restoration projects
satai
3
140
Combinatorial Search with Generators
kei18
0
530
AIによる画像認識技術の進化 -25年の技術変遷を振り返る-
hf149
7
3.8k
Featured
See All Featured
Imperfection Machines: The Place of Print at Facebook
scottboms
267
13k
Statistics for Hackers
jakevdp
799
220k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
283
13k
We Have a Design System, Now What?
morganepeng
53
7.7k
Become a Pro
speakerdeck
PRO
29
5.5k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
53
2.9k
Gamification - CAS2011
davidbonilla
81
5.4k
The World Runs on Bad Software
bkeepers
PRO
70
11k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
3k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
229
22k
Making the Leap to Tech Lead
cromwellryan
134
9.5k
A Modern Web Designer's Workflow
chriscoyier
695
190k
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop