Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
1
440
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
990
Word and Document Embeddings
lmcinnes
0
130
Topological Data Analysis
lmcinnes
1
310
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.5k
A Guide to Dimension Reduction
lmcinnes
3
1.3k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.5k
Other Decks in Research
See All in Research
単施設でできる臨床研究の考え方
shuntaros
0
2.6k
AWSで実現した大規模日本語VLM学習用データセット "MOMIJI" 構築パイプライン/buiding-momiji
studio_graph
2
410
「どう育てるか」より「どう働きたいか」〜スクラムマスターの最初の一歩〜
hirakawa51
0
850
一人称視点映像解析の最先端(MIRU2025 チュートリアル)
takumayagi
6
3.4k
【輪講資料】Moshi: a speech-text foundation model for real-time dialogue
hpprc
3
630
AIグラフィックデザインの進化:断片から統合(One Piece)へ / From Fragment to One Piece: A Survey on AI-Driven Graphic Design
shunk031
0
410
大規模な2値整数計画問題に対する 効率的な重み付き局所探索法
mickey_kubo
1
350
時系列データに対する解釈可能な 決定木クラスタリング
mickey_kubo
2
920
IMC の細かすぎる話 2025
smly
2
600
When Submarine Cables Go Dark: Examining the Web Services Resilience Amid Global Internet Disruptions
irvin
0
300
AIによる画像認識技術の進化 -25年の技術変遷を振り返る-
hf149
7
3.9k
2025年度人工知能学会全国大会チュートリアル講演「深層基盤モデルの数理」
taiji_suzuki
25
18k
Featured
See All Featured
Building Better People: How to give real-time feedback that sticks.
wjessup
368
19k
Code Reviewing Like a Champion
maltzj
525
40k
Product Roadmaps are Hard
iamctodd
PRO
54
11k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
23
1.4k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
31
2.2k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
11
1.1k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3k
jQuery: Nuts, Bolts and Bling
dougneiner
64
7.9k
Gamification - CAS2011
davidbonilla
81
5.4k
A designer walks into a library…
pauljervisheath
207
24k
What's in a price? How to price your products and services
michaelherold
246
12k
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.4k
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop