Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
490
1
Share
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
1k
Word and Document Embeddings
lmcinnes
0
170
Topological Data Analysis
lmcinnes
1
350
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.6k
A Guide to Dimension Reduction
lmcinnes
3
1.4k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.7k
Other Decks in Research
See All in Research
通時的な類似度行列に基づく単語の意味変化の分析
rudorudo11
0
230
データサイエンティストの業務変化
datascientistsociety
PRO
0
340
言語モデルから言語について語る際に押さえておきたいこと
eumesy
PRO
5
2k
離散凸解析に基づく予測付き離散最適化手法 (IBIS '25)
taihei_oki
PRO
1
740
Earth AI: Unlocking Geospatial Insights with Foundation Models and Cross-Modal Reasoning
satai
3
720
「行ける・行けない表」による地域公共交通の性能評価
bansousha
0
130
世界モデルにおける分布外データ対応の方法論
koukyo1994
7
2k
Unified Audio Source Separation (Defense Slides)
kohei_1979
1
550
ドメイン知識がない領域での自然言語処理の始め方
hargon24
1
280
AIスーパーコンピュータにおけるLLM学習処理性能の計測と可観測性 / AI Supercomputer LLM Benchmarking and Observability
yuukit
1
790
非試合日の野球場を楽しむためのARホームランボールキャッチ体験システムの開発 / EC79-miyazaki
yumulab
0
100
Tiaccoon: Unified Access Control with Multiple Transports in Container Networks
hiroyaonoe
0
1.4k
Featured
See All Featured
Effective software design: The role of men in debugging patriarchy in IT @ Voxxed Days AMS
baasie
0
280
Embracing the Ebb and Flow
colly
88
5k
Practical Orchestrator
shlominoach
191
11k
Code Review Best Practice
trishagee
74
20k
Producing Creativity
orderedlist
PRO
348
40k
Code Reviewing Like a Champion
maltzj
528
40k
Getting science done with accelerated Python computing platforms
jacobtomlinson
2
160
The World Runs on Bad Software
bkeepers
PRO
72
12k
Building a Scalable Design System with Sketch
lauravandoore
463
34k
16th Malabo Montpellier Forum Presentation
akademiya2063
PRO
0
90
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.4k
Navigating Algorithm Shifts & AI Overviews - #SMXNext
aleyda
1
1.2k
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop