$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
1
450
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
1k
Word and Document Embeddings
lmcinnes
0
140
Topological Data Analysis
lmcinnes
1
320
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.5k
A Guide to Dimension Reduction
lmcinnes
3
1.4k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.6k
Other Decks in Research
See All in Research
生成AI による論文執筆サポート・ワークショップ ─ サーベイ/リサーチクエスチョン編 / Workshop on AI-Assisted Paper Writing Support: Survey/Research Question Edition
ks91
PRO
0
120
言語モデルの地図:確率分布と情報幾何による類似性の可視化
shimosan
8
2.2k
大規模言語モデルにおけるData-Centric AIと合成データの活用 / Data-Centric AI and Synthetic Data in Large Language Models
tsurubee
1
390
まずはここから:Overleaf共同執筆・CopilotでAIコーディング入門・Codespacesで独立環境
matsui_528
2
790
さまざまなAgent FrameworkとAIエージェントの評価
ymd65536
1
330
論文読み会 SNLP2025 Learning Dynamics of LLM Finetuning. In: ICLR 2025
s_mizuki_nlp
0
340
令和最新技術で伝統掲示板を再構築: HonoX で作る型安全なスレッドフロート型掲示板 / かろっく@calloc134 - Hono Conference 2025
calloc134
0
440
情報技術の社会実装に向けた応用と課題:ニュースメディアの事例から / appmech-jsce 2025
upura
0
270
Language Models Are Implicitly Continuous
eumesy
PRO
0
340
超高速データサイエンス
matsui_528
1
220
湯村研究室の紹介2025 / yumulab2025
yumulab
0
200
大学見本市2025 JSTさきがけ事業セミナー「顔の見えないセンシング技術:多様なセンサにもとづく個人情報に配慮した人物状態推定」
miso2024
0
190
Featured
See All Featured
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
17k
Measuring & Analyzing Core Web Vitals
bluesmoon
9
690
Raft: Consensus for Rubyists
vanstee
140
7.2k
Unsuck your backbone
ammeep
671
58k
Principles of Awesome APIs and How to Build Them.
keavy
127
17k
Making the Leap to Tech Lead
cromwellryan
135
9.6k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3.1k
The Invisible Side of Design
smashingmag
302
51k
Code Reviewing Like a Champion
maltzj
527
40k
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.5k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
3.2k
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop