Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Leland McInnes
July 12, 2019
Research
490
1
Share
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
1k
Word and Document Embeddings
lmcinnes
0
170
Topological Data Analysis
lmcinnes
1
360
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.6k
A Guide to Dimension Reduction
lmcinnes
3
1.4k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.7k
Other Decks in Research
See All in Research
R&Dチームを起ち上げる
shibuiwilliam
1
240
コーディングエージェントとABNを再考
hf149
2
550
競合や要望に流されない─B2B SaaSでミニマム要件を決めるリアルな取り組み / Don't be swayed by competitors or requests - A real effort to determine minimum requirements for B2B SaaS
kaminashi
0
1.5k
【SIGGRAPH Asia 2025】Lo-Fi Photograph with Lo-Fi Communication
toremolo72
0
160
データサイエンティストの業務変化
datascientistsociety
PRO
0
390
Data Visualization Tools in the Age of AI
flekschas
0
140
台湾モデルに学ぶ詐欺広告対策:市民参加の必要性
dd2030
0
320
AGI4OPT:自然言語から数理最適化を導くエ ージェントスキル Translating Human Intent into Mathematical Optimization
mickey_kubo
0
100
討議:RACDA設立30周年記念都市交通フォーラム2026
trafficbrain
0
860
量子コンピュータの紹介
oqtopus
0
290
The mathematics of transformers
gpeyre
0
250
typst の使い方:言語学を研究する学生のために
gitomochang
0
400
Featured
See All Featured
Money Talks: Using Revenue to Get Sh*t Done
nikkihalliwell
0
220
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.4k
How to build an LLM SEO readiness audit: a practical framework
nmsamuel
1
740
Bash Introduction
62gerente
615
210k
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
3
130
BBQ
matthewcrist
89
10k
We Have a Design System, Now What?
morganepeng
55
8.1k
The B2B funnel & how to create a winning content strategy
katarinadahlin
PRO
1
350
Ecommerce SEO: The Keys for Success Now & Beyond - #SERPConf2024
aleyda
1
2k
Highjacked: Video Game Concept Design
rkendrick25
PRO
1
350
How to Think Like a Performance Engineer
csswizardry
28
2.6k
ラッコキーワード サービス紹介資料
rakko
1
3.3M
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop