Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
1
450
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
1k
Word and Document Embeddings
lmcinnes
0
140
Topological Data Analysis
lmcinnes
1
320
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.5k
A Guide to Dimension Reduction
lmcinnes
3
1.3k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.5k
Other Decks in Research
See All in Research
EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues
satai
3
250
SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing Images
satai
3
330
EcoWikiRS: Learning Ecological Representation of Satellite Images from Weak Supervision with Species Observation and Wikipedia
satai
3
280
なめらかなシステムと運用維持の終わらぬ未来 / dicomo2025_coherently_fittable_system
monochromegane
0
4.1k
論文紹介: ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement
hisaokatsumi
0
110
VectorLLM: Human-like Extraction of Structured Building Contours via Multimodal LLMs
satai
4
350
引力・斥力を制御可能なランダム部分集合の確率分布
wasyro
0
260
EOGS: Gaussian Splatting for Efficient Satellite Image Photogrammetry
satai
4
690
財務諸表監査のための逐次検定
masakat0
0
150
Sat2City:3D City Generation from A Single Satellite Image with Cascaded Latent Diffusion
satai
3
160
Mechanistic Interpretability:解釈可能性研究の新たな潮流
koshiro_aoki
1
490
Panopticon: Advancing Any-Sensor Foundation Models for Earth Observation
satai
3
250
Featured
See All Featured
4 Signs Your Business is Dying
shpigford
185
22k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
30
2.9k
What's in a price? How to price your products and services
michaelherold
246
12k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
130k
Java REST API Framework Comparison - PWX 2021
mraible
34
8.9k
Agile that works and the tools we love
rasmusluckow
331
21k
Testing 201, or: Great Expectations
jmmastey
45
7.7k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.7k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4k
YesSQL, Process and Tooling at Scale
rocio
173
15k
Leading Effective Engineering Teams in the AI Era
addyosmani
7
580
Visualization
eitanlees
149
16k
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop