Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Faceting analyzed fields with some sprinkles of...
Search
Boaz Leskes
June 04, 2013
Technology
0
66
Faceting analyzed fields with some sprinkles of probability theory
Talk given at Berlin buzzwords 2013
Boaz Leskes
June 04, 2013
Tweet
Share
More Decks by Boaz Leskes
See All by Boaz Leskes
Every Shard Deserves a Home - Shard Allocation in Elasticsearch
bleskes
0
330
Life of a Document in Elasticsearch
bleskes
3
3.3k
Resiliency in Elasticsearch & Lucene
bleskes
0
520
Resiliency in Elasticsearch & Lucene
bleskes
0
240
Designing Concurrent Distributed Sequence Numbers for Elasticsearch
bleskes
2
720
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
1
370
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
6
680
The ELK Stack: For Real-Time Enlightenment
bleskes
1
1.7k
Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch
bleskes
1
320
Other Decks in Technology
See All in Technology
機械学習を「社会実装」するということ 2025年冬版 / Social Implementation of Machine Learning November 2025 Version
moepy_stats
4
520
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
5
45k
ローカルLLM基礎知識 / local LLM basics 2025
kishida
23
9k
SRE視点で振り返るメルカリのアーキテクチャ変遷と普遍的な考え
foostan
2
2.2k
『ソフトウェア』で『リアル』を動かす:クレーンゲームからデータ基盤までの統一アーキテクチャ / アーキテクチャConference 2025
genda
0
1.2k
Android Studio Otter の最新 Gemini 機能 / Latest Gemini features in Android Studio Otter
yanzm
0
450
The Complete Android UI Testing Landscape: From Journey to Traditional Approaches
alexzhukovich
1
120
Building AI Applications with Java, LLMs, and Spring AI
thomasvitale
1
250
リアーキテクティングのその先へ 〜品質と開発生産性の壁を越えるプラットフォーム戦略〜 / architecture-con2025
visional_engineering_and_design
0
7.8k
DDD x Microservice Architecture : Findy Architecture Conf 2025
syobochim
13
5.7k
.NET 10のASP. NET Core注目の新機能
tomokusaba
0
140
改竄して学ぶコンテナサプライチェーンセキュリティ ~コンテナイメージの完全性を目指して~/tampering-container-supplychain-security
mochizuki875
1
400
Featured
See All Featured
jQuery: Nuts, Bolts and Bling
dougneiner
65
8k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
9
1k
How to train your dragon (web standard)
notwaldorf
97
6.4k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
249
1.3M
Practical Orchestrator
shlominoach
190
11k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4.1k
Fireside Chat
paigeccino
41
3.7k
GraphQLとの向き合い方2022年版
quramy
49
14k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3.1k
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.5k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
118
20k
Transcript
Faceting analyzed fields with some sprinkles of probability theory conjures
trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference reference topic © Buzzcapture
topic reference ≠
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
None
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
P(w|T) = kDt |w 2 Dt k kDt k
brown dog fox quick 2 5 10 12 5 6
12 13 2 5 6 10 12 13 brown dog fox quick
In our index. • Terms = 12GB • “Arrows” =
41GB
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter:
{ regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little
Drop docs with too many terms
reference reference topic © Buzzcapture
iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841
4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857