Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Faceting analyzed fields with some sprinkles of...
Search
Boaz Leskes
June 04, 2013
Technology
0
64
Faceting analyzed fields with some sprinkles of probability theory
Talk given at Berlin buzzwords 2013
Boaz Leskes
June 04, 2013
Tweet
Share
More Decks by Boaz Leskes
See All by Boaz Leskes
Every Shard Deserves a Home - Shard Allocation in Elasticsearch
bleskes
0
320
Life of a Document in Elasticsearch
bleskes
3
3.3k
Resiliency in Elasticsearch & Lucene
bleskes
0
510
Resiliency in Elasticsearch & Lucene
bleskes
0
230
Designing Concurrent Distributed Sequence Numbers for Elasticsearch
bleskes
2
720
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
1
370
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
6
680
The ELK Stack: For Real-Time Enlightenment
bleskes
1
1.7k
Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch
bleskes
1
310
Other Decks in Technology
See All in Technology
AI時代の開発を加速する組織づくり - ブログでは書けなかったリアル
hiro8ma
1
310
Implementing and Evaluating a High-Level Language with WasmGC and the Wasm Component Model: Scala’s Case
tanishiking
0
180
OSSで50の競合と戦うためにやったこと
yamadashy
3
970
AI-Readyを目指した非構造化データのメダリオンアーキテクチャ
r_miura
1
310
OpenTelemetry が拡げる Gemini CLI の可観測性
phaya72
2
2.2k
ソースを読むプロセスの例
sat
PRO
15
9.9k
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
5
44k
Oracle Base Database Service 技術詳細
oracle4engineer
PRO
13
82k
[2025年10月版] Databricks Data + AI Boot Camp
databricksjapan
1
260
【SORACOM UG Explorer 2025】さらなる10年へ ~ SORACOM MVC 発表
soracom
PRO
0
130
FinOps について (ちょっと) 本気出して考えてみた
skmkzyk
0
210
SCONE - 動画配信の帯域を最適化する新プロトコル
kazuho
1
370
Featured
See All Featured
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3k
Done Done
chrislema
185
16k
Bootstrapping a Software Product
garrettdimon
PRO
307
110k
Rebuilding a faster, lazier Slack
samanthasiow
84
9.2k
Product Roadmaps are Hard
iamctodd
PRO
55
11k
Why Our Code Smells
bkeepers
PRO
340
57k
How To Stay Up To Date on Web Technology
chriscoyier
791
250k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
16k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
116
20k
Imperfection Machines: The Place of Print at Facebook
scottboms
269
13k
Docker and Python
trallard
46
3.6k
Art, The Web, and Tiny UX
lynnandtonic
303
21k
Transcript
Faceting analyzed fields with some sprinkles of probability theory conjures
trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference reference topic © Buzzcapture
topic reference ≠
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
None
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
P(w|T) = kDt |w 2 Dt k kDt k
brown dog fox quick 2 5 10 12 5 6
12 13 2 5 6 10 12 13 brown dog fox quick
In our index. • Terms = 12GB • “Arrows” =
41GB
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter:
{ regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little
Drop docs with too many terms
reference reference topic © Buzzcapture
iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841
4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857