Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Faceting analyzed fields with some sprinkles of...
Search
Boaz Leskes
June 04, 2013
Technology
0
63
Faceting analyzed fields with some sprinkles of probability theory
Talk given at Berlin buzzwords 2013
Boaz Leskes
June 04, 2013
Tweet
Share
More Decks by Boaz Leskes
See All by Boaz Leskes
Every Shard Deserves a Home - Shard Allocation in Elasticsearch
bleskes
0
320
Life of a Document in Elasticsearch
bleskes
3
3.3k
Resiliency in Elasticsearch & Lucene
bleskes
0
510
Resiliency in Elasticsearch & Lucene
bleskes
0
230
Designing Concurrent Distributed Sequence Numbers for Elasticsearch
bleskes
2
710
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
1
370
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
6
670
The ELK Stack: For Real-Time Enlightenment
bleskes
1
1.7k
Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch
bleskes
1
310
Other Decks in Technology
See All in Technology
DBのスキルで生き残る技術 - AI時代におけるテーブル設計の勘所
soudai
PRO
50
20k
Delta airlines Customer®️ USA Contact Numbers: Complete 2025 Support Guide
deltahelp
0
730
面倒な作業はAIにおまかせ。Flutter開発をスマートに効率化
ruideengineer
0
260
Zero Data Loss Autonomous Recovery Service サービス概要
oracle4engineer
PRO
2
7.8k
品質と速度の両立:生成AI時代の品質保証アプローチ
odasho
1
370
マネジメントって難しい、けどおもしろい / Management is tough, but fun! #em_findy
ar_tama
7
1.1k
開発生産性を測る前にやるべきこと - 組織改善の実践 / Before Measuring Dev Productivity
kaonavi
12
5.3k
Lufthansa ®️ USA Contact Numbers: Complete 2025 Support Guide
lufthanahelpsupport
0
200
Enhancing SaaS Product Reliability and Release Velocity through Optimized Testing Approach
ropqa
1
230
整頓のジレンマとの戦い〜Tidy First?で振り返る事業とキャリアの歩み〜/Fighting the tidiness dilemma〜Business and Career Milestones Reflected on in Tidy First?〜
bitkey
3
17k
KubeCon + CloudNativeCon Japan 2025 Recap by CA
ponkio_o
PRO
0
300
united airlines ™®️ USA Contact Numbers: Complete 2025 Support Guide
flyunitedhelp
1
360
Featured
See All Featured
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
48
2.9k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
31
1.3k
Testing 201, or: Great Expectations
jmmastey
43
7.6k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
53
2.9k
Facilitating Awesome Meetings
lara
54
6.4k
Faster Mobile Websites
deanohume
307
31k
Git: the NoSQL Database
bkeepers
PRO
430
65k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
10
960
Java REST API Framework Comparison - PWX 2021
mraible
31
8.7k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
181
54k
Unsuck your backbone
ammeep
671
58k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
138
34k
Transcript
Faceting analyzed fields with some sprinkles of probability theory conjures
trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference reference topic © Buzzcapture
topic reference ≠
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
None
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
P(w|T) = kDt |w 2 Dt k kDt k
brown dog fox quick 2 5 10 12 5 6
12 13 2 5 6 10 12 13 brown dog fox quick
In our index. • Terms = 12GB • “Arrows” =
41GB
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter:
{ regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little
Drop docs with too many terms
reference reference topic © Buzzcapture
iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841
4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857