Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Faceting analyzed fields with some sprinkles of...
Search
Boaz Leskes
June 04, 2013
Technology
0
52
Faceting analyzed fields with some sprinkles of probability theory
Talk given at Berlin buzzwords 2013
Boaz Leskes
June 04, 2013
Tweet
Share
More Decks by Boaz Leskes
See All by Boaz Leskes
Every Shard Deserves a Home - Shard Allocation in Elasticsearch
bleskes
0
310
Life of a Document in Elasticsearch
bleskes
3
3.2k
Resiliency in Elasticsearch & Lucene
bleskes
0
500
Resiliency in Elasticsearch & Lucene
bleskes
0
220
Designing Concurrent Distributed Sequence Numbers for Elasticsearch
bleskes
2
700
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
1
360
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
6
650
The ELK Stack: For Real-Time Enlightenment
bleskes
1
1.7k
Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch
bleskes
1
310
Other Decks in Technology
See All in Technology
リーダブルテストコード 〜メンテナンスしやすい テストコードを作成する方法を考える〜 #DevSumi #DevSumiB / Readable test code
nihonbuson
11
7.4k
30分でわかる『アジャイルデータモデリング』
hanon52_
9
2.7k
PHPカンファレンス名古屋-テックリードの経験から学んだ設計の教訓
hayatokudou
2
420
生成 AI プロダクトを育てる技術 〜データ品質向上による継続的な価値創出の実践〜
icoxfog417
PRO
2
210
SA Night #2 FinatextのSA思想/SA Night #2 Finatext session
satoshiimai
1
140
ハッキングの世界に迫る~攻撃者の思考で考えるセキュリティ~
nomizone
13
5.2k
Classmethod AI Talks(CATs) #17 司会進行スライド(2025.02.19) / classmethod-ai-talks-aka-cats_moderator-slides_vol17_2025-02-19
shinyaa31
0
130
なぜ私は自分が使わないサービスを作るのか? / Why would I create a service that I would not use?
aiandrox
0
770
Cloud Spanner 導入で実現した快適な開発と運用について
colopl
1
740
オブザーバビリティの観点でみるAWS / AWS from observability perspective
ymotongpoo
8
1.5k
開発スピードは上がっている…品質はどうする? スピードと品質を両立させるためのプロダクト開発の進め方とは #DevSumi #DevSumiB / Agile And Quality
nihonbuson
2
3.1k
Culture Deck
optfit
0
430
Featured
See All Featured
Why You Should Never Use an ORM
jnunemaker
PRO
55
9.2k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
160
15k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
193
16k
Reflections from 52 weeks, 52 projects
jeffersonlam
348
20k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
The Art of Programming - Codeland 2020
erikaheidi
53
13k
Thoughts on Productivity
jonyablonski
69
4.5k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
100
18k
BBQ
matthewcrist
87
9.5k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
Fireside Chat
paigeccino
34
3.2k
GraphQLとの向き合い方2022年版
quramy
44
13k
Transcript
Faceting analyzed fields with some sprinkles of probability theory conjures
trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference reference topic © Buzzcapture
topic reference ≠
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
None
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
P(w|T) = kDt |w 2 Dt k kDt k
brown dog fox quick 2 5 10 12 5 6
12 13 2 5 6 10 12 13 brown dog fox quick
In our index. • Terms = 12GB • “Arrows” =
41GB
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter:
{ regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little
Drop docs with too many terms
reference reference topic © Buzzcapture
iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841
4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857