Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Faceting analyzed fields with some sprinkles of...
Search
Boaz Leskes
June 04, 2013
Technology
0
49
Faceting analyzed fields with some sprinkles of probability theory
Talk given at Berlin buzzwords 2013
Boaz Leskes
June 04, 2013
Tweet
Share
More Decks by Boaz Leskes
See All by Boaz Leskes
Every Shard Deserves a Home - Shard Allocation in Elasticsearch
bleskes
0
310
Life of a Document in Elasticsearch
bleskes
3
3.2k
Resiliency in Elasticsearch & Lucene
bleskes
0
500
Resiliency in Elasticsearch & Lucene
bleskes
0
220
Designing Concurrent Distributed Sequence Numbers for Elasticsearch
bleskes
2
690
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
1
360
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
6
650
The ELK Stack: For Real-Time Enlightenment
bleskes
1
1.7k
Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch
bleskes
1
310
Other Decks in Technology
See All in Technology
PHP ユーザのための OpenTelemetry 入門 / phpcon2024-opentelemetry
shin1x1
3
1.3k
バクラクのドキュメント解析技術と実データにおける課題 / layerx-ccc-winter-2024
shimacos
2
1.1k
継続的にアウトカムを生み出し ビジネスにつなげる、 戦略と運営に対するタイミーのQUEST(探求)
zigorou
0
670
AWS re:Invent 2024 ふりかえり勉強会
yhana
0
270
型情報を用いたLintでコード品質を向上させる
sansantech
PRO
2
110
スタートアップで取り組んでいるAzureとMicrosoft 365のセキュリティ対策/How to Improve Azure and Microsoft 365 Security at Startup
yuj1osm
0
230
pg_bigmをRustで実装する(第50回PostgreSQLアンカンファレンス@オンライン 発表資料)
shinyakato_
0
110
マイクロサービスにおける容易なトランザクション管理に向けて
scalar
0
150
LINEスキマニにおけるフロントエンド開発
lycorptech_jp
PRO
0
330
AWS re:Invent 2024で発表された コードを書く開発者向け機能について
maruto
0
210
生成AIをより賢く エンジニアのための RAG入門 - Oracle AI Jam Session #20
kutsushitaneko
4
270
PHPerのための計算量入門/Complexity101 for PHPer
hanhan1978
5
260
Featured
See All Featured
Typedesign – Prime Four
hannesfritz
40
2.4k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
29
900
What's in a price? How to price your products and services
michaelherold
243
12k
For a Future-Friendly Web
brad_frost
175
9.4k
StorybookのUI Testing Handbookを読んだ
zakiyama
27
5.3k
Statistics for Hackers
jakevdp
796
220k
Fantastic passwords and where to find them - at NoRuKo
philnash
50
2.9k
The Illustrated Children's Guide to Kubernetes
chrisshort
48
48k
Building Applications with DynamoDB
mza
91
6.1k
4 Signs Your Business is Dying
shpigford
182
21k
Build your cross-platform service in a week with App Engine
jlugia
229
18k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
232
17k
Transcript
Faceting analyzed fields with some sprinkles of probability theory conjures
trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference reference topic © Buzzcapture
topic reference ≠
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
None
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
P(w|T) = kDt |w 2 Dt k kDt k
brown dog fox quick 2 5 10 12 5 6
12 13 2 5 6 10 12 13 brown dog fox quick
In our index. • Terms = 12GB • “Arrows” =
41GB
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter:
{ regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little
Drop docs with too many terms
reference reference topic © Buzzcapture
iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841
4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857