Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Faceting analyzed fields with some sprinkles of...
Search
Boaz Leskes
June 04, 2013
Technology
0
55
Faceting analyzed fields with some sprinkles of probability theory
Talk given at Berlin buzzwords 2013
Boaz Leskes
June 04, 2013
Tweet
Share
More Decks by Boaz Leskes
See All by Boaz Leskes
Every Shard Deserves a Home - Shard Allocation in Elasticsearch
bleskes
0
310
Life of a Document in Elasticsearch
bleskes
3
3.2k
Resiliency in Elasticsearch & Lucene
bleskes
0
500
Resiliency in Elasticsearch & Lucene
bleskes
0
220
Designing Concurrent Distributed Sequence Numbers for Elasticsearch
bleskes
2
700
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
1
360
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
6
660
The ELK Stack: For Real-Time Enlightenment
bleskes
1
1.7k
Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch
bleskes
1
310
Other Decks in Technology
See All in Technology
10ヶ月かけてstyled-components v4からv5にアップデートした話
uhyo
5
290
Стильный код: натуральный поиск редких атрибутов по картинке. Юлия Антохина, Data Scientist, Lamoda Tech
lamodatech
0
800
Dataverseの検索列について
miyakemito
1
120
AWSのマルチアカウント管理 ベストプラクティス最新版 2025 / Multi-Account management on AWS best practice 2025
ohmura
4
330
更新系と状態
uhyo
8
1.9k
生成AIによるCloud Native基盤構築の可能性と実践的ガードレールの敷設について
nwiizo
7
1.2k
新卒エンジニアがCICDをモダナイズしてみた話
akashi_sn
2
260
4/17/25 - CIJUG - Java Meets AI: Build LLM-Powered Apps with LangChain4j (part 2)
edeandrea
PRO
0
140
エンジニアリングで組織のアウトカムを最速で最大化する!
ham0215
1
170
バックオフィス向け toB SaaS バクラクにおけるレコメンド技術活用 / recommender-systems-in-layerx-bakuraku
yuya4
5
580
生成AIのユースケースをとにかく集めてまるっと学ぶ!/ all about generative ai usecases
gakumura
2
250
白金鉱業Meetup_Vol.18_生成AIはデータサイエンティストを代替するのか?
brainpadpr
3
160
Featured
See All Featured
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.5k
GraphQLの誤解/rethinking-graphql
sonatard
71
10k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
129
19k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
30
2k
4 Signs Your Business is Dying
shpigford
183
22k
Navigating Team Friction
lara
185
15k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
331
21k
GitHub's CSS Performance
jonrohan
1030
460k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
357
30k
Raft: Consensus for Rubyists
vanstee
137
6.9k
Why Our Code Smells
bkeepers
PRO
336
57k
A Modern Web Designer's Workflow
chriscoyier
693
190k
Transcript
Faceting analyzed fields with some sprinkles of probability theory conjures
trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference reference topic © Buzzcapture
topic reference ≠
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
None
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
P(w|T) = kDt |w 2 Dt k kDt k
brown dog fox quick 2 5 10 12 5 6
12 13 2 5 6 10 12 13 brown dog fox quick
In our index. • Terms = 12GB • “Arrows” =
41GB
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter:
{ regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little
Drop docs with too many terms
reference reference topic © Buzzcapture
iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841
4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857