Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Faceting analyzed fields with some sprinkles of...
Search
Boaz Leskes
June 04, 2013
Technology
0
68
Faceting analyzed fields with some sprinkles of probability theory
Talk given at Berlin buzzwords 2013
Boaz Leskes
June 04, 2013
Tweet
Share
More Decks by Boaz Leskes
See All by Boaz Leskes
Every Shard Deserves a Home - Shard Allocation in Elasticsearch
bleskes
0
330
Life of a Document in Elasticsearch
bleskes
3
3.3k
Resiliency in Elasticsearch & Lucene
bleskes
0
520
Resiliency in Elasticsearch & Lucene
bleskes
0
240
Designing Concurrent Distributed Sequence Numbers for Elasticsearch
bleskes
2
720
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
1
370
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
6
690
The ELK Stack: For Real-Time Enlightenment
bleskes
1
1.7k
Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch
bleskes
1
330
Other Decks in Technology
See All in Technology
Bedrock AgentCore Memoryの新機能 (Episode) を試してみた / try Bedrock AgentCore Memory Episodic functionarity
hoshi7_n
2
1.8k
M&Aで拡大し続けるGENDAのデータ活用を促すためのDatabricks権限管理 / AEON TECH HUB #22
genda
0
230
なぜ あなたはそんなに re:Invent に行くのか?
miu_crescent
PRO
0
200
ActiveJobUpdates
igaiga
1
310
日本Rubyの会: これまでとこれから
snoozer05
PRO
5
230
AIエージェント開発と活用を加速するワークフロー自動生成への挑戦
shibuiwilliam
4
840
特別捜査官等研修会
nomizone
0
560
SREが取り組むデプロイ高速化 ─ Docker Buildを最適化した話
capytan
0
140
ハッカソンから社内プロダクトへ AIエージェント ko☆shi 開発で学んだ4つの重要要素
leveragestech
0
110
日本の AI 開発と世界の潮流 / GenAI Development in Japan
hariby
1
380
Authlete で実装する MCP OAuth 認可サーバー #CIMD の実装を添えて
watahani
0
160
Bedrock AgentCore Evaluationsで学ぶLLM as a judge入門
shichijoyuhi
2
240
Featured
See All Featured
AI Search: Where Are We & What Can We Do About It?
aleyda
0
6.7k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
16k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
980
The untapped power of vector embeddings
frankvandijk
1
1.5k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
254
22k
Are puppies a ranking factor?
jonoalderson
0
2.4k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
61k
Neural Spatial Audio Processing for Sound Field Analysis and Control
skoyamalab
0
130
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Building Experiences: Design Systems, User Experience, and Full Site Editing
marktimemedia
0
330
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
Bash Introduction
62gerente
615
210k
Transcript
Faceting analyzed fields with some sprinkles of probability theory conjures
trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference reference topic © Buzzcapture
topic reference ≠
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
None
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
P(w|T) = kDt |w 2 Dt k kDt k
brown dog fox quick 2 5 10 12 5 6
12 13 2 5 6 10 12 13 brown dog fox quick
In our index. • Terms = 12GB • “Arrows” =
41GB
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter:
{ regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little
Drop docs with too many terms
reference reference topic © Buzzcapture
iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841
4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857