Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Faceting analyzed fields with some sprinkles of...
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Boaz Leskes
June 04, 2013
Technology
0
72
Faceting analyzed fields with some sprinkles of probability theory
Talk given at Berlin buzzwords 2013
Boaz Leskes
June 04, 2013
Tweet
Share
More Decks by Boaz Leskes
See All by Boaz Leskes
Every Shard Deserves a Home - Shard Allocation in Elasticsearch
bleskes
0
330
Life of a Document in Elasticsearch
bleskes
3
3.3k
Resiliency in Elasticsearch & Lucene
bleskes
0
540
Resiliency in Elasticsearch & Lucene
bleskes
0
250
Designing Concurrent Distributed Sequence Numbers for Elasticsearch
bleskes
2
730
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
1
380
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
6
690
The ELK Stack: For Real-Time Enlightenment
bleskes
1
1.7k
Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch
bleskes
1
340
Other Decks in Technology
See All in Technology
EMからVPoEを経てCTOへ:マネジメントキャリアパスにおける葛藤と成長
kakehashi
PRO
9
1.4k
LINE Messengerの次世代ストレージ選定
lycorptech_jp
PRO
19
7.7k
開発組織の課題解決を加速するための権限委譲 -する側、される側としての向き合い方-
daitasu
5
340
AWS SES VDMで 将来の配信事故を防げた話
moyashi
0
250
JAWS DAYS 2026 楽しく学ぼう!ストレージ 入門
yoshiki0705
2
130
管理者向けGitHub Enterpriseの運用Tips紹介: 人にもAIにも優しいプラットフォームづくり
yuriemori
0
180
Exadata Database Service on Dedicated Infrastructure(ExaDB-D) UI スクリーン・キャプチャ集
oracle4engineer
PRO
8
7.1k
OCI技術資料 : コンピュート・サービス 概要
ocise
4
54k
作りっぱなしで終わらせない! 価値を出し続ける AI エージェントのための「信頼性」設計 / Designing Reliability for AI Agents that Deliver Continuous Value
aoto
PRO
2
250
新職業『オーケストレーター』誕生 — エージェント10体を同時に回すAgentOps
gunta
4
1.7k
Yahoo!ショッピングのレコメンデーション・システムにおけるML実践の一例
lycorptech_jp
PRO
1
180
Abuse report だけじゃない。AWS から緊急連絡が来る状況とは?昨今の攻撃や被害の事例の紹介と備えておきたい考え方について
kazzpapa3
1
320
Featured
See All Featured
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
199
73k
Leveraging LLMs for student feedback in introductory data science courses - posit::conf(2025)
minecr
1
190
Discover your Explorer Soul
emna__ayadi
2
1.1k
Crafting Experiences
bethany
1
81
Exploring anti-patterns in Rails
aemeredith
2
280
Groundhog Day: Seeking Process in Gaming for Health
codingconduct
0
120
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
10
1.1k
State of Search Keynote: SEO is Dead Long Live SEO
ryanjones
0
150
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.6k
The AI Search Optimization Roadmap by Aleyda Solis
aleyda
1
5.4k
Primal Persuasion: How to Engage the Brain for Learning That Lasts
tmiket
0
290
Build your cross-platform service in a week with App Engine
jlugia
234
18k
Transcript
Faceting analyzed fields with some sprinkles of probability theory conjures
trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference reference topic © Buzzcapture
topic reference ≠
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
None
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
P(w|T) = kDt |w 2 Dt k kDt k
brown dog fox quick 2 5 10 12 5 6
12 13 2 5 6 10 12 13 brown dog fox quick
In our index. • Terms = 12GB • “Arrows” =
41GB
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter:
{ regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little
Drop docs with too many terms
reference reference topic © Buzzcapture
iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841
4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857