Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Faceting analyzed fields with some sprinkles of...
Search
Boaz Leskes
June 04, 2013
Technology
0
49
Faceting analyzed fields with some sprinkles of probability theory
Talk given at Berlin buzzwords 2013
Boaz Leskes
June 04, 2013
Tweet
Share
More Decks by Boaz Leskes
See All by Boaz Leskes
Every Shard Deserves a Home - Shard Allocation in Elasticsearch
bleskes
0
310
Life of a Document in Elasticsearch
bleskes
3
3.1k
Resiliency in Elasticsearch & Lucene
bleskes
0
500
Resiliency in Elasticsearch & Lucene
bleskes
0
220
Designing Concurrent Distributed Sequence Numbers for Elasticsearch
bleskes
2
690
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
1
360
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
6
650
The ELK Stack: For Real-Time Enlightenment
bleskes
1
1.7k
Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch
bleskes
1
300
Other Decks in Technology
See All in Technology
プロダクトチームへのSystem Risk Records導入・運用事例の紹介/Introduction and Case Studies on Implementing and Operating System Risk Records for Product Teams
taddy_919
1
170
10分でわかるfreee エンジニア向け会社説明資料
freee
18
520k
来年もre:Invent2024 に行きたいあなたへ - “集中”と“つながり”で楽しむ -
ny7760
0
470
[AWS JAPAN 生成AIハッカソン] Dialog の紹介
yoshimi0227
0
150
新卒1年目が挑む!生成AI × マルチエージェントで実現する次世代オンボーディング / operation-ai-onboarding
cyberagentdevelopers
PRO
1
170
Amazon FSx for NetApp ONTAPを利用するにあたっての要件整理と設計のポイント
non97
1
160
Aurora_BlueGreenDeploymentsやってみた
tsukasa_ishimaru
1
130
AIを駆使したゲーム開発戦略: 新設AI組織の取り組み / sge-ai-strategy
cyberagentdevelopers
PRO
1
130
コンテンツを支える 若手ゲームクリエイターの アートディレクションの事例紹介 / cagamefi-game
cyberagentdevelopers
PRO
1
130
大規模データ基盤チームのオンプレTiDB運用への挑戦 / dpu-tidb
cyberagentdevelopers
PRO
1
110
チームを主語にしてみる / Making "Team" the Subject
ar_tama
4
310
AWSコンテナ本出版から3年経った今、もし改めて執筆し直すなら / If I revise our container book
iselegant
15
4k
Featured
See All Featured
Designing Dashboards & Data Visualisations in Web Apps
destraynor
228
52k
Producing Creativity
orderedlist
PRO
341
39k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
250
21k
Scaling GitHub
holman
458
140k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
167
49k
Happy Clients
brianwarren
97
6.7k
4 Signs Your Business is Dying
shpigford
180
21k
Thoughts on Productivity
jonyablonski
67
4.3k
How to Ace a Technical Interview
jacobian
275
23k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
231
17k
Raft: Consensus for Rubyists
vanstee
136
6.6k
A Philosophy of Restraint
colly
203
16k
Transcript
Faceting analyzed fields with some sprinkles of probability theory conjures
trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference reference topic © Buzzcapture
topic reference ≠
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
None
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
P(w|T) = kDt |w 2 Dt k kDt k
brown dog fox quick 2 5 10 12 5 6
12 13 2 5 6 10 12 13 brown dog fox quick
In our index. • Terms = 12GB • “Arrows” =
41GB
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter:
{ regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little
Drop docs with too many terms
reference reference topic © Buzzcapture
iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841
4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857