Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Faceting analyzed fields with some sprinkles of...
Search
Boaz Leskes
June 04, 2013
Technology
0
57
Faceting analyzed fields with some sprinkles of probability theory
Talk given at Berlin buzzwords 2013
Boaz Leskes
June 04, 2013
Tweet
Share
More Decks by Boaz Leskes
See All by Boaz Leskes
Every Shard Deserves a Home - Shard Allocation in Elasticsearch
bleskes
0
310
Life of a Document in Elasticsearch
bleskes
3
3.2k
Resiliency in Elasticsearch & Lucene
bleskes
0
510
Resiliency in Elasticsearch & Lucene
bleskes
0
220
Designing Concurrent Distributed Sequence Numbers for Elasticsearch
bleskes
2
710
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
1
360
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
6
660
The ELK Stack: For Real-Time Enlightenment
bleskes
1
1.7k
Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch
bleskes
1
310
Other Decks in Technology
See All in Technology
Houtou.pm #1
papix
0
600
データプレーンプログラミングとは? DPU&スイッチASICの開発経験から語る
ebiken
PRO
0
150
初めてのGoogle Cloud by AWS出身者
harukasakihara
1
730
Azure Developer CLI と Azure Deployment Environment / Azure Developer CLI and Azure Deployment Environment
nnstt1
1
110
SmartHRの複数のチームにおけるMCPサーバーの活用事例と課題
yukisnow1823
2
1.1k
ソフトウェアは捨てやすく作ろう/Let's make software easy to discard
sanogemaru
10
5.6k
令和トラベルQAのAI活用
seigaitakahiro
0
480
iOS/Androidで無限循環Carousel表現を考えてみる
fumiyasac0921
0
120
ソフトウェアテストのAI活用_ver1.10
fumisuke
0
220
会社紹介資料 / Sansan Company Profile
sansan33
PRO
6
360k
S3 Tables を図解でやさしくおさらい~基本から QuickSight 連携まで/s3-tables-illustrated-basics-quicksight
emiki
1
310
mnt_data_とは?ChatGPTコード実行環境を深堀りしてみた
icck
0
180
Featured
See All Featured
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
The Cult of Friendly URLs
andyhume
78
6.4k
The Power of CSS Pseudo Elements
geoffreycrofte
76
5.8k
Reflections from 52 weeks, 52 projects
jeffersonlam
349
20k
What's in a price? How to price your products and services
michaelherold
245
12k
GraphQLとの向き合い方2022年版
quramy
46
14k
Automating Front-end Workflow
addyosmani
1370
200k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
48
5.4k
A Tale of Four Properties
chriscoyier
159
23k
Scaling GitHub
holman
459
140k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
331
21k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Transcript
Faceting analyzed fields with some sprinkles of probability theory conjures
trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference reference topic © Buzzcapture
topic reference ≠
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
None
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
P(w|T) = kDt |w 2 Dt k kDt k
brown dog fox quick 2 5 10 12 5 6
12 13 2 5 6 10 12 13 brown dog fox quick
In our index. • Terms = 12GB • “Arrows” =
41GB
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter:
{ regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little
Drop docs with too many terms
reference reference topic © Buzzcapture
iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841
4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857