Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Faceting analyzed fields with some sprinkles of...
Search
Boaz Leskes
June 04, 2013
Technology
0
63
Faceting analyzed fields with some sprinkles of probability theory
Talk given at Berlin buzzwords 2013
Boaz Leskes
June 04, 2013
Tweet
Share
More Decks by Boaz Leskes
See All by Boaz Leskes
Every Shard Deserves a Home - Shard Allocation in Elasticsearch
bleskes
0
320
Life of a Document in Elasticsearch
bleskes
3
3.3k
Resiliency in Elasticsearch & Lucene
bleskes
0
510
Resiliency in Elasticsearch & Lucene
bleskes
0
230
Designing Concurrent Distributed Sequence Numbers for Elasticsearch
bleskes
2
720
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
1
370
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
6
680
The ELK Stack: For Real-Time Enlightenment
bleskes
1
1.7k
Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch
bleskes
1
310
Other Decks in Technology
See All in Technology
複数サービスを支えるマルチテナント型Batch MLプラットフォーム
lycorptech_jp
PRO
1
980
Unlocking the Power of AI Agents with LINE Bot MCP Server
linedevth
0
120
COVESA VSSによる車両データモデルの標準化とAWS IoT FleetWiseの活用
osawa
1
400
AIエージェント開発用SDKとローカルLLMをLINE Botと組み合わせてみた / LINEを使ったLT大会 #14
you
PRO
0
130
テストを軸にした生き残り術
kworkdev
PRO
0
220
プラットフォーム転換期におけるGitHub Copilot活用〜Coding agentがそれを加速するか〜 / Leveraging GitHub Copilot During Platform Transition Periods
aeonpeople
1
240
バイブスに「型」を!Kent Beckに学ぶ、AI時代のテスト駆動開発
amixedcolor
3
590
[ JAWS-UG 東京 CommunityBuilders Night #2 ]SlackとAmazon Q Developerで 運用効率化を模索する
sh_fk2
3
460
Automating Web Accessibility Testing with AI Agents
maminami373
0
1.3k
Snowflake Intelligenceにはこうやって立ち向かう!クラシルが考えるAI Readyなデータ基盤と活用のためのDataOps
gappy50
0
280
AI時代を生き抜くエンジニアキャリアの築き方 (AI-Native 時代、エンジニアという道は 「最大の挑戦の場」となる) / Building an Engineering Career to Thrive in the Age of AI (In the AI-Native Era, the Path of Engineering Becomes the Ultimate Arena of Challenge)
jeongjaesoon
0
260
「Linux」という言葉が指すもの
sat
PRO
4
140
Featured
See All Featured
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.4k
KATA
mclloyd
32
14k
VelocityConf: Rendering Performance Case Studies
addyosmani
332
24k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
36
2.5k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
8
930
Building Applications with DynamoDB
mza
96
6.6k
The Power of CSS Pseudo Elements
geoffreycrofte
77
6k
Optimizing for Happiness
mojombo
379
70k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
30
9.7k
Build The Right Thing And Hit Your Dates
maggiecrowley
37
2.9k
Optimising Largest Contentful Paint
csswizardry
37
3.4k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.5k
Transcript
Faceting analyzed fields with some sprinkles of probability theory conjures
trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference reference topic © Buzzcapture
topic reference ≠
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
None
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
P(w|T) = kDt |w 2 Dt k kDt k
brown dog fox quick 2 5 10 12 5 6
12 13 2 5 6 10 12 13 brown dog fox quick
In our index. • Terms = 12GB • “Arrows” =
41GB
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter:
{ regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little
Drop docs with too many terms
reference reference topic © Buzzcapture
iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841
4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857