Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Yokozuna, Distributed Search You Don't Think About
Search
Ryan Zezeski
May 14, 2013
Technology
4
760
Yokozuna, Distributed Search You Don't Think About
Discussion and demos of Yokozuna at RICON|East 2013.
Ryan Zezeski
May 14, 2013
Tweet
Share
More Decks by Ryan Zezeski
See All by Ryan Zezeski
Yokozuna: Scaling Solr With Riak
rzezeski
4
1.6k
Other Decks in Technology
See All in Technology
「視座」の上げ方が成人発達理論にわかりやすくまとまってた / think_ perspective_hidden_dimensions
shuzon
2
4.8k
急成長中のWINTICKETにおける品質と開発スピードと向き合ったQA戦略と今後の展望 / winticket-autify
cyberagentdevelopers
PRO
1
160
いまさらのStorybook
ikumatadokoro
0
140
プロダクトエンジニアが活躍する環境を作りたくて 事業責任者になった話 ~プロダクトエンジニアの行き着く先~
gimupop
1
480
グローバル展開を見据えたサービスにおける機械翻訳プラクティス / dp-ai-translating
cyberagentdevelopers
PRO
1
150
日経電子版におけるリアルタイムレコメンドシステム開発の事例紹介/nikkei-realtime-recommender-system
yng87
1
510
ABEMA のコンテンツ制作を最適化!生成 AI x クラウド映像編集システム / abema-ai-editor
cyberagentdevelopers
PRO
1
180
ガチ勢によるPipeCD運用大全〜滑らかなCI/CDを添えて〜 / ai-pipecd-encyclopedia
cyberagentdevelopers
PRO
3
210
「 SharePoint 難しい」ってよく聞くけど、そんなに言うなら8歳の息子に試してもらった
taichinakamura
1
630
Datachain会社紹介資料(2024年11月) / Company Deck
datachain
3
16k
小規模に始めるデータメッシュとデータガバナンスの実践
kimujun
3
590
Figma Dev Modeで進化するデザインとエンジニアリングの協働 / figma-with-engineering
cyberagentdevelopers
PRO
1
430
Featured
See All Featured
jQuery: Nuts, Bolts and Bling
dougneiner
61
7.5k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
41
2.1k
The Language of Interfaces
destraynor
154
24k
Thoughts on Productivity
jonyablonski
67
4.3k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
7
150
Statistics for Hackers
jakevdp
796
220k
Speed Design
sergeychernyshev
24
570
Git: the NoSQL Database
bkeepers
PRO
425
64k
4 Signs Your Business is Dying
shpigford
180
21k
KATA
mclloyd
29
13k
How STYLIGHT went responsive
nonsquared
95
5.2k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
43
6.6k
Transcript
Yokozuna Distributed Search You Don’t Think About Ryan Zezeski May
14th 2013 1 Tuesday, May 21, 13
Live Demo 2 Tuesday, May 21, 13
Live Demos 3 Tuesday, May 21, 13
PROBLEM? 4 Tuesday, May 21, 13
PROBLEM? SOLUTION! 5 Tuesday, May 21, 13
Solution Pre-made 6 Tuesday, May 21, 13
Piece At A Time 7 Tuesday, May 21, 13
Goals • Don’t screw up • Show how Yokozuna doesn’t
make you think (too hard) • Teach you about Search • Neat things you can do with Yokozuna 8 Tuesday, May 21, 13
PROBLEM: SEARCH FOR COMMITS ABOUT SPECIFIC FEATURE/BUG MAKE IT GOOGLE-LIKE
9 Tuesday, May 21, 13
SOLUTION: INDEX COMMITS IN YOKOZUNA - “COMMIT LOG SEARCHER” (CLS)
10 Tuesday, May 21, 13
Anatomy of a Commit Msg 11 Tuesday, May 21, 13
Primary Key 12 Tuesday, May 21, 13
Any Node Will Do 13 Tuesday, May 21, 13
Term Query 14 Tuesday, May 21, 13
Query Any Node 15 Tuesday, May 21, 13
Boolean (1) repo:riak_kv repo:riak_core 16 Tuesday, May 21, 13
Boolean (2) repo:riak_kv AND author:”Ryan Zezeski” 17 Tuesday, May 21,
13
Boolean (3) commit_author:"Ryan Zezeski" OR commit_author:"Joseph Blomstedt" NOT commit_repo:riak_kv 18
Tuesday, May 21, 13
Range (1) commit_repo:riak_core AND commit_dt:[NOW-1YEAR TO NOW] 19 Tuesday, May
21, 13
Range (2) commit_repo:riak_core AND commit_dt:[NOW-1YEAR TO NOW] I RAN THIS
ON 2013-05-10 sort=dt asc 20 Tuesday, May 21, 13
Wildcard (1) *:* GET TOTAL COUNT FIRST 21 Tuesday, May
21, 13
Wildcard (2) commit_repo:riak_* NOTICE COUNT IS LOWER 22 Tuesday, May
21, 13
WHAT ABOUT SEARCHING SUMMARY AND BODY? 23 Tuesday, May 21,
13
THE INVERTED INDEX 24 Tuesday, May 21, 13
AN INDEX - BUT INVERTED 25 Tuesday, May 21, 13
EVERYONE KNOWS WHAT IT IS 26 Tuesday, May 21, 13
EVEN NON-TECH PEOPLE 27 Tuesday, May 21, 13
YES...EVEN YOUR PARENTS 28 Tuesday, May 21, 13
What’s In A Book? 29 Tuesday, May 21, 13
• WORDS • PARAGRAPHS • SECTIONS • CHAPTERS • ETC.
30 Tuesday, May 21, 13
AND PAGE NUMBERS 31 Tuesday, May 21, 13
32 Tuesday, May 21, 13
PAGE NUMBERS ARE AN IMPLICIT INDEX 33 Tuesday, May 21,
13
PAGE NUMBER TO WORDS WORD TO PAGE NUMBERS INVERTED 34
Tuesday, May 21, 13
STOLEN FROM BLOG OF RICKY HO: http:// horicky.blogspot.com/2013/02/text-processing- part-2-inverted-index.html 35
Tuesday, May 21, 13
HOW DO YOU GET THE WORDS IN THE FIRST PLACE?
36 Tuesday, May 21, 13
Analysis - The Iceberg That Sunk The Titanic 37 Tuesday,
May 21, 13
Phrase (1) subject:hinted OR subject:handoff OR body:hinted OR body:handoff 38
Tuesday, May 21, 13
Phrase (2) subject:”hinted handoff” OR body:”hinted handoff” 39 Tuesday, May
21, 13
Phrase (3) subject:”partition vnode” OR body:”partition vnode” 40 Tuesday, May
21, 13
Phrase (4) subject:”partition vnode”~4 OR body:”partition vnode”~4 41 Tuesday, May
21, 13
Exact Term subject:behavior OR body:behavior 42 Tuesday, May 21, 13
Fuzzy Term subject:behavior~1 OR body:behavior~1 43 Tuesday, May 21, 13
Ranking ADD SCORE TO FL SCORE ADDED TO EVERY RESULT
44 Tuesday, May 21, 13
RECALL, PRECISION, AND RELEVANCY, OH MY! 45 Tuesday, May 21,
13
RELEVANCY - FOR A GIVEN QUERY & DOC SET THERE
IS AN IDEAL ANSWER OF ONLY RELEVANT DOCS 46 Tuesday, May 21, 13
RECALL = WHAT % OF IDEAL ANSWER SET WAS RETRIEVED
47 Tuesday, May 21, 13
PRECISION = WHAT % OF ANSWER IS RELEVANT 48 Tuesday,
May 21, 13
RECALLvs.PRECISION AS YOU INCREASE RECALL YOU DEGRADE PRECISION 49 Tuesday,
May 21, 13
SOLR DETERMINES RELEVANCY VIA THE NOTION OF SIMILARITY 50 Tuesday,
May 21, 13
SOLR USES TF-IDF: TERM FREQUENCY, INVERSE DOCUMENT FREQUENCY 51 Tuesday,
May 21, 13
Dismax + Facets + Highlighting FACETS HIGHLIGHTING DISMAX 52 Tuesday,
May 21, 13
FACET - A TAXONOMY OF YOUR QUERY BASED ON FIELD’S
VALUES 53 Tuesday, May 21, 13
FACETS ALLOW “DRILL DOWN” - THEY GUIDE THE USER 54
Tuesday, May 21, 13
HIGHLIGHTING GIVES YOUR RESULTS CONTEXT - ALLOWS QUICKER DETERMINATION OF
RELEVANCY 55 Tuesday, May 21, 13
DISMAX - DISjunction MAX - A QUERY HANDLER MEANT FOR
DIRECT USER INPUT 56 Tuesday, May 21, 13
All Nodes Up 57 Tuesday, May 21, 13
All Nodes Up - Query 58 Tuesday, May 21, 13
Node 4 Down 59 Tuesday, May 21, 13
Node 4 Down - Query 60 Tuesday, May 21, 13
Node 3 & 4 Down 61 Tuesday, May 21, 13
Node 3 & 4 Down - Query 62 Tuesday, May
21, 13
REPLICATION PROVIDES HIGH AVAILABILITY 2 3 4 1 START WITH
4 NODES 63 Tuesday, May 21, 13
Write 3 Replicas 2 3 4 1 64 Tuesday, May
21, 13
Take 2 Nodes Down 2 3 4 1 1 REPLICA
STILL AVAILABLE 65 Tuesday, May 21, 13
WHAT IF DATA IS WRITTEN WHILE NODES ARE DOWN? 66
Tuesday, May 21, 13
YZ Not Stored Yet 67 Tuesday, May 21, 13
Store YZ Log 68 Tuesday, May 21, 13
Query YZ - Node 1& 2 69 Tuesday, May 21,
13
Set XFer Limit To 0 70 Tuesday, May 21, 13
Start Nodes 3 & 4 71 Tuesday, May 21, 13
Query Solr Direct WHEN MAKING THIS DEMO I WAS EXPECTING
THIS TO BE 0 BUT I FORGOT ABOUT AAE WHICH STARTED KICKING IN BEFORE HANDOFF - SELF HEALING FTW! 72 Tuesday, May 21, 13
Set Xfer Limit To 64 73 Tuesday, May 21, 13
Handoff Occurs 74 Tuesday, May 21, 13
0 Pending Xfers 75 Tuesday, May 21, 13
Solr Direct (Again) NOTICE IT’S NOW 301, UP FROM 54,
MORE PROOF THAT HANDOFF OCCURRED - NOTE THIS QUERY IS GOING DIRECT TO ONLY 1 SHARD 76 Tuesday, May 21, 13
Query Node 4 YZ NOW HIT YOKOZUNA ON NODE4 (NOTICE
CHANGE IN PORT #) - THIS WILL RUN A DIST SEARCH AND THUS RETURN CORRECT COUNT 77 Tuesday, May 21, 13
Data Ownership A VNODE THE RING 78 Tuesday, May 21,
13
Node Down X X X X X X X X
X X 79 Tuesday, May 21, 13
Write Fallback X X X X X X X X
X X 80 Tuesday, May 21, 13
Node Up HINTED HANDOFF WILL MOVE REPLICA TO PRIMARY 81
Tuesday, May 21, 13
WHAT IF YOU RM -RF THE INDEX DIR? 82 Tuesday,
May 21, 13
Kill The Data RM -RF THE INDEX DIRECTORY KILL THE
SOLR PROC 83 Tuesday, May 21, 13
Auto Restart YOKOZUNA NOTICES SOLR DIED AND AUTOMATICALLY RESTARTS IT
84 Tuesday, May 21, 13
Node 4 - 0 Results 85 Tuesday, May 21, 13
AAE Notices Missing Data 86 Tuesday, May 21, 13
Node 4 - 13 Results DATA IS RE-INDEXED OVER TIME
87 Tuesday, May 21, 13
More AAE Repair 88 Tuesday, May 21, 13
Node 4 - 128 Results MORE INDEXES ARE REPAIRED, THIS
CONTINUES UNTIL AAE REPAIRS ALL INDEXES 89 Tuesday, May 21, 13
WHAT EVEN IS ACTIVE ANTI- ENTROPY? 90 Tuesday, May 21,
13
Mo Systems Mo Failure • index update could get lost
• files can become truncated/corrupted • accidental `rm -rf` • segfault at right time • etc... 91 Tuesday, May 21, 13
MYRAID OF FAILURE SCENARIOS - FROM OBVIOUS TO NEARLY INVISIBLE
92 Tuesday, May 21, 13
ENTROPY IS DAMAGE AAE IS SELF HEALING STRIKER!!!! EHEM, I
MEAN, ENTROPY!!!! 93 Tuesday, May 21, 13
REPAIR EFFICIENTLY - NOT STUPIDLY 94 Tuesday, May 21, 13
Learn You Some Merkle For A Great Good BIG UPS
TO @jtuple FOR THE AAE DIAGRAMS 95 Tuesday, May 21, 13
Segments EACH SEGMENT IS LIST OF KEY-HASH PAIRS 96 Tuesday,
May 21, 13
Segment Hashes HASH OF HASHES IN SEGMENT 97 Tuesday, May
21, 13
Hash O’ Hashes 98 Tuesday, May 21, 13
WHAT HAPPENS DURING EXCHANGE? 99 Tuesday, May 21, 13
Start With 2 Trees 100 Tuesday, May 21, 13
Compare Top Hashes TOP HASHES DON’T MATCH - SOMETHING IS
DIFFERENT 101 Tuesday, May 21, 13
Compare Child Hashes NARROW DOWN THE DIVERGENT SEGMENT 102 Tuesday,
May 21, 13
Recur NARROW DOWN THE DIVERGENT SEGMENT CONT... 103 Tuesday, May
21, 13
Iter Key-Hash Pairs ITER FINAL LIST OF HASHES TO FIND
DIVERGENT KEYS 104 Tuesday, May 21, 13
Repair Divergent Keys REPAIR (RE-INDEX) KEYS THAT ARE DIVERGENT (RED)
105 Tuesday, May 21, 13
CODE FOR DETECTION AND REPAIR - NOT PREVENTION 106 Tuesday,
May 21, 13
WHAT HAPPENS IF 3 NODES GO DOWN? 107 Tuesday, May
21, 13
Stop 3 Nodes 108 Tuesday, May 21, 13
Query 109 Tuesday, May 21, 13
CONSISTENCY vs. AVAILABILITY 110 Tuesday, May 21, 13
Uptime - Story of 9s UPTIME = (MTBF - MTTR)
/ MTBF 111 Tuesday, May 21, 13
Uptime is Flawed IF THE SYSTEM IS DOWN, BUT NO
ONE MAKES A REQUEST, IS IT REALLY DOWN? 112 Tuesday, May 21, 13
Yield - Uptime of the People YIELD = QUERIES COMPLETED
/ QUERIES OFFERED 113 Tuesday, May 21, 13
Harvest vs. Yield HARVEST = DATA AVAIL / COMPLETE DATA
IF FACE OF FAILURE YOU CAN’T HAVE BOTH FOR A SINGLE REQUEST 114 Tuesday, May 21, 13
IN TIMES OF TROUBLE - YOKOZUNA CHOOSES HARVEST FOR QUERIES
115 Tuesday, May 21, 13
TECHNICALLY - YOKOZUNA IS ALWAYS < 100% HARVEST IN A
NON- QUIESCENT CLUSTER 116 Tuesday, May 21, 13
YOKOZUNA FAVORS YIELD FOR WRITES 117 Tuesday, May 21, 13
ONCE RIAK 1.4 SHIPS - YOKOZUNA LANDS IN MASTER 118
Tuesday, May 21, 13
THANK YOU HTTP://GITHUB.COM/BASHO/YOKOZUNA 119 Tuesday, May 21, 13