Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mercari Item Search: 
Behind The Scenes (20min)

kazeburo
October 28, 2019

Mercari Item Search: 
Behind The Scenes (20min)

Mercari Item Search: 
Behind The Scenes (20min)
第22回 Lucene/Solr勉強会 2019.10.28

kazeburo

October 28, 2019
Tweet

More Decks by kazeburo

Other Decks in Technology

Transcript

  1. Mercari Item Search: 
 Behind The Scenes (20min) Masahiro Nagano

    (kazeburo) ୈ22ճ Lucene/Solrษڧձ 2019.10.28
  2. Softwares for Search • 2013.7 (?) ~ • Solr on

    BareMetal Servers • Nginx as LB • 2019.7 ~ New Architecture • Elasticsearch on GKE
  3. K-Pgڥք • ΋ͬͱ૿͑Δ঎඼ɺ΋ͬͱ৳ͼΔDAU • Tuning JVM/GC • Tried CMS ,

    Parallel GC and G1GC. Parallel GC was better for this Era • Tuning Query • Use filter query correctly • Split Index and Fallback in Nginx
  4. Paleogene period PHP Recent Master update select Recent Slave Recent


    Slave Recent
 Slave OpenResty Replication pollInterval 30s All Master All
 Slave All
 Slave update Replication pollInterval 1min
  5. • Indexͷ෼ׂOpenResty ʹΑΔ Recent͔ΒAll΁ͷࣗಈFall back • OpenResty • Nginx͓Αͼɺngx_luaΛ͸͡Ίͱ͢ΔCͰॻ͔Ε֤ͨछαʔυύʔςΟϞδϡʔϧͳ ͲͰιϑτ΢ΣΞɾσΟετϦϏϡʔγϣϯ

    • ݕࡧrequestΛड͚ͨࡍʹɺ·ͣRecent Indexʹରͯ͠ݕࡧΛ࣮ߦ͠ɺಘΒΕͨJSONΛ Nginx಺ͷLuaͰॲཧɺऔಘݸ਺(rows)ʹରͯ݁͠Ռ͕ෆ଍͍ͯ͠Ε͹ɺAll Indexʹର͠ ͯΫΤϦΛ͠ͳ͓͢
  6. Paleogene period PHP Recent Master update select Recent Slave Recent


    Slave Recent
 Slave OpenResty Replication pollInterval 30s All Master All
 Slave All
 Slave update Replication pollInterval 1min
  7. Paleogene period PHP Recent Master update select Recent Slave Recent


    Slave Recent
 Slave OpenResty Replication pollInterval 30s All Master All
 Slave All
 Slave update Replication pollInterval 1min
  8. • ઐ༻ Slave ͷ࡞੒ • Bot, ScaperͷΞΫηεىҼͷݕࡧϦΫΤετͷ൑ఆ • ৽ண঎඼͓஌Βͤϝʔϧͷݕࡧॲཧͷ෼཭ •

    ʮDescription͕ΩʔϫʔυʯͷՁ֨ɾΧςΰϦαδΣετͷॏ͍ΫΤϦͷ෼཭ • Query Rewriting by Lua
  9. Neogene period Recent Master update select Recent Slave Recent
 Slave

    Condition
 Slave OpenResty Replication pollInterval 30s All Master All
 Slave All
 Slave update Replication pollInterval 1min Condition
 Slave Suggest
 Slave Suggest
 Slave Replication pollInterval bit long PHP General Specific purpose
  10. local create_t = string.match(token, "^fq=created_time_t%%3A%%5B(%d%d%d%d%-%d%d%-%d%dT%d%d%%3A%d%d%%3A%d) %dZ%+TO%+%%2A%%5D$") if create_t then --

    conditionsͰfilter queryͷhit཰Λ͋͛ΔͨΊɺcreated_time_tͷ࠷ޙΛ9ඵʹͯ͠͠·͏ create_t_filter = "created_time_t%3A%5B" .. create_t .. "9Z+TO+%2A%5D" -- ঎඼ͷঢ়ଶ: શ෦ࢦఆ͞ΕͯΔͷͰফ͢ args = string.gsub(ngx.var.args,"&fq=item_condition_id%%3A%%281%+OR%+2%+OR%+3%+OR%+4%+OR%+5%+OR%+6%%29","") -- ࠷௿Ձ͕֨300ԁͳͷͰফͤΔ args = string.gsub(args,"&fq=price_t%%3A%%5B300%%2BTO%%2B*%%5D", "") -- ঎඼ͷՁ֨: frangeʹॻ͖׵͑ɹ args = string.gsub(args,"&fq=price_t%%3A%%5B%%2A%+TO%+(%d+)%%5D","&fq=%%7B%%21frange%+cache%%3Dfalse%+cost% %3D150%+u%%3D%1%%7Dprice_t") -- "͑͑͑͑͑͑͑͑͑͑͑͑͑͑͑͑͑͑͑͑͑" ରࡦɻ10ճҎ্࿈ଓͰ10จࣈ·Ͱ੾Γ٧Ί args, _, _ = ngx.re.gsub(args, "(%[0-9a-f][0-9a-f]%[0-9a-f][0-9a-f]%[0-9a-f][0-9a-f])\\1{9,}", "$1$1$1$1$1$1$1$1$1$1", "i")
  11. Neogene period Recent Master update select Recent Slave Recent
 Slave

    Condition
 Slave OpenResty Replication pollInterval 30s All Master All
 Slave All
 Slave update Replication pollInterval 1min Condition
 Slave Suggest
 Slave Suggest
 Slave Replication pollInterval bit long PHP General Specific purpose
  12. Neogene period Recent Master update select Recent Slave Recent
 Slave

    Condition
 Slave OpenResty Replication pollInterval 30s All Master All
 Slave All
 Slave update Replication pollInterval 1min Condition
 Slave Suggest
 Slave Suggest
 Slave Replication pollInterval bit long PHP General Specific purpose
  13. • εϜʔζͳSolr ͷόʔδϣϯΞοϓͷ࣮ݱ (4 → 6 → 8) • ΠϯσοΫεΛมߋ͍ͨ͠

    (BigramԽ) • BigramͰͷύϑΥʔϚϯεҡ࣋ͷͨΊ͸BareMetal αʔόͷεέʔϧΞοϓ΋ඞཁ • Microservices Խ/ϦΞʔΩςΫνϟ͍ͨ͠ • Master ͷো֐΁ͷඋ͑
  14. • தؒ queue αʔϏε (solr-queue-app) • Solr΁ͷߋ৽ϦΫΤετͷJSONΛҰ౓Ωϡʔʹ֨ೲ͔ͯ͠Βɺࢦఆ͞ΕͨαʔόʹPOST͠௚͢ • PHPͷίʔυΛมߋͤͣʹ Master

    ௥ՃɾมߋͰ͖ΔΑ͏ʹ • Solr-DB (MySQL) • ߋ৽σʔλΛ঎඼ID͝ͱʹ෼ׂ͠ɺMySQLʹ֨ೲ • MySQLͷσʔλ͔ΒશΠϯσοΫεͷ࡞੒͕਺࣌ؒͰߦ͑Δ • ͜Ε·Ͱ͸਺೔͔Β਺िؒ • Cloud PubSub ΁ͷૹ৴
  15. Quaternary period PHP Solr6 Master Solr8 Master Solr6’ Master MySQL

    Q4M API Worker Q4M API Worker API solr-queue-app API
  16. • ௨ৗͷ৽ணɾΧςΰϦλΠϜϥΠϯ • MySQL ͔Β৽ணΛऔಘ • ߋ৽͸΄΅ϦΞϧλΠϜ • ͓͢͢ΊλΠϜϥΠϯ •

    ӾཡཤྺͳͲ͔ΒSolrΫΤϦΛ࡞੒ɺSolr͔ΒϢʔβ͝ͱʹҟͳΔ঎඼Λ৽ண ॱʹදࣔ • ߋ৽͸΄΅ϦΞϧλΠϜ
  17. • ϦΞϧλΠϜͳΠϯσοΫε൓ө • ϨϓϦέʔγϣϯ͸ͤͣɺશ෦Master • soft commitΛ1ඵҎԼʹઃఆ • ߴ଎ͳϨεϙϯεͷ࣮ݱ •

    ෼ࢄΠϯσοΫεʹΑΓαʔό͋ͨΓͷυΩϡϝϯτΛݮΒ͢ • ৽ணʹे෼ͳυΩϡϝϯτ਺ͷΈอ࣋ / εΩʔϚ΋γϯϓϧʹ
  18. PHP PHP API MySQL blackhole black hole Q4M Solr (master)

    worker trigger dequeue black hole Q4M Solr (master) worker trigger dequeue black hole Q4M Solr (master) worker trigger dequeue black hole Q4M Solr (master) worker trigger dequeue soft commit several times per second Update Update item selected by consistent hashing Use MySQL replication as PubSub
  19. black hole Q4M Solr (master) worker trigger dequeue consul my

    $res = $ua->get(‘http://localhost/v1/health/service/".$SRV.'?passing'); my $ref = JSON::XS::decode_json($res->content); my @list = sort { $a cmp $b } map { $_->{Node}{Address} } @$ref; my $ketama = Algorithm::ConsistentHash::Ketama->new(); $ketama->add_bucket($_ . '_' . $timestamp, 1) for @list; my $s1 = $ketama->hash($item_id); return $s1 eq $my_ip; Get server list from Consul Make Consistent Hash Drawing by consistent-hashing Update Solr consistent-hashingͷ݁Ռ͕trueͳΒ͹ɺupdate
 consistent-hashingͷ݁Ռ͕falseͳΒ͹ɺdelete
  20. black hole Q4M Solr (master) worker trigger dequeue consul black

    hole Q4M Solr (master) worker trigger dequeue consul black hole Q4M Solr (master) worker trigger dequeue consul black hole Q4M Solr (master) worker trigger dequeue consul API/Go PHP PHP select distribute select request to all Solr servers and merge their resposne select select select select