Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Site::Reliability::Engineering - YAPC::Hakkaido...

kazeburo
December 12, 2016

Site::Reliability::Engineering - YAPC::Hakkaido 2016 Sapporo

Site::Reliability::Engineering - YAPC::Hakkaido 2016 Sapporo

kazeburo

December 12, 2016
Tweet

More Decks by kazeburo

Other Decks in Technology

Transcript

  1. Me • Masahiro Nagano • @kazeburo • Mercari, Inc
 Principal

    Engineer
 Site Reliability Engineering (SRE) Team • BASE, Inc Technical Advisor
  2. CPAN/Perl • Gazelle • Cookie::Baker(::XS) • WWW::Form::UrlEncoded(::XS) • HTTP::Entity::Parser •

    Apache::LogFormat::Compiler • Plack::Middleware::ServerStatus::Lite • GrowthForecast
  3. Google SRE • શһ͕ Software Engineer • ։ൃνʔϜͱSREͱ࠾༻ͷ۠ผ͸ͳ͍ • 6ϲ݄ؒͷSREݚम΋ड͚Δ͜ͱ΋Ͱ͖ɺ͔ͦ͜ΒSREʹͳΔྫ΋

    • SLA ͱ Error Budget • ։ൃνʔϜͱSREͷؒͰSLAɺError BudgetΛڞ༗͢Δ͜ͱͰɺ৽نػೳ ͷ௥ՃͱαʔϏεͷ҆ఆͷڝ߹Λղܾ͢Δ • SLA͸αʔϏε͝ͱʹܾఆ͞ΕΔ • Error Budget͕ෆ଍ͦ͠͏Ͱ͋Ε͹ɺ৴པੑΛ޲্ͤ͞Δ։ൃʹઐ೦͢ Δ͜ͱ͕ٻΊΒΕΔ
  4. Google SRE • OnCall (౰൪) • SREνʔϜ͸ͩΕͰ΋ఆظతʹ୲౰͢Δ • ӡ༻ʹ͋ͨΔ࣌ؒΛ 50%

    ʹ੍ݶ • ࢒Γͷ࣌ؒ͸৴པੑ޲্ͷͨΊͷιϑτ΢ΣΞ։ൃ ʹ͋ͯΔ • ࣗಈԽ΍ෛՙ෼ࢄͷιϑτ΢ΣΞ͕ੜ·ΕΔ౔৕ʹ
  5. Google SRE “what happens when a software engineer is tasked

    with what used to be called operations” https://cloudplatform.googleblog.com/2016/07/adventures-in-SRE-land-welcome-to-Google-Mission-Control.html Google Vice President of Engineering Ben Treynor Sloss, who coined the term SRE “our work is like being a part of the world’s most intense pit crew. We change the tires of a race car as it’s going 100mph” https://students.googleblog.com/2012/06/site-reliability-engineers-worlds-most.html
  6. SRE in JP • ʮΠϯϑϥΤϯδχΞ/DevOpsʯશ੝ • 2015/11 Mercari Tech Blog

    ʹͯ঺հͯ͠Ҏ߱஫໨͕ू·Δ • “ΠϯϑϥνʔϜվΊ Site Reliability Engineering (SRE) νʔϜʹ ͳΓ·ͨ͠”
 http://tech.mercari.com/entry/2015/11/18/153421 • 2016/06 SRE Tech Talk#1 • https://connpass.com/event/34825/
  7. SRE in JP • αΠϘ΢ζ͞Μ “SRE νʔϜΛઃཱ͠·͢” • http://blog.cybozu.io/entry/2016/09/01/080000 •

    Cookpad͞Μ “࠷ۙɺElastic Beanstalk΍ECSͱઓ͍ͬͯΔSREνʔ ϜͷੁݪͰ͢” • http://techlife.cookpad.com/entry/2016/10/06/000000 • Retty͞Μ “ΠϯϑϥͰ͸ͳ͍ʂ৴པੑΛߴΊΔΤϯδχΞ ʮSREʯͱ͸ʁ” • https://www.wantedly.com/companies/retty/posts/17568
  8. Architecture nginx nginx nginx ©2011 Amazon Web Services LLC or

    its affiliates. All rights reserved. Client Multimedia Corporate data center Traditional server Mobile Client IAM Add-on Example: IAM Add-on ence ) Assignment/ Task Requester Workers DNS-RR App App App App App App MySQL MySQL memcached memcached JP util util cloud cloud ઐ༻αʔόʹΑΔߏ੒͕ϕʔε
  9. Architecture US nginx nginx nginx ©2011 Amazon Web Services LLC

    or its affiliates. All rights reserved. Client Multimedia Corporate data center Traditional server Mobile Client IAM Add-on Example: IAM Add-on ence ) Assignment/ Task Requester Workers DNS-RR App App App App App App MySQL MySQL memcached memcached nginx nginx nginx ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia C d Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific DNS-RR App App App App App App MySQL MySQL memcached memcached EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 JP util util cloud cloud util util EC2 EC2
  10. Architecture US nginx nginx nginx ©2011 Amazon Web Services LLC

    or its affiliates. All rights reserved. Client Multimedia Corporate data center Traditional server Mobile Client IAM Add-on Example: IAM Add-on ence ) Assignment/ Task Requester Workers DNS-RR App App App App App App MySQL MySQL memcached memcached nginx nginx nginx ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia C d Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific DNS-RR App App App App App App MySQL MySQL memcached memcached EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 EC2 JP Ϋϥ΢υͰ΋ ಉ͡ߏ੒ util util cloud cloud util util EC2 EC2
  11. Architecture • ϝϯςϯεϏϦςΟɾεέʔϥϏϦςΟઓུͷڞ௨Խ • > গਓ਺Ͱͷ Operation • > US

    Ͱͷ App Store ϥϯΩϯά3Ґʹ଱͑Δ • αʔϏεͷల։࣌ʹ࠷΋Bestͳ Infrastructure Λબ୒ • > UK͸GCP
  12. OSS by SRE • Gaurun - General push notification server

    in Go • https://github.com/mercari/gaurun • WideBullet - an API gateway with JSON-RPC • https://github.com/mercari/widebullet • Mackerel Plugins • https://github.com/kazeburo/custom-mackerel-plugins
  13. Nginx as a Internal LB App nginx nginx Solr Solr

    Solr Solr App App clustering with consul nginx nginx DNS-RR Services LLC or its affiliates. All rights reserved. Multimedia Corporate data center Traditional server Mobile Client Example: IAM Add-on ment/ k Requester Workers private network
  14. Nginx as a Useful L7-LB App nginx nginx Solr Solr

    Solr Solr App App clustering with consul nginx nginx DNS-RR Services LLC or its affiliates. All rights reserved. Multimedia Corporate data center Traditional server Mobile Client Example: IAM Add-on ment/ k Requester Workers private network Cache!! Here
  15. Cache is NOT silver bullet • Cache͕ଘࡏ͠ͳ͍৔߹͸஗͍ɻϢʔβମݧͷѱԽ • Cache Thundering

    Problem • eg) proxy_cache_lock • => Cache ΛઌಡΈ͢Δ͜ͱ͸Ͱ͖ͳ͍͔ • ύλʔϯ͕ଟ͘batchͰcacheΛ࡞੒͢Δͷ͸೉͍͠ • සൟʹߋ৽͍ͨ͠
  16. make Search pre-cacher App nginx nginx Solr Solr Solr Solr

    App App log daemon tail access.log http req
  17. #!/usr/bin/perl use strict; use warnings; use LWP::UserAgent; while(1){ open(my $fh,

    "-|", "tail","-5000","/var/log/nginx/access.log") or die $!; my %path_query; while(<$fh>){ if ( m!pre-cacher! ) { next } if ( $_ !~ m!(cachemode=readahead|group.limit)! ) { next } if ( m!\suri:(.+?)\s! ){ $path_query{"$1"} = 1; } } my $ua = LWP::UserAgent->new(agent=>”pre-cacher"); $ua->timeout(20); for my $path_query ( keys %path_query ) { my $req = HTTP::Request->new( GET => ‘http://localhost' . $path_query ); $req->header('Host'=>'lb-search'); $ua->request($req); } sleep 5; }
  18. Search pre-cacher • ͱ͋Δ Perl Monger (Me) ͷSRE͕3෼Ͱॻ͍ͨ • ॏ͍Solr

    queryΛ౤͛Δ࣌͸query_stringʹ readadheadͱ։ൃऀʹ͚ͭͯ໯͑͹ࣗಈͰઌಡ ΈରԠՄೳ
  19. JobWorker in Mercari App Q4M Q4M enqueue child child child

    child Parent fork(2) php-parallel-prefork dequeue clustering with consul
  20. Copy on Write (CoW) • fork(2) Λߦͬͨࡍʹɺ਌ϓϩηε͔Βࢠϓϩη ε΁ͷϝϞϦʔίϐʔ͸ߦΘͣɺมߋͯ͠ॳΊͯ ίϐʔ͞ΕΔɻ •

    ϝϞϦʔΛ਌ϓϩηεͱڞ༗͢Δ͜ͱͰઅ໿ • ܭࢉ݁ՌΛڞ༗͢Δ͜ͱͰෛՙΛԼ͛Δ • mod_perl mongerͳΒྦΛྲྀͯ͠ޠΔ
  21. CoW in PHP • mod_php Ͱ͸ޮՌͳ͠
 ਌ϓϩηεىಈ࣌ʹԿ͔΍ΔΑ͏ͳΦϓγϣϯ͕ଘࡏ͠ ͳ͍ • CLI

    Ͱ͸࢖͑Δ • PHPͰ͸ClassΛར༻ͨ࣌͠ʹautoload͢ΔจԽ • > ࣗવͱCoWʹͳΓʹ͍͘ • ϕϯνϚʔΫͰূ໌ͯ͠औΓࠐΜͰ΋Β͏
  22. function t_sendmail() { $smtp = new SimpleMailWithSwift(); $smtp->send($params); } //

    t_sendmail(); // ਌ϓϩηεͰϝʔϧΛૹΔ͜ͱͰclassͷpreloadΛߦ͏ for ( $i=0; $i < 300; $i++ ) { $pid = pcntl_fork(); if ($pid == -1) { die('fork Ͱ͖·ͤΜ'); } else if ($pid) { pcntl_wait($status); } else { // ࢠϓϩηεͷ৔߹ t_sendmail(); exit(0); } } 28.639s => 22.048s
  23. In-house URL shortener • େن໛ϝʔϧ഑৴࣌ʹར༻͍ͨ͠ • > ౎౓URLੜ੒Ͱ͖Δmsec୯ҐͷύϑΥʔϚϯε • JP/US/UK

    Ͳ͔͜ΒͰ΋௿஗ԆͰ࢖͑Δ • URIͷhost෦͸શRegionͰڞ௨ • https://example.ly/abcd1234
  24. RTT between regions ੴ AWS ౦ GCP 18ms 110ms 140ms

    GCP 6ms ͍͍ͩͨͷ஋ γεςϜΛ෼ࢄͯ͠഑ஔ͢Δඞཁ͕͋Δ ࢀর͸Օॴʹ·ͱΊΔ
  25. In-house URL shortener • Regional API • Go + MySQL

    • Private Network಺ʹઃஔ • Global API • GAE/Go / OperationͷলྗԽ • US region ʹઃஔ
  26. • ଵଦ • > Infrastructure, Operation ͷࣗಈԽ • ୹ؾ •

    > ߴ͍ύϑΥʔϚϯεΛ௥͍ٻΊΔ • ၗຫ • > ଟ͘ͷϢʔβɺେن໛ͳΞΫηεɺ༷ʑͳσʔλΛѻ͏ • > ۀ຿΁ͷϓϥΠυ
  27. DSL for System Call • CݴޠϑϨϯυϦʔ • System callͱ1:1ͰରԠ •

    > fork, waitpid, syswrite, sysreadͳͲ • Socket·ΘΓ΋ૉ௚ • Ruby, PHPͱൺ΂ͯΫη͕ͳ͍ • OS΍NetworkΛֶͿͷʹ͸͍͍ݴޠ(ͩͱࢥ͏)