Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
メルカリのシステム・サービス監視について/Monitoring Mercari service...
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
kazeburo
November 29, 2017
Technology
7.4k
5
Share
メルカリのシステム・サービス監視について/Monitoring Mercari service and servers
メルカリのシステム・サービス監視について
Monitoring seminar in Mercari
kazeburo
November 29, 2017
More Decks by kazeburo
See All by kazeburo
さくらのクラウド開発ふりかえり2025
kazeburo
2
2.9k
国産クラウドを支える設計とチームの変遷 “技術・組織・ミッション”
kazeburo
7
17k
クラウド開発の舞台裏とSRE文化の醸成 / SRE NEXT 2025 Lunch Session
kazeburo
1
2.5k
さくらのクラウド 開発の挑戦とその舞台裏
kazeburo
1
1.5k
[SRE kaigi 2025] ガバメントクラウドに向けた開発と変化するSRE組織のあり方 / Development for Government Cloud and the Evolving Role of SRE Teams
kazeburo
4
4.3k
[さくらのTech Day] ガバメントクラウド開発と変化と成長する組織 / sakura techday, Develop govcloud and the team
kazeburo
0
8.9k
ガバメントクラウド開発と変化と成長する組織 / Organizational change and growth in developing a government cloud
kazeburo
4
3.8k
DNS水責め攻撃と監視 / DNS water torture attack Monitoring and SLO
kazeburo
5
4.6k
DBやめてみた / DNS water torture attack and countermeasures
kazeburo
13
14k
Other Decks in Technology
See All in Technology
脅威をエンジニアリングの糧にして:恐怖を乗り越えた先にあったもの / Turn threats into fuel for engineering: what lay beyond overcoming fear
nrslib
1
360
『家族アルバム みてね』における インシデント対応との向き合い方 / Approach incident response in Family Album
kohbis
2
280
海外カンファレンス「JavaOne」参加レポート ユーザー系IT企業における目的・成果/JavaOne Report Purpose and Results in the User IT Company
muit
0
120
TypeScript Compiler APIとPHP-Parserを活用し、TypeScriptとPHPで型を共有する
shuta13
0
270
大学生が本気でDatabricksを活用してDiscordサークルをデータ駆動させてみた
phantomjuju
1
300
AI時代の私の技術インプットとアウトプット術
tonkotsuboy_com
15
8k
関西に縁あるMicrosoft MVPsが語るCopilotの未来
kasada
0
730
個人AIからチームAIへ:開発における品質と生産性の再設計
moongift
PRO
0
320
APIテストとは?
nagix
0
160
もりもり新機能を一挙紹介! AgentCoreに入門して、AWS上にAIエージェントを構築しよう
minorun365
PRO
6
350
20260528_生成AIを専属DSに_Howの次にすべきことを考える
doradora09
PRO
0
270
AI Adaptable なテストを整える工夫 / Ways to Make Your Tests AI-Adaptable
bitkey
PRO
2
190
Featured
See All Featured
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
1
2.7k
Music & Morning Musume
bryan
47
7.2k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
250
1.3M
技術選定の審美眼(2025年版) / Understanding the Spiral of Technologies 2025 edition
twada
PRO
118
120k
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
3
150
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
How People are Using Generative and Agentic AI to Supercharge Their Products, Projects, Services and Value Streams Today
helenjbeal
1
200
Optimising Largest Contentful Paint
csswizardry
37
3.7k
Stewardship and Sustainability of Urban and Community Forests
pwiseman
0
220
AI Search: Where Are We & What Can We Do About It?
aleyda
0
7.5k
Leveraging Curiosity to Care for An Aging Population
cassininazir
1
260
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
1.2k
Transcript
ϝϧΧϦͷγεςϜɾαʔϏε ࢹʹ͍ͭͯ Monitoring Seminar in Mercari 2017/Good/Meat @kazeburo
Me • Masahiro Nagano / խ • id:kazeburo • Mercari,
Inc Principal Engineer Site Reliability Engineering (SRE) Team
Agenda • Mercariͷ͜Ε·Ͱͱࢹπʔϧ • MackerelͰͷαʔόࢹ
~ 2014/9 JP ੴङ ΠϯϑϥνʔϜ!5PLZP
~2014/9 • ͘͞ΒΠϯλʔωοτੴङDCͷઐ༻αʔόͱΫϥυΛར༻ • ઐ༻αʔόʹͯZabbixαʔόΛߏங • ʮτϦΨʔʯΛ׆༻ͯ͠͞·͟·ͳࢹΛߦ͏ • ͍݅ࣜͩ͠Ͱෳࡶͳࢹ͕࣮ݱ •
ݱࡏͷࢹ߲ͷϕʔε͕Ͱ͖Δ
2014/9~ US JP ੴङ ΠϯϑϥνʔϜ!5PLZP
2014/9~ • USαʔϏε͕ AWS Oregon region ʹͯ։࢝ • ઐ༻αʔόͱΫϥυͷϚϧνˍϋΠϒϦουߏ •
USʹZabbix ServerΛߏஙͯ͠ɺ౦ژ͔Βࢹ
ଟRegion Zabbixͷ՝ • Zabbixͷઃఆ͕ͣΕ͍ͯ͘ • ӡ༻͍ͯ͠ΔZabbixͷόʔδϣϯ͕ҟͳΔ • ઐ༻αʔόͱAWSͰएׯҧ͏ࢹ߲ • JPͰ࡞ΓࠐΜͩࢹ͕USͰ࠶ݱͰ͖͍ͯͳ͍
• USͰ͚ͩى͖Δࢹ࿙ΕʹΑΔࣄނ • Zabbix ProxyΛར༻͠ɺ1ݸͷZabbix ServerूͳͲͷҊ
Zabbixͷ՝ • Zabbix ࣗମͷӡ༻ • όʔδϣϯΞοϓͷෛ୲ • MySQL ͷෛՙ͕େ͖͘ɺࢹԆͳͲൃੜ •
ෳࡶͳτϦΨʔͷཧ • ϚεΫϦοΫओମͷઃఆ • όʔδϣϯཧͳͲΛߦ͍͍ͨ • ࢹͷ௨Λվળ͍ͨ͠
2016/1~ US JP ੴङ 43&!5PLZP
mackerel ಋೖ • Service Metrics͔Βಋೖ • ؆୯ʹάϥϑ͕ඳ͚ɺࢹᮢͷઃఆ͕Ͱ͖Δ • fluentdɺNorikraͱͷΈ߹Θͤ •
ZabbixͷτϦΨʔͷҠ২ • τϦΨʔΛPluginͱ࣮ͯ͠ • Plugin GitͰཧ͠ɺAnsibleͰ
mackerel ಋೖ • ࢹπʔϧɺ࣌ܥྻDBͷӡ༻ͷΦϑϩʔυ • Կ͠ͳͯ͘ຖिόʔδϣϯΞοϓ • JP/USͰͷࢹ߲Λ߹ΘͤΔ • ҟͳΔͱ͜Ζ
Ansible templateͳͲͰٵऩ
2017/3~ US JP ੴङ UK 43&!5PLZP
2017/3~ • UK ͰͷαʔϏε։࢝ • UK Λ։࢝͢Δʹ͋ͨͬͯɺ͞Βʹ͏ҰͭͷΫϥυΛ࠾༻ • ࢹ͕ΫϥυԽ͞Ε͓ͯΓɺ৽ͨͳࢹαʔόͷՃඞཁͳ͠ •
JP/US ͷࢹ߲͕ͦͷ··ద༻Ͱ͖ɺΠϯϑϥετϥΫνϟͷߏஙྃͱͱ ʹࢹͷઃఆ͕ྃ
ݱࡏ US JP ੴङ UK 43&&OHJOFFST!+1646, Stackdriver Prometheus
ݱࡏ • ϚΠΫϩαʔϏεԽ • GKE ্ͷίϯςφɾαʔϏεͷࢹͷͨΊʹ Stack DriverɺPrometheusɺ DataDog ͷ׆༻
• αʔόαΠυΤϯδχΞࢹπʔϧΛར༻
ͦͷଞͷࢹ New Relic Kurado άϥϑը૾ΛҰؾʹݟΕΔͷͰศར جຊతͳϝτϦΫεͪ͜ΒͰݟΔ PHPͷ෦ͷτϨʔε ΞϓϦέʔγϣϯͷνϡʔχϯάͷࢀߟ
MackerelͰͷαʔόࢹ
https://speakerdeck.com/kazeburo/mackerel-day
ࢹʹ·ͭΘΔࣈ • ࢹϧʔϧ: 278 • Hostຖͷࢹϧʔϧ • MySQL: 34 •
Application: 39 • Search: 37 • Custom Plugin: 50+ (check + metrics + utils)
MySQLͷࢹ߲(1/4) • Connectivity • FileSystem % >85% >88% • Swap
% >50% >70% • ssh-alive • sshdͷϓϩηεࢹ • global-ip-and-iptables • global ipͷ༗ແͱiptablesͷঢ়ଶ • unbound-resolv • localͷunboudͰ໊લղܾ͕Ͱ͖Δ͔ • unbound-process • unboundͷϓϩηεࢹ • crond-process • crondͷϓϩηεࢹ • uptime • ࠶ىಈࢹ
MySQLͷࢹ߲(2/4) • inode-usage • inode༻ >80% >90% • uname-change •
unameίϚϯυͷ݁Ռͷdiffࢹ • passwd-change • passwdϑΝΠϧͷdiffࢹ • hostname-changed • hostnameίϚϯυͷ݁Ռͷdiffࢹ • custom.ntpq.synced.remote <0.1 <0.1 • custom.ntpq.offset.seconds >300 >300 (msec) • ntpͷಉظαʔόͱ࣌ࠁͷζϨ • custom.linux-lite.memory.avail <50MB <20MB • ۭ͖ϝϞϦ • custom.linux-lite.cpu-usage.cpu-steal >20% >20% • custom.linux-lite.cpu-usage.cpu-iowait >30% >50% • custom.linux-lite-cpu-usage.cpu-system >8% >8% • ͦΕͧΕͷCPU༻(100%্͕ݶ)
MySQLͷࢹ߲(3/4) • cutom.linux-lite.loadavg.per-cpu >3 >3 • ίΞͰׂͬͨϩʔυΞϕϨʔδ • postfix-smtp-alive •
SMTPϙʔτͷ֬ೝ • postfix-master-process • postfix masterϓϩηεࢹ • custom.postfix.mailq.queue >100 >5k • postfix mail Ωϡʔཹ • custom.linux-lite.process.all >2k >2k • custom.linux-lite.process.running >60 >100 • શϓϩηεͱ࣮ߦதͷϓϩηε • mysql-uptime • mysqlͷuptime • custom.mysql-lite.replication-threads.io <0.2 <0.2 • custom.mysql-lite.replication-threads.sql <0.2 <0.2 • ϨϓϦέʔγϣϯͷ֤threadͷঢ়ଶ
MySQLͷࢹ߲(4/4) • custom.mysql-lite.replication-behind- master.second >5 >5 • mysqlͷϨϓϦέʔγϣϯԆ • custom.mysql-lite.connections.utilization
>90 >90 • max_connectionsʹର͢ΔίωΫγϣϯ • custom.mysql-lite.threads.running >1k >2k • mysql্Ͱ࣮ߦதͷεϨου • mysql-slave-sql-error • replicationΤϥʔͷࢹ • machine-exceptions • αʔόͷϝϞϦΤϥʔࢹ • raid-disks • αʔόͷRAID/Diskঢ়ଶͷࢹ
ࢹͷҭ͔ͯͨ • ʮࢹܧଓతͳςετʯby kazuho • ʮςετϑΝʔετʯͰγεςϜΛߏங͠ͳ͕ΒࢹΛ࡞Δ • ϓϩηεࢹϙʔτࢹ • ʮϙετϞʔςϜʯͷରԠࡦͱͯ͠ࢹΛҭͯΔ
• ᮢͷௐɺL7Ϩϕϧͷࢹ
end