Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
SREによるモンスト改善事例 / improvement-example-of-monster...
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
浜田 恭平 (Kyohei Hamada)
August 24, 2017
Programming
4.5k
4
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
SREによるモンスト改善事例 / improvement-example-of-monster-strike-by-sre
hbstudy#76
第76回: SRE大全: XFLAG スタジオ編
https://hbstudy.connpass.com/event/62338/
浜田 恭平 (Kyohei Hamada)
August 24, 2017
More Decks by 浜田 恭平 (Kyohei Hamada)
See All by 浜田 恭平 (Kyohei Hamada)
年間版1秒動画2023 大量配信の裏側 - MIXI TECH CONFERENCE 2023 DAY1
haman29
2
3k
モンストのサーバー負荷との戦い 〜あけおめ2018編〜 / bcu_30_server_9
haman29
12
16k
Other Decks in Programming
See All in Programming
TAKTでAI駆動開発の品質を設計する
j5ik2o
7
1.4k
その問い、本当に正しいですか?AI時代のエンジニアに必要な哲学と認知科学 / ai-philosophy-cognitive-science
minodriven
11
5.8k
フロントエンドとバックエンドで「1文字」を揃えよう
youkidearitai
PRO
0
710
Lessons from Spec-Driven Development
simas
PRO
0
210
AI時代のUIはどこへ行く?その2!
yusukebe
21
7.3k
JJUG CCC 2026 Spring: JSpecify で実現する Kotlin フレンドリーな Java API 設計
ternbusty
1
180
Vue × Nuxt × Oxc どこまで使える?実運用の現在地
andpad
0
270
ふつうのFeature Flag実践入門
irof
8
4k
ローカルLLMを使ってB2Bサービスを作っていての学び
yaotti
0
200
Creating Composable Callables in Contemporary C++
rollbear
0
150
Vite+ Unified Toolchain for the Web
naokihaba
0
320
不変条件と整合性境界—ビジネスが決める設計判断と実現パターン / Invariants and Consistency Boundaries
nrslib
14
5.5k
Featured
See All Featured
Have SEOs Ruined the Internet? - User Awareness of SEO in 2025
akashhashmi
0
370
The Curse of the Amulet
leimatthew05
1
13k
Exploring the relationship between traditional SERPs and Gen AI search
raygrieselhuber
PRO
2
4k
Evolving SEO for Evolving Search Engines
ryanjones
0
220
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Bioeconomy Workshop: Dr. Julius Ecuru, Opportunities for a Bioeconomy in West Africa
akademiya2063
PRO
1
150
[SF Ruby Conf 2025] Rails X
palkan
2
1.1k
Test your architecture with Archunit
thirion
1
2.3k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
360
30k
Winning Ecommerce Organic Search in an AI Era - #searchnstuff2025
aleyda
1
2k
Self-Hosted WebAssembly Runtime for Runtime-Neutral Checkpoint/Restore in Edge–Cloud Continuum
chikuwait
0
590
SEO for Brand Visibility & Recognition
aleyda
0
4.6k
Transcript
SREʹΑΔϞϯετվળࣄྫ 2017/08/24 hbstudy ୈ76ճɿ SREେશ: XFLAG ελδΦฤ XFLAG ࣄۀຊ෦ ήʔϜ։ൃࣨ
SREάϧʔϓ ా ګฏ @haman29 XFLAG STUDIO
About me • ా ګฏ @haman29 • https://twitter.com/haman29 • લ৬ͰαʔόαΠυΤϯδχΞɻยखؒͰϑϩϯτΤϯυɺΠϯϑϥɻ
• ओʹWebαʔϏεʢϑϦϚαʔϏεɺΫʔϙϯαʔϏεͳͲʣΛ୲ɻ • 2016/07 SREάϧʔϓཱͱಉ࣌ʹೖࣾ͠ɺSREͷҰһʹɻ • ओʹຊ൛Ϟϯετͷӡ༻ɾվળΛ୲ɻ • झຯϘϧμϦϯά • https://codeiq.jp/magazine/2016/03/38660/ 3ਓʢલ৬ʹ͍ͨ࣌ʣ 2
͓ॻ͖ memcached Λར༻ͨ͠ෛՙରࡦ ࣄྫ1 Resque worker ىಈͷߴԽ ࣄྫ2 DBαʔόߏஙͷࣗಈԽ ࣄྫ3
·ͱΊ 3
4 ࣄྫ1. memcachedΛར༻ͨ͠ෛՙରࡦ
̏पΠϕϯτ • 3पരઈײँΨνϟ • 1Ϣʔβ1ճͷΈҾ͚ΔΨνϟ • 6Ωϟϥ͕5ମग़͖ͯͯͦͷ1ମΛબΔ • ΨνϟΛҾͨ͘ΊͷΞΠςϜʮχδۄʯΛϢʔβશһʹΔ •
৽نϢʔβʹΔ • ϢϝۄΫΤετ • ΠϕϯτΫΤετΛपճ͠ɺϢϝۄΛूΊɺϢϝۄΛͬͯΨνϟΛճ͢ • ͯͷΩϟϥΛ99ମूΊΔ·Ͱ͜ΕΛ܁Γฦ͢ 5
̏पരઈײँΨνϟ • 7:30ࠒϢʔβΞΠςϜ༻ͷDB٧·Δ • select/insert/update͕૿Ճ • os_waits ͕ٸ૿ • ରԠ
• innodb_spin_wait_delay ௐ • slaveΛՃͯ͠selectΛࢄ 6
ෛՙରࡦɹݕ౼ • 1िؒޙͷϢϝۄΫΤετʹඋ͑ͯselectΛݮΒ͓͖͍ͯͨ͠ • ͢ͰʹSQL࠷దԽ͞Ε͍ͯΔʢ୯७ͳΫΤϦ͔͠ͳ͍ʣ • monsterstrike/second_level_cache Λར༻ • hooopo/second_level_cache
Λfork֦ͯ͠ு • ActiveRecordܦ༝ͰΫΤϦΛൃߦ͢Δͱɺ͍͍ײ͡ʹ memcached Ͱ Ωϟογϡͯ͘͠ΕΔ • ͢ͰʹϞϯετͰར༻࣮͕͋Δ 7
SecondLevelCache `.create` { “user_item/10001” => “[10001, 101, 1, …]”, …
} insert into user_items(user_id, item_id, cnt) values (101, 1, 2); class UserItem < ActiveRecord::Base acts_as_cached(version: 1, expires_in: 1.day) end UserItem.create(user_id: 101, item_id: 1) # => id: 10001 app memcached MySQL 8 no cached
SecondLevelCache `.find` { “user_item/10001” => “[10001, 101, 1, …]” }
select * from user_items where id = 10001; UserItem.find(10001) app memcached MySQL 9 class UserItem < ActiveRecord::Base acts_as_cached(version: 1, expires_in: 1.day) end no cached cached
SecondLevelCache `.fetch_by_uniq_keys` { “user_item/fbu/user_id_101_item_id_1” => 10001, “user_item/10001” => “[10001, 101,
1, …]” } select * from user_items where user_id = 101 and item_id = 1; app memcached MySQL UserItem.fetch_by_uniq_keys(user_id:101, item_id: 1) ※ΩϟογϡͷϥΠϑαΠΫϧͷҧ͍ʹ͢Δ͜ͱͰ ɹޮΑ͘memcachedΛ׆༻Ͱ͖ɺMySQLͷΫΤϦݮΒ͢͜ͱ͕Ͱ͖Δ 10 class UserItem < ActiveRecord::Base acts_as_cached(version: 1, expires_in: 1.day) end no cached cached # جຊతʹexpire͠ͳ͍ # සൟʹexpire͢Δ
11 SecondLevelCache `.fetch_by_index` (֦ு) { “user_item/fbi/user_id/101” => [10001, 10002, 10003],
“user_item/10001” => “[10001, 101, 1, …]”, “user_item/10002” => “[10002, 101, 2, …]”, “user_item/10003” => “[10003, 101, 3, …]” } select id from user_items where user_id = 101; select * from user_items where id in (10001, 10002, 10003); app memcached MySQL UserItem.fetch_by_index(user_id:101) class UserItem < ActiveRecord::Base acts_as_cached(version: 1, expires_in: 1.day) acts_as_cached_by_index(:user_id) end # indexΛு͍ͬͯΔΧϥϜͷΈࢦఆՄ no cached cached ※Ωϟογϡ͍ͯ͠ͳ͍ࠩͷΈMySQLʹ͍߹ΘͤΔ # جຊతʹexpire͠ͳ͍ # සൟʹexpire͢Δ
SecondLevelCacheಋೖޙ ࢥ͍ͷ΄͔select͕ݮ͍ͬͯͳ͍… 12
# Profile # Rank Query ID Response time Calls R/Call
V/M Item # ==== ================== ============= ===== ====== ===== =============== # 1 0x365FBDCB443D99A3 5.1165 82.3% 2523 0.0020 0.20 SELECT user_items # 2 0x5B79B47AB9093007 1.0197 16.4% 178 0.0057 0.79 SELECT user_items # 3 0xA6FF35DF18E85C6C 0.0655 1.1% 101 0.0006 0.00 SELECT user_items # 4 0x10259F2E34E9D7F1 0.0136 0.2% 32 0.0004 0.00 SELECT user_items # 5 0x28DA30E044AAF5E8 0.0022 0.0% 8 0.0003 0.00 SELECT user_items # 6 0x022C53131F50003E 0.0016 0.0% 9 0.0002 0.00 SELECT user_items # Query 1: 910.54 QPS, 1.85x concurrency, ID 0x365FBDCB443D99A3 at byte 5687530 # Scores: V/M = 0.20 # Time range: 2016-10-13 23:06:37.637168 to 23:06:40.408062 # Attribute pct total min max avg 95% stddev median # ============ === ======= ======= ======= ======= ======= ======= ======= # Count 88 2523 # Exec time 82 5s 96us 973ms 2ms 8ms 20ms 185us # Rows affecte 100 154 0 1 0.06 0.99 0.24 0 # Query size 84 302.79k 120 124 122.89 118.34 0.00 118.34 # Warning coun 0 0 0 0 0 0 0 0 # String: # Hosts 10.53.6.53 (30/1%), 192.168.117.176 (20/0%)... 398 more # Query_time distribution # 1us # 10us # # 100us ################################################################ # 1ms ####### # 10ms ## # 100ms # # 1s # 10s+ # Tables # SHOW TABLE STATUS LIKE 'user_items'\G # SHOW CREATE TABLE `user_items`\G # EXPLAIN /*!50100 PARTITIONS*/ SELECT `user_items`.* FROM `user_items` WHERE `user_items`.`user_id` = 12345 AND `user_items`.`item_id` = 101 LIMIT 1\G ΫΤϦௐࠪ pt-query-digest ( Percona Toolkit ) # tcpdump ΛͱΔ # ݁ՌselectͷΈʹߜΔ $ pt-query-digest --type=tcpdump --limit=100 --filter '$event->{arg} =~ m/^(select)/i' dumpfile > result_select `.fetch_by_uniq_keys` ͕͑ͦ͏ 13
14 ݁Ռ selectΛ70%ݮɻΠϕϯτߴෛՙ࣌ͳ͘ࡹ͍ͨɻ ϢϝۄΫΤετ ̏पരઈײँΨνϟ ஈ֊తʹෛՙରࡦ༻ࠩΛσϓϩΠ
15 ࣄྫ2. Resque worker ىಈͷߴԽ
Resqueߏ Batch 1 Redis 1 Ԇͯ͠ྑ͍ॲཧόοάάϥϯυͰ port 1 worker worker
worker worker worker worker ɾɾɾ worker worker worker worker worker worker worker worker worker worker worker worker worker worker worker worker worker worker Batch 2 worker worker worker port 2 port 3 port 4 Redis 2 port 1 port 2 port 3 port 4 (Redis 1 port 1) app app app LB enqueue dequeue 16 (Redis 2 port 4)
resque:restart • શͯͷ Resque worker ϓϩηεΛ࠶ىಈ͢ΔλεΫʢcapistranoʣ • جຊతʹɺࠩΛө͢Δ࣌ʹຖճ࣮ߦ͢Δඞཁ͕͋Δ • ߹ܭ
3,500 workers • Redis server 6 * 4 ports • Batch server 50 * 1ʙ4 workers / port • ՝ • શөʹ20΄Ͳֻ͔Δɻ • workerͷࢦఆɺqueue໊ͷࢦఆ͕ύϥϝʔλԽ͞Ε͍ͯͳ͍ 17
ௐࠪɹresque:restart ಈ࡞ Batch 1 worker worker worker worker worker worker
ɾɾɾ worker worker worker worker worker worker worker worker worker worker worker worker worker worker worker worker worker worker Batch 2 worker worker worker (Redis 1 port 1) (Redis 2 port 4) 18
Batch 1 worker worker worker worker worker worker ɾɾɾ worker
worker worker worker worker worker worker worker worker worker worker worker worker worker worker worker worker worker Batch 2 worker worker worker ௐࠪɹresque:restart ಈ࡞ (Redis 1 port 1) (Redis 2 port 4) 19 STOP
Batch 1 worker worker worker worker worker worker ɾɾɾ worker
worker worker worker worker worker worker worker worker worker worker worker worker worker worker worker worker worker Batch 2 worker worker worker worker ௐࠪɹresque:restart ಈ࡞ (Redis 1 port 1) (Redis 2 port 4) 20 STOP parent
Batch 1 worker worker worker worker worker worker ɾɾɾ worker
worker worker worker worker worker worker worker worker worker worker worker worker worker worker worker worker worker Batch 2 worker worker worker worker fork worker worker worker ௐࠪɹresque:restart ಈ࡞ (Redis 1 port 1) (Redis 2 port 4) 21 STOP parent
Batch 1 worker worker worker worker worker worker ɾɾɾ worker
worker worker worker worker worker worker worker worker worker worker worker worker worker worker worker worker worker Batch 2 worker worker worker worker fork worker worker worker ௐࠪɹresque:restart ಈ࡞ (Redis 1 port 1) (Redis 2 port 4) 22 STOP parent STOP
Batch 1 worker worker worker worker worker worker ɾɾɾ worker
worker worker worker worker worker worker worker worker worker worker worker worker worker worker worker worker worker Batch 2 worker worker worker worker fork worker fork worker worker worker worker worker worker Ҏ߱ྻͰ࣮ߦ͞ΕΔ ௐࠪɹresque:restart ಈ࡞ (Redis 1 port 1) (Redis 2 port 4) 23 STOP STOP parent parent
ௐࠪɹresque:restart ಈ࡞ worker worker worker ɾɾɾ worker worker worker worker
worker worker worker worker worker worker worker worker worker worker worker Redis 1 port 1 queue 1 worker worker worker ɾɾɾ worker worker worker worker worker worker worker worker worker worker worker worker worker worker worker queue 2 ෳqueue͕૬Γ͍ͯ͠Δύλʔϯ queue 1 ͷޙʹ queue 2 ͕ྻͰ࣮ߦ͞ΕΔ 24
࣮ɹresque:restart • grosser/parallel Λར༻ͯ͠4ฒྻͰ࣮ߦ • ෳqueue͕૬Γ͍ͯ͠Δ߹ฒྻʹ • workerɺqueue໊ΛύϥϝʔλԽ • มߋ࣌ͷίετΛݮΒ͢
25
݁Ռɹresque:restart 20 → 3 85%ݮ Ϣʔβ༷ʹΑΓૣ͘ՁΛఏڙͰ͖ΔΑ͏ʹͳͬͨ 26
27 ࣄྫ3. DBαʔόߏஙͷࣗಈԽ
• MariaDB • શͯΦϯϓϨ • DC • 1DC͋ͨΓ150ऑ(backupؚΉ) → શମͰ300ʢ2017/08࣌ʣ
• ਨׂɺਫฏׂ DBαʔόߏ master slave backup master backup DC1 DC2 replication • εέʔϧΞοϓͳͲͷߏมߋɺϋʔυΣΞނোͳͲʹΑΓDBαʔό ͷߏங͕ߴ͍සͰൃੜ͢Δ 28
• Ϛγϯ֬อɺߏมߋʢDCৗறͷϝϯόʔͱ࿈ܞʣ • OSΠϯετʔϧʢPXEboot ͔ Cobbler + koanʣ • ॳظηοτΞοϓʢAnsibleʣ
• σʔλྖҬ࡞ ( mkfs.xfs, mount ) • MariaDBηοτΞοϓʢChefʣ • όοΫΞοϓͱϦετΞ Percona XtraBackup • σʔλݩbackup༻DBαʔό • ϨϓϦέʔγϣϯ • nagiosࢹೖΕ • (masterͷ߹) ϝϯςφϯε࣌ʹMHA(mysql-master-ha)ͰΓସ͑ DBαʔόߏஙʙαʔϏεΠϯ 29
• Ϛγϯ֬อɺߏมߋʢDCৗறͷϝϯόʔͱ࿈ܞʣ • OSΠϯετʔϧʢPXEboot ͔ Cobbler + koanʣ • ॳظηοτΞοϓʢAnsibleʣ
• σʔλྖҬ࡞ ( mkfs.xfs, mount ) • MariaDBηοτΞοϓʢChefʣ • όοΫΞοϓͱϦετΞ Percona XtraBackup • σʔλݩbackup༻DBαʔό • ϨϓϦέʔγϣϯ • nagiosࢹೖΕ • (masterͷ߹) ϝϯςφϯε࣌ʹMHA(mysql-master-ha)ͰΓସ͑ DBαʔόߏஙʙαʔϏεΠϯɹվળલ ख࡞ۀ ख࡞ۀ ख࡞ۀ 30
σʔλྖҬ࡞( mkfs.xfs, mount )ͷࣗಈԽ • طଘͷChefϨγϐʹՃ͢Δ • ཁ݅ • SSD,
ioDrive, ioMemoryͷΈ߹ΘͤʹରԠ͍ͨ͠ • ϚϯτରͷσόΠε໊Λਪఆ͍ͨ͠ • LVMΛར༻ͯ͠ෳσΟεΫΛଋͶ͍ͨ 31
σʔλྖҬͷରσΟεΫΛਪఆ͢Δ • SSD 1ຕ, ioMemory 1ຕ • /dev/fioa ͕ data
volume • SSD 1ຕ • /dev/sda7 ͳͲ͕ data volume • SSD 2ຕ • /dev/sdb ͕ data volume • ( /dev/sda root volume ) • SSD 3ຕ • /dev/sdb + /dev/sdc (LVM) ͕ data volume 32
• Ϛγϯ֬อɺߏมߋʢDCৗறͷϝϯόʔͱ࿈ܞʣ • OSΠϯετʔϧʢPXEboot ͔ Cobbler + koanʣ • ॳظηοτΞοϓʢAnsibleʣ
• σʔλྖҬ࡞ ( mkfs.xfs, mount ) • MariaDBηοτΞοϓʢChefʣ • όοΫΞοϓͱϦετΞ Percona XtraBackup • σʔλݩbackup༻DBαʔό • ϨϓϦέʔγϣϯ • nagiosࢹೖΕ • (masterͷ߹) ϝϯςφϯε࣌ʹMHA(mysql-master-ha)ͰΓସ͑ DBαʔόߏஙʙαʔϏεΠϯɹվળޙ 1ίϚϯυͰରతʹ࣮ߦͰ͖ΔΑ͏ʹͨ͠ γΣϧεΫϦϓτʹͨ͠ Chefద༻ʹؚΊͨ 33
• ຊ࣭తͳ࡞ۀʹूத͢ΔͨΊʹɺ • ʢࣄྫ3ʣࣗಈԽʹΑΓଐਓੑΛഉআ͠ɺΦϖϛεΛݮΒ͠ɺεέʔϧ ͍͢͠ӡ༻ʹ͢Δ ↓ • Ϣʔβ༷ʹΑΓߴ͍ՁΛఏڙ͢ΔͨΊʹɺ • ʢࣄྫ1ʣαʔϏεΛམͱ͞ͳ͍ͨΊʹઌճΓΛͯ͠ෛՙରࡦ͢Δ
• ʢࣄྫ2ʣσϓϩΠϑϩʔΛվળͯ͠ϦϦʔεʹֻ͔Δ࣌ؒΛ͘͢Δ ·ͱΊ 34
͋Γ͕ͱ͏͍͟͝·ͨ͠