Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Mackerelにおける時系列データベースの性能改善 / Performance Improv...
Search
Yuuki Tsubouchi (yuuk1)
July 09, 2016
Technology
13
8.8k
Mackerelにおける時系列データベースの性能改善 / Performance Improvement of TSDB in Mackerel
ペパボ・はてな技術大会〜インフラ技術基盤〜@福岡
Yuuki Tsubouchi (yuuk1)
July 09, 2016
Tweet
Share
More Decks by Yuuki Tsubouchi (yuuk1)
See All by Yuuki Tsubouchi (yuuk1)
とあるSREの博士「過程」 / A Certain SRE’s Ph.D. Journey
yuukit
9
4.1k
eBPFを用いたAIネットワーク監視システム論文の実装 / eBPF Japan Meetup #4
yuukit
3
1k
クラウドのテレメトリーシステム研究動向2025年
yuukit
3
1k
博士論文公聴会: Scaling Telemetry Workloads in Cloud Applications: Techniques for Instrumentation, Storage, and Mining / PhD Defence
yuukit
1
230
博士学位論文予備審査 / Scaling Telemetry Workloads in Cloud Applications: Techniques for Instrumentation, Storage, and Mining
yuukit
1
2k
MetricSifter:クラウドアプリケーションにおける故障箇所特定の効率化のための多変量時系列データの特徴量削減 / FIT 2024
yuukit
2
290
工学としてのSRE再訪 / Revisiting SRE as Engineering
yuukit
19
14k
Cloudless Computingの論文紹介
yuukit
2
580
#SRE論文紹介 Detection is Better Than Cure: A Cloud Incidents Perspective V. Ganatra et. al., ESEC/FSE’23
yuukit
3
2.2k
Other Decks in Technology
See All in Technology
モダンフロントエンド 開発研修
recruitengineers
PRO
4
1.1k
トヨタ生産方式(TPS)入門
recruitengineers
PRO
4
490
GitHub Copilot coding agent を推したい / AIDD Nagoya #1
tnir
4
4.7k
LLMエージェント時代に適応した開発フロー
hiragram
1
420
広島銀行におけるAWS活用の取り組みについて
masakimori
0
140
Devinを使ったモバイルアプリ開発 / Mobile app development with Devin
yanzm
0
190
人と組織に偏重したEMへのアンチテーゼ──なぜ、EMに設計力が必要なのか/An antithesis to the overemphasis of people and organizations in EM
dskst
6
640
生成AI利用プログラミング:誰でもプログラムが書けると 世の中どうなる?/opencampus202508
okana2ki
0
190
Amazon Bedrock AgentCore でプロモーション用動画生成エージェントを開発する
nasuvitz
6
450
攻撃と防御で実践するプロダクトセキュリティ演習~導入パート~
recruitengineers
PRO
3
480
DuckDB-Wasmを使って ブラウザ上でRDBMSを動かす
hacusk
1
110
広島発!スタートアップ開発の裏側
tsankyo
0
250
Featured
See All Featured
Mobile First: as difficult as doing things right
swwweet
223
9.9k
The Invisible Side of Design
smashingmag
301
51k
Optimising Largest Contentful Paint
csswizardry
37
3.4k
How STYLIGHT went responsive
nonsquared
100
5.7k
The Language of Interfaces
destraynor
160
25k
Building Adaptive Systems
keathley
43
2.7k
Why You Should Never Use an ORM
jnunemaker
PRO
59
9.5k
Bash Introduction
62gerente
614
210k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
How to Think Like a Performance Engineer
csswizardry
25
1.8k
KATA
mclloyd
32
14k
Gamification - CAS2011
davidbonilla
81
5.4k
Transcript
Mackerelʹ͓͚Δ ࣌ܥྻσʔλϕʔεͷੑೳվળ ϖύϘɾͯͳٕज़େձʙΠϯϑϥٕज़ج൫ʙ@Ԭ ͯͳ id:y_uuki
id:y_uuki yuuki ΣϒΦϖϨʔγϣϯΤϯδχΞ@ͯͳ ೖࣾ3͘Β͍
07/02@ژ https://speakerdeck.com/yuukit/linux-network-performance-improvement-at-hatena
͘͡ 1. Mackerelͱ࣌ܥྻσʔλ 2. GraphiteͷΞʔΩςΫνϟͱੑೳঢ়گ 3. σΟεΫεϥογϯάͱͦͷղܾ 4. ·ͱΊ
͘͡ 1. Mackerelͱ࣌ܥྻσʔλ 2. GraphiteͷΞʔΩςΫνϟͱੑೳঢ়گ 3. σΟεΫεϥογϯάͱͦͷղܾ 4. ·ͱΊ
https://mackerel.io
αʔόͷϝτϦοΫՄࢹԽ
MackerelͷΞʔΩςΫνϟ
Mackerelͷ࣌ܥྻσʔλͷಛੑ • ΤʔδΣϯτ͕Ϣʔβ͞Μͷϗετ͔ΒຖϝτϦοΫ ߘ • 2016/01࣌ͰΞΫςΟϒΤʔδΣϯτ 10,000+ • 1ΤʔδΣϯτ͋ͨΓͷϝτϦοΫ࠷େ200 •
ԾʹฏۉϝτϦοΫΛ100 metrics/agentͱ͢Δͱɹ ߹ܭૹ৴ϝτϦοΫ 1,000,000 metrics/min + • ϝτϦοΫͷେྔॻ͖ࠐΈʹ͑ΒΕΔσʔλϕʔε͕ ඞཁ
Graphite
͘͡ 1. Mackerelͱ࣌ܥྻσʔλ 2. GraphiteͷΞʔΩςΫνϟͱੑೳঢ়گ 3. σΟεΫεϥογϯάͱͦͷղܾ 4. ·ͱΊ
Graphiteͱ • PythonͰॻ͔Εͨ࣌ܥྻσʔλϕʔεϛυϧΣΞ • HTTPΠϯλϑΣʔε ʢॻ͖ࠐΈಠࣗϓϩτίϧʣ • ग़ྗσʔλܗࣜάϥϑը૾·ͨJSON Graphite (timestamp,
name, value) graph request Image or JSON
GraphiteͷΞʔΩςΫνϟ (timestamp, name, value) graph request Image or JSON carbon
graphite-web filesystem write read whisper whisper
GraphiteͷΞʔΩςΫνϟ (graphite-web) (timestamp, name, value) graph request Image or JSON
carbon graphite-web filesystem write read whisper whisper ಡΈࠐΈཁٻΛड͚͚ΔͨΊͷWebΞϓϦέʔγϣϯ
GraphiteͷΞʔΩςΫνϟ (carbon) (timestamp, name, value) graph request Image or JSON
carbon graphite-web filesystem write read whisper whisper ॻ͖ࠐΈཁٻΛड͚͚ΔͨΊͷσʔϞϯ
GraphiteͷΞʔΩςΫνϟ (whisper) (timestamp, name, value) graph request Image or JSON
carbon graphite-web filesystem write read whisper whisper ࣌ܥྻDBϑΝΠϧΛ࡞ɾߋ৽͢ΔͨΊͷϥΠϒϥϦ ϝτϦοΫ͝ͱʹ ϑΝΠϧ͕Ͱ͖Δ
Whisperͷσʔλߏ • ͯ͢ͷσʔλΛอଘ͢ΔͱσΟεΫ༻ྔ͕ංେԽ • timestamp: 4byte, value: 8byteͱͯ͠12bytes/datapointͱ͢Δ ͱɺ1Ͱ6MB/metric •
ݹ͍σʔλʹ͍ͭͯҰఆظؒͰฏۉԽor࠷େΛؙͯ͠Ί ͯ͠·ͬͯσΟεΫ༻ྔΛઅ • ex. 1ਫ਼ͷσʔλ1͚ͩͰΑ͍͕ɺ5ਫ਼ͷσʔλ 1िؒ͢ͱ͍͏Α͏ͳΠϝʔδ
Graphiteͷॻ͖ࠐΈύϑΥʔϚϯεಛੑ(CPUར༻) • carbon2ͭͷεϨου͕ڠௐͯ͠ಈ࡞͢Δ • σʔλΛड͚औΔωοτϫʔΫI/OεϨου • ϑΝΠϧॻ͖ࠐΈͷͨΊͷI/OεϨου • ΠϕϯτۦಈϞσϧͷωοτϫʔΫαʔό •
όοϑΝ͝͠ʹεϨουؒͰσʔλϙΠϯτΛ͢ • ֤εϨου͕1ίΞͰ͢Δ • carbonϓϩηεΛෳݸͨͯͯࢄͤ͞Δ
Graphiteͷॻ͖ࠐΈύϑΥʔϚϯεಛੑ(σΟεΫIO) • େྔͷϑΝΠϧʹখ͞ͳσʔλྔʢ12ByteʣΛ1Ҏ ʹॻ͖ࠐΉ • ϑΝΠϧγεςϜ্ͷۙྡϒϩοΫʹ·ͱΊͯॻ͘͜ͱ ͕Ͱ͖ͳ͍ͨΊɺI/Oޮѱ͍ (શํҐॻ͖ࠐΈ) • ໘ɺಉ࣌ʹෳͷεϨου͕1ͭͷϑΝΠϧʹॻ͖ࠐ
Ή͜ͱ͕ͳ͍ͨΊɺ I/OͷฒྻߴΊ͍͢ • XFSͷΑ͏ͳฒྻI/Oʹ༏ΕͨϑΝΠϧγεςϜͰͳ͘ ͯɺੑೳมΘΒͳ͍ (ext4ͳͲ)
ϋʔυΣΞߏͱϦιʔε༻ྔ • CPU: Xeon E5-2697 v3 @ 2.60GHz 2 socket
28ίΞ • ϝϞϦ: 126GB • σΟεΫ: Fusion ioMemory ioDrive2 6.4TB • ͍ΘΏΔϑϨογϡετϨʔδɻϝʔΧʔެশ 300k write IOPS • ࣮ޮI/Oੑೳ: 50k ~ 100k write IOPS • ී௨ͷSSDͳΒ1/10ͷੑೳ͕ͰΕྑ͍ํ
Graphiteνϡʔχϯά • ioDriveͷIOPSΛ͍ΔલʹCPUϦιʔεΛ͍͖ͬ ͯ͠·͏ͨΊɺCPUΛઅͯ͠I/Oʹ͚Δߟ͑ํ • random writeʹڧ͍ߴͳσΟεΫͳͨΊɺجຊతʹ carbonI/Oεέδϡʔϥʹ༨ܭͳ࠷దԽΛͤ͞ͳ͍ • ιʔτʹΑΔI/OޮԽI/OϦιʔεΛ͍͖Βͳ͍
ͨΊͷ੍ݶͷύϥϝʔλ͕͋Δ • echo noop > /sys/block/fioa/queue/scheduler
GraphiteΫϥελߏ (timestamp, name, value) graphite-web carbon carbon … … LB
carbon carbon … … LB LB carbon carbon … …
ৄ͘͠ϒϩάͰ http://blog.yuuk.io/entry/high-performance-graphite
͘͡ 1. Mackerelͱ࣌ܥྻσʔλ 2. GraphiteͷΞʔΩςΫνϟͱੑೳঢ়گ 3. σΟεΫεϥογϯάͱͦͷղܾ 4. ·ͱΊ
write IOPS read IOPS ಥવͷreadෛՙ૿େ
ͳʹ͕ى͖ͨͷ͔ • read IOPS͕૿Ճ͠ɺwrite IOPS͕ݮগ͍ͯ͘͠ • ϝϞϦෆʹΑΔSwapྖҬͷ༻ͳ͠ɻOSͷϝϞϦ ༻ྔ1/3ఔͩͬͨ • αʔϏεͷಥൃతͳΞΫηε૿Ճͳ͠
• sar -BͰɺҰఆ࣌ؒͷϖʔδΠϯͱϖʔδΞτͷ͕ ҟৗʹ૿͍͑ͯͨ͜ͱ͕໌ • ͜ͷݱΛσΟεΫεϥογϯάͱݺͿ͜ͱʹ͢Δ • LinuxͷϖʔδΩϟογϡͷΈͱGraphiteͷI/Oύ λʔϯ͔ΒݪҼΛਪͨ͠
LinuxͷϖʔδΩϟογϡ • ϝϞϦͷ༁ = used + buffers/caches + free •
ϑΝΠϧγεςϜ͔ΒσʔλΛಡΈࠐΉ/ॻ͖ࠐΉͱɺ࣍ճ Ҏ߱ߴʹಡ·ͤΔͨΊʹɺOS͕ϖʔδ୯ҐͰσΟεΫ্ ͷσʔλΛϝϞϦʹࡌͤΔ • ϖʔδΩϟογϡͱݺͿ • ϖʔδΩϟογϡLRUΞϧΰϦζϜɻ࠷ۙࢀর͞Εͨ Ωϟογϡσʔλ͠ɺࢀর͞Εͳ͍ݹ͍Ωϟογϡσʔ λΛফ͢ • ϖʔδΩϟογϡ௨ৗϝϞϦ༻ྔʹؚ·Εͳ͍
GraphiteͷI/Oύλʔϯ • 1ҎʹશͯͷΞΫςΟϒͳwhisperϑΝΠϧʹॻ͖ ࠐΉͨΊɺσΟεΫͷൣғʹͬͯॻ͖ࠐΈ͕Δ • whisperͷϝτϦοΫॻ͖ࠐΈૢ࡞ɺwrite(2)͚ͩͰ ͳ͘ɺϝλσʔλͷಡΈࠐΈΦϑηοτܭࢉͷͨΊ ͷread(2)Δ • ϖʔδΩϟογϡread͚ͩͰͳ͘writeʹ༗ޮ
(Direct I/Oআ͘) • GraphiteϗετେྔͷϖʔδΩϟογϡΛͭ
read IOPS૿ͷݪҼ • ϖʔδΠϯͱϖʔδΞτճ͕ଟ͍ͱ͍͏͜ͱɺ LRUʹΑΓݹ͍Ωϟογϡ͕͍ग़͞Ε͍ͯΔ • whisperॻ͖ࠐΈͷreadͰϖʔδΩϟογϡ͕ޮ͔ͳ͘ ͳͬͨ݁Ռɺread IOPS͕૿͑ͨ Memory
used page cache page in page out
ϖʔδΩϟογϡͷઅ • ࡌϝϞϦΛ૿͢͜ͱͰҰԠղܾͰ͖Δ͕ɺ͢Ͱʹ 126GB RAMͳͷͰɺແବͳϖʔδΩϟογϡΛݮ͍ͨ͠ • writeͨ͠σʔλΛ͙͢ʹಡΉͱݶΒͳ͍ͨΊɺwrite࣌ͷ σʔλΛΩϟογϡʹͷͤͳ͍ => Direct
I/O • ͔͠͠ɺDirect I/OΛ͏ͨΊʹɺϒϩοΫαΠζͰϝϞ ϦΞϥΠϝϯτΛἧ͑Δඞཁ͕͋Δ => PythonͰΔͷ͕ ͱͯ໘ (malloc => posix_memalign) • posix_fadvise(2)Λͬͯղܾ
posix_fadvise(2) • ϓϩηε͕ΧʔωϧϑΝΠϧσʔλͷΞΫηεύλʔϯΛ ௨ • Χʔωϧࢦఆ͞ΕͨΞΫηεύλʔϯʹԠͯ͡I/Oੑೳ͕ ্͢ΔΑ͏ʹ࠷దԽ • ΞΫηεύλʔϯ •
POSIX_FADV_SEQUENTIAL: 2ഒͷઌಡΈ • POSIX_FADV_RANDOM: ઌಡΈఀࢭ • POSIX_FADV_DONTNEED: Ωϟογϡͨ͠ϖʔδͷղ์ • etc int posix_fadvise(int fd, off_t offset, off_t len, int advice);
posix_fadvise(2)ΛGraphiteʹద༻ • ࠷ॳɺϖʔδΩϟογϡΛམͱ͢Φϓγϣϯʹண • whisperͷॻ͖ࠐΈϩδοΫ݁ߏෳࡶͳͨΊɺwriteʹ ΑΔϖʔδΩϟογϡ෦͚ͩΛམͱ͢ͷ͕͍͠ • FAD_RANDONʹΑΓɺઌಡΈΛͤͣඞཁͳϖʔδ͚ͩ Ωϟογϡ͢ΔΑ͏ʹͨ͠ •
whisperͷॻ͖ࠐΈͰγʔέϯγϟϧʹᢞΊΔॲཧͳ͍ • ઌಡΈ͍ͯͨ͠ແବͳϖʔδΩϟογϡ͕ݮͬͨ Active(file): 5387160 kB Inactive(file): 37566804 kB Active(file): 32252136 kB Inactive(file): 7231020 kB /proc/meminfo before & after
GraphiteͷPull Request
Pull Request༰ • มߋ༰͞΄Ͳ͘͠ͳ͍ • fadvise ϞδϡʔϧΛ͏ • straceͯ͠posix_fadvise͕Ͱͯ͘Εok •
ৗʹfadvise͢Δͷ͕Α͍͔Θ͔Βͳ͍ͨΊɺઃఆϑΝΠϧ ʹΑΔ༗ޮɾແޮΛΓସ͑ΒΕΔΑ͏ʹ (σϑΥϧτແޮ) • Ϛʔδͯ͠Β͏·Ͱ1ϲ݄͘Β͍͔͔ͬͨ with open(path, 'r+b') as fh: if CAN_FADVISE and FADVISE_RANDOM: posix_fadvise(fh.fileno(), 0, 0, POSIX_FADV_RANDOM)
ςετεΫϦϓτʹΑΔݕূ https://gist.github.com/yuuki/8d5d386115b0f01b5371 • whisperͷॻ͖ࠐΈؔΛͬͯɺ࣮ࡍʹϖʔδΩϟο γϡͷྔ͕ݮΔ͔Ͳ͏͔֬ೝ • 100ݸͷwhisperϑΝΠϧʹରͯ͠100ݸͷσʔλϙΠϯ τΛॻ͖ࠐΉεΫϦϓτ • /proc/<pid>/io
ͷread_bytes(࣮ࡍʹσΟεΫ͔ΒಡΈͩ ͨ͠αΠζ)ΛΈΔ • POSIX_FAD_RANDOMΦϓγϣϯΛ͚ͭΔͱϖʔδ Ωϟογϡྔ͕1/2ʹͳͬͨ
͘͡ 1. Mackerelͱ࣌ܥྻσʔλ 2. GraphiteͷΞʔΩςΫνϟͱੑೳঢ়گ 3. σΟεΫεϥογϯάͱͦͷղܾ 4. ·ͱΊ
·ͱΊ • MackerelͰ 1,000,000 metrics/min + ͷϝτϦοΫ ॻ͖ࠐΈΛࡹ͘ඞཁ͕͋Δ • ࣌ܥྻσʔλϕʔεͱͯ͠GraphiteΛબ
• ioDriveલఏͰOSͤͷνϡʔχϯά • σΟεΫεϥογϯάΛposix_fadviseʹΑΓ writebackʹΑΔϖʔδΩϟογϡΛແޮʹ͢Δύον Ͱղܾ
None
1ҎԼͷཻͷϝτϦοΫ ཻΛଛͳΘͣظอଘ ϦΞϧλΠϜͳҟৗݕ
࣍ੈͷ࣌ܥྻσʔλϕʔεʹ ৽͍ͨ͠
http://hatenacorp.jp/recruit/fresh/operation-engineer ٕज़͕͖ͳਓ
ຊεϥΠυͷKeynoteςϯϓϨʔτͱͯ͠ shoya140͞ΜͷZebra(http://shoya.io/blog/zebra/) ΛΘ͍͖ͤͯͨͩ·ͨ͠ Mackerelʹ͓͚Δ ࣌ܥྻσʔλϕʔεͷੑೳվળ ϖύϘɾͯͳٕज़େձʙΠϯϑϥٕज़ج൫ʙ@Ԭ ͯͳ id:y_uuki