Upgrade to Pro — share decks privately, control downloads, hide ads and more …

クラウドのテレメトリーシステム研究動向2025年

 クラウドのテレメトリーシステム研究動向2025年

2025/03/13 さくらインターネット研究所 テックトーク2025春
https://sakura-tokyo.connpass.com/event/343441/

クラウドシステムの複雑化に伴い、エンジニアがシステムの利用や動作の状態を把握するためには、より精緻なデータを収集できるテレメトリー技術が必須です。しかし、テレメトリーワークロードの増大により、計算資源の利用効率向上やデータ量削減といったスケーリング技術も同時に必要です。本発表では、「テレメトリーワークロードスケーリング」を主題とした発表者の博士論文を基に、博士論文では詳しく取り扱えなかった最新の研究動向を紹介します。

Yuuki Tsubouchi (yuuk1)

March 14, 2025
Tweet

More Decks by Yuuki Tsubouchi (yuuk1)

Other Decks in Research

Transcript

  1. 2 ௶಺ ༎थ / yuuk1 さくらインターネット研究所   上級研究員 京都 大

    学博 士 (情報学) to appear https://yuuk.io/ 主な研究分野 AIOps eBPFܭ૷ ࣌ܥྻDB SREͷݚڀऀ 学位論 文 審査や 手 続き は完了し最終確定待ち 京都市在住
  2. ΞδΣϯμ 1. ത࢜࿦จͷςʔϚͷಋೖ 2. ത࢜࿦จͷ̏ͭͷߩݙ 3. ത࢜࿦จͷςʔϚʹԊ͏࠷৽ݚڀಈ޲ 4. ·ͱΊ ത࢜࿦จ

    ࠷৽ݚڀಈ޲ lςϨϝτϦʔϫʔΫϩʔυεέʔϦϯάz ໰୊ͷಋೖ -JOVYΧʔωϧͷಈతܭ૷ɺ࣌ܥྻ%# ؔ࿈ݚڀ΋ؚΊͨςϨϝτϦʔϫʔΫϩʔ υεέʔϦϯάͷੈքͷ࿩ ෼ ෼ ෼ ෼ 3 ʢ࣌ؒͷؔ܎্̎ͭͷͷΈ঺հʣ
  3. ΞϓϦέʔγϣϯ γεςϜ ΤϯδχΞ ར༻ऀ Πϯλʔωοτ ςϨϝτϦʔγεςϜ σʔλ఻ૹ ςϨϝτϦʔʹΑΔ؂ࢹͱ෼ੳ ؂ࢹͱ෼ੳͷͨΊʹɺγεςϜɺΞϓϦέʔγϣϯɺαʔϏε͔Βԕִ஍΁ɺ ੑೳ΍ར༻ʹؔ͢ΔσʔλΛࣗಈͰऩू͠ɺૹ৴͢Δɻ

    ܭثͷಡΈऔΓ஋Λه࿥͠ɺૹ৴͢Δϓϩηεɻ ࣙॻͷఆٛ ຊݚڀʹ͓͚Δఆٛ ԕִ஍ ܭث ૹ৴ ෼ੳ ςϨϝτϦʔ [63] <>'SBOL$BSEFO 3VTTFMM1+FEMJDLB BOE3PCFSU)FOSZ5FMFNFUSZ4ZTUFNT&OHJOFFSJOH 5
  4. OpenTelemetry, Prometheus exporter, Fluentd, eBPF, … ΞϓϦέʔγϣϯ γεςϜ ར༻ऀ Πϯλʔωοτ

    ςϨϝτϦʔγεςϜ σʔλ఻ૹ ςϨϝτϦʔγεςϜͷయܕߏ੒ ࡾ૚ߏ଄ʹ෼ׂ ܭଌ อଘ ෼ੳ Prometheus, InfluxDB, VictoriaMetrics,VictoriaLogs, Loki/Tempo/Mimer, Clickhouse,Apache Iceberg, … Grafana, Perces, AIOps 6 ΤϯδχΞ
  5. ΞϓϦέʔγϣϯ γεςϜ ར༻ऀ Πϯλʔωοτ ςϨϝτϦʔγεςϜ σʔλ఻ૹ ςϨϝτϦʔϫʔΫϩʔυͷ૿େ ܭࢉࢿݯ ফඅ૿େ σʔλॲཧྔ૿େ

    •ΞϓϦͷCPU/ϝϞ Ϧফඅྔ૿ •ΞϓϦͷॲཧ஗Ԇ૿ •DBͷऔΓࠐΈॲཧෛՙ૿ •DBͷอଘσʔλྔ૿ • ػցֶशͷॲཧ஗Ԇ૿Ճɾ ਫ਼౓௿Լ ܭଌ อଘ ෼ੳ 7 ΤϯδχΞ
  6. ΞϓϦέʔγϣϯ γεςϜ ར༻ऀ Πϯλʔωοτ ςϨϝτϦʔγεςϜ ໨తɿςϨϝτϦʔϫʔΫϩʔυεέʔϦϯά ܭଌ อଘ ෼ੳ ϫʔΫϩʔυ

    Φ ʛ ό ʛ ϔ ο υ ςϨϝτϦʔϫʔΫϩʔυͷ૿େʹରͯ͠ɺ ޮ཰తʹεέʔϦϯάͤ͞Δ 8 ΤϯδχΞ
  7. ܭଌ อଘ ෼ੳ ΞϓϦέʔγϣϯ γεςϜ ར༻ऀ Πϯλʔωοτ ςϨϝτϦʔγεςϜ σʔλ఻ૹ ߩݙɿ֤૚͝ͱͷ࢒ཹ՝୊Λٕज़తʹղܾ

    ܭࢉࢿݯ ফඅ૿େ σʔλॲཧ ྔ૿େ ߩݙᶃ ߩݙᶅ ߩݙᶄ 9 ωοτϫʔΫ τϨʔε ϝτϦΫε ΤϯδχΞ
  8. ܭଌ อଘ ෼ੳ ΞϓϦέʔγϣϯ γεςϜ ٕज़ऀ ར༻ऀ Πϯλʔωοτ ςϨϝτϦʔγεςϜ ߩݙᶃɿܭଌ

    ܭࢉࢿݯ ফඅ૿େ ߩݙᶃ ߩݙᶅ ߩݙᶄ Y. Tsubouchi, M. Furukawa, R. Matsumoto, Low Overhead TCP/UDP Socket- based Tracing for Discovering Network Services Dependencies, JIP, 2022. ωοτϫʔΫίʔ ϧάϥϑΛಘΔͨ ΊʹɺeBPFʹΑΔ ௿Φʔόʔϔου ͳܭ૷๏ʹண໨ɻ 11
  9. ωοτϫʔΫίʔϧάϥϑ Ͳ͏΍ͬͯܭଌ͢Δʁ Cloud Load Balancers Database Clusters Web app servers

    Message queues ֤ίϯϙʔωϯτͷݺͼग़ؔ͠܎ Λ஌Γ͍ͨɻ - มߋͷӨڹൣғΛ஌Γ͍ͨɻ - ϦϯΫ୯ҐͷϝτϦΫεΛ஌Γ͍ͨɻ ߩݙᶃ 12
  10. ܭ૷Ξϓϩʔν Cloud Load Balancers Database Clusters Web app servers Message

    queues Kernel User Proxy Network Stack App NIC Switch ωοτϫʔΫ௨৴ܦ࿏্ͷ͍ͣΕ ͔ʹܭଌ఺Λઃஔ͢Δɻ ߩݙᶃ Χʔωϧͷ্Ґ૚ʢιέοτʣͰͷܭ૷ʹண໨ɻ ରApp: ΞϓϦͷमਖ਼͕ඞཁͳ͍ɻ ରProxy: தܧΦʔόʔϔου͕ͳ͍ɻ ରSwitch: ܭଌෛՙΛΤϯυϗετʹ෼ࢄͰ͖Δɻ 13
  11. ιέοτ૚ʹ͓͚Δܭ૷ख๏ Kernel User Service Agent ετϦʔϛϯά๏ ϑϩʔू໿๏ ϑϩʔूଋ๏ʢఏҊʣ ✗ ϝοηʔδ਺૿ՃʹԠ͡

    ͯɺϢʔβۭؒ΁ͷܭଌ஋ͷ సૹ਺͕૿Ճɻ ✗ ୹໋ͳϑϩʔ͕૿Ճ͢Δͱɺసૹ σʔλ਺΋૿Ճɻ Ѽઌ͕ಉҰͷϑϩʔΛଋ ͶΔɻ ※ ϑϩʔ = ྆୺ͷΞυϨεͱϙʔτͷ૊͕ಉҰͷ௨৴୯Ґ Queue ܭଌ఺ Kernel User Service Agent ܭଌ఺ ※ ໼ҹ͸σʔλͷྲྀΕΛද͢ ✔ ϑϩʔ͝ͱʹू໿͞Εͨܭଌ஋ ͷΈอଘɻసૹσʔλ਺Λ௿ݮɻ Flow1 Flow2 Flow3 Flow4 Kernel User Service Agent ܭଌ఺ ✔ ୹໋ͳϑϩʔ਺͕ଟ͘ͱ ΋సૹσʔλ਺Λ௿ݮ Bundle 1 Bundle 2 ✔ ܭଌΦʔόʔϔου͕ খ͍͞ ([96,97]) ([27,98]) ߩݙᶃ 14 <>+JO+JO-JO FUBM l.JDSPTDPQF1JOQPJOU1FSGPSNBODF*TTVFTXJUI$BVTBM(SBQITJO.JDSP4FSWJDF&OWJSPONFOUTz*$40$  <>8FBWF4DPQFIUUQTHJUIVCDPNXFBWFXPSLTTDPQF
  12. ιέοτ૚ʹ͓͚Δܭ૷ख๏ Kernel User Service Agent ετϦʔϛϯά๏ ϑϩʔू໿๏ ϑϩʔूଋ๏ʢఏҊʣ ✗ ϝοηʔδ਺૿ՃʹԠ͡

    ͯɺϢʔβۭؒ΁ͷܭଌ஋ͷ సૹ਺͕૿Ճɻ ✗ ୹໋ͳϑϩʔ͕૿Ճ͢Δͱɺసૹ σʔλ਺΋૿Ճɻ Ѽઌ͕ಉҰͷϑϩʔΛଋ ͶΔɻ ※ ϑϩʔ = ྆୺ͷΞυϨεͱϙʔτͷ૊͕ಉҰͷ௨৴୯Ґ Queue ܭଌ఺ Kernel User Service Agent ܭଌ఺ ※ ໼ҹ͸σʔλͷྲྀΕΛද͢ ✔ ϑϩʔ͝ͱʹू໿͞Εͨܭଌ஋ ͷΈอଘɻసૹσʔλ਺Λ௿ݮɻ Flow1 Flow2 Flow3 Flow4 Kernel User Service Agent ܭଌ఺ ✔ ୹໋ͳϑϩʔ਺͕ଟ͘ͱ ΋సૹσʔλ਺Λ௿ݮ Bundle 1 Bundle 2 ✔ ܭଌΦʔόʔϔου͕ খ͍͞ ([96,97]) ([27,98]) ߩݙᶃ 15 <>'SBODJTDP/FWFT FUBM l#MBDLCPY*OUFSBQQMJDBUJPO5SB ff i D.POJUPSJOHGPS"EBQUJWF$POUBJOFS1MBDFNFOUz4"$ <>%BUBEPH/FUXPSL1FSGPSNBODF.POJUPSJOHIUUQTEPDTEBUBEPHIRDPNOFUXPSL@NPOJUPSJOHQFSGPSNBODF
  13. ιέοτ૚ʹ͓͚Δܭ૷ख๏ Kernel User Service Agent ετϦʔϛϯά๏ ϑϩʔू໿๏ ϑϩʔूଋ๏ʢఏҊʣ ✗ ϝοηʔδ਺૿ՃʹԠ͡

    ͯɺϢʔβۭؒ΁ͷܭଌ஋ͷ సૹ਺͕૿Ճɻ ✗ ୹໋ͳϑϩʔ͕૿Ճ͢Δͱɺసૹ σʔλ਺΋૿Ճɻ Ѽઌ͕ಉҰͷϑϩʔΛଋ ͶΔɻ ※ ϑϩʔ = ྆୺ͷΞυϨεͱϙʔτͷ૊͕ಉҰͷ௨৴୯Ґ Queue ܭଌ఺ Kernel User Service Agent ܭଌ఺ ※ ໼ҹ͸σʔλͷྲྀΕΛද͢ ✔ ϑϩʔ͝ͱʹू໿͞Εͨܭଌ஋ ͷΈอଘɻసૹσʔλ਺Λ௿ݮɻ Flow1 Flow2 Flow3 Flow4 Kernel User Service Agent ܭଌ఺ ✔ ୹໋ͳϑϩʔ਺͕ଟ͘ͱ ΋సૹσʔλ਺Λ௿ݮ Bundle 1 Bundle 2 ✔ ܭଌΦʔόʔϔου͕ খ͍͞ ([96,97]) ([27,98]) ߩݙᶃ 16
  14. ܭଌ อଘ ෼ੳ ΞϓϦέʔγϣϯ γεςϜ ٕज़ऀ ར༻ऀ Πϯλʔωοτ ςϨϝτϦʔγεςϜ ߩݙᶄɿอଘ

    ܭࢉࢿݯ ফඅ૿େ ߩݙᶃ ߩݙᶅ ߩݙᶄ ࣌ܥྻσʔλʢϝ τϦΫεʣͷ औΓࠐΈॲཧޮ཰ ޲্ͱ௕ظσʔλ อଘίετͷ௿ݮ Λཱ྆͢Δɻ ௶಺༎थ, ࿬ࡔேਓ, ᖛా݈, দ໦խ޾, খྛོߒ, Ѩ෦ത, দຊ྄հ, HeteroTSDB: ҟछ෼ࢄ KVSؒͷࣗಈ֊૚ԽʹΑΔߴੑೳͳ࣌ܥྻσʔλϕʔε, ৘ใॲཧֶձ࿦จࢽ, 2021೥. 18
  15. ϝτϦΫεͷऔΓࠐΈϫʔΫϩʔυྔ͸ɺ̎ͭͷ࣍ݩʹൺྫ͢Δ ϝτϦΫεετϨʔδͷϫʔΫϩʔυ ࣌ؒ cpu_seconds{instance=host1,…} memory_total_bytes{instance=host1,…} http_requests_count{instance=host1,…} http_requests_count{instance=host99,…} ߩݙᶄ ᶄ ϝ

    τ Ϧ Ϋ ε ͷ ݸ ਺ ᶃ ղ૾౓ (Ұൠʹ1 ~ 60ඵͷൣғ) cpu_seconds{instance=host1,…} cpu_seconds{instance=host1,mode=user,core_no=1,…} cpu_seconds{instance=host1,mode=system,core_no=1,…} cpu_seconds{instance=host1,mode=user,core_no=2,…} ଟ࣍ݩԽʹΑΔݸ਺૿Ճ ෼ղ 20
  16. 21 ϝτϦΫεετϨʔδͷεέʔϥϏϦςΟཁٻ औΓࠐΈॲཧεϧʔϓοτ σʔλอଘྔ ɾਫฏ෼ׂ͞Εͨෳ਺ϊʔυͰͷऔΓࠐΈ ɾϝϞϦ্ͷσʔλߏ଄΁ͷޮ཰తॻ͖ࠐΈ Ұൠతͳղܾ๏ Slack 12M datapoints

    / sec Meta 700M datapoints / min LYCorp 12.5M datapoints / min [19] [32] [112] Slack 12 TB / day ByteDance 10 TB/ day LYCorp 2.7 TB / day Mackerel 460 days Ұൠతͳղܾ๏ σʔλѹॖٕज़΍ ίʔϧυετϨʔδ্Ͱͷ௕ظอଘ ߩݙᶄ [19] [35] [69] [112] <>4VNBO,BSVNVSJ FUBM l5PXBSET0CTFSWBCJMJUZ%BUB.BOBHFNFOUBU4DBMFz 4*(.0%3FDPSE  <>5VPNBT1FMLPOFO FUBMl(PSJMMB"'BTU 4DBMBCMF *O.FNPSZ5JNF4FSJFT%BUBCBTFz 7-%#  <>9VBOIVB4IJ FUBMl#ZUFTFSJFT"O*O.FNPSZ5JNF4FSJFT%BUBCBTFGPS-BSHF4DBMF.POJUPSJOH4ZTUFNTz 4P$$  <>)JSPLJ4BLBNPUP4DBMJOH5JNF4FSJFT%BUBUP*O fi OJUZ",VCFSOFUFT1PXFSFE4PMVUJPOXJUI&OWPZIUUQTTQFBLFSEFDLDPNMZDPSQUFDI@KQTDBMJOH@UTEC@JOJ fi OJUFMZ@XJUI@PTT <>.BDLFSFMIUUQTNBDLFSFMJP
  17. 22 KVSͷऔΓࠐΈޮ཰ ϝϞϦϕʔεKVS ϝϞϦ͸ϥϯμϜΞΫ ηεޮ཰ʹ༏ΕΔͨ ΊɺϋογϡදΛ࠾༻ σΟεΫϕʔεKVS ϝτϦΫε਺͕૿େ͢Δ = KVSͷΩʔ਺͕૿େ͢Δ

    Memory Disk ฏߧ໦ɾεΩο ϓϦετͳͲͷ ιʔτࡁΈߏ଄ ιʔτࡁΈͷͨ ΊσΟεΫΞΫ ηεޮ཰͕ߴ͍ O(logn) ॻ͖ࠐΈ Flush ॻ͖ࠐΈ Memory O(k) σΟεΫ্ʹσʔλΛอ࣋͠ͳ͍ɻ ʢίϛοτϩά΍εφοϓγϣοτΛআ͘ʣ Disk File HBase, Cassandra, … Redis, Valkey, Dragonfly, … ߩݙᶄ ↳ ಺෦ΦϒδΣΫτͷ؅ཧίετ૿େɻྫʣσʔλ௥Ճ࣌ͷΠϯσοΫεࢀরޮ཰
  18. 23 KVSͷऔΓࠐΈޮ཰ ϝϞϦϕʔεKVS ϝϞϦ͸ϥϯμϜΞΫ ηεޮ཰ʹ༏ΕΔͨ ΊɺϋογϡදΛ࠾༻ σΟεΫϕʔεKVS ϝτϦΫε਺͕૿େ͢Δ = KVSͷΩʔ਺͕૿େ͢Δ

    Memory Disk ฏߧ໦ɾεΩο ϓϦετͳͲͷ ιʔτࡁΈߏ଄ ιʔτࡁΈͷͨ ΊσΟεΫΞΫ ηεޮ཰͕ߴ͍ O(logn) ॻ͖ࠐΈ Flush ॻ͖ࠐΈ Memory O(k) σΟεΫ্ʹσʔλΛอ࣋͠ͳ͍ɻ ʢίϛοτϩά΍εφοϓγϣοτΛআ͘ʣ Disk File HBase, Cassandra, … Redis, Valkey, Dragonfly, … ߩݙᶄ ✘ ϝϞϦ͸هԱྔ͋ͨΓͷඅ༻͕େ ͖͍ͨΊɺ௕ظอ࣋ʹ͸ෆ޲͖ɻ ✘ Ωʔ਺͕େ͖͍࣌ʹɺσʔλͷॻ͖ ࠐΈޮ཰͕௿Լ͢Δɻ ↳ ಺෦ΦϒδΣΫτͷ؅ཧίετ૿େɻྫʣσʔλ௥Ճ࣌ͷΠϯσοΫεࢀরޮ཰
  19. 24 ఏҊख๏ HeteroTSDB Client ϝϞϦϕʔεKVS σΟεΫϕʔεKVS App Flusher ௚ۙͷλΠϜελϯϓΛ΋ͭσʔ λ͕֨ೲ͞ΕΔϝϞϦόοϑΝ

    ϋογϡදʹجͮ͘ߴ଎औΓࠐΈ ݹ͍λΠϜελϯϓΛ΋ͭσʔλ͕ ֨ೲ͞ΕΔσΟεΫετϨʔδ SSD/HDDʹอଘ͢Δ͜ͱʹΑΔ ௕ظอ࣋ίετͷ௿Լ σʔλͷϚΠά Ϩʔγϣϯ ཱ྆ ߩݙᶄ (Redis) (Cassandra)
  20. 25 ࣮ݧɿऔΓࠐΈॲཧޮ཰ͷൺֱ ϗετ਺ʢ1~8ʣ औ Γ ࠐ Έ ε ϧ ʛ

    ϓ ο τ ఏҊख๏ʢHeteroTSDBʣ͕ ϕʔεϥΠϯͷ3.98ഒɻ 420k datapoints/s ੨ɿKairosDB ᒵɿఏҊख๏ Slackࣾͷ12 M/s ͷϫʔΫϩʔυʹஔ ͖׵͑Δͱ - ఏҊख๏͸229ݸ - KairosDB͸915ݸ ͷϗετ਺Λඞཁͱ͢ΔܭࢉʹͳΔɻ ϝτϦΫε਺Λ1Mʹݻఆ ߩݙᶄ
  21. ܭଌ อଘ ෼ੳ ΞϓϦέʔγϣϯ γεςϜ ΤϯδχΞ ར༻ऀ Πϯλʔωοτ ςϨϝτϦʔγεςϜ ߩݙᶅɿ෼ੳ

    ܭࢉࢿݯ ෛՙ૿େ ߩݙᶃ ߩݙᶅ ߩݙᶄ ো֐ʹؔ࿈͠ͳ͍ϝ τϦΫεΛڭࢣͳ͠ ػցֶशͰࣗಈͰ࡟ ݮ͢Δલॲཧ๏Λఏ Ҋ͢Δɻ 26 Y. Tsubouchi and H. Tsuruta, MetricSifter: Feature Reduction of Multivariate Time Series Data for Efficient Fault Localization in Cloud Applications, IEEE Access, 2024. εΩοϓ
  22. 27 ΞϓϦέʔγϣϯ ܭଌ جຊݪଇɿαϯϓϦϯάɾू໿ɾಛ௃࡟ݮͳͲͷσʔλ࡟ݮ͸ɺίϯςΩετ ͕๛෋ͳՕॴʢܭ૷ɾϚΠχϯάʣͰద༻͢Δ͜ͱɻ ૯ׅɿςϨϝτϦʔγεςϜઃܭࢦ਑ ςϨϝτϦʔγεςϜ ΦϖϨʔλʔ อଘ ෼ੳ

    ϓϩηεɺιέοτɺτϥϯβΫ γϣϯͳͲɻ ߩݙᶃͰ͸ɺιέοτΛجʹू໿ɻ ΞϓϦέʔγϣϯ ίϯςΩετ ো֐΍ΞϥʔτͳͲɻ ӡ༻ίϯςΩετ σʔλ࡟ݮΛͤͣɺܭࢉ ࢿݯͷར༻ޮ཰޲্Λ ໨ࢦ͢ɻ ߩݙᶅͰ͸ɺো֐ൃੜΛ جʹಛ௃࡟ݮɻ
  23. ςϨϝτϦʔϫʔΫϩʔυεέʔϦϯάͷੈք 31 ܭଌ ςϨϝτϦʔγεςϜ อଘ ෼ੳ •࿦จͷ਺͕࠷ଟɻ •τϨʔγϯάͷݡ͍αϯϓϦ ϯά๏ͷఏҊ͕ಛʹଟ͍ɻ •੡඼͸ଟ͍͕࿦จ͸গͳ͍

    •ϝλσʔλʢଟ࣍ݩϥϕϧʣ ͷѹॖ΍ΫΤϦͷϓογϡμ ΢ϯػߏͳͲ͕ఏҊɻ •AIOpsͷ࿦จ͸େྔʹൃද ͞Ε͍ͯΔ͕ɺεέʔϦϯ άʹؔ͢Δ࿦จ͸গͳ͍ɻ •ϊΠζআڈͳͲͷϑΟϧλ Ϧϯά
  24. ςϨϝτϦʔϫʔΫϩʔυεέʔϦϯάͷੈք 32 ܭଌ ςϨϝτϦʔγεςϜ อଘ ෼ੳ •੡඼͸ଟ͍͕࿦จ͸গͳ͍ •ϝλσʔλʢଟ࣍ݩϥϕϧʣ ͷѹॖ΍ΫΤϦͷϓογϡμ ΢ϯػߏͳͲ͕ఏҊɻ

    •AIOpsͷ࿦จ͸େྔʹൃද ͞Ε͍ͯΔ͕ɺεέʔϦϯ άʹؔ͢Δ࿦จ͸গͳ͍ɻ •ϊΠζআڈͳͲͷϑΟϧλ Ϧϯά ࠓ೔͸ܭଌ૚ʹண໨ •࿦จͷ਺͕࠷ଟɻ •τϨʔγϯάͷݡ͍αϯϓϦ ϯά๏ͷఏҊ͕ಛʹଟ͍ɻ
  25. ܭଌ૚Ͱͷσʔλ࡟ݮͷಈػ 33 ܭଌ ෼ੳ ܭଌ૚ͰσʔλྔΛ࡟ݮͤ͞Ε͹ɺ ޙଓ૚ͷෛՙ΋࡟ݮՄೳɻ [186] Paige Cruz, “99.99%

    of Your Traces Are (Probably) Trash", SREcon24. [71] Guangba Yu, et al. “LogReducer: Identify and Reduce Log Hotspots in Kernel on the Fly”. ICSE. 2023. ”τϨʔεͷ99.99%͸ΰϛͰ͋Δ” [186] WeChatͰ͸ɺ୯ҰͷϩάςϯϓϨʔτ ͕શετϨʔδͷ95.7%Λ઎Ί͍ͯͨɻ [71] •࿦จͷ਺͕࠷ଟɻ •τϨʔγϯάͷݡ͍αϯϓϦ ϯά๏ͷఏҊ͕ಛʹଟ͍ɻ
  26. 34 ܭଌ૚ͷݚڀಈ޲ɹ֓ཁ ܭଌ อଘ ෼ੳ ϝτϦΫε τϨʔε ػցֶशϞσϧʹΑΔςΠϧ αϯϓϦϯά গ਺ͷՁ஋ͷ͋ΔτϨʔε

    ͷΈΛબ୒͢Δɻ ॏཁͳϝτϦΫεΛಈతʹܾఆ͠ɺ ܭଌස౓Λ্͛Δɻ ௚લͷ஋ͱͷဃ཭͕େ͖͍৔߹ͷΈ ૹ৴͢Δɻ ϨτϩΞΫςΟϒαϯϓϦϯά τϨʔεͷߏ଄ɾ࣮ߦ࣌ؒɾ ଟ༷ੑɾ࣮ߦ࣌ঢ়ଶʢγες ϜϝτϦΫεʣΛߟྀ͢Δɻ ѹॖʢશτϨʔεͷۙࣅ৘ใอ࣋ʣ ނোݕ஌ޙʹ࣌ؒΛḪͬͯશ τϨʔεΛܭଌɾऩू ࠓ೔͸εΩοϓ ϩά ϗοτεϙοτΛࣗಈൃݟ ετϨʔδফඅྔΛ઎ΊΔϩάςϯϓ ϨʔτΛࣗಈൃݟ ো֐ൃੜ࣌ʹϦΞΫςΟϒʹऩू ނোͷരൃ൒ܘΛܭࢉ͠ɺͦͷൣғ಺ͷ ϊʔυͷΈ͔Βσʔλऩू PMF (CLOUD,2024) τϨʔεΛڞ௨෦෼ͱՄม෦෼ʹ ෼ղ͠ɺॏෳഉআɻ PMF: Chakraborty, Aishwariya, et al. "Enabling Programmable Metric Flows." CLOUD, 2024.
  27. 35 ܭଌ૚ͷݚڀಈ޲ɹτϨʔε อଘ ෼ੳ τϨʔε গ਺ͷՁ஋ͷ͋ΔτϨʔε ͷΈΛબ୒͢Δɻ ނোݕ஌ޙʹ࣌ؒΛḪͬͯશ τϨʔεΛܭଌɾऩू ϨτϩΞΫςΟϒαϯϓϦϯά

    τϨʔεͷߏ଄ɾ࣮ߦ࣌ؒɾ ଟ༷ੑɾ࣮ߦ࣌ঢ়ଶʢγες ϜϝτϦΫεʣΛߟྀ͢Δɻ ѹॖʢશτϨʔεͷۙࣅ৘ใอ࣋ʣ Sifter (SoCC,2019) Sieve (IWCS, 2021) STEAM (FSE, 2023) TraStrainer (FSE, 2024) τϨʔεσʔλ͔Β ਖ਼ৗϞσϧΛߏஙɻ ҟৗ΍֎Ε஋ͱͳΔ τϨʔεΛ༏ઌɻ ҟৗ͚ͩͰͳ͘ߏ଄ తʹ΋࣌ؒతʹ΋௝ ͍͠τϨʔεΛ༏ ઌɻ APIɾߏ଄ɾ஗Ԇɾε ςʔλείʔυͳͲͷ ଐੑ͝ͱʹଟ༷ੑΛҡ ࣋͢Δɻ γεςϜͷঢ়ଶมԽʢϝ τϦΫεͷมԽʣʹؔ ࿈͢Δ౓߹͍͕ߴ͍τ ϨʔεΛ༏ઌ͢Δɻ (Microsoft) ػցֶशϞσϧʹΑΔςΠϧ αϯϓϦϯά τϨʔεΛڞ௨෦෼ͱՄม෦෼ʹ ෼ղ͠ɺॏෳഉআɻ Sifter: Las-Casas, Pedro, et al. "Sifter: Scalable sampling for distributed traces, without feature engineering." SoCC. 2019. Sieve: Huang, Zicheng, et al. "Sieve: Attention-based sampling of end-to-end trace data in distributed microservice systems.” ICWS, 2021. STEAM: He, Shilin, et al. "STEAM: Observability-preserving trace sampling.” ESEC/FSE, 2023. TraStrainer: Huang, Haiyu, et al. "Trastrainer: Adaptive sampling for distributed traces with system runtime state.” ESEC/FSE, 2024.
  28. ܭଌ૚ͷݚڀಈ޲ɹτϨʔε อଘ ෼ੳ ػցֶशϞσϧʹΑΔςΠϧ αϯϓϦϯά গ਺ͷՁ஋ͷ͋ΔτϨʔε ͷΈΛબ୒͢Δɻ ϨτϩΞΫςΟϒαϯϓϦϯά τϨʔεͷߏ଄ɾ࣮ߦ࣌ؒɾ ଟ༷ੑɾ࣮ߦ࣌ঢ়ଶʢγες

    ϜϝτϦΫεʣΛߟྀ͢Δɻ ѹॖʢશτϨʔεͷۙࣅ৘ใอ࣋ʣ τϨʔεΛڞ௨෦෼ͱՄม෦෼ʹ ෼ղ͠ɺॏෳഉআɻ ނোݕ஌ޙʹ࣌ؒΛḪͬͯશ τϨʔεΛܭଌɾऩू Mint (ASPLOS,2025) (Alibaba) 36 શτϨʔε (100%) ڞ௨ੑ ʢύλʔϯʣ Մมੑ ʢύϥϝʔλʣ ͢΂ͯอଘ ॏཁͳ΋ͷ ͷΈอଘ • ετϨʔδ࢖༻ྔɿݩͷ2.7%ʹ࡟ݮ • NW࢖༻ྔɿݩͷ4.2%ʹ࡟ݮ “1 or 0”ํࣜͷݶք 5%ͷτϨʔεΛอ࣋͠ ͍͕ͯͨɺ෼ੳΫΤϦͷ ϛεϨʔτ͕27.17% Huang, Haiyu, et al. "Mint: Cost-Ef fi cient Tracing with All Requests Collection via Commonality and Variability Analysis." arXiv preprint arXiv:2411.04605 (2024).
  29. ػցֶशϞσϧʹΑΔςΠϧ αϯϓϦϯά গ਺ͷՁ஋ͷ͋ΔτϨʔε ͷΈΛબ୒͢Δɻ 37 ܭଌ૚ͷݚڀಈ޲ɹτϨʔε ϨτϩΞΫςΟϒαϯϓϦϯά ѹॖʢશτϨʔεͷۙࣅ৘ใอ࣋ʣ ނোݕ஌ޙʹͷΈ࣌ؒΛḪͬ ͯτϨʔεΛऔಘ

    IUUQTHJUMBCNQJTXTPSHDMEUSBDJOHIJOETJHIU ΞϓϦέʔγϣϯ τϨʔεੜ੒ ϩʔΧϧϝϞϦʹอଘ Hindsight Agent ϝλσʔλͷ؅ཧ τϨʔεϝτϦΫεͷҡ࣋ Hindsight (NSDI,2022) ίʔσΟωʔλʔ ύϯͣ͘Λ௥੻ ܦ࿏Λޙ͔Β࠶ߏங͢Δ ͨΊͷϙΠϯλ อ ଘ ૚ τϦΨʔ ϚγϯؒͰτϨʔεͷ Ұ؏ੑΛௐ੔ τϨʔεΛڞ௨෦෼ͱՄม෦෼ʹ ෼ղ͠ɺॏෳഉআɻ Zhang, Lei, et al. "The bene fi t of hindsight: Tracing Edge-Cases in distributed systems." NSDI, 2023.
  30. 38 ܭଌ૚ͷݚڀಈ޲ɹϩά ϩά ϗοτεϙοτΛࣗಈൃݟ ετϨʔδফඅྔΛ઎ΊΔϩάςϯϓ ϨʔτΛࣗಈൃݟ SALO (CLOUD,2024) ো֐ൃੜ࣌ʹϦΞΫςΟϒʹऩू ނোͷരൃ൒ܘΛܭࢉ͠ɺͦͷൣғ಺ͷ

    ϊʔυͷΈ͔Βσʔλऩू (IBM Research) طଘΞϓϩʔν SALO •࠷େ95ˋͷϩάྔ࡟ݮ •ԼྲྀͷAIOpsλεΫͷ࠷େ20ˋੑೳ޲্ ݁Ռ େྔͷϩά ࡟ݮ͞Εͨϩά രൃ൒ܘ ނো Pathak, Divya, et al. "Self Adjusting Log Observability for Cloud Native Applications." CLOUD, 2024.
  31. 39 ܭଌ૚ͷݚڀಈ޲ɹϩά LogReducer (ICSE,2023) (Tencent) ୯ҰͷϩάςϯϓϨʔτ͕શετϨʔδͷ 95.7%Λ઎Ί͍ͯͨɻ ໰୊ʢWeChat: 1೔ʹ19.7PBɺ100ஹߦʣ γεςϜίʔϧ

    write() ʹeBPFͰϑοΫͯ͠ ϗοτεϙοτ൑ఆ͞ΕͨΒϩάग़ྗΛdrop ࡟ݮɿ19.7PB → 12.0PB/೔ (໿39%ݮগʣ मਖ਼࣌ؒ୹ॖɿ9೔ → 10෼ ݁Ռ ϩά ϗοτεϙοτΛࣗಈൃݟ ετϨʔδফඅྔΛ઎ΊΔϩάςϯϓ ϨʔτΛࣗಈൃݟ ো֐ൃੜ࣌ʹϦΞΫςΟϒʹऩू ނোͷരൃ൒ܘΛܭࢉ͠ɺͦͷൣғ಺ͷ ϊʔυͷΈ͔Βσʔλऩू Yu, Guangba, et al. "Logreducer: Identify and reduce log hotspots in kernel on the fl y." ICSE, 2023.