Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
OSC-Hokkaido-2018-hayabusa
Search
Hiroshi
July 07, 2018
Research
0
700
OSC-Hokkaido-2018-hayabusa
This is the presentation material for OSC Hokkaido 2018
Hiroshi
July 07, 2018
Tweet
Share
More Decks by Hiroshi
See All by Hiroshi
pepacon night : log research working group report
hirolovesbeer
0
1.4k
イベントネットワークにおけるsyslog分析でのElasticsearchの利用
hirolovesbeer
1
1.2k
Other Decks in Research
See All in Research
LLM-Assisted Semantic Guidance for Sparsely Annotated Remote Sensing Object Detection
satai
3
200
論文紹介: ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement
hisaokatsumi
0
150
さまざまなAgent FrameworkとAIエージェントの評価
ymd65536
1
370
[Devfest Incheon 2025] 모두를 위한 친절한 언어모델(LLM) 학습 가이드
beomi
2
1.3k
大規模言語モデルにおけるData-Centric AIと合成データの活用 / Data-Centric AI and Synthetic Data in Large Language Models
tsurubee
1
460
論文紹介:Safety Alignment Should be Made More Than Just a Few Tokens Deep
kazutoshishinoda
0
150
Unsupervised Domain Adaptation Architecture Search with Self-Training for Land Cover Mapping
satai
3
430
Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image Classification
satai
3
340
AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data
satai
3
580
空間音響処理における物理法則に基づく機械学習
skoyamalab
0
140
Thirty Years of Progress in Speech Synthesis: A Personal Perspective on the Past, Present, and Future
ktokuda
0
140
An Open and Reproducible Deep Research Agent for Long-Form Question Answering
ikuyamada
0
140
Featured
See All Featured
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
1.7k
Designing Powerful Visuals for Engaging Learning
tmiket
0
190
Highjacked: Video Game Concept Design
rkendrick25
PRO
0
250
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
10
750
Abbi's Birthday
coloredviolet
0
3.7k
The AI Search Optimization Roadmap by Aleyda Solis
aleyda
1
5k
Claude Code のすすめ
schroneko
65
200k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
132
19k
Digital Ethics as a Driver of Design Innovation
axbom
PRO
0
130
So, you think you're a good person
axbom
PRO
0
1.8k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
17k
Git: the NoSQL Database
bkeepers
PRO
432
66k
Transcript
Hayabusa ߴʹશจݕࡧՄೳͳ OSSϩάݕࡧΤϯδϯͷ͝հ Ѩ෦ തɿגࣜձࣾϨϐμϜ ݚڀһ OSC 2018 Hokkaido 2018/07/08
ࣗݾհ • ໊લɿѨ෦ ത • ॴଐɿגࣜձࣾϨϐμϜʢݚڀһʣɺίίϯגࣜձࣾʢࣾิࠤ/ٕज़ݚڀ ॴ ݚڀһʣɺใ௨৴ݚڀػߏʢڠྗݚڀһʣɺઌՊֶٕज़େֶ Ӄେֶʢത࢜ޙظ՝ఔʣ •
ͦͷଞɿInterop Tokyo ShowNet NOCϝϯόʔ
࣍ • എܠͱత • Hayabusaʹ͍ͭͯ • ࢄHayabusaͷఏҊʢઃܭͱ࣮ʣ • ධՁ •
ߟ • ·ͱΊͱࠓޙͷ՝ !3
എܠͱత !4
Interop Tokyo ShowNet 2018 • 900Λ͑ΔཧɾԾػث܈ • ΄΅શͯͷػث͕syslogΛૹ৴ • ߏஙظؒʹड৴͢Δsyslogྔ
• 2ສ݅/ඵʢ20k/secʣ • 1ԯ̓ઍສ݅/ !5
ShowNetʹ͓͚Δϩάͷӡ༻ • େྔͷϩάΛੵ͢Δ • େྔͷϩά͔Βݕࡧ͢Δ • ΠϯγσϯτରԠͷͨΊʹϩάΛݕࡧ͢Δ • τϥϒϧγϡʔτͷͨΊʹϩάΛݕࡧ͢Δ •
ϩά͔Β౷ܭใΛऔಘ͢Δ • ߜΓࠐΜͩݕࡧใΛ౷ܭใͱͯ͠දࣔ͢Δ !6
طଘͷղܾࡦ • HadoopΤίγεςϜʢSpark, Impala, Hive, …ʣ • OSSʢElasticsearch + Kibana,
fluentd, …ʣ • ༻ϓϩμΫτʢSplunk, VMware Loginsight, …ʣ • ΫϥυαʔϏεʢGoogle BigQuery, Treasure Data, …ʣ !7
େ͖ͳ • ϩάͷߏԽ͕Ͱ͖ͳ͍ • ػࡐʹ౷Ұੑ͕ͳ͍ɾ࠷৽ͷϑΝʔϜ͗ͯ͢ใ͕ͳ͍ • ετϦʔϛϯάॲཧ͕͍͠ྲྀྔ • ϩάͷྲྀྔ͕ଟ͗ͯ͢ॲཧ͕͍͔ͭͳ͍ •
όονॲཧ͕͍͔ͭͳ͍ • όονॲཧ͕ࢦఆ࣌ؒʹऴΘΒͳ͍ • ࢄॲཧγεςϜ͕ෳࡶ͗͢Δ • ཧίετ͕ലେ !8
త • ܰྔʹߏஙɾӡ༻͕ߦ͑ΔγεςϜͷ࣮ݱ • γϯϓϧͰεέʔϧΞοϓՄೳͳγεςϜͷ࣮ݱ • ݕࡧੑೳ͕CPUʢίΞʣੑೳʹൺྫͯ͠ૣ͘ͳΔ • ෳࡶͳཧػߏΛඋ͑ͳ͍ !9
)BZBCVTBʹ͍ͭͯ !10
Hayabusaͱʁ • େྔͷϩάΛߴʹݕࡧ͢Δʢ17ԯϨίʔυͷશจݕࡧ͕5ඵʣ • ελϯυΞϩϯαʔόͰಈ࡞͢Δ • ϚϧνίΞΛ༗ޮʹ͍ɺߴͳฒྻݕࡧॲཧΛ࣮ݱ͢Δ
StoreEngine • σΟεΫʹॻ͖ࠐ·ΕͨϩάΛߴʹಡΈࠐΉ • ಡΈࠐΜͩϩάΛSQLite3ͷϑΝΠϧͱมʢ1ߦ1Ϩίʔυʣ • SQLite3ͷશจݕࡧʹಛԽͨ͠FTS(Full Text Search)ܗࣜͰinsert •
࣌ؒσΟϨΫτϦߏʹରԠ : /targetdir/yyyy/mm/dd/hh/min.db StoreEngine
SearchEngine • GNU ParallelΛ༻͍ͯSQLite3ϑΝΠϧฒྻݕࡧΛ͔͚Δ $ parallel sqlite3 ::: target files
::: “select count(*) from xxx where logs match ‘keyword’;” • ݕࡧ݁ՌΛUNIXύΠϓϥΠϯΛ༻͍ͯɺawkcountίϚϯυͰूܭ $ parallel sqlite3 ::: target files ::: “select count(*) from xxx where logs match ‘keyword’;” | awk ‘{m+=$1} END{print m;}’ SeachEngine !13
શจݕࡧੑೳ • Apache SparkͱͷൺֱʢελϯυΞϩϯڥʣ • Apache SparkͱͷൺֱʢSpark x 3 +
HDFS vs Hayabusa x 1ʣ Hayabusa͕ ̐ഒߴ Hayabusa͕ 27ഒߴ
OSSͱͯ͠ެ։ • GitHubʹͯެ։ • https://github.com/hirolovesbeer/hayabusa !15
Hayabusaͷ • ελϯυΞϩϯڥ • ੑೳΛ্͛ΔʹεέʔϧΞοϓ͔͠ͳ͍ • εέʔϧΞοϓίετ • ࢄॲཧγεςϜͱͷࠩ •
ن͕େ͖͘ͳΕࢄॲཧγεςϜͷॲཧ͘ͳΔ • Hayabusa͍͔ͭੑೳ͕ൈ͔ΕΔ !16
ࢄ)BZBCVTBͷఏҊʢઃܭͱ࣮ʣ !17
త • HayabusaΛࢄॲཧγεςϜͱਐԽͤ͞ॲཧΛεέʔϧΞτͤ͞Δ • ελϯυΞϩϯͷੑೳੜ͔͠ଓ͚Δ • ࢄॲཧγεςϜͰ͋Δ͕γϯϓϧͳઃܭΛࢤ͢ • σʔλΛෳ͢Δ͜ͱͰোੑΛߴΊΔ !18
GNU ParallelͷϦϞʔτ࣮ߦ • ཧ : GNU ParallelͷϦϞʔτ࣮ߦΛར༻͢Εࢄ࣮ߦՄೳ $ time parallel
—controlmaster -S host1,host2,host3 sqlite3 ::: … • ݱ࣮ : sshͷΦʔόϔου͕͔͔Γॲཧ͕Ԇ ϗετ͕૿͑Δͱॲཧ͕࣌ؒ૿͑Δ
ఏҊख๏ • ࢄݕࡧ • ࣮ߦ͢ΔݕࡧॲཧΛRPCͱͯ͠HayabusaૹΓࠐΉ • ݁ՌΛRPCͷϨεϙϯεͱͯ͠ड͚औΓूܭ͢Δ • ฒྻੵ •
શͯͷϗετಉҰͷϦΫΤετ͕ಧ͍ͯಉ݁͡ՌΛฦ͢Α͏ʹ͢Δ • ࣄલʹશॲཧϗετͱϩάσʔλΛෳ͢Δ !20
ࢄHayabusaΞʔΩςΫνϟશ༰
ฒྻੵ • syslogΛෳϗετͱෳ͢Δ • શϗετͰಉҰͷsyslogΛड৴ • UDP SamplicatorʢOSSʣͷར༻ • syslogύέοτͷෳͱసૹ
• ෳॲཧͷίΞεέʔϧԽ • UDP SmaplicatorͷϚϧνϓϩηεԽ !22 syslogͷෳ
UDP SamplicatorͷϚϧνϓϩηεԽ • ϘτϧωοΫʹͳΓ͕ͪͳϓϩηεΛίΞεέʔϧ • SO_REUSEPORTΛར༻ͨ͠ϚϧνϓϩηεԽ • ͜ΕʹΑΓUDP 514ϙʔτ͕ෳϓϩηεͰγΣΞ͞ΕΔ socketΦϓγϣϯͷՃ
ۉʹsyslogసૹͷෛՙ͕ όϥϯε͞ΕΔ !23
ࢄݕࡧ • RPC • Producer / ConsumerϞσϧͷ࠾༻ • ࣮ •
ZeroMQͷPush / Pullύλʔϯ • ϦΫΤετͷϩʔυόϥϯε • Push / PullύλʔϯۉҰʹϦΫΤετΛϗετ͢Δ ZeroMQͷPush / Pullύλʔϯ !24
ࢄݕࡧ • ZeroMQΫϥΠΞϯτ • VentilatorͱSinkͷׂ • ZeroMQϫʔΧ • ड͚औͬͨॲཧϦΫΤετ Λ࣮ߦͯ݁͠ՌΛฦ͢
!25
ॲཧϦΫΤετ • ϦΫΤετ $ parallel sqlite3 ::: target files :::
“select count(*) from xxx where logs match ‘keyword’;” | awk ‘{m+=$1} END{print m;}’ ੨ࣈ : GNU ParallelͷίϚϯυΛ֤ॲཧϗετૹΓࠐΉ ࣈ : ΫϥΠΞϯτϗετͰ·ͱΊ͋͛Δ !26
΄΅ຊͳٙࣅίʔυ • ΫϥΠΞϯτ • Worker ࣮ߦίϚϯυ ίϚϯυΛ ϫʔΧૹ৴ ίϚϯυΛ࣮ߦ ݁ՌΛΫϥΠΞϯτૹ৴
݁ՌΛड͚λʔϛφϧදࣔ !27
ධՁ !28
࣮ݧڥ • Amazon Web Service (AWS) • EC2Πϯελϯε : c4.4xlarge
• vCPU : Xeon E5-2666 v3 @ 2.90GHz x 16 cores • ϝϞϦ : 30GB • σΟεΫʢEBSʣ : SSD 8GB (OS) + SSD 50GB (Data) • OS : Ubuntu 16.04.3 LTS (Xenial Xerus) !29
ࢄݕࡧ • ݕࡧͷ݅ • 1ͷσʔλʹରͯ͠100ճϦΫΤετΛ࣮ߦ͢Δ • 1ͷσʔλϑΝΠϧ60ʢ60ϑΝΠϧʣ x 24࣌ؒ =
1,440ϑΝΠϧ • 1ϑΝΠϧ͋ͨΓͷϨίʔυ10ສ݅ʢ1,440 x 10ສʹ1ԯ4400ສϨίʔυʣ • 100ճͷϦΫΤετͰ144ԯϨίʔυ͕ରͱͳΔ • ࣮ߦ͢ΔSQLจҎԼͰશจݕࡧͱΧϯτ • select count(*) from syslog where logs match ‘keyword’; !30
ࢄݕࡧʢϗετεέʔϧΞτʣ • ϗετΛ1͔Β10૿Ճͤ͞Δ • 1Ͱ249ඵ͔Β10Ͱ39ඵ·Ͱॖʢ10ճࢼߦฏۉʣ
ࢄݕࡧʢϗετεέʔϧΞτʣ • ϗετΛ1͔Β10૿Ճͤ͞Δ • 1Ͱ249ඵ͔Β10Ͱ39ඵ·Ͱॖʢ10ճࢼߦฏۉʣ Ϋϥυڥෆ҆ఆ ʢϕετΤϑΥʔτʣ
ࢄݕࡧʢWorkerεέʔϧΞτʣ • ϗετ10ɺ͔ͭ1͋ͨΓͷϫʔΧΛ1͔Β16·Ͱ૿Ճͤ͞Δ • 1ϗετ1 worker 249ඵ͔Β10ϗετ10 workerͰ6.8ඵ·Ͱॖ ͜ͷลΓ͕࠷ *0ڝ߹͕ى͖Δ͔Β͔
͔ΘΒͣ
݁Ռͷ·ͱΊ • ॲཧੑೳ • ϗετ10ͷ߹ : ϗετ1ͷ10ഒૣ͘ͳΔʢ249ඵ -> 39ඵʣ •
ϗετ10ͰϫʔΧΛ૿Ճ : ૯ϫʔΧ10ʙ160Ͱ 249ඵ -> 6.8ඵ • ϨίʔυΛϑϧεΩϟϯˍશจݕࡧͨ݁͠Ռ • 144ԯϨίʔυ͔ΒඞཁͳσʔλΛൈ͖ग़͢ͷʹ6.8ඵ·ͰߴԽ • 10ͷϗετͰ36ഒͷߴԽΛ࣮ݱ !34
Amazon Elastic MapReduceͱͷൺֱ • Amazon EMR : ΠϯελϯεHayabusaͱಉ͡c4.4xlarge • ߏ1Ϛελʔϊʔυ
+ 10 ίΞϊʔυ • σʔλͷΞΫηε • EMR͔ΒAmazon S3μΠϨΫτʹ ΞΫηε • શจݕࡧͷํ๏ • ϚελʔϊʔυͷPySpark͔Βߦ͏ JNQPSUUJNF GSPNQZTQBSLTRMJNQPSU42-$POUFYU TRM$POUFYU42-$POUFYU TD MJOFTTDUFYU'JMF TBCFXPSLTTECFODINBSLMPH pMFTLL MPH MJOFTDBDIF GPSJJOSBOHF TUBSUUJNFUJNF <MJOFTpMUFS MBNCEBTOPDJO T DPVOU GPSJJOSBOHF >FMBQTFE@UJNFUJNFUJNF TUBSUQSJOUFMBQTFE@UJNF 1Z4QBSLͰ࣮ߦ͢Δίʔυ
Amazon Elastic Mapreduceͱͷൺֱ • ࣮ߦ݁Ռ • 10ͷߏͰ17ഒHayabusaͷํ͕ߴʹಈ࡞
ߟ !37
ݕࡧͷεέʔϧΞτ • 144ԯ͔ΒඞཁͳσʔλΛൈ͖ग़͢ͷʹ6.8ඵ·ͰߴԽ • 2લͷBigQueryͷϑϧεΩϟϯ͕120ԯϨίʔυͰ5ඵ • 10ͷϗετͰ36ഒͷߴԽΛ࣮ݱ • BigQueryԿඦɺԿઍͷϗετ͕ಉ࣌ʹಈ͍͍ͯΔ͔ෆ໌ •
Amazon Elastic MapReduceͱͷൺֱ • 10ͷߏͰ17ഒHayabusaͷํ͕ߴʹશจݕࡧՄೳ • γεςϜͷίετΛߟ͑ͨ߹ • ϦʔζφϒϧͰߴੑೳͳࢄݕࡧॲཧ͕࣮ݱͰ͖ͨ !38
ੵͷฒྻԽ • syslogͷෳͷ • େྔͷσʔλʢύέοτʣͷෳͰଳҬΛѹഭ͢Δ • ຊདྷͰ͋ΕHDFSͷΑ͏ʹࢄϑΝΠϧγεςϜΛ͏͖ • ϝλσʔλػߏΛܦ༝ͯ͠σʔλʹΞΫηε͢ΔͨΊຊ࣭తʹ͘ͳΔ •
ࢄϑΝΠϧγεςϜͱ͍ͯ͠ʢҰͭͷݚڀʣ • γϯϓϧ͞ͷٻͷ݁Ռ • อ࣋σʔλ͕ػثͷނোͰফࣦͨ͠ͱͯ͠ෳ͕ΔɾނোػΛ֎͚ͩ͢ • ࢄϑΝΠϧγεςϜͷΑ͏ʹ࠶ஔॲཧ͕ෆཁ !39
γϯϓϧͳઃܭʹΑΔӡ༻ͷ؆ུԽ • ࢄݕࡧ • Procedure / ConsumerϞσϧͰ࣮ݱ • ϓϩηε࣮ߦεέδϡʔϥGNU Parallelʹґଘ
• ෳࡶͳࢄγεςϜΛΘͳ͍ར • τϥϒϧѲͷߴԽ • γεςϜӡ༻ෛՙͷܰݮ !40
ߴԽͷ؊ • ׂΓΓઃܭ • ϦτϥΠॲཧ/Τϥʔॲཧະ࣮ • εέδϡʔϥ • ZeroMQͱGnu Parallelʹ͓ͤ
• ετϨʔδ • ࢄอଘͤͣ͞ʹෳΛอ࣋
ϋʔυΣΞʹґଘ͢Δ • CPU Core • ૣ͚Εૣ͍΄Ͳྑ͍ • CoreͷΑΓΫϩοΫ͕ͦͦ͜͜ૣ͍ํ͕͕ग़Δ͜ͱ͋Δ • σΟεΫ
• SSDNVMeʢͦΓΌૣ͍ʹܾ·͍ͬͯΔʣ • I/OੑೳΛҾ͖ग़͢
ଞͷγεςϜͱͷൺֱ • શจݕࡧͰApache Sparkͱൺֱͨ͠ • Elasticsearchͱͷൺֱʁ • Ͳ͏ͬͯൺΔʁ • ΤϯδϯͷʁʢElasticsearchͱͯૣ͍ʣ
• ݺͼग़͠APIͷՃຯ͢ΔʁʢREST APIݺͼग़͠ͱ͍ͯʣ • Write & Read • ॻ͖ͳ͕ΒಡΈࠐΜͩ߹ʁ
·ͱΊͱࠓޙͷ՝ !44
·ͱΊ • HayabusaͷࢄγεςϜԽͷઃܭͱ࣮ • 144ԯϨίʔυͷsyslogϑϧεΩϟϯˍશจݕࡧΛ6.8ඵͰ࣮ݱ • ϚϧνϕϯμػثΛରͱͨ͠ɺେྔͷෆἧ͍ͳϩάΛߴʹݕࡧՄೳ • τϥϒϧγϡʔτɾΠϯγσϯτϨεϙϯεΛஶ͘͠ॖ͢ΔՄೳੑ •
γϯϓϧͳࢄॲཧߏʹΑΔཧͷ༰қੑ !45
ࠓޙͷ՝ • ଞͷιϑτΣΞͱͷൺֱʢBigQuery, ElasticSearch, Splunkʣ • HayabusaͱଞͷΞϓϦέʔγϣϯͱͷ༥߹ʢΞϊϚϦݕͳͲʣ • Hayabusaͱ౷ܭॲཧϥΠϒϥϦػցֶशϥΠϒϥϦͱͷ݁߹ •
ࢄϑΝΠϧγεςϜɾࢄετϨʔδͷ࣮ !46
ँࣙ • ຊݚڀͷҰ෦ɺࠃཱݚڀ։ൃ๏ਓՊֶٕज़ৼڵػߏʢJSTʣͷݚڀՌ ൃలࣄۀʮઓུతݚڀਪਐࣄۀʢCRESTʣJPMJCR1783ʯͷࢧԉʹ ΑͬͯߦΘΕͨ
None