Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
OSC-Hokkaido-2018-hayabusa
Search
Hiroshi
July 07, 2018
Research
0
680
OSC-Hokkaido-2018-hayabusa
This is the presentation material for OSC Hokkaido 2018
Hiroshi
July 07, 2018
Tweet
Share
More Decks by Hiroshi
See All by Hiroshi
pepacon night : log research working group report
hirolovesbeer
0
1.3k
イベントネットワークにおけるsyslog分析でのElasticsearchの利用
hirolovesbeer
1
1.1k
Other Decks in Research
See All in Research
Weekly AI Agents News!
masatoto
22
18k
大規模言語モデルのバイアス
yukinobaba
PRO
4
530
JMED-LLM: 日本語医療LLM評価データセットの公開
fta98
4
990
Weekly AI Agents News! 6月号 論文のアーカイブ
masatoto
1
130
言語処理学会30周年記念事業留学支援交流会@YANS2024:「学生のための短期留学」
a1da4
1
190
Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve
eumesy
PRO
6
1k
Weekly AI Agents News! 6月号 プロダクト/ニュースのアーカイブ
masatoto
0
120
20240725異文化融合研究セミナーiSeminar
tadook
0
120
Weekly AI Agents News! 7月号 論文のアーカイブ
masatoto
1
160
Active Adaptive Experimental Design for Treatment Effect Estimation with Covariate Choices
masakat0
0
170
「並列化時代の乱数生成」
abap34
2
320
ミニ四駆AI用制御装置の事例紹介
aks3g
0
130
Featured
See All Featured
4 Signs Your Business is Dying
shpigford
179
21k
Web development in the modern age
philhawksworth
205
10k
Debugging Ruby Performance
tmm1
72
12k
The Brand Is Dead. Long Live the Brand.
mthomps
53
38k
How STYLIGHT went responsive
nonsquared
93
5.1k
Art, The Web, and Tiny UX
lynnandtonic
294
20k
Code Reviewing Like a Champion
maltzj
517
39k
The Pragmatic Product Professional
lauravandoore
31
6.2k
Code Review Best Practice
trishagee
62
16k
Atom: Resistance is Futile
akmur
261
25k
Documentation Writing (for coders)
carmenintech
65
4.3k
Making Projects Easy
brettharned
113
5.8k
Transcript
Hayabusa ߴʹશจݕࡧՄೳͳ OSSϩάݕࡧΤϯδϯͷ͝հ Ѩ෦ തɿגࣜձࣾϨϐμϜ ݚڀһ OSC 2018 Hokkaido 2018/07/08
ࣗݾհ • ໊લɿѨ෦ ത • ॴଐɿגࣜձࣾϨϐμϜʢݚڀһʣɺίίϯגࣜձࣾʢࣾิࠤ/ٕज़ݚڀ ॴ ݚڀһʣɺใ௨৴ݚڀػߏʢڠྗݚڀһʣɺઌՊֶٕज़େֶ Ӄେֶʢത࢜ޙظ՝ఔʣ •
ͦͷଞɿInterop Tokyo ShowNet NOCϝϯόʔ
࣍ • എܠͱత • Hayabusaʹ͍ͭͯ • ࢄHayabusaͷఏҊʢઃܭͱ࣮ʣ • ධՁ •
ߟ • ·ͱΊͱࠓޙͷ՝ !3
എܠͱత !4
Interop Tokyo ShowNet 2018 • 900Λ͑ΔཧɾԾػث܈ • ΄΅શͯͷػث͕syslogΛૹ৴ • ߏஙظؒʹड৴͢Δsyslogྔ
• 2ສ݅/ඵʢ20k/secʣ • 1ԯ̓ઍສ݅/ !5
ShowNetʹ͓͚Δϩάͷӡ༻ • େྔͷϩάΛੵ͢Δ • େྔͷϩά͔Βݕࡧ͢Δ • ΠϯγσϯτରԠͷͨΊʹϩάΛݕࡧ͢Δ • τϥϒϧγϡʔτͷͨΊʹϩάΛݕࡧ͢Δ •
ϩά͔Β౷ܭใΛऔಘ͢Δ • ߜΓࠐΜͩݕࡧใΛ౷ܭใͱͯ͠දࣔ͢Δ !6
طଘͷղܾࡦ • HadoopΤίγεςϜʢSpark, Impala, Hive, …ʣ • OSSʢElasticsearch + Kibana,
fluentd, …ʣ • ༻ϓϩμΫτʢSplunk, VMware Loginsight, …ʣ • ΫϥυαʔϏεʢGoogle BigQuery, Treasure Data, …ʣ !7
େ͖ͳ • ϩάͷߏԽ͕Ͱ͖ͳ͍ • ػࡐʹ౷Ұੑ͕ͳ͍ɾ࠷৽ͷϑΝʔϜ͗ͯ͢ใ͕ͳ͍ • ετϦʔϛϯάॲཧ͕͍͠ྲྀྔ • ϩάͷྲྀྔ͕ଟ͗ͯ͢ॲཧ͕͍͔ͭͳ͍ •
όονॲཧ͕͍͔ͭͳ͍ • όονॲཧ͕ࢦఆ࣌ؒʹऴΘΒͳ͍ • ࢄॲཧγεςϜ͕ෳࡶ͗͢Δ • ཧίετ͕ലେ !8
త • ܰྔʹߏஙɾӡ༻͕ߦ͑ΔγεςϜͷ࣮ݱ • γϯϓϧͰεέʔϧΞοϓՄೳͳγεςϜͷ࣮ݱ • ݕࡧੑೳ͕CPUʢίΞʣੑೳʹൺྫͯ͠ૣ͘ͳΔ • ෳࡶͳཧػߏΛඋ͑ͳ͍ !9
)BZBCVTBʹ͍ͭͯ !10
Hayabusaͱʁ • େྔͷϩάΛߴʹݕࡧ͢Δʢ17ԯϨίʔυͷશจݕࡧ͕5ඵʣ • ελϯυΞϩϯαʔόͰಈ࡞͢Δ • ϚϧνίΞΛ༗ޮʹ͍ɺߴͳฒྻݕࡧॲཧΛ࣮ݱ͢Δ
StoreEngine • σΟεΫʹॻ͖ࠐ·ΕͨϩάΛߴʹಡΈࠐΉ • ಡΈࠐΜͩϩάΛSQLite3ͷϑΝΠϧͱมʢ1ߦ1Ϩίʔυʣ • SQLite3ͷશจݕࡧʹಛԽͨ͠FTS(Full Text Search)ܗࣜͰinsert •
࣌ؒσΟϨΫτϦߏʹରԠ : /targetdir/yyyy/mm/dd/hh/min.db StoreEngine
SearchEngine • GNU ParallelΛ༻͍ͯSQLite3ϑΝΠϧฒྻݕࡧΛ͔͚Δ $ parallel sqlite3 ::: target files
::: “select count(*) from xxx where logs match ‘keyword’;” • ݕࡧ݁ՌΛUNIXύΠϓϥΠϯΛ༻͍ͯɺawkcountίϚϯυͰूܭ $ parallel sqlite3 ::: target files ::: “select count(*) from xxx where logs match ‘keyword’;” | awk ‘{m+=$1} END{print m;}’ SeachEngine !13
શจݕࡧੑೳ • Apache SparkͱͷൺֱʢελϯυΞϩϯڥʣ • Apache SparkͱͷൺֱʢSpark x 3 +
HDFS vs Hayabusa x 1ʣ Hayabusa͕ ̐ഒߴ Hayabusa͕ 27ഒߴ
OSSͱͯ͠ެ։ • GitHubʹͯެ։ • https://github.com/hirolovesbeer/hayabusa !15
Hayabusaͷ • ελϯυΞϩϯڥ • ੑೳΛ্͛ΔʹεέʔϧΞοϓ͔͠ͳ͍ • εέʔϧΞοϓίετ • ࢄॲཧγεςϜͱͷࠩ •
ن͕େ͖͘ͳΕࢄॲཧγεςϜͷॲཧ͘ͳΔ • Hayabusa͍͔ͭੑೳ͕ൈ͔ΕΔ !16
ࢄ)BZBCVTBͷఏҊʢઃܭͱ࣮ʣ !17
త • HayabusaΛࢄॲཧγεςϜͱਐԽͤ͞ॲཧΛεέʔϧΞτͤ͞Δ • ελϯυΞϩϯͷੑೳੜ͔͠ଓ͚Δ • ࢄॲཧγεςϜͰ͋Δ͕γϯϓϧͳઃܭΛࢤ͢ • σʔλΛෳ͢Δ͜ͱͰোੑΛߴΊΔ !18
GNU ParallelͷϦϞʔτ࣮ߦ • ཧ : GNU ParallelͷϦϞʔτ࣮ߦΛར༻͢Εࢄ࣮ߦՄೳ $ time parallel
—controlmaster -S host1,host2,host3 sqlite3 ::: … • ݱ࣮ : sshͷΦʔόϔου͕͔͔Γॲཧ͕Ԇ ϗετ͕૿͑Δͱॲཧ͕࣌ؒ૿͑Δ
ఏҊख๏ • ࢄݕࡧ • ࣮ߦ͢ΔݕࡧॲཧΛRPCͱͯ͠HayabusaૹΓࠐΉ • ݁ՌΛRPCͷϨεϙϯεͱͯ͠ड͚औΓूܭ͢Δ • ฒྻੵ •
શͯͷϗετಉҰͷϦΫΤετ͕ಧ͍ͯಉ݁͡ՌΛฦ͢Α͏ʹ͢Δ • ࣄલʹશॲཧϗετͱϩάσʔλΛෳ͢Δ !20
ࢄHayabusaΞʔΩςΫνϟશ༰
ฒྻੵ • syslogΛෳϗετͱෳ͢Δ • શϗετͰಉҰͷsyslogΛड৴ • UDP SamplicatorʢOSSʣͷར༻ • syslogύέοτͷෳͱసૹ
• ෳॲཧͷίΞεέʔϧԽ • UDP SmaplicatorͷϚϧνϓϩηεԽ !22 syslogͷෳ
UDP SamplicatorͷϚϧνϓϩηεԽ • ϘτϧωοΫʹͳΓ͕ͪͳϓϩηεΛίΞεέʔϧ • SO_REUSEPORTΛར༻ͨ͠ϚϧνϓϩηεԽ • ͜ΕʹΑΓUDP 514ϙʔτ͕ෳϓϩηεͰγΣΞ͞ΕΔ socketΦϓγϣϯͷՃ
ۉʹsyslogసૹͷෛՙ͕ όϥϯε͞ΕΔ !23
ࢄݕࡧ • RPC • Producer / ConsumerϞσϧͷ࠾༻ • ࣮ •
ZeroMQͷPush / Pullύλʔϯ • ϦΫΤετͷϩʔυόϥϯε • Push / PullύλʔϯۉҰʹϦΫΤετΛϗετ͢Δ ZeroMQͷPush / Pullύλʔϯ !24
ࢄݕࡧ • ZeroMQΫϥΠΞϯτ • VentilatorͱSinkͷׂ • ZeroMQϫʔΧ • ड͚औͬͨॲཧϦΫΤετ Λ࣮ߦͯ݁͠ՌΛฦ͢
!25
ॲཧϦΫΤετ • ϦΫΤετ $ parallel sqlite3 ::: target files :::
“select count(*) from xxx where logs match ‘keyword’;” | awk ‘{m+=$1} END{print m;}’ ੨ࣈ : GNU ParallelͷίϚϯυΛ֤ॲཧϗετૹΓࠐΉ ࣈ : ΫϥΠΞϯτϗετͰ·ͱΊ͋͛Δ !26
΄΅ຊͳٙࣅίʔυ • ΫϥΠΞϯτ • Worker ࣮ߦίϚϯυ ίϚϯυΛ ϫʔΧૹ৴ ίϚϯυΛ࣮ߦ ݁ՌΛΫϥΠΞϯτૹ৴
݁ՌΛड͚λʔϛφϧදࣔ !27
ධՁ !28
࣮ݧڥ • Amazon Web Service (AWS) • EC2Πϯελϯε : c4.4xlarge
• vCPU : Xeon E5-2666 v3 @ 2.90GHz x 16 cores • ϝϞϦ : 30GB • σΟεΫʢEBSʣ : SSD 8GB (OS) + SSD 50GB (Data) • OS : Ubuntu 16.04.3 LTS (Xenial Xerus) !29
ࢄݕࡧ • ݕࡧͷ݅ • 1ͷσʔλʹରͯ͠100ճϦΫΤετΛ࣮ߦ͢Δ • 1ͷσʔλϑΝΠϧ60ʢ60ϑΝΠϧʣ x 24࣌ؒ =
1,440ϑΝΠϧ • 1ϑΝΠϧ͋ͨΓͷϨίʔυ10ສ݅ʢ1,440 x 10ສʹ1ԯ4400ສϨίʔυʣ • 100ճͷϦΫΤετͰ144ԯϨίʔυ͕ରͱͳΔ • ࣮ߦ͢ΔSQLจҎԼͰશจݕࡧͱΧϯτ • select count(*) from syslog where logs match ‘keyword’; !30
ࢄݕࡧʢϗετεέʔϧΞτʣ • ϗετΛ1͔Β10૿Ճͤ͞Δ • 1Ͱ249ඵ͔Β10Ͱ39ඵ·Ͱॖʢ10ճࢼߦฏۉʣ
ࢄݕࡧʢϗετεέʔϧΞτʣ • ϗετΛ1͔Β10૿Ճͤ͞Δ • 1Ͱ249ඵ͔Β10Ͱ39ඵ·Ͱॖʢ10ճࢼߦฏۉʣ Ϋϥυڥෆ҆ఆ ʢϕετΤϑΥʔτʣ
ࢄݕࡧʢWorkerεέʔϧΞτʣ • ϗετ10ɺ͔ͭ1͋ͨΓͷϫʔΧΛ1͔Β16·Ͱ૿Ճͤ͞Δ • 1ϗετ1 worker 249ඵ͔Β10ϗετ10 workerͰ6.8ඵ·Ͱॖ ͜ͷลΓ͕࠷ *0ڝ߹͕ى͖Δ͔Β͔
͔ΘΒͣ
݁Ռͷ·ͱΊ • ॲཧੑೳ • ϗετ10ͷ߹ : ϗετ1ͷ10ഒૣ͘ͳΔʢ249ඵ -> 39ඵʣ •
ϗετ10ͰϫʔΧΛ૿Ճ : ૯ϫʔΧ10ʙ160Ͱ 249ඵ -> 6.8ඵ • ϨίʔυΛϑϧεΩϟϯˍશจݕࡧͨ݁͠Ռ • 144ԯϨίʔυ͔ΒඞཁͳσʔλΛൈ͖ग़͢ͷʹ6.8ඵ·ͰߴԽ • 10ͷϗετͰ36ഒͷߴԽΛ࣮ݱ !34
Amazon Elastic MapReduceͱͷൺֱ • Amazon EMR : ΠϯελϯεHayabusaͱಉ͡c4.4xlarge • ߏ1Ϛελʔϊʔυ
+ 10 ίΞϊʔυ • σʔλͷΞΫηε • EMR͔ΒAmazon S3μΠϨΫτʹ ΞΫηε • શจݕࡧͷํ๏ • ϚελʔϊʔυͷPySpark͔Βߦ͏ JNQPSUUJNF GSPNQZTQBSLTRMJNQPSU42-$POUFYU TRM$POUFYU42-$POUFYU TD MJOFTTDUFYU'JMF TBCFXPSLTTECFODINBSLMPH pMFTLL MPH MJOFTDBDIF GPSJJOSBOHF TUBSUUJNFUJNF <MJOFTpMUFS MBNCEBTOPDJO T DPVOU GPSJJOSBOHF >FMBQTFE@UJNFUJNFUJNF TUBSUQSJOUFMBQTFE@UJNF 1Z4QBSLͰ࣮ߦ͢Δίʔυ
Amazon Elastic Mapreduceͱͷൺֱ • ࣮ߦ݁Ռ • 10ͷߏͰ17ഒHayabusaͷํ͕ߴʹಈ࡞
ߟ !37
ݕࡧͷεέʔϧΞτ • 144ԯ͔ΒඞཁͳσʔλΛൈ͖ग़͢ͷʹ6.8ඵ·ͰߴԽ • 2લͷBigQueryͷϑϧεΩϟϯ͕120ԯϨίʔυͰ5ඵ • 10ͷϗετͰ36ഒͷߴԽΛ࣮ݱ • BigQueryԿඦɺԿઍͷϗετ͕ಉ࣌ʹಈ͍͍ͯΔ͔ෆ໌ •
Amazon Elastic MapReduceͱͷൺֱ • 10ͷߏͰ17ഒHayabusaͷํ͕ߴʹશจݕࡧՄೳ • γεςϜͷίετΛߟ͑ͨ߹ • ϦʔζφϒϧͰߴੑೳͳࢄݕࡧॲཧ͕࣮ݱͰ͖ͨ !38
ੵͷฒྻԽ • syslogͷෳͷ • େྔͷσʔλʢύέοτʣͷෳͰଳҬΛѹഭ͢Δ • ຊདྷͰ͋ΕHDFSͷΑ͏ʹࢄϑΝΠϧγεςϜΛ͏͖ • ϝλσʔλػߏΛܦ༝ͯ͠σʔλʹΞΫηε͢ΔͨΊຊ࣭తʹ͘ͳΔ •
ࢄϑΝΠϧγεςϜͱ͍ͯ͠ʢҰͭͷݚڀʣ • γϯϓϧ͞ͷٻͷ݁Ռ • อ࣋σʔλ͕ػثͷނোͰফࣦͨ͠ͱͯ͠ෳ͕ΔɾނোػΛ֎͚ͩ͢ • ࢄϑΝΠϧγεςϜͷΑ͏ʹ࠶ஔॲཧ͕ෆཁ !39
γϯϓϧͳઃܭʹΑΔӡ༻ͷ؆ུԽ • ࢄݕࡧ • Procedure / ConsumerϞσϧͰ࣮ݱ • ϓϩηε࣮ߦεέδϡʔϥGNU Parallelʹґଘ
• ෳࡶͳࢄγεςϜΛΘͳ͍ར • τϥϒϧѲͷߴԽ • γεςϜӡ༻ෛՙͷܰݮ !40
ߴԽͷ؊ • ׂΓΓઃܭ • ϦτϥΠॲཧ/Τϥʔॲཧະ࣮ • εέδϡʔϥ • ZeroMQͱGnu Parallelʹ͓ͤ
• ετϨʔδ • ࢄอଘͤͣ͞ʹෳΛอ࣋
ϋʔυΣΞʹґଘ͢Δ • CPU Core • ૣ͚Εૣ͍΄Ͳྑ͍ • CoreͷΑΓΫϩοΫ͕ͦͦ͜͜ૣ͍ํ͕͕ग़Δ͜ͱ͋Δ • σΟεΫ
• SSDNVMeʢͦΓΌૣ͍ʹܾ·͍ͬͯΔʣ • I/OੑೳΛҾ͖ग़͢
ଞͷγεςϜͱͷൺֱ • શจݕࡧͰApache Sparkͱൺֱͨ͠ • Elasticsearchͱͷൺֱʁ • Ͳ͏ͬͯൺΔʁ • ΤϯδϯͷʁʢElasticsearchͱͯૣ͍ʣ
• ݺͼग़͠APIͷՃຯ͢ΔʁʢREST APIݺͼग़͠ͱ͍ͯʣ • Write & Read • ॻ͖ͳ͕ΒಡΈࠐΜͩ߹ʁ
·ͱΊͱࠓޙͷ՝ !44
·ͱΊ • HayabusaͷࢄγεςϜԽͷઃܭͱ࣮ • 144ԯϨίʔυͷsyslogϑϧεΩϟϯˍશจݕࡧΛ6.8ඵͰ࣮ݱ • ϚϧνϕϯμػثΛରͱͨ͠ɺେྔͷෆἧ͍ͳϩάΛߴʹݕࡧՄೳ • τϥϒϧγϡʔτɾΠϯγσϯτϨεϙϯεΛஶ͘͠ॖ͢ΔՄೳੑ •
γϯϓϧͳࢄॲཧߏʹΑΔཧͷ༰қੑ !45
ࠓޙͷ՝ • ଞͷιϑτΣΞͱͷൺֱʢBigQuery, ElasticSearch, Splunkʣ • HayabusaͱଞͷΞϓϦέʔγϣϯͱͷ༥߹ʢΞϊϚϦݕͳͲʣ • Hayabusaͱ౷ܭॲཧϥΠϒϥϦػցֶशϥΠϒϥϦͱͷ݁߹ •
ࢄϑΝΠϧγεςϜɾࢄετϨʔδͷ࣮ !46
ँࣙ • ຊݚڀͷҰ෦ɺࠃཱݚڀ։ൃ๏ਓՊֶٕज़ৼڵػߏʢJSTʣͷݚڀՌ ൃలࣄۀʮઓུతݚڀਪਐࣄۀʢCRESTʣJPMJCR1783ʯͷࢧԉʹ ΑͬͯߦΘΕͨ
None