Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
OSC-Hokkaido-2018-hayabusa
Search
Hiroshi
July 07, 2018
Research
0
700
OSC-Hokkaido-2018-hayabusa
This is the presentation material for OSC Hokkaido 2018
Hiroshi
July 07, 2018
Tweet
Share
More Decks by Hiroshi
See All by Hiroshi
pepacon night : log research working group report
hirolovesbeer
0
1.4k
イベントネットワークにおけるsyslog分析でのElasticsearchの利用
hirolovesbeer
1
1.2k
Other Decks in Research
See All in Research
Panopticon: Advancing Any-Sensor Foundation Models for Earth Observation
satai
3
370
湯村研究室の紹介2025 / yumulab2025
yumulab
0
200
投資戦略202508
pw
0
580
長期・短期メモリを活用したエージェントの個別最適化
isidaitc
0
300
Sat2City:3D City Generation from A Single Satellite Image with Cascaded Latent Diffusion
satai
4
300
POI: Proof of Identity
katsyoshi
0
110
離散凸解析に基づく予測付き離散最適化手法 (IBIS '25)
taihei_oki
PRO
1
600
大規模言語モデルにおけるData-Centric AIと合成データの活用 / Data-Centric AI and Synthetic Data in Large Language Models
tsurubee
1
390
Pythonでジオを使い倒そう! 〜それとFOSS4G Hiroshima 2026のご紹介を少し〜
wata909
0
1.2k
日本語新聞記事を用いた大規模言語モデルの暗記定量化 / LLMC2025
upura
0
350
SNLP2025:Can Language Models Reason about Individualistic Human Values and Preferences?
yukizenimoto
0
220
CoRL2025速報
rpc
2
3.4k
Featured
See All Featured
A better future with KSS
kneath
240
18k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
Building a Modern Day E-commerce SEO Strategy
aleyda
45
8.3k
Designing for Performance
lara
610
69k
Writing Fast Ruby
sferik
630
62k
It's Worth the Effort
3n
187
29k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
31
2.7k
A Tale of Four Properties
chriscoyier
162
23k
Imperfection Machines: The Place of Print at Facebook
scottboms
269
13k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
10
700
Optimizing for Happiness
mojombo
379
70k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.8k
Transcript
Hayabusa ߴʹશจݕࡧՄೳͳ OSSϩάݕࡧΤϯδϯͷ͝հ Ѩ෦ തɿגࣜձࣾϨϐμϜ ݚڀһ OSC 2018 Hokkaido 2018/07/08
ࣗݾհ • ໊લɿѨ෦ ത • ॴଐɿגࣜձࣾϨϐμϜʢݚڀһʣɺίίϯגࣜձࣾʢࣾิࠤ/ٕज़ݚڀ ॴ ݚڀһʣɺใ௨৴ݚڀػߏʢڠྗݚڀһʣɺઌՊֶٕज़େֶ Ӄେֶʢത࢜ޙظ՝ఔʣ •
ͦͷଞɿInterop Tokyo ShowNet NOCϝϯόʔ
࣍ • എܠͱత • Hayabusaʹ͍ͭͯ • ࢄHayabusaͷఏҊʢઃܭͱ࣮ʣ • ධՁ •
ߟ • ·ͱΊͱࠓޙͷ՝ !3
എܠͱత !4
Interop Tokyo ShowNet 2018 • 900Λ͑ΔཧɾԾػث܈ • ΄΅શͯͷػث͕syslogΛૹ৴ • ߏஙظؒʹड৴͢Δsyslogྔ
• 2ສ݅/ඵʢ20k/secʣ • 1ԯ̓ઍສ݅/ !5
ShowNetʹ͓͚Δϩάͷӡ༻ • େྔͷϩάΛੵ͢Δ • େྔͷϩά͔Βݕࡧ͢Δ • ΠϯγσϯτରԠͷͨΊʹϩάΛݕࡧ͢Δ • τϥϒϧγϡʔτͷͨΊʹϩάΛݕࡧ͢Δ •
ϩά͔Β౷ܭใΛऔಘ͢Δ • ߜΓࠐΜͩݕࡧใΛ౷ܭใͱͯ͠දࣔ͢Δ !6
طଘͷղܾࡦ • HadoopΤίγεςϜʢSpark, Impala, Hive, …ʣ • OSSʢElasticsearch + Kibana,
fluentd, …ʣ • ༻ϓϩμΫτʢSplunk, VMware Loginsight, …ʣ • ΫϥυαʔϏεʢGoogle BigQuery, Treasure Data, …ʣ !7
େ͖ͳ • ϩάͷߏԽ͕Ͱ͖ͳ͍ • ػࡐʹ౷Ұੑ͕ͳ͍ɾ࠷৽ͷϑΝʔϜ͗ͯ͢ใ͕ͳ͍ • ετϦʔϛϯάॲཧ͕͍͠ྲྀྔ • ϩάͷྲྀྔ͕ଟ͗ͯ͢ॲཧ͕͍͔ͭͳ͍ •
όονॲཧ͕͍͔ͭͳ͍ • όονॲཧ͕ࢦఆ࣌ؒʹऴΘΒͳ͍ • ࢄॲཧγεςϜ͕ෳࡶ͗͢Δ • ཧίετ͕ലେ !8
త • ܰྔʹߏஙɾӡ༻͕ߦ͑ΔγεςϜͷ࣮ݱ • γϯϓϧͰεέʔϧΞοϓՄೳͳγεςϜͷ࣮ݱ • ݕࡧੑೳ͕CPUʢίΞʣੑೳʹൺྫͯ͠ૣ͘ͳΔ • ෳࡶͳཧػߏΛඋ͑ͳ͍ !9
)BZBCVTBʹ͍ͭͯ !10
Hayabusaͱʁ • େྔͷϩάΛߴʹݕࡧ͢Δʢ17ԯϨίʔυͷશจݕࡧ͕5ඵʣ • ελϯυΞϩϯαʔόͰಈ࡞͢Δ • ϚϧνίΞΛ༗ޮʹ͍ɺߴͳฒྻݕࡧॲཧΛ࣮ݱ͢Δ
StoreEngine • σΟεΫʹॻ͖ࠐ·ΕͨϩάΛߴʹಡΈࠐΉ • ಡΈࠐΜͩϩάΛSQLite3ͷϑΝΠϧͱมʢ1ߦ1Ϩίʔυʣ • SQLite3ͷશจݕࡧʹಛԽͨ͠FTS(Full Text Search)ܗࣜͰinsert •
࣌ؒσΟϨΫτϦߏʹରԠ : /targetdir/yyyy/mm/dd/hh/min.db StoreEngine
SearchEngine • GNU ParallelΛ༻͍ͯSQLite3ϑΝΠϧฒྻݕࡧΛ͔͚Δ $ parallel sqlite3 ::: target files
::: “select count(*) from xxx where logs match ‘keyword’;” • ݕࡧ݁ՌΛUNIXύΠϓϥΠϯΛ༻͍ͯɺawkcountίϚϯυͰूܭ $ parallel sqlite3 ::: target files ::: “select count(*) from xxx where logs match ‘keyword’;” | awk ‘{m+=$1} END{print m;}’ SeachEngine !13
શจݕࡧੑೳ • Apache SparkͱͷൺֱʢελϯυΞϩϯڥʣ • Apache SparkͱͷൺֱʢSpark x 3 +
HDFS vs Hayabusa x 1ʣ Hayabusa͕ ̐ഒߴ Hayabusa͕ 27ഒߴ
OSSͱͯ͠ެ։ • GitHubʹͯެ։ • https://github.com/hirolovesbeer/hayabusa !15
Hayabusaͷ • ελϯυΞϩϯڥ • ੑೳΛ্͛ΔʹεέʔϧΞοϓ͔͠ͳ͍ • εέʔϧΞοϓίετ • ࢄॲཧγεςϜͱͷࠩ •
ن͕େ͖͘ͳΕࢄॲཧγεςϜͷॲཧ͘ͳΔ • Hayabusa͍͔ͭੑೳ͕ൈ͔ΕΔ !16
ࢄ)BZBCVTBͷఏҊʢઃܭͱ࣮ʣ !17
త • HayabusaΛࢄॲཧγεςϜͱਐԽͤ͞ॲཧΛεέʔϧΞτͤ͞Δ • ελϯυΞϩϯͷੑೳੜ͔͠ଓ͚Δ • ࢄॲཧγεςϜͰ͋Δ͕γϯϓϧͳઃܭΛࢤ͢ • σʔλΛෳ͢Δ͜ͱͰোੑΛߴΊΔ !18
GNU ParallelͷϦϞʔτ࣮ߦ • ཧ : GNU ParallelͷϦϞʔτ࣮ߦΛར༻͢Εࢄ࣮ߦՄೳ $ time parallel
—controlmaster -S host1,host2,host3 sqlite3 ::: … • ݱ࣮ : sshͷΦʔόϔου͕͔͔Γॲཧ͕Ԇ ϗετ͕૿͑Δͱॲཧ͕࣌ؒ૿͑Δ
ఏҊख๏ • ࢄݕࡧ • ࣮ߦ͢ΔݕࡧॲཧΛRPCͱͯ͠HayabusaૹΓࠐΉ • ݁ՌΛRPCͷϨεϙϯεͱͯ͠ड͚औΓूܭ͢Δ • ฒྻੵ •
શͯͷϗετಉҰͷϦΫΤετ͕ಧ͍ͯಉ݁͡ՌΛฦ͢Α͏ʹ͢Δ • ࣄલʹશॲཧϗετͱϩάσʔλΛෳ͢Δ !20
ࢄHayabusaΞʔΩςΫνϟશ༰
ฒྻੵ • syslogΛෳϗετͱෳ͢Δ • શϗετͰಉҰͷsyslogΛड৴ • UDP SamplicatorʢOSSʣͷར༻ • syslogύέοτͷෳͱసૹ
• ෳॲཧͷίΞεέʔϧԽ • UDP SmaplicatorͷϚϧνϓϩηεԽ !22 syslogͷෳ
UDP SamplicatorͷϚϧνϓϩηεԽ • ϘτϧωοΫʹͳΓ͕ͪͳϓϩηεΛίΞεέʔϧ • SO_REUSEPORTΛར༻ͨ͠ϚϧνϓϩηεԽ • ͜ΕʹΑΓUDP 514ϙʔτ͕ෳϓϩηεͰγΣΞ͞ΕΔ socketΦϓγϣϯͷՃ
ۉʹsyslogసૹͷෛՙ͕ όϥϯε͞ΕΔ !23
ࢄݕࡧ • RPC • Producer / ConsumerϞσϧͷ࠾༻ • ࣮ •
ZeroMQͷPush / Pullύλʔϯ • ϦΫΤετͷϩʔυόϥϯε • Push / PullύλʔϯۉҰʹϦΫΤετΛϗετ͢Δ ZeroMQͷPush / Pullύλʔϯ !24
ࢄݕࡧ • ZeroMQΫϥΠΞϯτ • VentilatorͱSinkͷׂ • ZeroMQϫʔΧ • ड͚औͬͨॲཧϦΫΤετ Λ࣮ߦͯ݁͠ՌΛฦ͢
!25
ॲཧϦΫΤετ • ϦΫΤετ $ parallel sqlite3 ::: target files :::
“select count(*) from xxx where logs match ‘keyword’;” | awk ‘{m+=$1} END{print m;}’ ੨ࣈ : GNU ParallelͷίϚϯυΛ֤ॲཧϗετૹΓࠐΉ ࣈ : ΫϥΠΞϯτϗετͰ·ͱΊ͋͛Δ !26
΄΅ຊͳٙࣅίʔυ • ΫϥΠΞϯτ • Worker ࣮ߦίϚϯυ ίϚϯυΛ ϫʔΧૹ৴ ίϚϯυΛ࣮ߦ ݁ՌΛΫϥΠΞϯτૹ৴
݁ՌΛड͚λʔϛφϧදࣔ !27
ධՁ !28
࣮ݧڥ • Amazon Web Service (AWS) • EC2Πϯελϯε : c4.4xlarge
• vCPU : Xeon E5-2666 v3 @ 2.90GHz x 16 cores • ϝϞϦ : 30GB • σΟεΫʢEBSʣ : SSD 8GB (OS) + SSD 50GB (Data) • OS : Ubuntu 16.04.3 LTS (Xenial Xerus) !29
ࢄݕࡧ • ݕࡧͷ݅ • 1ͷσʔλʹରͯ͠100ճϦΫΤετΛ࣮ߦ͢Δ • 1ͷσʔλϑΝΠϧ60ʢ60ϑΝΠϧʣ x 24࣌ؒ =
1,440ϑΝΠϧ • 1ϑΝΠϧ͋ͨΓͷϨίʔυ10ສ݅ʢ1,440 x 10ສʹ1ԯ4400ສϨίʔυʣ • 100ճͷϦΫΤετͰ144ԯϨίʔυ͕ରͱͳΔ • ࣮ߦ͢ΔSQLจҎԼͰશจݕࡧͱΧϯτ • select count(*) from syslog where logs match ‘keyword’; !30
ࢄݕࡧʢϗετεέʔϧΞτʣ • ϗετΛ1͔Β10૿Ճͤ͞Δ • 1Ͱ249ඵ͔Β10Ͱ39ඵ·Ͱॖʢ10ճࢼߦฏۉʣ
ࢄݕࡧʢϗετεέʔϧΞτʣ • ϗετΛ1͔Β10૿Ճͤ͞Δ • 1Ͱ249ඵ͔Β10Ͱ39ඵ·Ͱॖʢ10ճࢼߦฏۉʣ Ϋϥυڥෆ҆ఆ ʢϕετΤϑΥʔτʣ
ࢄݕࡧʢWorkerεέʔϧΞτʣ • ϗετ10ɺ͔ͭ1͋ͨΓͷϫʔΧΛ1͔Β16·Ͱ૿Ճͤ͞Δ • 1ϗετ1 worker 249ඵ͔Β10ϗετ10 workerͰ6.8ඵ·Ͱॖ ͜ͷลΓ͕࠷ *0ڝ߹͕ى͖Δ͔Β͔
͔ΘΒͣ
݁Ռͷ·ͱΊ • ॲཧੑೳ • ϗετ10ͷ߹ : ϗετ1ͷ10ഒૣ͘ͳΔʢ249ඵ -> 39ඵʣ •
ϗετ10ͰϫʔΧΛ૿Ճ : ૯ϫʔΧ10ʙ160Ͱ 249ඵ -> 6.8ඵ • ϨίʔυΛϑϧεΩϟϯˍશจݕࡧͨ݁͠Ռ • 144ԯϨίʔυ͔ΒඞཁͳσʔλΛൈ͖ग़͢ͷʹ6.8ඵ·ͰߴԽ • 10ͷϗετͰ36ഒͷߴԽΛ࣮ݱ !34
Amazon Elastic MapReduceͱͷൺֱ • Amazon EMR : ΠϯελϯεHayabusaͱಉ͡c4.4xlarge • ߏ1Ϛελʔϊʔυ
+ 10 ίΞϊʔυ • σʔλͷΞΫηε • EMR͔ΒAmazon S3μΠϨΫτʹ ΞΫηε • શจݕࡧͷํ๏ • ϚελʔϊʔυͷPySpark͔Βߦ͏ JNQPSUUJNF GSPNQZTQBSLTRMJNQPSU42-$POUFYU TRM$POUFYU42-$POUFYU TD MJOFTTDUFYU'JMF TBCFXPSLTTECFODINBSLMPH pMFTLL MPH MJOFTDBDIF GPSJJOSBOHF TUBSUUJNFUJNF <MJOFTpMUFS MBNCEBTOPDJO T DPVOU GPSJJOSBOHF >FMBQTFE@UJNFUJNFUJNF TUBSUQSJOUFMBQTFE@UJNF 1Z4QBSLͰ࣮ߦ͢Δίʔυ
Amazon Elastic Mapreduceͱͷൺֱ • ࣮ߦ݁Ռ • 10ͷߏͰ17ഒHayabusaͷํ͕ߴʹಈ࡞
ߟ !37
ݕࡧͷεέʔϧΞτ • 144ԯ͔ΒඞཁͳσʔλΛൈ͖ग़͢ͷʹ6.8ඵ·ͰߴԽ • 2લͷBigQueryͷϑϧεΩϟϯ͕120ԯϨίʔυͰ5ඵ • 10ͷϗετͰ36ഒͷߴԽΛ࣮ݱ • BigQueryԿඦɺԿઍͷϗετ͕ಉ࣌ʹಈ͍͍ͯΔ͔ෆ໌ •
Amazon Elastic MapReduceͱͷൺֱ • 10ͷߏͰ17ഒHayabusaͷํ͕ߴʹશจݕࡧՄೳ • γεςϜͷίετΛߟ͑ͨ߹ • ϦʔζφϒϧͰߴੑೳͳࢄݕࡧॲཧ͕࣮ݱͰ͖ͨ !38
ੵͷฒྻԽ • syslogͷෳͷ • େྔͷσʔλʢύέοτʣͷෳͰଳҬΛѹഭ͢Δ • ຊདྷͰ͋ΕHDFSͷΑ͏ʹࢄϑΝΠϧγεςϜΛ͏͖ • ϝλσʔλػߏΛܦ༝ͯ͠σʔλʹΞΫηε͢ΔͨΊຊ࣭తʹ͘ͳΔ •
ࢄϑΝΠϧγεςϜͱ͍ͯ͠ʢҰͭͷݚڀʣ • γϯϓϧ͞ͷٻͷ݁Ռ • อ࣋σʔλ͕ػثͷނোͰফࣦͨ͠ͱͯ͠ෳ͕ΔɾނোػΛ֎͚ͩ͢ • ࢄϑΝΠϧγεςϜͷΑ͏ʹ࠶ஔॲཧ͕ෆཁ !39
γϯϓϧͳઃܭʹΑΔӡ༻ͷ؆ུԽ • ࢄݕࡧ • Procedure / ConsumerϞσϧͰ࣮ݱ • ϓϩηε࣮ߦεέδϡʔϥGNU Parallelʹґଘ
• ෳࡶͳࢄγεςϜΛΘͳ͍ར • τϥϒϧѲͷߴԽ • γεςϜӡ༻ෛՙͷܰݮ !40
ߴԽͷ؊ • ׂΓΓઃܭ • ϦτϥΠॲཧ/Τϥʔॲཧະ࣮ • εέδϡʔϥ • ZeroMQͱGnu Parallelʹ͓ͤ
• ετϨʔδ • ࢄอଘͤͣ͞ʹෳΛอ࣋
ϋʔυΣΞʹґଘ͢Δ • CPU Core • ૣ͚Εૣ͍΄Ͳྑ͍ • CoreͷΑΓΫϩοΫ͕ͦͦ͜͜ૣ͍ํ͕͕ग़Δ͜ͱ͋Δ • σΟεΫ
• SSDNVMeʢͦΓΌૣ͍ʹܾ·͍ͬͯΔʣ • I/OੑೳΛҾ͖ग़͢
ଞͷγεςϜͱͷൺֱ • શจݕࡧͰApache Sparkͱൺֱͨ͠ • Elasticsearchͱͷൺֱʁ • Ͳ͏ͬͯൺΔʁ • ΤϯδϯͷʁʢElasticsearchͱͯૣ͍ʣ
• ݺͼग़͠APIͷՃຯ͢ΔʁʢREST APIݺͼग़͠ͱ͍ͯʣ • Write & Read • ॻ͖ͳ͕ΒಡΈࠐΜͩ߹ʁ
·ͱΊͱࠓޙͷ՝ !44
·ͱΊ • HayabusaͷࢄγεςϜԽͷઃܭͱ࣮ • 144ԯϨίʔυͷsyslogϑϧεΩϟϯˍશจݕࡧΛ6.8ඵͰ࣮ݱ • ϚϧνϕϯμػثΛରͱͨ͠ɺେྔͷෆἧ͍ͳϩάΛߴʹݕࡧՄೳ • τϥϒϧγϡʔτɾΠϯγσϯτϨεϙϯεΛஶ͘͠ॖ͢ΔՄೳੑ •
γϯϓϧͳࢄॲཧߏʹΑΔཧͷ༰қੑ !45
ࠓޙͷ՝ • ଞͷιϑτΣΞͱͷൺֱʢBigQuery, ElasticSearch, Splunkʣ • HayabusaͱଞͷΞϓϦέʔγϣϯͱͷ༥߹ʢΞϊϚϦݕͳͲʣ • Hayabusaͱ౷ܭॲཧϥΠϒϥϦػցֶशϥΠϒϥϦͱͷ݁߹ •
ࢄϑΝΠϧγεςϜɾࢄετϨʔδͷ࣮ !46
ँࣙ • ຊݚڀͷҰ෦ɺࠃཱݚڀ։ൃ๏ਓՊֶٕज़ৼڵػߏʢJSTʣͷݚڀՌ ൃలࣄۀʮઓུతݚڀਪਐࣄۀʢCRESTʣJPMJCR1783ʯͷࢧԉʹ ΑͬͯߦΘΕͨ
None