Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Introduction to Data Science for PHP Users
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Sotaro Karasawa
September 14, 2013
Technology
15k
5
Share
Introduction to Data Science for PHP Users
PHPカンファレンス2013「PHPerのためのデータサイエンス入門」 #phpcon2013
Sotaro Karasawa
September 14, 2013
More Decks by Sotaro Karasawa
See All by Sotaro Karasawa
「事業目線」の正体 〜3つのフェーズのCTO経験から見えてきた、EMが持つべき視点 @ EMConf JP 2026
sotarok
8
6.7k
大「個人開発サービス」時代に僕たちはどう生きるか
sotarok
22
13k
P2B Haus法人サポータープランのご提案
sotarok
2
1.7k
ソフトウェアxスタートアップから見た飲食と配送の世界 / The World of Food Deliverlies and Restaurant Businesses from a Software and Startup Perspective
sotarok
2
1.3k
CTO 3度目の正直 / My 3rd CTO Career
sotarok
21
11k
Introduction to the Corporate Solutions Engineering at MTC2018
sotarok
1
36k
Mercari meetup for Corporate Engineering #1 / What is "Corporate Engineering"?
sotarok
2
2.5k
Markdown and WYSIWYG
sotarok
1
6.4k
20 Jan 2017 / Moving Beyond Borders - Mercari DAY
sotarok
8
16k
Other Decks in Technology
See All in Technology
知ってた?JavaScriptの"正しさ"を検証するテストが5万以上もあること(Test262)
riyaamemiya
1
170
ESP32 IoTを動かしながらメモリ使用量を観測してみた話
zozotech
PRO
0
100
会社説明資料|株式会社ギークプラス ソフトウェア事業部
geekplus_tech
0
210
Vision Banana: Image Generators are Generalist Vision Learners
kzykmyzw
0
340
PdM・Eng・QAで進めるAI駆動開発の現在地/aidd-with-pdm-eng-qa
shota_kusaba
0
160
Digital Independence: Why, When and How
wannesrams
0
310
AIエージェントの支払い基盤 AgentCore Payments概要
kmiya84377
1
150
ボトムアップの改善の火を灯し続けろ!〜支援現場で学んだ、消えないための3つの打ち手〜 / 20260509 Kazuki Mori
shift_evolve
PRO
2
610
AI飲み会幹事エージェントを作っただけなのに
ykimi
0
110
ServiceによるKubernetes通信制御ーClusterIPを例に
miku01
1
160
オライリーイベント登壇資料「鉄リサイクル・産廃業界におけるAI技術実応用のカタチ」
takarasawa_
0
370
Building a Study Buddy AI Agent from Scratch: From Passive Chatbots to Autonomous Systems
itchimonji
0
150
Featured
See All Featured
Getting science done with accelerated Python computing platforms
jacobtomlinson
2
190
Navigating Weather and Climate Data
rabernat
0
190
Reflections from 52 weeks, 52 projects
jeffersonlam
356
21k
Tell your own story through comics
letsgokoyo
1
920
End of SEO as We Know It (SMX Advanced Version)
ipullrank
3
4.2k
We Have a Design System, Now What?
morganepeng
55
8.1k
Believing is Seeing
oripsolob
1
120
XXLCSS - How to scale CSS and keep your sanity
sugarenia
250
1.3M
Unlocking the hidden potential of vector embeddings in international SEO
frankvandijk
0
790
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
A Tale of Four Properties
chriscoyier
163
24k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.6k
Transcript
Crocos, Inc. Sotaro Karasawa @sotarok http://facebook.com/sotarok 1)1FSͷͨΊͷ σʔλαΠΤϯεೖ QIQDPO 1)1ΧϯϑΝϨϯε
ࣗݾհ 4PUBSP,BSBTBXB!TPUBSPL ฑ૱ଠ EIBUFOBOFKQTPUBSPL גࣜձࣾΫϩίε$SPDPT*OD 1)1 (JU 5% 3FE#VMM
ύʔϑΣΫτ1)1 ٕज़ධࣾ વΈͳ͞Μ࣋ͬͯ·͢ΑͶʂʁ ˡ
σʔλαΠΤϯε
ৄ͍͜͠ͱ σʔλαΠΤϯςΟετ ཆಡຊ ٕज़ධࣾ IUUQXXXBNB[PODPKQEQ
σʔλαΠΤϯε ۀཧղ σʔλཧղ σʔλநग़ σʔλՃ ϞσϦϯά ޮՌݕূ αʔϏε࣮ Ҿ༻σʔλαΠΤϯςΟετཆಡຊ 1ୈষσʔλαΠΤϯεͷϓϩηε
σʔλαΠΤϯε ੵ͞ΕͨσʔλΛੳɾϞσϦϯάͯ͠ ϏδωεΛߦ͢ΔͨΊʹॏཁͳ ࢦඪΛಘΔ Λ܁Γฦ͢
σʔλαΠΤϯε ੵ͞ΕͨσʔλΛੳɾϞσϦϯάͯ͠ ϏδωεΛߦ͢ΔͨΊʹॏཁͳ ࢦඪΛಘΔ Λ܁Γฦ͢ Βͳ͚Ε͍͚ͳ͍͜ͱ͕ଟ͍ ࣝͷྖҬɾ෯͕͍
࠷ݶͷͱ͜Ζ͔Β खܰʹ࢝ΊΒΕΔͱ͜Ζ͔Β ࠷ॳͷาΛ;Έͩͦ͏
σʔλαΠΤϯε ۀཧղ σʔλཧղ σʔλநग़ σʔλՃ ϞσϦϯά ޮՌݕূ αʔϏε࣮ Ҿ༻σʔλαΠΤϯςΟετཆಡຊ 1ୈষσʔλαΠΤϯεͷϓϩηε
1)1FS 8FCΞϓϦέʔγϣϯʹͱͬͯ σʔλͱԿ͔
1)1FS 8FCΞϓϦέʔγϣϯʹͱͬͯ σʔλͱԿ͔ σʔλϕʔε ϩά
ࠓճϩάͷ͓
େྔͷΞϓϦέʔγϣϯϩάΛ ͍͔ʹऩू͠ ͲͷΑ͏ʹूܭ͢Δ͔
ͦΕΛ౿·͑ͯ ࠓͷΞδΣϯμ ϩάऩूͱੳͷΈ 1)1ΞϓϦέʔγϣϯͷϩάऩू ੳ
ϩάͷऩूͱੳͷΈ
Έͷਚ͖ͳ͍ ϩάͷऩूͱੳ େྔͷσʔλ Ͳ͏ूΊΔ Ͳ͜ʹஷΊΔ Ͳ͏औΓग़͢ Ͳ͏ूܭ͢Δ
Έͷਚ͖ͳ͍ ϩάͷऩूͱੳ େྔͷσʔλ Ͳ͏ूΊΔ Ͳ͜ʹஷΊΔ Ͳ͏औΓग़͢ Ͳ͏ूܭ͢Δ ωοτϫʔΫଳҬ σΟεΫ༰ྔ Ϗοάσʔλॲཧܥ
ॲཧ࣌ؒ
IUUQXXXUSFBTVSFEBUBDPN
TD Web Server Web Server fluentd S3 Hadoop Client Hive
MySQL etc... Result
TD Web Server Web Server fluentd S3 Hadoop Client Hive
MySQL etc... Result ͋ͬͪଆʹσʔλ͕ஷ·ΓɺΫΤ ϦΛ͛Δͱ͋ͬͪͰ)BEPPQ ͕ىಈͯ݁͠ՌΛฦͯ͘͠ΕΔ
ϩάੳΛਐΊΔʹ͋ͨΓ հͳɺσʔλͷऩूɾੵɾσʔλॲཧ ɹˠ5%͕ͬͯ͘ΕΔ ຊ࣭తͳۀ ɾͲͷΑ͏ͳσʔλ ɾͲͷΑ͏ʹूܭ ͷઃܭɾ࣮ʹίϛοτͰ͖Δʂ
$SPDPTʹ͓͚Δϩάͷ׆༻ wΞϓϦέʔγϣϯϩά w'BDFCPPLͷଐੑใʹجͮ͘ੳ wओཁͳΞΫγϣϯͷ࣮ߦ࣮ߦ࣌ؒ wτϥϯβΫγϣϯɾଐੑผɾܦ࿏ผ wΠϕϯτϩά wιʔγϟϧͷγΣΞ w.PEBMͷ։ดFUD wͦͷଞΖΖ
1)1ΞϓϦέʔγϣϯͷ ϩάऩू
ͲΜͳΞϓϦέʔγϣϯϩά جຊతͳϩάઃܭ
ͲΜͳϩάΛूΊͯΔʁ
8FCαʔόͷϩά
ϩάͱ͍͑ 8FCαʔόʔͷϩά 5SFBTVSF%BUBͷνϡʔτϦ Ξϧ"QBDIFͷϩά http://docs.treasure-data.com/articles/quickstart
͚ͩͲຊʹཉ͍͠ͷ
ͲΜͳϢʔβʔ͕ʁ ͲΜͳͰʁͲ͔͜Βʁ ͍ͭԿΛͨ͠ͷ͔ʁ ͲΜͳϘλϯΛΫϦοΫͨ͠ ͷ͔ʁλοϓͨ͠ͷ͔ʁ
ΞϓϦέʔγϣϯϩά
ͲΜͳϢʔβʔ͕ʁ ɹˠϢʔβʔొใ ͲΜͳͰʁͲ͔͜Βʁ ɹˠ6"(&0 ͍ͭԿΛͨ͠ͷ͔ʁ ɹˠ63*ΞΫγϣϯ
ΞϓϦέʔγϣϯϩάΛ Ͳ͏ूΊΔ͔
ͦͷલʹ ܰ͘εΩʔϚϨεϩάʹ͍ͭͯ
εΩʔϚϨεϩάͱʁ εΩʔϚͷແ͍ϩά
ϩάͷεΩʔϚ ͜Ε·Ͱ ˠྫ͑547
ΧϥϜUJNF ΧϥϜTUBUVT ΧϥϜVSJ ΧϥϜVTFS@JE IPHF εΩʔϚ
foreach (file('app.log') as $line) { $column = explode("\t", trim($line)); $time
= $column[0]; $status = $column[1]; ... } ˞࣮ࡍʹ1)1ͳΜ͔ͰͬͯΒΕͳ͍ͷͰTFEBXLͰ
߲ͷΘ͔ΓͮΒ͞ εΩʔϚมߋͷ͠͞ ੳऀͱऩूऀͷೝࣝࠩҟʹ ΑΔࣄނ
5%ͷϩά ͱ͍͏͔qVFOUE +40/ { "time":1373876885, "status":200, "uri":"/52495/facebook", "session_id":"kn6avn2fuh21r25a65mgm3rjh3", "fb_id":"7c40c5dd2e55cde37a8c40ed80e1", ...
}
ϩάͷ1045
qVFOUQIQMPHHFS use Fluent\Logger\FluentLogger; $logger = new FluentLogger("localhost","24224"); $logger->post( "debug.test", array("hello"=>"world")
); IUUQTHJUIVCDPNqVFOUqVFOUMPHHFSQIQ
جຊతͳϩάઃܭ
ΞΫηεϨίʔυͱͳΔΑ ͏ʹه͢Δ
Ϩεϙϯεʹͻ͔͚ͬΔ ϑϨʔϜϫʔΫʹ͍͍ͩͨ ϨεϙϯεΠϕϯτͷϑοΫϙΠϯτ͋ΔΑͶʁ 4ZNGPOZͳΒ PO,FSOFM3FTQPOTF
tags: - { name: kernel.event_listener, event: kernel.response } public function
onKernelResponse(FilterResponseEvent $event) { $request = $event->getRequest(); $response = $event->getResponse(); // ͳΜ͔ྻͭͬͯ͘ $data = $this->onAccess($request, $response); // log data $this->logger->post("access",$data); } ˞࣮ࡍʹͬͱෳͷ-JTUFOFS-PHHFS͕ొͰ͖ΔΑ͏ʹͯ͋͠Γ·͕͢
جຊతͳεΩʔϚΛܾΊΔ
εΩʔϚϨεͱ͍ͬͯ Ͳ͏͍͏ϩάΛѻ͍ͬͯΔͷ͔ ֤ϨίʔυͰҙຯ͕ҧͬͯҙ ຯ͕ແ͍
جຊతͳεΩʔϚΛܾΊΔ UJNF TUBUVT VSJ VB SFGFSSFS LTSVͬΆ໊͍લʹ߹Θͤͯ ͓͘ͱΘ͔Γ͍͔͢
8FCαʔόʹ͋Δϩά ͚ͩͰͳ͘ BQQ SPVUF DPOUSPMMFS QSPDFTT@UJNF EFWJDF ϑϨʔϜϫʔΫͰͷ ϧʔςΟϯά໊ͱ͔ɺ
ίϯτϩʔϥ໊ͱ͔ (uri ʹϊΠζ͕͋ͬͯ routing ໊ͰूܭͰ͖Δ)
ΞϓϦέʔγϣϯͷΓ͏Δ ଐੑΛඇਖ਼نԽͯ͠Ϩίʔυ ʹؚΊΔ
ඇਖ਼نԽ͞ΕͨϨίʔυ TFTTJPO@JE VTFS@JE HFOEFS BHF EFWJDF
ͳͥඇਖ਼نԽ͔ͷϝϦοτ +0*/ͤͣʹूܭؔʹ͔ΔͨΊ )BEPPQͰ+0*/Ͱ͖Δ͕ɺ ͜͏͓ͯ͘͠ͱఔ͕ݮΔ͔Β ͍ˍγϯϓϧ
ͪͳΈʹ VTFS@JE TFTTJPO@JE ͳͲIBTIԽ͓ͯ͘͠ͱྑ͍ ˞ສҰͷͱ͖ͷϓϥΠόγʔʹ ྀ
·ͱΊΔͱ ΞΫηεϨίʔυͱͳΔΑ͏ ʹه͢Δ جຊతͳεΩʔϚΛܾΊΔ ΞϓϦέʔγϣϯͷΓ͏Δଐ ੑΛඇਖ਼نԽͯ͠ϨίʔυʹؚΊΔ
͜͜·ͰདྷΔͱɺ͏ੳ͕Մೳ
ੳͷྫ SELECT AVG(v['process_time']) FROM access WHERE v['route'] = 'crocos_index'
ੳͷྫ SELECT v['gender'], COUNT(*) FROM access GROUP BY v['gender'] ඇਖ਼نԽ͓͍ͯ͠
ͯΑ͔ͬͨʂ
ੳͷྫ Τϥʔͷௐࠪʹ SELECT v['route'], v['status'], v['ua'] FROM access WHERE v['user_id']
= 'xxx'
˞͘ͳΔͷͰؔ࿈ͷॲཧলུͯ͠·͢ ɹຊผʹ(3061#:ͨ͠Γ8&)&3۟ͰߜͬͨΓ
εΩʔϚϨεϩάͷ׆༻ྫ τϥϯβΫγϣϯ
ͯ͞ جຊతͳεΩʔϚΛ࣋ͭ ϩά͕ͨ·Γ࢝Ί·ͨ͠
ಛผͳҙຯΛ࣋ͭ ΞΫγϣϯͷޭͳͲΛ ه͍ͨ͠
τϥϯβΫγϣϯ uri route: ϦΫΤετ͕དྷͨ͜ͱΘ͔Δ ͔͠͠ɺຊʹޭ͔ͨ͠ɺ ΞϓϦέʔγϣϯͰ͔͠Θ͔Β ͳ͍
εΩʔϚϨεͷग़൪
جຊతͳεΩʔϚ ՃͷεΩʔϚ UJNF TUBUVT VSJ VB SFGFSSFS ͳΜͪΌΒ ͔ΜͪΌΒ
ಛఆͷϨίʔυʹɺಛผ ͳҙຯΛͨͤΔ͜ͱ͕Ͱ ͖Δʂ ͔͠ଞͷϨίʔυʹӨڹ Λ͋ͨ͑Δ͜ͱͳ͘ɻ
τϥϯβΫγϣϯ key_action key_attr_*
τϥϯβΫγϣϯ key_action shop:buy:completed ΞϓϦ:ಈ࡞:ঢ়گ ※͜ͷྫʮߪೖྃʯ
τϥϯβΫγϣϯ key_attr_* τϥϯβΫγϣϯʹؔΘΔՃ తͳใΛͭͬ͜Ή εΩʔϚɺkey_action ͝ͱʹ ҟͳΔ
τϥϯβΫγϣϯྫ key_action = shop:buy:completed key_attr_item_id = xxxxx key_attr_ref = fb_share
τϥϯβΫγϣϯੳͷྫ SELECT item_id, ref, COUNT(*) FROM access WHERE key_action =
'shop:buy:completed' GROUP BY item_id, ref ˞จࣈͷ্ؔW<>ল͍ͯΔ
τϥϯβΫγϣϯੳ ׆༻ྫ: ࢪࡦผʹΞΫηεݩΛه τϥϯβΫγϣϯޭ͔Β ࠷ޮՌతͳࢪࡦΛݟ͚ͭΔ
/&9545&1
ूܭ݁Ռ͔Β ɾ౷ܭతղੳख๏ ɾϞσϦϯά Ϗδωεʹରͯ͠ΫϦςΟΧϧͳࢦඪ ͷࢉग़ͱվળϓϩηεͷཱ֬
·ͱΊ
ϩάΛूΊͨΓੳͨ͠Γ͢Δͷେม ɹ→ Fluentd Hadoop ͏ ɹ→ Treasure Data ͏
Ͳ͏͍͏ϩάΛूΊΕ͍͍ͷ͔ ɹ→ 1ΞΫηε1Ϩίʔυඇਖ਼نԽϩά ɹ→ ϩάϑΥʔϚοτࣗମͷઃܭ ɹ→ εΩʔϚϨεͷ׆༻
࠷ޙʹ 8FBSFIJSJOH ύʔϑΣΫτ1)1ஶऀਓ ݩ1)1ΧϯϑΝϨϯεҕһਓ ݩඇϞςਓ ݩυϥ່ਓ ͱಇ͚Δͷ$SPDPT͚ͩ
None