Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GCPではじめるスモールスタートなデータ活用
Search
Takashi Nishibayashi
September 06, 2016
Technology
3
2.9k
GCPではじめるスモールスタートなデータ活用
2016-09-06
bq_sushi #4での発表資料です
Takashi Nishibayashi
September 06, 2016
Tweet
Share
More Decks by Takashi Nishibayashi
See All by Takashi Nishibayashi
診断前の病歴テキストを対象としたLLMによるエンティティリンキング精度検証
hagino3000
1
96
論文紹介 Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models
hagino3000
0
840
論文紹介 Audience Size Forecasting Fast and Smart Budget Planning for Media Buyers
hagino3000
0
240
論文紹介 Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems
hagino3000
1
620
論文紹介 Budget Management Strategies in Repeated Auctions (公開版)
hagino3000
2
280
論文紹介 A Request-level Guaranteed Delivery Advertising Planning: Forecasting and Allocation
hagino3000
1
120
論文紹介 Online Experimentation with Surrogate Metrics Guidelines and a Case Study
hagino3000
0
320
論文紹介 Bidding Machine: Learning to Bid for Directly Optimizing Profits in Display Advertising
hagino3000
1
200
論文紹介 Balancing Relevance and Discovery to Inspire Customers in the IKEA App
hagino3000
0
730
Other Decks in Technology
See All in Technology
United Airlines Customer Service– Call 1-833-341-3142 Now!
airhelp
0
160
AWS認定を取る中で感じたこと
siromi
1
170
Connect 100+を支える技術
kanyamaguc
0
190
マネジメントって難しい、けどおもしろい / Management is tough, but fun! #em_findy
ar_tama
5
730
生成AI時代 文字コードを学ぶ意義を見出せるか?
hrsued
1
800
低レイヤを知りたいPHPerのためのCコンパイラ作成入門 完全版 / Building a C Compiler for PHPers Who Want to Dive into Low-Level Programming - Expanded
tomzoh
4
3.4k
赤煉瓦倉庫勉強会「Databricksを選んだ理由と、絶賛真っ只中のデータ基盤移行体験記」
ivry_presentationmaterials
2
300
CI/CD/IaC 久々に0から環境を作ったらこうなりました
kaz29
1
230
Lazy application authentication with Tailscale
bluehatbrit
0
160
Witchcraft for Memory
pocke
1
750
LangChain Interrupt & LangChain Ambassadors meetingレポート
os1ma
2
280
Lambda Web Adapterについて自分なりに理解してみた
smt7174
6
160
Featured
See All Featured
Building Better People: How to give real-time feedback that sticks.
wjessup
367
19k
What’s in a name? Adding method to the madness
productmarketing
PRO
23
3.5k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
29
1.8k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.4k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
233
17k
Adopting Sorbet at Scale
ufuk
77
9.4k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
126
52k
Designing for humans not robots
tammielis
253
25k
How to Think Like a Performance Engineer
csswizardry
24
1.7k
Build The Right Thing And Hit Your Dates
maggiecrowley
36
2.8k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
281
13k
Transcript
1 GCPͰ͡ΊΔ εϞʔϧελʔτͳσʔλ׆༻ #bq_sushi ver. bq_sushi #4 2016-09-06 Takashi Nishibayashi
2 Takashi Nishibayashi Software Engineer Zucks AdNetwork, Zucks Inc. Data
analysis team ݱࡏ৴ޮͷ࠷దԽ ೖࡳՁ֨ࣗಈௐϩδοΫɺ৴αʔ όʔͷࠂબϩδοΫͷ։ൃʹैࣄ @hagino3000
3 3 ͜ΕԿ͔ ಉͷGCP NEXT TOKYOͷࣄྫհηογϣ ϯͰൃදͨ͠༰ͷॖখ൛Ͱ͢
4 4 Zucks AdNetwork ͷσʔλ׆༻ͷมભ
5 5 5 ϓϩδΣΫτ։࢝࣌ͷཧͱݱ࣮
6 6 6 ࢦ͢ॴ(Ծ) ࠂ৴αʔόʔͰΠϯϓϨογϣϯຖʹػցֶशϞσϧʹΑΔίϯ όʔδϣϯ༧ଌɺΫϦοΫ༧ଌΛߦͳ͍৴ޮΛΞοϓ ݱ࣮ େྔͷϩάϑΝΠϧ͕༷ʑͳϑΥʔϚοτͰAWS S3ʹஔ͔Ε͍ͯΔ ϚελσʔλMySQLʹ֨ೲ͞Ε͍ͯΔ
Elastic SearchʹೖͬͯΔͷۙ2िؒ
7 7 7
8 8 8 ͍͖ͳΓ౸ୡͰ͖ͳ͍
9 1ظ: ·ͣσʔλαΠΤϯςΟετ͕ར༻Ͱ͖ΔΑ͏ʹ ü ωοτࠂۀքͰػցֶश͕ྲྀߦ͍ͬͯΔͱ͍͑ɺࣗαʔϏεͷ σʔλͰͦΕ͕Մೳͳͷ͔ݕূ͍ͨ͠ ü ࣮ݧԾઆݕূͷͨΊʹਓ͕ؒσʔλΛखܰʹར༻͍ͨ͠ ü ݶΒΕͨਓ͕ؒΫΤϦूܭΛ࣮ߦͰ͖Εྑ͍
ü ඦϛϦඵͷԠੑೳٻΊͳ͍ ü σʔλετΞͷཧʹख͕͔͔ؒΒͳ͍ࣄ͕ॏཁ ü σʔλྔ 600GByte/day ఔ͕ͩɺ·ͩ·ͩ૿͑ͦ͏
10 1ظ: ·ͣσʔλαΠΤϯςΟετ͕ར༻Ͱ͖ΔΑ͏ʹ ² ࠂͷ৴ϩάΛBigQueryʹྲྀ͠ࠐΜͩ ² MySQLͷϚελσʔλBigQueryʹಉظ ² WebUIPandasɺBigQuery Pythonܦ༝Ͱར༻
² BigQueryͰαϒαϯϓϦϯάͯ͠ϩʔΧϧϚγϯͰֶश ² AWS EMRୀ ² Elastic Searchୀ ² Cloud Datalab betaʹඈͼ͍ͭͯരࢮ (20161݄)
11 2ظ: όονॲཧ͔Βར༻Ͱ͖ΔΑ͏ʹ ü ܧଓతʹճ͍࣮ͨ͠ݧɺ༧ଌॲཧͷόονΛcronͰΒ͍ͤͨ ü ੳλεΫʹݶΒͣɺ৴γεςϜଆͷόονॲཧ͍͍ͨ ü ػೳຖͷ༻ঢ়گ(ΫΤϦίετ)Ѳ͍ͨ͠
12 2ظ: όονॲཧ͔Βར༻Ͱ͖ΔΑ͏ʹ ² CloudLoggingͷઃఆͰBigQueryͷࠪϩάΛBigQueryʹΤΫεϙʔτ ² ػೳຖʹαʔϏεΞΧϯτΛ͍ग़ͯ͠ɺ༻ঢ়گΛѲ ² ίετ͕ͶͨΒ௨ ²
ೖࡳ୯Ձࣗಈௐόονɺෆਖ਼ΫϦοΫఆόον͕Քಈ ² ϧʔϧϕʔεɺҟৗݕϕʔεͷࣝผλεΫSQLͰॻ͚Δ ² ࣮ݧ݁ՌCloud Storage/BigQueryʹอଘ
13
14
15 Audit Logͷ༻్ ² ػೳຖͷΫΤϦίετ ² ຖͷΫΤϦίετ ² ςετ༻ͷςʔϒϧ࡞ऀௐࠪ ²
ΘΕ͍ͯͳ͍ςʔϒϧௐࠪ
16 3ظ: ͯ͢ͷ৬छͷϝϯόʔ͕σʔλΛར༻Ͱ͖ΔΑ͏ʹ ü ఆܕͷௐࠪλεΫΤϯδχΞ๊͕͑ͨ͘ͳ͍ ü ίετ͕രൃ͠ͳ͍Α͏ʹར༻ऀΛ૿͍ͨ͠ ü SQLॻ͚Δਓ͕૿͑Δͱྑ͍ײ͡ʹͳΔͷͰ
17 3ظ: ͯ͢ͷ৬छͷϝϯόʔ͕σʔλΛར༻Ͱ͖ΔΑ͏ʹ ² re:dashͰΫΤϦͰ͖ΔΑ͏ʹͨ͠ ² ΤϯδχΞ͕ཁΛݩʹςϯϓϨʔτͷΫΤϦΛ࡞ ² Ϩϙʔτը໘ͷϓϩτλΠϓʹ ²
ΫΤϦ୯ҐͷίετϦϛοτઃఆ(re:dashͷػೳ)ͰߴֹΫΤϦ࣮ߦ Λࢭ
18 ཁٻ͞ΕΔσʔλ࣭ϨϕϧมΘΔ ü Ϣʔεέʔε͕૿͑Δͱσʔλ࣭͕՝ʹ ü 23࣌ͷϩάऔΓࠐΈ͕ऴͬͨޙʹॲཧΛΒ͍ͤͨΜ͚ͩͲ? ² Stream Insert, Batch
Insert, ΫΤϦશͯϦτϥΠػߏඞਢ ² ݄ʹ1BigQueryͷௐࢠͷѱ͍͕͋Δ ² σʔλͷऔΓࠐΈ࿙ΕɺॏෳऔΓࠐΈνΣοΫͷόονΛՔಇ ² σʔλͷऔΓࠐΈঢ়گ͕֎෦͔Β֬ೝͰ͖ΔΈ
19 ෭࣍తՌ • ΤϯδχΞ͕͍ͭͰ৴ϩάͷௐ͕ࠪՄೳʹ • MySQLͰѻ͑ͳ͔ͬͨαΠζͷσʔλΛݩʹͨ͠ҙࢥܾఆ͕Մೳʹ • ༷ʑͳόονॲཧ͕σʔλΛར༻Մೳʹ • SQLΛॻ͚ͩ͘ͰϨϙʔτ͕ࣗ༝ʹ࡞Մೳʹ
• ϓϩδΣΫτͷϝϯόʔશһ͕σʔλʹΞΫηεՄೳʹ
20 ͦͷଞ • ΦϯϥΠϯͰσʔλΛࢀর͢ΔΑ͏ͳॲཧʹBigQuery͔ͳ͍ • Key-ValueͰҾ͚ΔΑ͏ʹͯ͠BigtableΛͬͨํ͕͍͍ • BigQueryͷલʹΩϟογϡϨΠϠΛ༻ҙ͢Δࣄྫ • Cloud
Dataproc or Cloud Dataflow…… • SpotifySparkෳࡶ͗ͯ͑͢ͳ͍ͱͷࣄͰDataflowΛscala͔Βར༻ • https://github.com/spotify/scio • Cloud Datalab͕৽͘͠ͳͬͨͦ͏ͳͷͰظ • Jupyter NotebookͷΫϥυ൛
21 ·ͱΊ • ͍͖ͳΓ͍͠ॴΛૂ͏ͱՌ͕ग़Δ·Ͱ͕͔͔࣌ؒΔͨΊɺͳΒ͠Λ͠ ͳ͕Βσʔλ׆༻ΛਐΊ͍ͯΔ • SQLͰهड़Ͱ͖Δϧʔϧϕʔεҟৗݕϕʔεͷॲཧػցֶशͱൺֱ͢ Δͱૣ͘Ռ͕ग़ͤΔ • Cloud
Storage, Cloud Logging, Cloud Dataprocͱͷ࿈ܞ͕ڧԽ͞Εɺ BigQueryͷϢʔεέʔε͕૿͑ͨ • ඦmsecͷԠੑೳɺಉ࣌ΫΤϦ࣮ߦɺ҆ఆੑΛٻΊͳ͚ΕBigQuery Ϧʔζφϒϧʹ͑Δ
22 ิ BigQueryͰ౷ܭྔΛग़࣌͢ʹ͏ΫΤϦϝϞ http://qiita.com/hagino3000/items/e9ed62638ebe54391188
23 23 Thank You