Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
BigQuery Schema Migration #bq_sushi
Search
Naotoshi Seo
April 08, 2016
Technology
6.2k
2
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
BigQuery Schema Migration #bq_sushi
Naotoshi Seo
April 08, 2016
More Decks by Naotoshi Seo
See All by Naotoshi Seo
ZOZOTOWNリプレイス2020
sonots
5
39k
Red Chainer and Cumo: Practical Deep Learning in Ruby at RubyKaigi 2019
sonots
1
4.8k
Introduction of Cumo, and Integration to Red Chainer
sonots
1
1.2k
Implementation of Cumo, a CUDA-aware version of Ruby/Numo
sonots
1
2.1k
Fast Numerical Computing and Deep Learning in Ruby with Cumo
sonots
0
10k
CuPy improvments around memory
sonots
3
1.8k
DeNA AIシステム部におけるクラウドを活用した機械学習基盤の構築
sonots
4
6.4k
Triglav - Data Driven Workflow Tool
sonots
1
4.3k
DeNA流データエンジニアリングの極意
sonots
17
13k
Other Decks in Technology
See All in Technology
スキルと MCP ツール、責務をどう分けるか? AI が迷わないインターフェース設計の戦略
cdataj
1
1.1k
なぜ Platform Engineering の土台に Kubernetes を選ぶのか
r4ynode
2
650
AIの性能が向上しても未解決な組織の重大問題は何か?/An Unsolved Organizational Problem in the Age of AI
moriyuya
4
680
小さく始める AI 活用推進 ― 日経電子版 Web チームの事例/nikkei-tech-talk47
nikkei_engineer_recruiting
0
270
フィジカル版Github Onshapeの紹介
shiba_8ro
0
260
2026 TECHFRESH 畢業分享會 - AI-Native 重塑軟體工程與虛擬講師
line_developers_tw
PRO
0
1.1k
On-behalf-of Token exchange with AgentCore Identity
hironobuiga
2
220
気軽に使える"情報のハブ"としてのNotion活用 〜フロー情報の集積点 と、 Claude Code × Notion AI〜
syucream
1
140
2026TECHFRESH畢業分享會 - Lightning Talk - E起 See See : 電商推薦讀心術? 數據說了算
line_developers_tw
PRO
0
1.1k
2026TECHFRESH畢業分享會 - 葬送的通靈師:化系統與用戶雜訊成行動訊號
line_developers_tw
PRO
0
1.1k
やさしいA2A入門
minorun365
PRO
12
1.9k
Agent Skills設計で柔軟性と硬さのバランスが難しい話
nassy20
0
130
Featured
See All Featured
The AI Search Optimization Roadmap by Aleyda Solis
aleyda
1
5.9k
SEOcharity - Dark patterns in SEO and UX: How to avoid them and build a more ethical web
sarafernandez
0
200
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
6k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
2k
Are puppies a ranking factor?
jonoalderson
1
3.6k
The browser strikes back
jonoalderson
0
1.2k
Reality Check: Gamification 10 Years Later
codingconduct
0
2.2k
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
Building Applications with DynamoDB
mza
96
7.1k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
287
14k
What the history of the web can teach us about the future of AI
inesmontani
PRO
1
610
Tell your own story through comics
letsgokoyo
1
950
Transcript
#JH2VFSZͷςʔϒϧΛ .JHSBUF ΧϥϜՃɺআɺ ܕมߋ ͢Δ 2016/04/08 @sonots #bq_sushi 3
ࣗݾհ • ඌར @sonots • DeNA ੳج൫ • Fluentd ίϛολ
• Ruby ίϛολ • ࠷ۙ embulk ۀ • embulk-output-bigquery • embulk-filter-column, etc
• 4݄23ൃചʂ • σʔλऩूಛू • Fluentd / Embulk • DeNA
/ Cookpad ͷࣄྫ
ΞδΣϯμ • ฐࣾͰͷ BigQuery ར༻ • εΩʔϚมߋͷඞཁੑ • BigQuery ʹ͓͚ΔεΩʔϚมߋͷࠔ͞
• εΩʔϚมߋͷઓུ
ฐࣾͰͷ BigQuery ར༻ • West (US) Ͱ̍Ҏ্લ͔Βར༻ • JP ͰϘνϘν͍࢝Ί͍ͯΔ
• σʔλҠߦπʔϧ࡞ͬͯΔ • hdfs2bigquery • vertica2bigquery • bigquery2hdfs • bigquery2vertica
ฐࣾͰͷੳۀ • σʔλҠߦ Hadoop/Vertica ӡ༻ج൫νʔϜ • ੳۀΞφϦετ͕ߦ͏ • BigQuery ʹΫΤϦΛ͛ΔͷΞφϦετ
εΩʔϚมߋͷඞཁੑ(1) • ϩάʹΧϥϜ͕Ճ͞Εͨ • ͬͺΓΧϥϜ͕ফ͞Εͨ • ΧϥϜͷܕΛؒҧ͑ͨ • INTEGER ͬΆ͍ͱࢥͬͯͨΒ
11,12 Έ͍ͨͳ ͕ೖͬͯΔߦ͕͋ͬͯ STRING ͡Όͳ͍ͱμϝ ͩͬͨͱ͔͋Δ͋Δ
εΩʔϚมߋͷඞཁੑ(2) • BigQuery ςʔϒϧ໊ϕετϓϥΫςΟε • ςʔϒϧ໊લஔࢺ_ˋY%m%d • ຖ৽͘͠ςʔϒϧΛ࡞Δ • ຖεΩʔϚ࠶ఆٛͷνϟϯε͕͋Δ
• εΩʔϚมߋ͠ͳͯ͘ྑ͍͡ΌΜʁ
εΩʔϚมߋͷඞཁੑ(3) • ̎ͭͷςʔϒϧͰΧϥϜͷܕ͕ҧ͏ͱΤϥʔʂ SELECT name FROM TABLE_DATE_RANGE(data.people_, TIMESTAMP('2014-03-26'), TIMESTAMP('2014-03-27')) WHERE
age >= 35 • Ωϟετ͢Δͱ͍͏ख͋Δ͕ɺੳ࣌ଞͷ͜ ͱʹ಄Λ͍͍ͨͷͰආ͚͍ͨ
εΩʔϚมߋπʔϧͷఏڙ • ͋Δ͖࢟(εΩʔϚ)Λఏࣔ͢Δͱɺ • ΧϥϜͷՃ • ΧϥϜͷআ • ΧϥϜͷܕมߋ •
Λࣗಈผͯ͠ɺεΩʔϚมߋͰ͖ΔΑ͏ʹ ͍ͯ͋͛ͨ͠
BigQuery ʹ͓͚Δ εΩʔϚมߋͷࠔ͞
εΩʔϚมߋͷࠔ • BigQuery ʹ ALTER TABLE ͕ͳ͍ • ΧϥϜՃͷAPI͋Δ •
ΧϥϜআɺܕมߋͷ API ͕ͳ͍ Ͳ͏͢Δ͔ʁͱ͍͏
ΧϥϜՃ • patch_table (or update_table) API ͰͰ͖Δ client.patch_table(project_id, dataset_id, table_id,
{ schema: { fields: [ {name:"time", type:"TIMESTAMP", mode:"NULLABLE"}, {name:"id", type:"INTEGER", mode:"NULLABLE"}, ] } }) google-api-ruby-client
ΧϥϜՃ(ҙ) • ͢Ͱʹ͋ΔεΩʔϚ + Ճ͢ΔΧϥϜ • get_table ͰεΩʔϚΛऔಘͯ͠Ϛʔδͯ͛͠Δ response =
get_table(project_id, dataset_id, table_id) columns = response.schema.fields.map {|col| col.to_h } columns << {name:"id", type:"INTEGER", mode:"NULLABLE"} google-api-ruby-client
patch tables ͷ੍ • Ճͨ͠ΧϥϜඌʹՃ͞ΕΔ • ՃͰ͖Δͷ NULLABLE ·ͨ REPEATED
͚ͩ • mode: REQUIRED ͳΧϥϜ͕ՃͰ͖ͳ͍ • มߋ REQUIRED => NULLABLE ͚ͩ • NULLABLE Λ REQUIRED ʹͰ͖ͳ͍ • REPEATED ʹͰ͖ͳ͍
ΧϥϜআɺܕมߋ
ΧϥϜআɺܕมߋ • ̎ͭͷઓུ • (1) export & filter & load
• (2) select & copy
(1) export & filter & load • gcs ʹ export
• embulk ΒͳʹΒͰ download ͭͭ͠Λม ͢Δ filtering ॲཧΛߦ͏ • BQ ʹ load ͠ͳ͓͢
(1) export & filter & load • ར • ՝ۚ͞Εͳ͍
• ܽ • ҰϩʔΧϧʹμϯϩʔυ্ͯ͛͢͠ ͜ͱʹͳΔͷͰඇৗʹ͍ɻɻɻ
(2) select & copy • insert_job API ʹ query ͱ
destination_table Λࢦఆ insert_job(project_id, { configuration: { query: { query:"SELECT ... FROM [...]", destination_table: { dataset_id: dataset_id, table_id: table_id }, } } }, {})
(2) select & copy ͷྫ SELECT STRING(business_id) AS business_id, STRING(full_address)
AS full_address, schools, BOOLEAN(open) AS open, FROM [dataset_id.table_id] • ΩϟετͰܕมߋ • ੍: ܕมߋ͢Δͱ mode: NULLABLE ʹͳΔ • ࢦఆ͠ͳ͔ͬͨΧϥϜআ͞ΕΔ
(2) select & copy • ར • ͍ • ܽ
• ՝ۚ͞ΕΔ
(2) select & copy Λ࠾༻ (1) export & filter &
load ΔͳΒ HDFS͔ΒσʔλLoadΓͯ࣌ؒ͋͠·ΓมΘΒͳ͍ ...
Further dive into select & copy
RECORD ܕͷΩϟετํ๏ SELECT INTEGER(votes.funny) AS votes.funny, INTEGER(votes.useful) AS votes.useful, INTEGER(votes.cool)
AS votes.cool, FROM [dataset_id.table_id] • υοτ۠ΓͰࢦఆ͢Δ
select & copy ͰͷΧϥϜՃ SELECT column1, INTEGER(NULL) AS column2, INTEGER(NULL)
AS (record.column3), FROM [dataset_id.table_id] • INTEGER ܕͷ column2 ΛՃ • RECORD ܕͷ record ΧϥϜͰͳ͘ record_column3 ͱ͍͏໊લͷΧϥϜ͕Ͱ͖Δɻɻɻ • patch table API ͬͨ΄͏͕ྑͦ͞͏ɻɻɻ
mode: REPEATED ΧϥϜͷࢦఆ • SELECT ͰࢦఆͰ͖Δ REPEATED ΧϥϜ̍ͭͩ ͚ɺͳͲͷ੍͕͋ͬͨΓ •
SELECT ͢Δͱߦ͕૿͑ΔͷͰɺREPEATED Χϥ Ϝͷͳ͍ߦ͕૿͑ͨςʔϒϧΛ࡞Δ͜ͱʹͳΔ • Ͳ͏ݫͦ͠͏
Atomic ͳςʔϒϧͷஔ • ௨ৗͷઓུ • มલͷςʔϒϧ => มޙ • atomic
ʹ swap • BigQuery ʹ rename ͕ͳ͍ʂແཧʂʁ • copy ͷ destination_table Λࣗࣗʹࢦఆ • atomic ʹ swap ͞ΕΔʂʂ
·ͱΊ
·ͱΊΔͱ • ΧϥϜͷՃ͕ඞཁͳ߹ɺ·ͣ patch table • ΧϥϜͷআɺ·ͨܕมߋ͕ඞཁͳ߹ɺ ͔ͦ͜Β͞Βʹ select &
copy • copyઌࣗࣗΛࢦఆ͢Δ͜ͱͰ atomic ʹ swap Ͱ͖Δ
੍ • mode: REPEATED ΧϥϜΛѻ͑ͳ͍ • mode: NULLABLE ΧϥϜͷΈՃՄೳ •
ܕมߋ͢Δͱ mode: NULLABLE ʹͳΔ
https://github.com/sonots/bigquery_migration/
͍ํ require 'bigquery_migration' config = { json_keyfile: '/path/to/your-project-000.json' dataset: 'your_dataset_name'
table: 'your_table_name' } columns = [ { name: 'string', type: 'STRING' }, { name: 'record', type: 'RECORD', fields: [ { name: 'integer', type: 'INTEGER' }, { name: 'timestamp', type: 'TIMESTAMP' }, ] } ] migrator = BigqueryMigration.new(config) migrator.migrate_table(columns: columns)
• 4݄23ൃചʂ • σʔλऩूಛू • Fluentd / Embulk • DeNA
/ Cookpad ͷࣄྫ