Upgrade to Pro — share decks privately, control downloads, hide ads and more …

BigQuery速習会 / Quick Learning about BigQuery

Altech
September 21, 2016

BigQuery速習会 / Quick Learning about BigQuery

BigQuery速習会@Wantedly
http://wantedly.connpass.com/event/40480

Altech

September 21, 2016
Tweet

More Decks by Altech

Other Decks in Technology

Transcript

  1. ࣗݾ঺հ • ஛໺ ૑ฏ • Wantedly • 2015೥2݄- Πϯλʔϯ •

    2016೥4݄- ৽ଔೖࣾ • Ϣʔβʔάϩʔεɺ΢Σϒ • @Altech_2015
  2. Contents • Overview • Data Stores @Wantedly • How to

    Import • Query Language • Query Cost • Using Result
  3. Q. େྔͷσʔλʹର͢ΔղੳΫΤϦΛ
 ΠϯλϥΫςΟϒʹฦ͢ʹ͸Ͳ͏͢Ε͹͍͍ʁ - - - What if a director

    suddenly asks, “Hey, can you give me yesterday’s number of impressions for AdWords display ads – but only in the Tokyo region?” https://cloud.google.com/files/BigQueryTechnicalWP.pdf
  4. OLAP - Online Analytical Processing • ैདྷͷιϦϡʔγϣϯ • ΠϯσοΫεΛͨ͘͞ΜషͬͨΓɺதؒσʔλΛू ܭ͓ͯ͘͠

    • ໰୊఺ɿΫΤϦͷ੾Γޱͷ໢ཏ͕େมɺଟ਺ͷΠϯ σοΫε͕ݩσʔλΑΓ΋େ͖͘ͳΔ͜ͱɺதؒσʔ λ͕εΩʔϚมߋͰյΕΔɺͳͲ… • BigQueryͷιϦϡʔγϣϯ • ࣄલܭࢉͤͣͦͷ৔ͰϑϧεΩϟϯ͢Δ
  5. ྻࢦ޲σʔλϕʔε • Relational Database : ߦ͝ͱʹσʔλΛ֨ೲ • BigQuery : ྻ͝ͱʹσʔλΛ֨ೲ

    • Pros: • ѹॖޮ཰ • εΩϟϯޮ཰ • Cons: • ߋ৽͠ʹ͍͘ SELECT * ☓
  6. BigQuery ಋೖޙ • σʔλϕʔε : Postgres • σʔλ΢ΣΞϋ΢εɿTreasureData … batch

    query • σʔλ΢ΣΞϋ΢εɿBigQuery … ad-hoc query • σʔλՄࢹԽɿDOMO
 
 
 
 … batch query ΋ BigQuery ʹҠߦ͢Δ༨஍͸͋Δɻ
  7. TreasureData to BigQuery Huntr::TreasureDataToBigQuery.export_table!(“log”, day: date) Huntr::TreasureDataToBigQuery::AUTO_EXPORT_TABLES SELECT time, controller,

    action, ... -- TD ͷεΩʔϚ৘ใ͔ΒΫΤϦΛੜ੒ FROM log WHERE TD_TIME_RANGE(time, '2016-09-20', '2016-09-21', 'JST') ORDER BY time Command:
  8. Partitioning • σʔλΛ෼͚͓͖ͯɺεΩϟϯ͢ΔൣғΛͬ͘͟Γߜ ΕΔΑ͏ʹ͢Δ • From TreasureData: daily partitioning •

    td.log20160921, td.log20160922, 
 td.log20160923, … • From Postgres: no partitioning • rdb.projects, rdb.companies, …
  9. StandardSQL, LegacySQL • 2016೥9݄ݱࡏɺೋͭͷΫΤϦݴޠ͕࢖͑Δ • Legacy SQL : FROM۟ʹؔ਺͕ॻ͚ΔͳͲಠࣗͷSQL. •

    Standard SQL : 6݄ʹβϦϦʔε. SQL2011ʹ४ڌ. • ࣾ಺ਪ঑: StandardSQL • PostgresSQL ΋ඪ४ʹൺֱత४ڌ͍ͯ͠Δ • ؔ਺ͷΠϯλʔϑΣʔε͕ྑ͍ɺUDF͕ѻ͍΍͍͢
  10. Data Types • INT64 (INTEGER) • FLOAT64 (FLOAT) • BOOL

    (BOOLEAN) • STRING • BYTES • DATE • TIMESTAMP • Array • Struct CAST("123" AS INT64) Type Casting:
  11. Syntax, Functions, Operators • Syntax: WITH, PARTITION BY etc. •

    https://cloud.google.com/bigquery/sql-reference/ query-syntax • Functions: • https://cloud.google.com/bigquery/sql-reference/ functions-and-operators
  12. Example: ਖ਼نදݱͰ໊લ෦෼Λऔಘ WITH emails AS ( SELECT "[email protected]" AS email

    UNION ALL SELECT "[email protected]" AS email ) SELECT 
 REGEXP_EXTRACT(email, r"^[a-zA-Z0-9_.+-]+") 
 AS user_name FROM emails; https://cloud.google.com/bigquery/sql-reference/functions-and-operators#regexp_extract
  13. User Define Function • JSON_EXTRACT Λࣗલ࣮૷ͯ͠࢖͏ྫɿ • υΩϡϝϯτʹಋઢ͕ͳ͍ͷͰ࢓༷͸গ͠มΘΔ͔΋ CREATE TEMPORARY

    FUNCTION CUSTOM_JSON_EXTRACT(json STRING, json_path STRING) RETURNS STRING LANGUAGE js AS """ try { var parsed = JSON.parse(json); } catch (e) { return null } return eval(json_path.replace("$", "parsed")); “""; SELECT CUSTOM_JSON_EXTRACT(“{a: 1}”, “$.a”);
  14. with Ruby • Ruby gems • google-api-client • Ұ൪ϓϦϛςΟϒͰɺgenerate ͞Εͨ΋ͷɻAPIͰ

    Ͱ͖Δ͜ͱ͸શͯͰ͖Δ͕ɺҰ෦൚༻త͗ͯ͢࢖ ͍ʹ͍͘ͱ͜Ζ΋ɻ • google-cloud • google-api-client ΛਓखͰϥοϓͨ͠΋ͷɻͰ͖ ͳ͍͜ͱ΋͋Δ͕ɺΠϯλʔϑΣʔε͕ྑ͍ɻ
  15. End. • Overview • Data Stores @Wantedly • How to

    Import • Query Language • Query Cost • Using Result
  16. ࢀߟࢿྉ • An Inside Look at Google BigQuery • https://cloud.google.com/files/

    BigQueryTechnicalWP.pdf • SQL Reference | BigQuery Documentation • https://cloud.google.com/bigquery/sql-reference/
  17. Exercise: ϝʔϧ͔ΒԠԉͨ͠ϢʔβʔΛݟ͚ͭΔ • mail_log ςʔϒϧΛ༻͍ͯɺઌिҰिؒͰԠԉΛ͓ئ ͍͢Δϝʔϧܦ༝ͰߦΘΕͨԠԉΛݟ͚ͭΔ • mailer_name: "unsubscribable_user_mailer" •

    action_name: "suggest_support_mail" • mail_action: "clicked" • ϝʔϧܦ༝ͰͷԠԉʹΫϦοΫͨ͠ืूΛ10෼Ҏ಺ʹ Ԡԉ • mail_log ͱ project_supports ςʔϒϧΛར༻