Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AWSでIcebergを使ってデータウェアハウスを構築してみる / 20240810-jaws...

AWSでIcebergを使ってデータウェアハウスを構築してみる / 20240810-jawsug-akita

2024/08/10 (土) [東北][秋田][JAWS-UG 秋田]勉強会 にて発表した資料

https://jaws-tohoku.connpass.com/event/321228/

kasacchiful

August 10, 2024
Tweet

More Decks by kasacchiful

Other Decks in Programming

Transcript

  1. ໰୊఺  $47 +40/ wਓ͕ؒૢ࡞͠΍͍͢ɺΞϓϦέʔγϣϯؒ࿈ܞ͠΍͍͢ "730 ߦࢦ޲ϑΥʔϚοτ  wσʔλͷߴ଎ॲཧ͕Մೳɺ0-"1ॲཧʹར༻Մೳ 1BSRVFU

    ྻࢦ޲ϑΥʔϚοτ  ˕ਪ঑  w෼ੳ༻్ʹ࢖͍΍͍͢ ಘҙ େྔσʔλͷςʔϒϧ୯ҐͷಡΈॻ͖ ύʔςΟγϣϯϨϕϧͷಡΈॻ͖ ෆಘҙ ϨίʔυϨϕϧͷಡΈॻ͖
  2. $3&"5&5"#-&  CREATE TABLE `sample_partition_100_db`.`partition100_date_iceberg` ( id string, uuid string,

    target_date date ) PARTITIONED BY (day(target_date)) LOCATION 's3://sample-partition-100-db/partition100_date_iceberg' TBLPROPERTIES ( 'table_type'='ICEBERG' ); `UBCMF@UZQF``*$&#&3(` Λࢦఆ
  3. (MVFδϣϒͰ.&3(&  ## ..<省略>.. ## Read Csv from Catalog df

    = glueContext.create_dynamic_frame.from_catalog( database="sample_partition_100_db", table_name="partition100_date_csv") ## Transform Type df = ApplyMapping.apply( frame=df, mappings=[ ("id", "string", "id", "string"), ("uuid", "string", "uuid", "string"), ("target_date", "string", "target_date", "date") ]).toDF() ## Store tmp table df.createOrReplaceTempView("tmp_partition100_date_csv") ## Merge spark.sql(''' MERGE INTO glue_catalog.sample_partition_100_db.partition100_date_i ceberg t USING tmp_partition100_date_csv s ON (t.id = s.id) WHEN MATCHED THEN UPDATE SET t.uuid = s.uuid, t.target_date = s.target_date WHEN NOT MATCHED THEN INSERT * ''' ) job.commit()