Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Open Table Formatにおけるストレージ抽象化の比較

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

Open Table Formatにおけるストレージ抽象化の比較

Open Table Format (OTF) は多種多様な分散ファイルシステムやオブジェクトストレージに対応しています。これらのストレージにはそれぞれ独自のAPIやSDKが用意されていますが、それらを最適な形で個別に使い分けるため、Apache Iceberg、Apache Hudi、Delta Lake、Apache Paimonはそれぞれどのようにストレージ層を抽象化しているか説明します。

More Decks by LINEヤフーTech (LY Corporation Tech)

Other Decks in Technology

Transcript

  1. © LY Corporation Todayʼs Agenda 2 How? Azure Data Lake

    Storage Gen2 https://iceberg.apache.org/spec/#overview Hadoop HDFS Apache Ozone Amazon S3 Google Cloud Storage
  2. © LY Corporation 3 Before OTF era… Hadoop FileSystem API

    is the abstraction layer Hadoop FileSystem API Azure Data Lake Storage Gen2 ・・・・ hdfs:// webhdfs:// ofs:// s3a:// gs:// abfs:// Hadoop MapReduce Hadoop HDFS Apache Ozone Amazon S3 Google Cloud Storage Apache Tez Apache Spark
  3. © LY Corporation 4 Before OTF era… Hadoop FileSystem API

    is the abstraction layer Hadoop FileSystem API Azure Data Lake Storage Gen2 ・・・・ Distributed FileSystem WebHDFS FileSystem Ozone FileSystem S3A FileSystem GoogleHadoop FileSystem AzureBlob FileSystem Hadoop MapReduce Hadoop HDFS Apache Ozone Amazon S3 Google Cloud Storage Apache Tez Apache Spark
  4. © LY Corporation 5 Abstraction in OTF Apache Hudi, Delta

    Lake, and Apache Paimon leverage Hadoop FileSystem API Hadoop FileSystem API Azure Data Lake Storage Gen2 ・・・・ Delta LogStore Paimon FileIO Apache Hudi Delta Lake Apache Paimon Hadoop HDFS Apache Ozone Amazon S3 Google Cloud Storage
  5. © LY Corporation Hudi storage schemes (name, isWriteTransactional, supportAtomicCreation, storageLockClass)

    isWriteTransactional • If true, skip block corrupt check Atomic writes • If supportAtomicCreation is true, FileSystemLockProvider can be used • If storageLockClass is true, StorageBasedLockProvider can be used • Otherwise, coordinator (such as ZooKeeper, DynamoDB) is required
  6. © LY Corporation FileSystem-specific implementations in Hudi Custom seek handling

    for GCS Custom EOF handling for Tencent CHDFS (Cloud HDFS) https://github.com/apache/hudi/blob/63275b32fdead4da9fdd4235fd540d80f46ea7ea/hudi-hadoop-common/sr c/main/java/org/apache/hudi/hadoop/fs/HadoopFSUtils.java#L230
  7. © LY Corporation HDFS-specific implementation in Delta Call msync API

    for HDFS • HDFS with Observer NameNode setup is read-after-write consistency https://github.com/delta-io/delta/pull/769
  8. © LY Corporation S3-specific implementation in Delta Use S3AFileSystem internal

    API if available • Use startAfter parameter in S3 ListObjectV2 API https://github.com/delta-io/delta/pull/1210
  9. © LY Corporation GCS-specific implementation in Delta Use a separate

    thread to write to GCS • Avoid incomplete file due to thread interruption https://github.com/delta-io/delta/pull/782
  10. © LY Corporation Apache Paimon Former Apache Flink Table Store

    • https://cwiki.apache.org/confluence/display/FLINK/FLIP-188%3A+Introduce+B uilt-in+Dynamic+Table+Storage File Layouts • LSM-tree (Log-Structured Merge-tree) for faster updates/deletes https://paimon.apache.org/docs/1.1/concepts/basic-concepts/
  11. © LY Corporation Paimon FileIO interface isObjectStore() • If true,

    rename is not atomic and external lock is required to commit snapshots newTwoPhaseOutputStream(Path, boolean) • Default: Create a temp file and rename it • S3, Alibaba Cloud Object Storage Service (OSS): Multipart upload and commit https://github.com/apache/paimon/pull/6287
  12. © LY Corporation Abstraction in Iceberg Hadoop FileSystem API Iceberg

    FileIO S3FileIO GCSFileIO Azure Data Lake Storage Gen2 ・・・・ ADLSFileIO HadoopFileIO Apache Iceberg has its own abstraction layer Apache Iceberg Amazon S3 Google Cloud Storage
  13. © LY Corporation Iceberg FileIO history • 2018: File I/O

    Submodule for TableOperation ◦ FileIO interface ▪ newInputFile ▪ newOutputFile ▪ deleteFile • 2022: Add interface for FileIO prefix operations and implementations ◦ SupportsPrefixOperation interface ▪ listPrefix ▪ deletePrefix • 2022: Add SupportBulkOperation interface ◦ SupportsBulkOperation interface ▪ deleteFiles ◦ Used in expire snapshots and remove orphan files ◦ 2 years earlier than BulkDelete in Hadoop FileSystem API
  14. © LY Corporation Optimize multi-object deletion in S3 • Before

    ◦ Call DeleteObject API for each file • After ◦ Call DeleteObjects API for each batch
  15. © LY Corporation Summary • Introduce storage abstraction layer and

    their differences among OTFs • Dive-deep into storage-specific implementations and optimizations ◦ This presentation does not cover all of them ◦ You can go further by reading source code or let AI to do ▪ Example: “Read the source code, please teach me FileIO in Paimon and its implementation, especially please teach me any specific implementation for a specific storage.” • Improvements in an OTF can be ported to other OTFs?