for (pluggable) indexing records for fast updates and deletes • support two table types1 ◦ Copy on Write (COW): data stored in purely columnar format ◦ Merge On Read (MOR): data stored using combination of columnar and row format • In MOR table, updates are added in delta (avro) files which are later compacted with columnar files synchronously or asynchronously • Depending on one’s requirement, compaction can be tuned ◦ We compact every write, since Amazon Athena only serves Read Optimized View2 • And Amazon EMR supports3,4 Hudi Data Lake at PayPay We use MOR table type Amazon EMR We use Apache Hudi 0.7.0 with EMR 6.0.0 1. https://hudi.apache.org/docs/overview.html#table-types 2. https://docs.aws.amazon.com/athena/latest/ug/querying-hudi.html 3. https://aws.amazon.com/emr/features/hudi/ 4. https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-6x.html (support is for Hudi 0.5.0) Support for Fast Writes using Apache Hudi