Write Mode Overwrite Upsert Output File Number Controlled by Spark configuration Massive small files Calculation Full calculaiton Merge on Read, incremental calculation Effectiveness Hourly Minute
M 3(checkpoint interval) * 10(writer) * 3 files(data file/equality delete/position delete) ts=2022-10-01-22 xx M 90 … … … ts=2022-09-27-00 x K 90 Flink Checkpoint intervel is 20 mins,10 writers
ts=2022-10-01-23 xxx M 3 * 3 (record with same partition will be shuffled to the same writer) ts=2022-10-01-22 xx M 9 ts=2022-10-01-21 x M 9 … … … ts=2022-09-27-00 x K 9 Flink Checkpoint intervel is 20 mins,10 writers BackPressure BackPressure
M EqualityFieldKeySelector ts=2022-10-01-22 xx M EqualityFieldKeySelector ts=2022-10-01-21 x M PartitionKeySelector … … … ts=2022-09-27-00 x K PartitionKeySelector