Spark streaming (merge into) iceberg table concurrent write with compaction job #12187

2MD · 2025-02-06T11:02:39Z

Query engine

Iceberg version 1.7.1
Spark version 3.3.2

Question

We have spark streaming application A1 which in every microbatch does:

s"""
       |MERGE INTO table as t
       |USING (select * from $tempViewName) as s
       |ON $joinCondition
       |WHEN MATCHED AND s.$versionColumnName > t.$versionColumnName THEN UPDATE SET *
       |WHEN NOT MATCHED THEN INSERT *
       |"""

We can update any data file.
Table "table" without partition.

And we have some spark batch application A2 which does:
remove old snapshot, rewrite manifest , compaction (binpack and sometimes z-order), rewrite position delete files, delete orphan files.
For this table.

How we can avoid concurrent troubles between two applications?

(We are still thinking about launching A2 inside mircobatch A1... but is not the best solution)

The text was updated successfully, but these errors were encountered:

2MD added the question Further information is requested label Feb 6, 2025

2MD changed the title ~~Spark streaming concurrent write with compact job~~ Spark streaming (merge into) iceberg table concurrent write with compaction job Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark streaming (merge into) iceberg table concurrent write with compaction job #12187

Spark streaming (merge into) iceberg table concurrent write with compaction job #12187

2MD commented Feb 6, 2025 •

edited

Loading

Spark streaming (merge into) iceberg table concurrent write with compaction job #12187

Spark streaming (merge into) iceberg table concurrent write with compaction job #12187

Comments

2MD commented Feb 6, 2025 • edited Loading

Query engine

Question

2MD commented Feb 6, 2025 •

edited

Loading