You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have spark streaming application A1 which in every microbatch does:
s"""
|MERGE INTO table as t
|USING (select * from $tempViewName) as s
|ON $joinCondition
|WHEN MATCHED AND s.$versionColumnName > t.$versionColumnName THEN UPDATE SET *
|WHEN NOT MATCHED THEN INSERT *
|"""
We can update any data file. Table "table" without partition.
And we have some spark batch application A2 which does:
remove old snapshot, rewrite manifest , compaction (binpack and sometimes z-order), rewrite position delete files, delete orphan files.
For this table.
How we can avoid concurrent troubles between two applications?
(We are still thinking about launching A2 inside mircobatch A1... but is not the best solution)
The text was updated successfully, but these errors were encountered:
2MD
changed the title
Spark streaming concurrent write with compact job
Spark streaming (merge into) iceberg table concurrent write with compaction job
Feb 6, 2025
Query engine
Iceberg version 1.7.1
Spark version 3.3.2
Question
We have spark streaming application A1 which in every microbatch does:
We can update any data file.
Table "table" without partition.
And we have some spark batch application A2 which does:
remove old snapshot, rewrite manifest , compaction (binpack and sometimes z-order), rewrite position delete files, delete orphan files.
For this table.
How we can avoid concurrent troubles between two applications?
(We are still thinking about launching A2 inside mircobatch A1... but is not the best solution)
The text was updated successfully, but these errors were encountered: