-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for concurrent write on Iceberg transformed column #24160
Add support for concurrent write on Iceberg transformed column #24160
Conversation
0f00cdd
to
ff872d7
Compare
@@ -2740,7 +2740,8 @@ private void finishWrite(ConnectorSession session, IcebergTableHandle table, Col | |||
|
|||
RowDelta rowDelta = transaction.newRowDelta(); | |||
table.getSnapshotId().map(icebergTable::snapshot).ifPresent(s -> rowDelta.validateFromSnapshot(s.snapshotId())); | |||
TupleDomain<IcebergColumnHandle> dataColumnPredicate = table.getEnforcedPredicate().filter((column, domain) -> !isMetadataColumnId(column.getId())); | |||
TupleDomain<IcebergColumnHandle> dataColumnPredicate = table.getEnforcedPredicate().intersect(table.getUnenforcedPredicate()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we intersect enforced with the unenforced predicate?
They are rather unrelated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that both carriers information about predicate.
The difference is that taken from @raunaqmorarka explanation:
enforced is the part of the predicate which is guaranteed to be satisfied by the connector, so the engine will not apply it on it's side.
unenforced is the part of the predicate which connector cannot guarantee even if it is able to use it to reduce output, so the engine will apply it on the connector output
@raunaqmorarka Correct me if I'm wrong here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{2:part:date=[ SortedRangeSet[type=date, ranges=1, {[2024-01-01]}] ]}
I'm checking io.trino.plugin.iceberg.TestIcebergLocalConcurrentWritesTest#testConcurrentUpdateWithOverlappingPartitionTransformation
(BTW really cool that we have this battery of new concurrency tests)
It's unclear to me why the partition predicate is not an "enforced predicate" while debugging your code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding such predicates for transformed columns as enforced, would mean pushdowns of those values and connector would need to filter files and rows also during reading from such table as well. It's not supported right now and would require more work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are rather unrelated.
We already join those in IcebergSplitSource
trino/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Line 259 in 61a79dd
TupleDomain<IcebergColumnHandle> effectivePredicate = TupleDomain.intersect( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are able to prune files in IcebergSplitSource based on unenforced filter, but not rows (hence "unenforced"). Filters on identity partitioned columns are enforced at the row level due to the data being partitioned by value and PartitionConstraintMatcher
in IcebergSplitSource.
I think it's valid to intersect unenforced filter here because org.apache.iceberg.MergingSnapshotProducer
is internally checking whether any files matching given filter have been added to the table.
8da17cc
to
c9670a6
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
@@ -2740,7 +2740,8 @@ private void finishWrite(ConnectorSession session, IcebergTableHandle table, Col | |||
|
|||
RowDelta rowDelta = transaction.newRowDelta(); | |||
table.getSnapshotId().map(icebergTable::snapshot).ifPresent(s -> rowDelta.validateFromSnapshot(s.snapshotId())); | |||
TupleDomain<IcebergColumnHandle> dataColumnPredicate = table.getEnforcedPredicate().filter((column, domain) -> !isMetadataColumnId(column.getId())); | |||
TupleDomain<IcebergColumnHandle> dataColumnPredicate = table.getEnforcedPredicate().intersect(table.getUnenforcedPredicate()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are able to prune files in IcebergSplitSource based on unenforced filter, but not rows (hence "unenforced"). Filters on identity partitioned columns are enforced at the row level due to the data being partitioned by value and PartitionConstraintMatcher
in IcebergSplitSource.
I think it's valid to intersect unenforced filter here because org.apache.iceberg.MergingSnapshotProducer
is internally checking whether any files matching given filter have been added to the table.
c9670a6
to
7623816
Compare
...rino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergLocalConcurrentWritesTest.java
Outdated
Show resolved
Hide resolved
...rino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergLocalConcurrentWritesTest.java
Outdated
Show resolved
Hide resolved
...rino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergLocalConcurrentWritesTest.java
Outdated
Show resolved
Hide resolved
...rino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergLocalConcurrentWritesTest.java
Outdated
Show resolved
Hide resolved
7623816
to
5df5282
Compare
5df5282
to
bb147d6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
But I would like a second opinion on it
@alexjo2144 @findepi @losipiuk @hashhar @electrum are any of you able to weigh in here ?
...in/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergLocalConcurrentWrites.java
Outdated
Show resolved
Hide resolved
...in/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergLocalConcurrentWrites.java
Outdated
Show resolved
Hide resolved
bb147d6
to
8daa87b
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
8daa87b
to
36c8476
Compare
TupleDomain<IcebergColumnHandle> convertibleUnenforcedPredicate = table.getUnenforcedPredicate().filter((_, domain) -> isConvertableToIcebergExpression(domain)); | ||
TupleDomain<IcebergColumnHandle> effectivePredicate = dataColumnPredicate.intersect(convertibleUnenforcedPredicate); | ||
if (!effectivePredicate.isAll()) { | ||
rowDelta.conflictDetectionFilter(toIcebergExpression(effectivePredicate)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty cool @pajaks 🎉
Outstanding test coverage.
@ebyhr @raunaqmorarka Can this be merged? |
Description
Add concurrent write to scenarios like:
Release notes