-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Adding same view version as part of a concurrent update shouldn't fail #14997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| // (indicating that the schema ID should be automatically assigned) | ||
| // this scenario can happen with concurrent updates in REST cases where the same update is | ||
| // applied twice. The view version gets a new ID assigned because | ||
| // ViewMetadata#sameViewVersion(current, updated) isn't true, because the current's schemaId was |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably deduplicate this properly. I'm working on a fix for this
| private int addSchemaInternal(Schema schema) { | ||
| int newSchemaId = reuseOrCreateNewSchemaId(schema); | ||
| if (schemasById.containsKey(newSchemaId)) { | ||
| if (null == lastAddedSchemaId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just FYI that this is most likely currently wrong, because the implication of setting lastAddedSchemaId here is that the metadata update will contain a -1 as the schema ID in addVersionInternal().
The internal state tracking is quite delicate here, so I'm currently exploring a few other options on how to achieve a concurrent replace operation to not fail due to internal state tracking in ViewMetadata
c7c9c51 to
fd9bbc7
Compare
| && (one.schemaId() == two.schemaId() | ||
| || (two.schemaId() == LAST_ADDED | ||
| && Objects.equals(lastSeenExistingSchemaId, one.schemaId()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit ugly - there is a difference between one and two
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the method name suggests that the comparison is symmetrical, but it is not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I agree that this should be made symmetrical. I was mostly exploring this part to do proper deduplication for the edge case that we're testing for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated this to be symmetrical
fd9bbc7 to
718e098
Compare
Applying a concurrent update of the same view version currently fails with
```
org.apache.iceberg.exceptions.ValidationException: Cannot set last added schema: no schema has been added
at org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:49)
at org.apache.iceberg.view.ViewMetadata$Builder.addVersionInternal(ViewMetadata.java:297)
at org.apache.iceberg.view.ViewMetadata$Builder.addVersion(ViewMetadata.java:277)
at org.apache.iceberg.MetadataUpdate$AddViewVersion.applyTo(MetadataUpdate.java:508)
at org.apache.iceberg.rest.CatalogHandlers.lambda$commit$11(CatalogHandlers.java:624)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at org.apache.iceberg.rest.CatalogHandlers.lambda$commit$12(CatalogHandlers.java:624)
```
This is due to our internal state tracking of `lastAddedSchemaId`, which is then assumed to be set when adding the view version and checking
```
if (version.schemaId() == LAST_ADDED) {
ValidationException.check(lastAddedSchemaId != null, "Cannot set last added schema: no schema has been added");
version = ImmutableViewVersion.builder().from(version).schemaId(lastAddedSchemaId).build();
}
```
I added a reproducible test to `TestViewMetadata` where the schema ID is set to `-1`, indicating that the schema ID can be re-assigned.
718e098 to
38d3376
Compare
This fixes an issue that @haizhou-zhao brought up in #14334.
Basically the test added in #14334 performs a concurrent update of the same view version, but fails with
This is due to our internal state tracking of
lastAddedSchemaId, which is then assumed to be set when adding the view version and checkingI added a reproducible test to
TestViewMetadatawhere the schema ID is set to-1, indicating that the schema ID can be re-assigned.Once we get this change in, we should also get #14334 in, as that reproduces the issue and has a good test for it.
@huaxingao, @singhpk234, @amogh-jahagirdar since you guys reviewed #14434 already, could you please review this one as well?