-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Schema for a branch should return table schema #9131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
02b9447 to
8da34e8
Compare
| .containsExactly( | ||
| new GenericRowWithSchema(new Object[] {1}, null), | ||
| new GenericRowWithSchema(new Object[] {2}, null), | ||
| new GenericRowWithSchema(new Object[] {3}, null)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use SimpleRecord like the rest of the tests do instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately that doesn't work, because SimpleRecord expects the data field to be populated. The particular error is [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name data cannot be resolved. Did you mean one of the following? [id].
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestSnapshotSelection.java
Show resolved
Hide resolved
d1a7ff8 to
761fdae
Compare
|
|
||
| Preconditions.checkArgument( | ||
| sparkTable.snapshotId() == null, | ||
| sparkTable.snapshotId() == null && sparkTable.branch() == null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether we actually want to fix this as part of this PR or a separate PR, but in the Iceberg sync we briefly talked about making sure that SELECT * from ns.table.branch_x VERSION AS OF ... shouldn't be supported and should throw an error, which is what this check is doing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've applied this and moved this to #9219
| } | ||
|
|
||
| public String branch() { | ||
| return branch; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't introduced by this commit, but branch should be final right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is effectively final as it's only set once. However, it's not marked as final due to the way the different constructors in SparkTable are called
| .containsExactly( | ||
| new SimpleRecord(1, null), new SimpleRecord(2, null), new SimpleRecord(3, null)); | ||
|
|
||
| // writing new records into the branch should work with the re-introduced column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is an appropriate place for the write test. It should be a new test case because this case tests the schema that is used when reading.
In addition, the test case should test writing when the current snapshot for a branch has a different schema than the table schema. With the column added back, the schemas are the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved this to a separate test and also used a different schema
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/sql/TestSelect.java
Outdated
Show resolved
Hide resolved
When retrieving the schema for branch we should always return the table schema instead of the snapshot schema. This is because the table schema is the schema that will be used when the branch will be created. We should only return the schema of the snapshot when we have a tag.
When retrieving the schema for branch we should always return the table schema instead of the snapshot schema. This is because the table schema is the schema that will be used when the branch will be created. We should only return the schema of the snapshot when we have a tag.
When retrieving the schema for branch we should always return the table schema instead of the snapshot schema. This is because the table schema is the schema that will be used when the branch will be created. We should only return the schema of the snapshot when we have a tag.
When retrieving the schema for branch we should always return the table schema instead of the snapshot schema. This is because the table schema is the schema that will be used when the branch will be created. We should only return the schema of the snapshot when we have a tag.
Below is an example that shows the weird schema behavior when describing a table.