-
Notifications
You must be signed in to change notification settings - Fork 3k
Add content types to DataFile and ManifestFile #1030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This also updates AllEntriesTable to return the correct sequence number.
aokolnychyi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The v1 metadata seems to stay unchanged.
| /** | ||
| * @return the content stored in the file; one of DATA, POSITION_DELETES, or EQUALITY_DELETES | ||
| */ | ||
| default FileContent content() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will DeleteFile extend this or will it be totally separate?
|
|
||
| return CloseableIterable.transform(manifests, manifest -> new BaseFileScanTask( | ||
| DataFiles.fromManifest(manifest), schemaString, specString, ResidualEvaluator.unpartitioned(rowFilter))); | ||
| return CloseableIterable.transform(manifests, manifest -> new ManifestEntriesTable.ManifestReadTask( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, this one wouldn't probably work correctly before if snapshot id inheritance was enabled.
| return wrapped.snapshotId(); | ||
| case 2: | ||
| DataFile file = wrapped.file(); | ||
| if (file == null || file instanceof GenericDataFile) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change needed because we added FileContent in GenericDataFile?
…pdate GenericFlinkManifestFile due to PR apache#1030
…pdate GenericFlinkManifestFile due to PR apache#1030
…pdate GenericFlinkManifestFile due to PR apache#1030
This adds
FileContentandManifestContentto encode the content type in aDataFileorManifestFile. Readers and writers are updated to handle the new metadata fields and values from v1 metadata default toDATA.DataFilealways usesFileContent.DATA. Although the schema is the same in a manifest,DataFilewill be used in the API only for data files, andDeleteFilewill be added to handle delete deltas.This also adds documentation comments to fields, and introduces an
IndexedDataFilefor writing v2 metadata that no longer writes block size and makes content required.