Skip to content

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented May 8, 2020

This adds FileContent and ManifestContent to encode the content type in a DataFile or ManifestFile. Readers and writers are updated to handle the new metadata fields and values from v1 metadata default to DATA.

DataFile always uses FileContent.DATA. Although the schema is the same in a manifest, DataFile will be used in the API only for data files, and DeleteFile will be added to handle delete deltas.

This also adds documentation comments to fields, and introduces an IndexedDataFile for writing v2 metadata that no longer writes block size and makes content required.

@rdblue rdblue requested a review from aokolnychyi May 9, 2020 00:00
This also updates AllEntriesTable to return the correct sequence number.
@rdblue rdblue added this to the Row-level Delete milestone May 12, 2020
Copy link
Contributor

@aokolnychyi aokolnychyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The v1 metadata seems to stay unchanged.

/**
* @return the content stored in the file; one of DATA, POSITION_DELETES, or EQUALITY_DELETES
*/
default FileContent content() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will DeleteFile extend this or will it be totally separate?


return CloseableIterable.transform(manifests, manifest -> new BaseFileScanTask(
DataFiles.fromManifest(manifest), schemaString, specString, ResidualEvaluator.unpartitioned(rowFilter)));
return CloseableIterable.transform(manifests, manifest -> new ManifestEntriesTable.ManifestReadTask(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, this one wouldn't probably work correctly before if snapshot id inheritance was enabled.

return wrapped.snapshotId();
case 2:
DataFile file = wrapped.file();
if (file == null || file instanceof GenericDataFile) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change needed because we added FileContent in GenericDataFile?

@aokolnychyi aokolnychyi merged commit 01d1462 into apache:master May 23, 2020
waterlx added a commit to waterlx/incubator-iceberg that referenced this pull request May 27, 2020
waterlx added a commit to waterlx/incubator-iceberg that referenced this pull request May 28, 2020
waterlx added a commit to waterlx/incubator-iceberg that referenced this pull request Jun 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants