Skip to content

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented Apr 17, 2020

This updates all of the tests that extend TableTestBase to run with a formatVersion parameter. This checks that operations work in both v1 and v2. This doesn't add any tests that run operations in v1 and upgrade the table to v2. Those will be added in another PR.

@rdblue rdblue marked this pull request as draft April 17, 2020 21:41
manifestEntry(Status.EXISTING, null, newFile(10)),
manifestEntry(Status.EXISTING, null, newFile(1)),
manifestEntry(Status.DELETED, null, newFile(5)),
manifestEntry(Status.DELETED, null, newFile(2)));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needed to change because it isn't possible to write entries to a manifest that inherits a sequence number but not snapshot ID, unless the snapshot ID matches the commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever we are rewriting manifests in the rewrite manifests action, we create a manifest with snapshot id = null but each entry is assigned a valid snapshot id. Will that continue to work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The problem here was that entries had a null snapshot ID, but they shouldn't for rewrites.

private ManifestEntry wrapped = null;

IndexedManifestEntry(long commitSnapshotId, Types.StructType partitionType) {
IndexedManifestEntry(Long commitSnapshotId, Types.StructType partitionType) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied manifests will use a null snapshot ID.

@rdblue rdblue changed the title Update TableTestBase tests to run with formats v1 and v2 (WIP) Update TableTestBase tests to run with formats v1 and v2 Apr 20, 2020
@rdblue rdblue marked this pull request as ready for review April 20, 2020 21:23
@aokolnychyi aokolnychyi mentioned this pull request Apr 20, 2020
@aokolnychyi
Copy link
Contributor

Let me go through this now.

public static ManifestWriter write(PartitionSpec spec, OutputFile outputFile) {
// always use a v1 writer for appended manifests because sequence number must be inherited
return write(1, spec, outputFile, null);
// always use a v2 writer to preserve sequence numbers, but use null for sequence number so appends inherit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean manifests will be written with the v2 schema (i.e. with sequence numbers) even though TableMetadata is v1 and the manifest list is written with v1? And this should work because we do a projection on read and sequence number is optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the constructor used to create an append or rewritten manifest, so this makes the manifests that are passed into FastAppend, MergeAppend, and RewriteManifests support sequence numbers. It is needed for the last case: when rewriting a manifest for a v2 table, we need to preserve sequence numbers.

This works for v1 because v1 tables will ignore the new fields.

@aokolnychyi
Copy link
Contributor

+1, ran the tests locally.

@aokolnychyi aokolnychyi merged commit a81d11e into apache:master Apr 21, 2020
@rdblue rdblue added this to the Row-level Delete milestone May 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants