Skip to content

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented Aug 10, 2020

Utilities that compare partitions need to also track the partition spec that a partition tuple belongs to because the same set of partition values can be valid for multiple specs, but identify different partitions. Many classes track the partitions of data and delete files, and the easiest way to update those utilities is to pass the spec ID along with the DataFile instance. Otherwise, getting the correct spec ID would require updating several public APIs to add a spec ID argument.

This PR adds spec ID to DataFile and DeleteFile, and adds it to metadata that is inherited from ManifestFile, where the spec ID of a manifest is tracked.

This also cleans up unnecessary factory methods in DataFile that were used only in tests and were missing spec ID. Now, all data file creation uses the builder.

@rdblue rdblue mentioned this pull request Aug 10, 2020
@rdblue rdblue force-pushed the add-data-file-spec-id branch from 5b1c45b to da78f03 Compare August 10, 2020 23:37
@rdblue rdblue force-pushed the add-data-file-spec-id branch from da78f03 to 56742bf Compare August 10, 2020 23:37
Copy link
Contributor

@rdsr rdsr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Though I do see a couple core tests failing on Travis

@shardulm94
Copy link
Contributor

#1307 (comment) mentions that a workaround in StructLikeWrapper will be removed after adding specId to DataFile. Is that supposed to happen as part of this PR?

@rdblue
Copy link
Contributor Author

rdblue commented Aug 11, 2020

@shardulm94, that is done in #1308. That PR includes the addition of specId to DataFile, but I think it is cleaner to put that change in a separate PR, which is why I opened this one.

@rdblue rdblue merged commit 700fc9e into apache:master Aug 11, 2020
@rdblue
Copy link
Contributor Author

rdblue commented Aug 11, 2020

Thanks for reviewing @rdsr and @shardulm94!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants