Add manifest listing files #21
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds a separate file, a manifest list, to track the manifests for a snapshot. The manifest list is an Avro file with a row for each manifest. The file columns are used to avoid reading manifests to look for data files.
Columns include:
manifest_path: path of the manifest filepartition_spec_id: ID of the partition spec used to write the manifest (depends on Store multiple partition specs in table metadata. #3)added_snapshot_id: snapshot ID when the manifest was added to the tableadded_data_files_count,existing_data_files_count,deleted_data_files_countto track operationspartitions: a summary (min, max, and containsNull for each field) of the partitions in the manifest fileManifest lists are written when the table property write.manifest-lists.enabled is set to true.
Manifest lists are written in the metadata file in place of a list of manifest locations. The snapshot object includes a "manifest-list" key instead of the "manifests" key.