Skip to content

Conversation

@ajantha-bhat
Copy link
Member

@ajantha-bhat ajantha-bhat commented Mar 1, 2023

Changes:

  • schema changes
    • move spec_id (without changing field id) before the counters. Because It looks odd when we add new counters to have spec_id in between.
    • add column docs for record_count and file_count to clarify that it is only data file counters.
  • Rename internal fields recordCount to dataRecordCount and fileCount to dataFileCount for better clarity.

This PR is a prerequisite for #6661

@ajantha-bhat
Copy link
Member Author

Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the column order, I am good with the refactoring based on what we want to add next. Wondering what others think.

Types.NestedField.required(2, "record_count", Types.LongType.get()),
Types.NestedField.required(3, "file_count", Types.IntegerType.get()),
Types.NestedField.required(4, "spec_id", Types.IntegerType.get()));
Types.NestedField.required(4, "spec_id", Types.IntegerType.get()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the column order, it feels odd to me to have IDs in 1,4,2,3 order. But that's just personal preference.

Copy link
Member Author

@ajantha-bhat ajantha-bhat Mar 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have discussed this here
#6661 (comment)

I think @RussellSpitzer and @szehon-ho agree that there is no impact from column re-ordering.
Spec-id in the middle really looks odd when we add other counters. I don't want to change the field id and break the compatibility.

Types.NestedField.required(
2, "record_count", Types.LongType.get(), "count of records in data files"),
Types.NestedField.required(
3, "file_count", Types.IntegerType.get(), "count of data files"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs should use sentence case.

Table partitionsTable = new PartitionsTable(table);
Types.StructType expected =
new Schema(required(3, "file_count", Types.IntegerType.get())).asStruct();
new Schema(required(3, "file_count", Types.IntegerType.get(), "count of data files"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs should use sentence case here as well.

scan.schema(),
partitions,
root -> StaticDataTask.Row.of(root.recordCount, root.fileCount));
root -> StaticDataTask.Row.of(root.dataRecordCount, root.dataFileCount));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see much value in these renames, but it's okay with me.

@ajantha-bhat
Copy link
Member Author

@rdblue: Thanks for the review. I have fixed the case now.

@ajantha-bhat
Copy link
Member Author

Hi, Can this PR be merged if no more comments or no new reviewers?

@jackye1995
Copy link
Contributor

Sorry overlooked this PR, looks like all comments are addressed, I will go ahead to merge it.

@jackye1995 jackye1995 merged commit a31f941 into apache:master Mar 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants