Skip to content

Remove RichColumnDescriptor from parquet module#13757

Merged
sopel39 merged 1 commit intotrinodb:masterfrom
charlesjmorgan:cm/pq-prim-field
Aug 26, 2022
Merged

Remove RichColumnDescriptor from parquet module#13757
sopel39 merged 1 commit intotrinodb:masterfrom
charlesjmorgan:cm/pq-prim-field

Conversation

@charlesjmorgan
Copy link
Copy Markdown
Member

@charlesjmorgan charlesjmorgan commented Aug 19, 2022

Description

The RichColumnDescriptor class in the parquet module seems a bit redundant. This PR removes that class from the PrimitiveField and replaces the columns field in the ParquetReader with a list of PrimitiveField's

Is this change a fix, improvement, new feature, refactoring, or other?

refactor

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Changes in parquet library and the update usages in hive, and iceberg

How would you describe this change to a non-technical end user or system administrator?

Remove a redundant class

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(x) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section

@cla-bot cla-bot bot added the cla-signed label Aug 19, 2022
@charlesjmorgan charlesjmorgan force-pushed the cm/pq-prim-field branch 2 times, most recently from 1634195 to 048ac1a Compare August 22, 2022 22:11
Copy link
Copy Markdown
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't remove io.trino.parquet.RichColumnDescriptor class. Is it still used?

Could you improve commit description? Why we don't need io.trino.parquet.RichColumnDescriptor anymore

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh I thought I had removed it, it has been removed now. Commit message has been updated as well.

RichColumnDescriptor seems to be a redundant class that is a wrapper
on ColumnDescriptor. It has an extra field "required", but that
field already exists in the Field/PrimitiveField classes. A simple solution to this
is to remove RichColumnDescriptor and use the "required" variable that already
exists in Field/PrimitiveField.
@sopel39 sopel39 merged commit a8f75d3 into trinodb:master Aug 26, 2022
@github-actions github-actions bot added this to the 394 milestone Aug 26, 2022
@charlesjmorgan charlesjmorgan deleted the cm/pq-prim-field branch August 26, 2022 16:23
private int nextBatchSize = INITIAL_BATCH_SIZE;
private final PrimitiveColumnReader[] columnReaders;
private final long[] maxBytesPerCell;
private final Map<Integer, PrimitiveColumnReader> columnReaders;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this cause performance regression? Can it be primitive list?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants