Skip to content

Simplify ORC reader APIs#1629

Merged
dain merged 8 commits intotrinodb:masterfrom
dain:orc-read-api-cleanup
Oct 18, 2019
Merged

Simplify ORC reader APIs#1629
dain merged 8 commits intotrinodb:masterfrom
dain:orc-read-api-cleanup

Conversation

@dain
Copy link
Copy Markdown
Member

@dain dain commented Oct 1, 2019

  • Change OrcReader to produce Page
  • Add type safe wrappers for ORC identifiers
  • Rename ORC stream classes to column

@cla-bot cla-bot bot added the cla-signed label Oct 1, 2019
@dain dain force-pushed the orc-read-api-cleanup branch from a386bce to 922cca1 Compare October 1, 2019 06:09
@dain dain requested a review from electrum October 1, 2019 06:17
@dain dain force-pushed the orc-read-api-cleanup branch from 922cca1 to afa513b Compare October 10, 2019 06:19
Copy link
Copy Markdown
Member

@phd3 phd3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for simplifying this! :)

some nit comments, still reading a few classes.

@dain dain force-pushed the orc-read-api-cleanup branch from afa513b to 156dfab Compare October 13, 2019 03:02
Copy link
Copy Markdown
Member

@phd3 phd3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that OrcRecordReader#blockLoaded can incur double/multiple counting for nested block sizes. For a column C with type row(a, b, c), getLoadedBlock invocation for column C will result in ORCRecordReader#blockLoaded 4 times, right? getSizeInBytes for a C includes sizes of fieldBlocks as well. so the internal block sizes would be counted twice.

@dain dain closed this Oct 16, 2019
@dain dain deleted the orc-read-api-cleanup branch October 16, 2019 01:09
@dain dain restored the orc-read-api-cleanup branch October 16, 2019 02:53
@dain dain reopened this Oct 16, 2019
@dain dain force-pushed the orc-read-api-cleanup branch from 156dfab to 8e91dea Compare October 18, 2019 00:03
@dain dain merged commit 774766e into trinodb:master Oct 18, 2019
@dain dain deleted the orc-read-api-cleanup branch October 18, 2019 07:21
zhenxiao pushed a commit to prestodb/presto that referenced this pull request Jul 15, 2021
Cherry-pick of trinodb/trino#1067, trinodb/trino#2042, trinodb/trino#4055, trinodb/trino#1629, trinodb/trino#3483

Co-authored-by: Parth Brahmbhatt <pbrahmbhatt@netflix.com>
Co-authored-by: David Phillips <david@acz.org>
Co-authored-by: Xingyuan Lin <linxingyuan1102@gmail.com>
Co-authored-by: Dain Sundstrom <dain@iq80.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants