Skip to content

Fix parquet data page offset calculation in parquet writer#10722

Merged
martint merged 1 commit intotrinodb:masterfrom
sauliusvl:fix-parquet-data-page-offset
Jan 24, 2022
Merged

Fix parquet data page offset calculation in parquet writer#10722
martint merged 1 commit intotrinodb:masterfrom
sauliusvl:fix-parquet-data-page-offset

Conversation

@sauliusvl
Copy link
Copy Markdown
Contributor

The ParquetWriter::flush method calls int OutputStreamSliceOutput::size() to get the data page offset which is a long, thus flushing fails trying to write files larger than ~2 GB with an integer overflow exception (originating here).

@sauliusvl sauliusvl changed the title Fix parquet data page offset type in parquet writer Fix parquet data page offset calculation in parquet writer Jan 21, 2022
@findepi
Copy link
Copy Markdown
Member

findepi commented Jan 21, 2022

@cla-bot check

@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Jan 21, 2022

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Jan 21, 2022

The cla-bot has been summoned, and re-checked this pull request!

@martint
Copy link
Copy Markdown
Member

martint commented Jan 24, 2022

@cla-bot check

@cla-bot cla-bot bot added the cla-signed label Jan 24, 2022
@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Jan 24, 2022

The cla-bot has been summoned, and re-checked this pull request!

@martint martint merged commit 828f4da into trinodb:master Jan 24, 2022
@github-actions github-actions bot added this to the 369 milestone Jan 24, 2022
@mosabua mosabua mentioned this pull request Jan 24, 2022
v-jizhang added a commit to v-jizhang/presto that referenced this pull request Feb 14, 2022
Cherry-pick of trinodb/trino#10722

The ParquetWriter::flush method calls int OutputStreamSliceOutput::size()
to get the data page offset which is a long, thus flushing fails trying
to write files larger than ~2 GB with an integer overflow exception.

Co-authored-by: Saulius Valatka <saulius.vl@gmail.com>
pettyjamesm pushed a commit to prestodb/presto that referenced this pull request Feb 18, 2022
Cherry-pick of trinodb/trino#10722

The ParquetWriter::flush method calls int OutputStreamSliceOutput::size()
to get the data page offset which is a long, thus flushing fails trying
to write files larger than ~2 GB with an integer overflow exception.

Co-authored-by: Saulius Valatka <saulius.vl@gmail.com>
@sauliusvl sauliusvl deleted the fix-parquet-data-page-offset branch June 12, 2022 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants