Skip to content

Comments

[presto-orc] Fix row index position when compression is disabled#23760

Merged
NikhilCollooru merged 1 commit intoprestodb:masterfrom
sdruzkin:export-D63733934
Oct 2, 2024
Merged

[presto-orc] Fix row index position when compression is disabled#23760
NikhilCollooru merged 1 commit intoprestodb:masterfrom
sdruzkin:export-D63733934

Conversation

@sdruzkin
Copy link
Collaborator

@sdruzkin sdruzkin commented Oct 2, 2024

Description

Fix a bug in position accounting in the OrcOutputBuffer when doing batched writeBytes.

This bug results in incorrect row index when the file is not compressed,
and one or more writeBytes calls have an input larger than the size of the
compression buffer, and that stripe has 2+ row groups.

Incorrect row index will result in incorrect reads when using presto-orc
reader.

Motivation and Context

Impact

Fix data corruption when an ORC/DWRF file is not compressed and one of the string or binary
column values is larger than the compression buffer size.

Test Plan

Updated unit tests.

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

ORC/DWRF Changes
* Fix a data corruption in uncompressed ORC/DWRF files with large values in string/binary columns :pr:`23760`

Summary:
Fix a bug in position accounting in the OrcOutputBuffer when doing
batched writeBytes.

Differential Revision: D63733934
@sdruzkin sdruzkin requested a review from a team as a code owner October 2, 2024 00:27
@sdruzkin sdruzkin requested a review from presto-oss October 2, 2024 00:27
@facebook-github-bot
Copy link
Collaborator

This pull request was exported from Phabricator. Differential Revision: D63733934

@sdruzkin sdruzkin requested review from NikhilCollooru and ajaygeorge and removed request for ajaygeorge October 2, 2024 00:35
@NikhilCollooru NikhilCollooru merged commit 699a808 into prestodb:master Oct 2, 2024
@jaystarshot jaystarshot mentioned this pull request Nov 1, 2024
25 tasks
@tdcmeehan tdcmeehan added the from:Meta PR from Meta label Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants