Skip to content

Conversation

@pitrou
Copy link
Member

@pitrou pitrou commented Oct 7, 2025

Rationale for this change

Fix issues found by OSS-Fuzz when invalid Parquet data is fed to the Parquet reader:

Are these changes tested?

Yes, using the updated fuzz regression files from apache/arrow-testing#115

Are there any user-facing changes?

No.

This PR contains a "Critical Fix". (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.)

@pitrou pitrou requested a review from wgtmac as a code owner October 7, 2025 13:57
@pitrou
Copy link
Member Author

pitrou commented Oct 7, 2025

@github-actions crossbow submit -g cpp

@pitrou
Copy link
Member Author

pitrou commented Oct 7, 2025

@AntoinePrv Would you like to take a look?

@github-actions github-actions bot added the awaiting review Awaiting review label Oct 7, 2025
@pitrou pitrou requested review from adamreeve and mapleFU October 7, 2025 13:58
// There may be remaining null if they are not greedily filled by either decoder calls
check_and_handle_fully_null_remaining();

ARROW_DCHECK(batch.is_done() || exhausted());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check could trigger if the RLE-bit-packed data is invalid (for example a run of invalid size). @AntoinePrv

@pitrou pitrou changed the title GH-47740: [C++][Parquet] Fix dangerous behavior when reading invalid Parquet data GH-47740: [C++][Parquet] Fix undefined behavior when reading invalid Parquet data Oct 7, 2025
@github-actions
Copy link

github-actions bot commented Oct 7, 2025

Revision: d620685

Submitted crossbow builds: ursacomputing/crossbow @ actions-0059c16459

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-42-cpp GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Oct 7, 2025
@pitrou
Copy link
Member Author

pitrou commented Oct 7, 2025

Valgrind failure is unrelated and will be fixed by #47743

Copy link
Contributor

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - nice work!

@pitrou pitrou merged commit 33d1f32 into apache:main Oct 8, 2025
45 checks passed
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Oct 8, 2025
@pitrou pitrou deleted the gh47740-parquet-fuzz branch October 8, 2025 06:43
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 2 benchmarking runs that have been run so far on merge-commit 33d1f32.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 7 possible false positives for unstable benchmarks that are known to sometimes produce them.

raulcd pushed a commit that referenced this pull request Oct 8, 2025
…Parquet data (#47741)

### Rationale for this change

Fix issues found by OSS-Fuzz when invalid Parquet data is fed to the Parquet reader:
* https://issues.oss-fuzz.com/issues/447262173
* https://issues.oss-fuzz.com/issues/447480433
* https://issues.oss-fuzz.com/issues/447490896
* https://issues.oss-fuzz.com/issues/447693724
* https://issues.oss-fuzz.com/issues/447693728
* https://issues.oss-fuzz.com/issues/449498800

### Are these changes tested?

Yes, using the updated fuzz regression files from apache/arrow-testing#115

### Are there any user-facing changes?

No.

**This PR contains a "Critical Fix".** (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.)

* GitHub Issue: #47740

Authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
typename EncodingTraits<DType>::Accumulator* out,
int* out_num_values) {
std::vector<ByteArray> values(num_values);
std::vector<ByteArray> values(num_values - null_count);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this was not a problem in itself, it was just allocating too much memory :)

@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Oct 10, 2025
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this pull request Oct 15, 2025
…valid Parquet data (apache#47741)

### Rationale for this change

Fix issues found by OSS-Fuzz when invalid Parquet data is fed to the Parquet reader:
* https://issues.oss-fuzz.com/issues/447262173
* https://issues.oss-fuzz.com/issues/447480433
* https://issues.oss-fuzz.com/issues/447490896
* https://issues.oss-fuzz.com/issues/447693724
* https://issues.oss-fuzz.com/issues/447693728
* https://issues.oss-fuzz.com/issues/449498800

### Are these changes tested?

Yes, using the updated fuzz regression files from apache/arrow-testing#115

### Are there any user-facing changes?

No.

**This PR contains a "Critical Fix".** (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.)

* GitHub Issue: apache#47740

Authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants