Avoid reading unusually large parquet footers #25973

raunaqmorarka · 2025-06-10T17:19:27Z

Description

Reading unusually large parquet footers can lead to workers
going into full GC and crashing when decoding the footer in
org.apache.parquet.format.Util#readFileMetaData
This is usually caused by misconfigured parquet writers producing
too many row groups per file. This change adds a guard rail to
fail reads of such files gracefully.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## Hive, Delta Lake, Iceberg, Hudi
* Prevent workers from going into full GC or crashing when decoding unusually large parquet footers. ({issue}`25973`)

Copilot

Pull Request Overview

Adds a safeguard against excessively large Parquet footers by introducing a configurable max-footer-read-size limit.

Introduce maxFooterReadSize in ParquetReaderOptions and expose it via ParquetReaderConfig.
Implement a guard in MetadataReader to throw a ParquetCorruptionException when the footer exceeds the configured size.
Propagate the new option to all Parquet reader entry points (Iceberg, Hudi, Hive, Delta Lake) and update tests and documentation accordingly.

Reviewed Changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetReaderConfig.java	Add setter/getter for `parquet.max-footer-read-size` and remove legacy mapping
lib/trino-parquet/src/main/java/io/trino/parquet/reader/MetadataReader.java	Add overloads and guard in `readFooter` to enforce the footer size limit
lib/trino-parquet/src/main/java/io/trino/parquet/ParquetReaderOptions.java	Extend options and builder with `maxFooterReadSize`, defaulting to 15 MB
plugin/**/IcebergPageSourceProvider.java, HudiPageSourceProvider.java, DeltaLakePageSourceProvider.java, ParquetPageSourceFactory.java	Pass `options.getMaxFooterReadSize()` into `MetadataReader.readFooter`
docs/src/main/sphinx/object-storage/file-formats.md	Document the new `parquet.max-footer-read-size` session property

Comments suppressed due to low confidence (3)

lib/trino-parquet/src/main/java/io/trino/parquet/reader/MetadataReader.java:96

There’s no test covering the new exception path when a footer exceeds the configured limit. Consider adding a unit test that simulates a large footer and asserts that ParquetCorruptionException is thrown.

if (maxFooterReadSize.isPresent() && completeFooterSize > maxFooterReadSize.get().toBytes()) {

lib/trino-parquet/src/main/java/io/trino/parquet/ParquetReaderOptions.java:157

[nitpick] The builder field maxFooterSize is inconsistent with the rest of the API which uses maxFooterReadSize. Rename it to maxFooterReadSize for clarity and consistency.

private DataSize maxFooterSize;

docs/src/main/sphinx/object-storage/file-formats.md:104

[nitpick] The list item formatting in the Sphinx document is incorrect. Change * - to - (or align with the surrounding list style) so the new property renders properly.

* - `parquet.max-footer-read-size`

Praveen2112

LGTM - Two minor questions

Praveen2112 · 2025-06-11T04:57:59Z

docs/src/main/sphinx/object-storage/file-formats.md

Do we need something for ORC footer as well ?

I haven't seen this situation in ORC yet, even in parquet you need really bad configuration to reach this situation

lib/trino-parquet/src/main/java/io/trino/parquet/reader/MetadataReader.java

Reading unusually large parquet footers can lead to workers going into full GC and crashing when decoding the footer in org.apache.parquet.format.Util#readFileMetaData This is usually caused by misconfigured parquet writers producing too many row groups per file. This change adds a guard rail to fail reads of such files gracefully.

Defunct hive.parquet.max-read-block-size

9d419c9

cla-bot bot added the cla-signed label Jun 10, 2025

github-actions bot added docs hudi Hudi connector iceberg Iceberg connector delta-lake Delta Lake connector hive Hive connector redshift Redshift connector labels Jun 10, 2025

raunaqmorarka requested review from Praveen2112, Copilot, losipiuk, lukasz-stec and wendigo June 10, 2025 17:19

Copilot AI reviewed Jun 10, 2025

View reviewed changes

raunaqmorarka force-pushed the raunaq/parq-footer branch from f1bd9b6 to 16971bf Compare June 10, 2025 17:27

Praveen2112 approved these changes Jun 11, 2025

View reviewed changes

raunaqmorarka force-pushed the raunaq/parq-footer branch from 16971bf to 29a4ef3 Compare June 11, 2025 06:56

raunaqmorarka merged commit 33f5659 into master Jun 11, 2025
70 of 73 checks passed

raunaqmorarka deleted the raunaq/parq-footer branch June 11, 2025 07:56

github-actions bot added this to the 477 milestone Jun 11, 2025

voonhous mentioned this pull request Jul 31, 2025

[HUDI-9556] Migrate trino-hudi plugin to hudi repo apache/hudi#13493

Merged

4 tasks

ebyhr mentioned this pull request Aug 6, 2025

Add Trino 477 release notes #26350

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid reading unusually large parquet footers #25973

Avoid reading unusually large parquet footers #25973

Uh oh!

raunaqmorarka commented Jun 10, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Praveen2112 left a comment

Uh oh!

Praveen2112 Jun 11, 2025

Uh oh!

raunaqmorarka Jun 11, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Avoid reading unusually large parquet footers #25973

Avoid reading unusually large parquet footers #25973

Uh oh!

Conversation

raunaqmorarka commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Additional context and related issues

Release notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Praveen2112 left a comment

Choose a reason for hiding this comment

Uh oh!

Praveen2112 Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

raunaqmorarka Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

raunaqmorarka commented Jun 10, 2025 •

edited

Loading