Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MINOR]: Fail fast when footer size larger than Int.MaxValue #3136

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ConeyLiu
Copy link
Contributor

Rationale for this change

The footer size is limited to unsigned int for some other language implementations, such as Rust. However, it is limited to Int.MaxValue on the Java side due to some limitations(eg, lack of unsigned int supports, the biggest byte array/buffer is limited to Int.MaxValue). So this PR reads the footer size as long and fails fast if it is larger than Int.MaxValue.

What changes are included in this PR?

Are these changes tested?

Existed UTs.

Are there any user-facing changes?

No

@ConeyLiu
Copy link
Contributor Author

@dylanburati I have tried to support the footer with an unsigned int size in Java implementation. However, due to some limitations, it can not be done. We could write a footer(without encrypted) size larger than Int.MaxValue bytes, while we can not read it due to some limitations in thrift.

java.io.IOException: can not read class org.apache.parquet.format.FileMetaData: MaxMessageSize reached

	at org.apache.parquet.format.Util.read(Util.java:393)
	at org.apache.parquet.format.Util.readFileMetaData(Util.java:156)
	at org.apache.parquet.format.converter.ParquetMetadataConverter$3.visit(ParquetMetadataConverter.java:1545)
	at org.apache.parquet.format.converter.ParquetMetadataConverter$3.visit(ParquetMetadataConverter.java:1542)
	at org.apache.parquet.format.converter.ParquetMetadataConverter$NoFilter.accept(ParquetMetadataConverter.java:1242)
	at org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1542)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:637)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:571)
Caused by: org.apache.thrift.transport.TTransportException: MaxMessageSize reached
	at org.apache.thrift.transport.TEndpointTransport.checkReadBytesAvailable(TEndpointTransport.java:94)
	at org.apache.thrift.protocol.TCompactProtocol.checkStringReadLength(TCompactProtocol.java:726)
	at org.apache.thrift.protocol.TCompactProtocol.readString(TCompactProtocol.java:675)
	at org.apache.parquet.format.InterningProtocol.readString(InterningProtocol.java:269)
	at org.apache.parquet.format.KeyValue$KeyValueStandardScheme.read(KeyValue.java:415)
	at org.apache.parquet.format.KeyValue$KeyValueStandardScheme.read(KeyValue.java:392)
	at org.apache.parquet.format.KeyValue.read(KeyValue.java:327)
	at org.apache.parquet.format.FileMetaData$FileMetaDataStandardScheme.read(FileMetaData.java:1338)

Since it is a rare condition. Here we just fail fast with an exception.

cc @wgtmac

@@ -589,7 +589,12 @@ private static final ParquetMetadata readFooter(
long fileMetadataLengthIndex = fileLen - magic.length - FOOTER_LENGTH_SIZE;
LOG.debug("reading footer index at {}", fileMetadataLengthIndex);
f.seek(fileMetadataLengthIndex);
int fileMetadataLength = readIntLittleEndian(f);
long readFileMetadataLength = readIntLittleEndian(f) & 0xFFFFFFFFL;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read as a unsigned int

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant