-
Notifications
You must be signed in to change notification settings - Fork 1.5k
PARQUET-2094: Handle negative values in page headers #933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
TravisCI runs contain multiple |
@plygrnd, yes this is expected. The test class TestCorruptThriftRecords verifies if the correct exceptions are thrown. It uses MR to execute the tests and it seems MR logs the exceptions before they are thrown to the caller. |
|
Okay, LGTM then. |
|
LGTM |
| * A specific IOException thrown when invalid values are found in the Parquet file metadata (including the footer, | ||
| * page header etc.). | ||
| */ | ||
| public static class InvalidParquetMetadataException extends IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this is an IOException. What is the value of making this a checked exception? Why not just make this a RuntimeException? Or use some existing one like IllegalStateException or ParquetDecodingException?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The module parquet-format-structures is the one all the others are depended on. Parquet exceptions are implemented in another module so I cannot use them here. Since we already throw IOExceptions I've felt extending it would be a good idea. But you might be right. I am happy to extend RuntimeException instead of IOException.
| return pageHeader; | ||
| } | ||
|
|
||
| private static <T> void validateValue(Predicate<? super T> validator, T value, String metaName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why accept a predicate? Most check methods like this use a boolean. I would expect it to work like a Precondition:
if (!isValid) {
throw new ParquetDecodingException(...);
}
int size = pageHeader.getCompressed_page_size()
validateValue(size >= 0, String.format("Compressed page size must be positive, but was: %s", size));There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why I've implemented this way. I'm fine rewriting to use a simple boolean.
| public static PageHeader readPageHeader(InputStream from, | ||
| BlockCipher.Decryptor decryptor, byte[] AAD) throws IOException { | ||
| return read(from, new PageHeader(), decryptor, AAD); | ||
| return validate(read(from, new PageHeader(), decryptor, AAD)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be more clear if you called MetadataValidator.validate rather than just validate.
| fail("Expected exception but did not thrown"); | ||
| } catch (InvalidParquetMetadataException e) { | ||
| assertTrue("Exception message does not contain the expected parts", | ||
| e.getMessage().contains("pageHeader.compressed_page_size")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there an assertion helper so you don't need to catch the exception? Something like assertThrows in the codebase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is something already implemented but in another module and I cannot use it here.
| * A specific RuntimeException thrown when invalid values are found in the Parquet file metadata (including the | ||
| * footer, page header etc.). | ||
| */ | ||
| public static class InvalidParquetMetadataException extends RuntimeException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: I'd prefer it if the exception weren't an inner class since that makes it harder to reference. But this isn't a blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a fair point anyway. I'll move it before merging. Thanks a lot for the review.
| * page header etc.). | ||
| */ | ||
| public class InvalidParquetMetadataException extends RuntimeException { | ||
| <T> InvalidParquetMetadataException(String message) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the type parameter for?
| import org.apache.parquet.format.Util.DefaultFileMetaDataConsumer; | ||
| import org.junit.Test; | ||
|
|
||
| import org.apache.parquet.format.Util.DefaultFileMetaDataConsumer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: unnecessary change.
(cherry picked from commit 1695d92)
(cherry picked from commit 1695d92)
|
@gszadovszky, thanks for getting this done. |
(cherry picked from commit 1695d92)
Make sure you have checked all steps below.
Jira
Tests
Commits
Documentation