-
Notifications
You must be signed in to change notification settings - Fork 1.5k
PARQUET-1761: Lower Logging Level in ParquetOutputFormat #745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
+1 (non-binding) this seems sensible |
gszadovszky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is always a tough question to decide the correct log level. If we use DEBUG it may make the investigation of a problem really hard if the issue is not practically reproducible. Meanwhile, I am not sure if these lines can be crucial in such cases. In case of file size issues the blockSize and maxPaddingSize can be interesting.
What do you think?
|
@gszadovszky For me, since the properties are passed in, it is on the caller to log them in the client code before passing to Parquet. The DEBUG logging allows the user to validate. Anyway, I have proposed simplifying the logging into a single statement (to reduce SPAM in the logs) and to debug only the large properties block that is unwieldy, and introduces new-lines which can make parsing the logs tricky (breaks the normal pattern of one log message per line). |
gszadovszky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've written that some data might worth logging at info level because block size might be used automatically based on the file system instead of the configuration. I've checked the code and unfortunately, at this point this is the value from the config. So, I am fine with both ways of having all these values logged at DEBUG level or the current solution. (Sorry for the confusion.)
|
@gszadovszky Let us start with the current change and we can re-visit later if this is still too verbose. |
* 'master' of https://github.com/apache/parquet-mr: (222 commits) PARQUET-2052: Integer overflow when writing huge binary using dictionary encoding (apache#910) PARQUET-2041: Add zstd to `parquet.compression` description of ParquetOutputFormat Javadoc (apache#899) PARQUET-2050: Expose repetition & definition level from ColumnIO (apache#908) PARQUET-1761: Lower Logging Level in ParquetOutputFormat (apache#745) PARQUET-2046: Upgrade Apache POM to 23 (apache#904) PARQUET-2048: Deprecate BaseRecordReader (apache#906) PARQUET-1922: Deprecate IOExceptionUtils (apache#825) PARQUET-2037: Write INT96 with parquet-avro (apache#901) PARQUET-2044: Enable ZSTD buffer pool by default (apache#903) PARQUET-2038: Upgrade Jackson version used in parquet encryption. (apache#898) Revert "[WIP] Refactor GroupReadSupport to unuse deprecated api (apache#894)" PARQUET-2027: Fix calculating directory offset for merge (apache#896) [WIP] Refactor GroupReadSupport to unuse deprecated api (apache#894) PARQUET-2030: Expose page size row check configurations to ParquetWriter.Builder (apache#895) PARQUET-2031: Upgrade to parquet-format 2.9.0 (apache#897) PARQUET-1448: Review of ParquetFileReader (apache#892) PARQUET-2020: Remove deprecated modules (apache#888) PARQUET-2025: Update Snappy version to 1.1.8.3 (apache#893) PARQUET-2022: ZstdDecompressorStream should close `zstdInputStream` (apache#889) PARQUET-1982: Random access to row groups in ParquetFileReader (apache#871) ... # Conflicts: # parquet-column/src/main/java/org/apache/parquet/example/data/simple/SimpleGroup.java # parquet-hadoop/pom.xml # parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java # parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java
Make sure you have checked all steps below.
Jira
Tests
Commits
Documentation