PARQUET-1761: Lower Logging Level in ParquetOutputFormat #745

belugabehr · 2020-01-12T16:10:20Z

Make sure you have checked all steps below.

Jira

My PR addresses the following PARQUET-1761 issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR"
- https://issues.apache.org/jira/browse/PARQUET-1761
- In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Tests

My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
1. Subject is separated from body by a blank line
2. Subject is limited to 50 characters (not including Jira issue reference)
3. Subject does not end with a period
4. Subject uses the imperative mood ("add", not "adding")
5. Body wraps at 72 characters
6. Body explains "what" and "why", not "how"

Documentation

In case of new functionality, my PR adds documentation that describes how to use it.
- All the public functions and the classes in the PR contain Javadoc that explain what it does

dossett · 2021-03-31T16:00:43Z

+1 (non-binding) this seems sensible

gszadovszky

It is always a tough question to decide the correct log level. If we use DEBUG it may make the investigation of a problem really hard if the issue is not practically reproducible. Meanwhile, I am not sure if these lines can be crucial in such cases. In case of file size issues the blockSize and maxPaddingSize can be interesting.
What do you think?

belugabehr · 2021-05-14T15:06:00Z

@gszadovszky For me, since the properties are passed in, it is on the caller to log them in the client code before passing to Parquet. The DEBUG logging allows the user to validate.

Anyway, I have proposed simplifying the logging into a single statement (to reduce SPAM in the logs) and to debug only the large properties block that is unwieldy, and introduces new-lines which can make parsing the logs tricky (breaks the normal pattern of one log message per line).

gszadovszky

I've written that some data might worth logging at info level because block size might be used automatically based on the file system instead of the configuration. I've checked the code and unfortunately, at this point this is the value from the config. So, I am fine with both ways of having all these values logged at DEBUG level or the current solution. (Sorry for the confusion.)

belugabehr · 2021-05-17T18:52:45Z

@gszadovszky Let us start with the current change and we can re-visit later if this is still too verbose.

* 'master' of https://github.com/apache/parquet-mr: (222 commits) PARQUET-2052: Integer overflow when writing huge binary using dictionary encoding (apache#910) PARQUET-2041: Add zstd to `parquet.compression` description of ParquetOutputFormat Javadoc (apache#899) PARQUET-2050: Expose repetition & definition level from ColumnIO (apache#908) PARQUET-1761: Lower Logging Level in ParquetOutputFormat (apache#745) PARQUET-2046: Upgrade Apache POM to 23 (apache#904) PARQUET-2048: Deprecate BaseRecordReader (apache#906) PARQUET-1922: Deprecate IOExceptionUtils (apache#825) PARQUET-2037: Write INT96 with parquet-avro (apache#901) PARQUET-2044: Enable ZSTD buffer pool by default (apache#903) PARQUET-2038: Upgrade Jackson version used in parquet encryption. (apache#898) Revert "[WIP] Refactor GroupReadSupport to unuse deprecated api (apache#894)" PARQUET-2027: Fix calculating directory offset for merge (apache#896) [WIP] Refactor GroupReadSupport to unuse deprecated api (apache#894) PARQUET-2030: Expose page size row check configurations to ParquetWriter.Builder (apache#895) PARQUET-2031: Upgrade to parquet-format 2.9.0 (apache#897) PARQUET-1448: Review of ParquetFileReader (apache#892) PARQUET-2020: Remove deprecated modules (apache#888) PARQUET-2025: Update Snappy version to 1.1.8.3 (apache#893) PARQUET-2022: ZstdDecompressorStream should close `zstdInputStream` (apache#889) PARQUET-1982: Random access to row groups in ParquetFileReader (apache#871) ... # Conflicts: # parquet-column/src/main/java/org/apache/parquet/example/data/simple/SimpleGroup.java # parquet-hadoop/pom.xml # parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java # parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java

PARQUET-1761: Lower Logging Level in ParquetOutputFormat

05e1c04

belugabehr force-pushed the PARQUET-1761 branch from 810fae1 to 05e1c04 Compare May 13, 2021 19:51

gszadovszky reviewed May 14, 2021

View reviewed changes

Debug log the entire properties structure

3e93183

gszadovszky approved these changes May 14, 2021

View reviewed changes

gszadovszky merged commit 875a4bb into apache:master May 18, 2021

asfimport mentioned this pull request Jun 23, 2024

Lower Logging Level in ParquetOutputFormat #2436

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PARQUET-1761: Lower Logging Level in ParquetOutputFormat #745

PARQUET-1761: Lower Logging Level in ParquetOutputFormat #745

Uh oh!

belugabehr commented Jan 12, 2020

Uh oh!

dossett commented Mar 31, 2021

Uh oh!

gszadovszky left a comment

Uh oh!

belugabehr commented May 14, 2021 •

edited

Loading

Uh oh!

gszadovszky left a comment

Uh oh!

belugabehr commented May 17, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PARQUET-1761: Lower Logging Level in ParquetOutputFormat #745

PARQUET-1761: Lower Logging Level in ParquetOutputFormat #745

Uh oh!

Conversation

belugabehr commented Jan 12, 2020

Jira

Tests

Commits

Documentation

Uh oh!

dossett commented Mar 31, 2021

Uh oh!

gszadovszky left a comment

Choose a reason for hiding this comment

Uh oh!

belugabehr commented May 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gszadovszky left a comment

Choose a reason for hiding this comment

Uh oh!

belugabehr commented May 17, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

belugabehr commented May 14, 2021 •

edited

Loading