Skip to content

Conversation

@SteNicholas
Copy link
Member

@SteNicholas SteNicholas commented Mar 27, 2024

What changes were proposed in this pull request?

Support Netty level logging at the network layer for Celeborn. To configure Netty level logging a LogHandler must be added to the channel pipeline. NettyLogger is introduced as a new class which is able to construct a log handler depending on the log level:

  • In case of <Logger name="org.apache.celeborn.common.network.util.NettyLogger" level="DEBUG" additivity="false">: a custom log handler is created which does not dump the message contents. This way the log is a bit more compact. Moreover when network level encryption is switched on this level might be sufficient.
  • In case of <Logger name="org.apache.celeborn.common.network.util.NettyLogger" level="TRACE" additivity="false">: Netty's own log handler is used which dumps the message contents.
  • Otherwise (when the logger is not TRACE or DEBUG) the pipeline does not contain a log handler (there is no runtime penalty for the default setting but a long running service must be restarted along with the new log level to have an effect).

Backport:

Why are the changes needed?

This level of logging proved to be sufficient during debugging some external shuffle related problem. Compared with the tcpdump this log lines can be more easily correlated with the Celeborn internal calls. Moreover the log layout can be configured to contain the thread names that way for a timeout a busy thread could be identified.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Local manually test.

@SteNicholas
Copy link
Member Author

SteNicholas commented Mar 27, 2024

cc @mridulm, @otterc, @FMX, @RexXiong, @waitinfuture.

@SteNicholas SteNicholas changed the title [CELEBORN-1359] Supporting Netty Logging at the network layer [CELEBORN-1359] Support Netty Logging at the network layer Mar 27, 2024
@pan3793
Copy link
Member

pan3793 commented Mar 27, 2024

Code change LGTM, please update the PR description, we are using LOG4J2 with log4j2.xml rahter than LOG4J1 with log4j.properties

log4j.logger.XXX

@SteNicholas
Copy link
Member Author

@pan3793, I have updated the description of this pull request. PTAL.

@SteNicholas SteNicholas requested a review from pan3793 March 27, 2024 16:35
Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this !
This was in my list here CELEBORN-1350 :-)

Copy link
Contributor

@otterc otterc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just have a small question

Copy link
Contributor

@RexXiong RexXiong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks! Merge to main(v0.5.0)

@RexXiong RexXiong closed this in 6fdeced Mar 28, 2024
CodingCat pushed a commit to CodingCat/incubator-celeborn that referenced this pull request Apr 1, 2024
### What changes were proposed in this pull request?

Support Netty level logging at the network layer for Celeborn. To configure Netty level logging a LogHandler must be added to the channel pipeline. `NettyLogger` is introduced as a new class which is able to construct a log handler depending on the log level:

- In case of `<Logger name="org.apache.celeborn.common.network.util.NettyLogger" level="DEBUG" additivity="false">`: a custom log handler is created which does not dump the message contents. This way the log is a bit more compact. Moreover when network level encryption is switched on this level might be sufficient.
- In case of `<Logger name="org.apache.celeborn.common.network.util.NettyLogger" level="TRACE" additivity="false">`: Netty's own log handler is used which dumps the message contents.
- Otherwise (when the logger is not `TRACE` or `DEBUG`) the pipeline does not contain a log handler (there is no runtime penalty for the default setting but a long running service must be restarted along with the new log level to have an effect).

Backport:

- [[SPARK-36719][CORE] Supporting Netty Logging at the network layer](apache/spark#33962)
- [[SPARK-45377][CORE] Handle InputStream in NettyLogger](apache/spark#43165)

### Why are the changes needed?

This level of logging proved to be sufficient during debugging some external shuffle related problem. Compared with the tcpdump this log lines can be more easily correlated with the Celeborn internal calls. Moreover the log layout can be configured to contain the thread names that way for a timeout a busy thread could be identified.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Local manually test.

Closes apache#2423 from SteNicholas/CELEBORN-1359.

Authored-by: SteNicholas <[email protected]>
Signed-off-by: Shuang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants