-
Notifications
You must be signed in to change notification settings - Fork 9
[wip] align hub with log4j #100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe changes in this pull request involve updates to the Changes
Possibly related PRs
Suggested reviewers
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
4c84914 to
48b5fba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Outside diff range and nitpick comments (3)
hub/conf/log4j2.properties (2)
9-9: Consider environment-specific logging patternsThe current console pattern includes full stacktraces which might be excessive for console output. Consider:
- Using shorter stacktrace length for console
- Adding environment-specific configurations
-appender.console.layout.pattern = %highlight{%d{ISO8601} [%thread] %-5level %logger{36} - %msg%n%throwable{10}} +appender.console.layout.pattern = %highlight{%d{ISO8601} [%thread] %-5level %logger{36} - %msg%n%throwable{3}} + +# Add environment-specific filter +appender.console.filter.env.type = ScriptFilter +appender.console.filter.env.script = ${sys:ENV:-dev}.equalsIgnoreCase('prod') || level >= INFOAlso applies to: 10-10, 11-11
59-63: Consider adding dedicated error appenderFor better monitoring and alerting capabilities, consider adding a dedicated appender for ERROR and FATAL level messages.
+# Error appender for critical issues +appender.error.type = RollingFile +appender.error.name = ERROR +appender.error.fileName = ${LOG_DIR}/chronon-hub-error.log +appender.error.filePattern = ${LOG_DIR}/chronon-hub-error.%d{yyyy-MM-dd}-%i.log.gz +appender.error.layout.type = PatternLayout +appender.error.layout.pattern = %d{ISO8601} [%thread] %-5level %logger{36} - %msg%n%throwable +appender.error.filter.threshold.type = ThresholdFilter +appender.error.filter.threshold.level = ERROR + rootLogger.level = INFO -rootLogger.appenderRefs = asyncFile, asyncStdout +rootLogger.appenderRefs = asyncFile, asyncStdout, error rootLogger.appenderRef.asyncFile.ref = ASYNCFILE rootLogger.appenderRef.asyncStdout.ref = ASYNCSTDOUT +rootLogger.appenderRef.error.ref = ERRORbuild.sbt (1)
259-259: Document logging architecture changesThe transition to log4j2 represents a significant architectural change. Consider:
- Documenting the logging strategy in a central location
- Creating a logging guide for developers
- Adding logging configuration templates for different environments
- Setting up log aggregation and monitoring strategy
Would you like help in creating:
- A logging architecture document?
- Log4j2 configuration templates for different environments?
- A migration guide for developers?
Also applies to: 266-266
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
build.sbt(2 hunks)hub/conf/log4j2.properties(1 hunks)
🔇 Additional comments (1)
build.sbt (1)
70-76: Verify logging functionality after excluding log4j-1.2-api
The exclusion of the legacy log4j-1.2-api from Spark dependencies is a good practice. However, since this is a significant change to the logging infrastructure, we should ensure it doesn't break existing logging functionality.
Let's check for any direct usage of log4j 1.2 API in the codebase:
✅ Verification successful
Excluding log4j-1.2-api is safe and aligns with proper Log4j2 migration
The codebase shows a proper migration path from Log4j 1.x to Log4j2:
- Log4j2 configurations are present in key components (hub, spark, demo)
- The configurations use modern Log4j2 syntax and features (async logging, proper patterns)
- While there's a legacy log4j.properties in online/src/main/resources, it's only used in test scripts for Spark configuration and doesn't affect the main application logging
The exclusion of log4j-1.2-api is safe as the codebase has already adopted Log4j2 configurations. The test scripts that use Log4j 1.x syntax are isolated to Spark configuration and won't be affected by this dependency change.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any direct usage of log4j 1.2 API
rg -l "org\.apache\.log4j\."
# Search for log4j 1.x style configuration files
fd "log4j\.properties|log4j\.xml"
Length of output: 241
Script:
#!/bin/bash
# Check the content of log4j.properties to understand the logging configuration
cat online/src/main/resources/log4j.properties
# Check the spark scripts to understand how log4j is being used
cat api/py/test/sample/scripts/spark_submit.sh
cat api/py/test/sample/scripts/spark_streaming.sh
# Search for any Log4j2 configuration files that might be replacing the old config
fd "log4j2\.properties|log4j2\.xml|log4j2\.yaml|log4j2\.yml"
Length of output: 7920
Script:
#!/bin/bash
# Check the content of Log4j2 configuration files to verify proper migration
cat docker-init/demo/log4j2.properties
cat hub/conf/log4j2.properties
cat spark/src/main/resources/log4j2.properties
# Search for any direct code references to Log4j2 API to confirm migration
rg -l "org\.apache\.logging\.log4j\."
Length of output: 3696
| appender.file.type = RollingFile | ||
| appender.file.name = FILE | ||
| appender.file.fileName = ${LOG_DIR}/chronon-hub.log | ||
| appender.file.filePattern = ${LOG_DIR}/chronon-hub.%d{yyyy-dd-MM}-%i.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix date format and consider compression for rolled files
There are two issues with the rolling file configuration:
- The date pattern
yyyy-dd-MMis incorrect (day before month) - No compression is configured for rolled files, which could save significant disk space
-appender.file.filePattern = ${LOG_DIR}/chronon-hub.%d{yyyy-dd-MM}-%i.log
+appender.file.filePattern = ${LOG_DIR}/chronon-hub.%d{yyyy-MM-dd}-%i.log.gz
appender.file.strategy.type = DefaultRolloverStrategy
appender.file.strategy.max = 30
-appender.file.strategy.totalSizeCap = 10GB
+appender.file.strategy.totalSizeCap = 10GB
+appender.file.strategy.compressionLevel = 9Also applies to: 24-24, 25-25, 26-26, 27-27
| appender.asyncFile.name = ASYNCFILE | ||
| appender.asyncFile.appenderRefs = fileRef | ||
| appender.asyncFile.appenderRef.fileRef.ref = FILE | ||
| appender.asyncFile.queueSize = 512 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjust async configuration to prevent memory issues
The current async configuration could lead to memory issues:
- Zero discard threshold with non-blocking mode could cause OutOfMemoryError
- Queue size might be insufficient for high-volume logging
appender.asyncFile.queueSize = 512
-appender.asyncFile.discardThreshold = 0
-appender.asyncFile.blocking = false
+appender.asyncFile.discardThreshold = 256
+appender.asyncFile.blocking = true
appender.asyncStdout.queueSize = 512
-appender.asyncStdout.discardThreshold = 0
-appender.asyncStdout.blocking = false
+appender.asyncStdout.discardThreshold = 256
+appender.asyncStdout.blocking = trueAlso applies to: 37-37, 38-38, 45-45, 46-46, 47-47
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scary, if true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nikhil-zlai, the concern is that configuring the async appenders with blocking = false and discardThreshold = 0 can lead to unbounded queue growth under heavy logging load. This can potentially cause an OutOfMemoryError because the queue will keep growing if the logging rate exceeds the appender's capacity to process logs. By setting blocking = true, the application will block when the queue is full, which helps prevent memory issues. Adjusting discardThreshold to a higher value allows for controlled discarding of log events when the queue reaches certain thresholds.
| appender.asyncStdout.blocking = false | ||
|
|
||
| # Logger for "play" with INFO level | ||
| logger.play.name = play |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Prevent duplicate logging by adjusting additivity
Setting additivity=true for these loggers will cause duplicate log entries as messages will be logged by both the specific logger and the root logger.
logger.play.name = play
logger.play.level = INFO
-logger.play.additivity = true
+logger.play.additivity = false
logger.htmlunit.name = com.gargoylesoftware.htmlunit.javascript
logger.htmlunit.level = OFF
-logger.htmlunit.additivity = true
+logger.htmlunit.additivity = falseAlso applies to: 51-51, 52-52, 55-55, 56-56, 57-57
| # Define property for log directory | ||
| property.LOG_DIR=${LOG_DIR:-/tmp/chronon/logs} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Avoid using /tmp for default log directory
Using /tmp as the default log directory is risky as:
- Logs will be lost on system restart
/tmpmight have limited space- Not suitable for production environments
Consider using a more appropriate default path:
-property.LOG_DIR=${LOG_DIR:-/tmp/chronon/logs}
+property.LOG_DIR=${LOG_DIR:-/var/log/chronon}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Define property for log directory | |
| property.LOG_DIR=${LOG_DIR:-/tmp/chronon/logs} | |
| # Define property for log directory | |
| property.LOG_DIR=${LOG_DIR:-/var/log/chronon} |
| # Define property for log directory | ||
| property.LOG_DIR=${LOG_DIR:-/tmp/chronon/logs} | ||
|
|
||
| # Appender: Console (STDOUT) with INFO filter and custom pattern | ||
| appender.console.type = Console | ||
| appender.console.name = STDOUT | ||
| appender.console.target = SYSTEM_OUT | ||
| appender.console.layout.type = PatternLayout | ||
| appender.console.layout.pattern = %highlight{%d{ISO8601} [%thread] %-5level %logger{36} - %msg%n%throwable{10}} | ||
| appender.console.filter.threshold.type = ThresholdFilter | ||
| appender.console.filter.threshold.level = INFO | ||
|
|
||
| #Rolling File Appender | ||
| appender.file.type = RollingFile | ||
| appender.file.name = FILE | ||
| appender.file.fileName = ${LOG_DIR}/chronon-hub.log | ||
| appender.file.filePattern = ${LOG_DIR}/chronon-hub.%d{yyyy-dd-MM}-%i.log | ||
| appender.file.layout.type = PatternLayout | ||
| appender.file.layout.pattern = %d{ISO8601} [%thread] %-5level %logger{36} - %msg%n%throwable | ||
| appender.file.policies.type = Policies | ||
| appender.file.policies.time.type = TimeBasedTriggeringPolicy | ||
| appender.file.policies.time.interval = 1 | ||
| appender.file.policies.size.type = SizeBasedTriggeringPolicy | ||
| appender.file.policies.size.size = 100MB | ||
| appender.file.strategy.type = DefaultRolloverStrategy | ||
| appender.file.strategy.max = 30 | ||
| appender.file.strategy.totalSizeCap = 10GB | ||
| appender.file.filter.threshold.type = ThresholdFilter | ||
| appender.file.filter.threshold.level = INFO | ||
|
|
||
| # Async Appender for File | ||
| appender.asyncFile.type = Async | ||
| appender.asyncFile.name = ASYNCFILE | ||
| appender.asyncFile.appenderRefs = fileRef | ||
| appender.asyncFile.appenderRef.fileRef.ref = FILE | ||
| appender.asyncFile.queueSize = 512 | ||
| appender.asyncFile.discardThreshold = 0 | ||
| appender.asyncFile.blocking = false | ||
|
|
||
| # Async Appender for Console (STDOUT) | ||
| appender.asyncStdout.type = Async | ||
| appender.asyncStdout.name = ASYNCSTDOUT | ||
| appender.asyncStdout.appenderRefs = consoleRef | ||
| appender.asyncStdout.appenderRef.consoleRef.ref = STDOUT | ||
| appender.asyncStdout.queueSize = 512 | ||
| appender.asyncStdout.discardThreshold = 0 | ||
| appender.asyncStdout.blocking = false | ||
|
|
||
| # Logger for "play" with INFO level | ||
| logger.play.name = play | ||
| logger.play.level = INFO | ||
| logger.play.additivity = true | ||
|
|
||
| # Logger for "com.gargoylesoftware.htmlunit.javascript" with OFF level | ||
| logger.htmlunit.name = com.gargoylesoftware.htmlunit.javascript | ||
| logger.htmlunit.level = OFF | ||
| logger.htmlunit.additivity = true | ||
|
|
||
| # Root Logger | ||
| rootLogger.level = INFO | ||
| rootLogger.appenderRefs = asyncFile, asyncStdout | ||
| rootLogger.appenderRef.asyncFile.ref = ASYNCFILE | ||
| rootLogger.appenderRef.asyncStdout.ref = ASYNCSTDOUT No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Log4j2 configuration needs alignment with existing logging setup
The new log4j2.properties configuration requires adjustments to align with the existing logging infrastructure:
- The repository already has a logback.xml in the same directory (hub/conf/) with nearly identical configuration, suggesting a potential conflict or migration in progress
- Another log4j configuration exists in online/src/main/resources/ using the older log4j 1.x format
- The new log4j2.properties duplicates the functionality of logback.xml (same log directory, pattern, levels, and async configuration)
Recommendations:
- Clarify whether this is intended to replace logback.xml, as having both configurations may cause conflicts
- If replacing, ensure all dependencies are updated to use log4j2 instead of logback
- If not replacing, this file should be removed to prevent confusion
- Consider standardizing on a single logging framework across all components
🔗 Analysis chain
Verify logging configuration alignment
Let's ensure this configuration aligns with existing logging standards and requirements across the codebase.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for other log4j2 configurations to ensure consistency
fd -e properties -e xml -e yaml -e yml -x grep -l "log4j" {} \;
# Search for logging-related environment variables
rg -l "LOG_DIR|log\.dir|logging\.dir"
# Check for any existing logging documentation
fd -e md -e txt -x grep -l "logging" {} \;
Length of output: 619
Script:
#!/bin/bash
# Check content of the other log4j configuration
cat ./online/src/main/resources/log4j.properties
# Check content of logback configuration
cat hub/conf/logback.xml
# Check if there are any logging-related environment variables in docker files
rg -l "LOG_DIR" --type docker
# Look for logging setup in documentation
rg "log.*configuration|logging.*setup" ./docs/source/setup/
Length of output: 3220
| "org.scala-lang.modules" %% "scala-parser-combinators" % "2.3.0", | ||
| "org.scala-lang.modules" %% "scala-java8-compat" % "1.0.2" | ||
| ), | ||
| libraryDependencies ++= log4j2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Both log4j2 and logback configurations present - potential conflict
The codebase currently has both logging implementations configured:
- log4j2 is added as a dependency with proper configuration in
hub/conf/log4j2.properties - logback is still present with configuration in
hub/conf/logback.xml
This dual configuration could lead to unexpected behavior since:
- Both logging implementations are present without explicit exclusion of logback
- The Play framework's default logging might conflict with log4j2
- The presence of both
logback.xmlandlog4j2.propertiesmay cause confusion about which configuration is active
Recommendations:
- Either remove
logback.xmland its dependencies if migrating to log4j2 - Or remove log4j2 if staying with Play's default logging
- Add explicit exclusion for the unused logging implementation
🔗 Analysis chain
Verify Play Framework logging after logback exclusion
While adding log4j2 and excluding logback helps standardize logging across the project, Play Framework typically uses logback by default. Please ensure:
- Play's logging functionality continues to work as expected
- All necessary log4j2 configuration is in place
Let's verify the logging configuration:
Also applies to: 266-266
Let me gather more information about the logging configuration and dependencies to better understand the setup.
Let me check one more thing to understand how log4j2 is defined and if there are any exclusions for logback.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for log4j2 configuration files
fd "log4j2\.properties|log4j2\.xml|log4j2\.yaml|log4j2\.yml" hub/
# Check for any remaining logback configuration
fd "logback.*\.xml" hub/
# Search for any logback-specific code
rg "ch\.qos\.logback\." hub/
Length of output: 947
Script:
#!/bin/bash
# Check the content of log4j2.properties
cat hub/conf/log4j2.properties
# Search for logging-related dependencies in build files
rg "libraryDependencies.*log" build.sbt
# Search for Play logging configuration in application.conf
rg "logger|logging" hub/conf/application.conf
Length of output: 2651
Script:
#!/bin/bash
# Check for log4j2 and logback dependency definitions
rg "val.*log4j2" build.sbt -A 5
rg "exclude.*logback" build.sbt
Length of output: 325
| @@ -0,0 +1,63 @@ | |||
| # Define property for log directory | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd propose dropping this - we already have a logback.xml that configures this stuff. We're also thinking of swapping Play for Vertx once we're past the monitoring poc so we can leave things as-is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only reason I'm attempting this is so we can align on logging frameworks across the board (and also so that we don't have to write xml lol) - happy to drop this and just ignore investment work for the PoC for now too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark recommended one is log4j2, right? if so, i think we should convert all logback xml to this.
also +1 on not having to write xml
|
lgtm, but will let piyush stamp |
Summary
Checklist
Summary by CodeRabbit
New Features
Bug Fixes
Chores