Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Dec 14, 2021

What changes were proposed in this pull request?

This patch proposes to migrate from log4j1 to log4j2 in Spark.

Why are the changes needed?

Apache Spark uses log4j 1.x, so hence is not susceptible to the recent log4j v2 vulnerability. However, log4j 1.x has reached end of life and is no longer supported. Vulnerabilities reported after August 2015 against log4j 1.x were not checked and will not be fixed. As a result, we are recommended to upgrade Spark to Log4j 2 as soon as possible.

Does this PR introduce any user-facing change?

Yes. Users may need to rewrite their log4j properties file for log4j2. As of version 2.4, log4j now supports configuration via properties files. Note that the property syntax is not the same as the syntax used in log4j 1, but during the migration I found the syntax is pretty close so the migration should be straightforward.

How was this patch tested?

Passed all existing tests.

@SparkQA

This comment has been minimized.

@viirya viirya marked this pull request as draft December 14, 2021 09:41
@SparkQA
Copy link

SparkQA commented Dec 14, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50650/

@SparkQA
Copy link

SparkQA commented Dec 14, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50651/

@SparkQA
Copy link

SparkQA commented Dec 14, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50650/

Copy link

@laychengtan1401 laychengtan1401 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log4j 2.12 still expose to vulnerability unless change to 2.15

@SparkQA
Copy link

SparkQA commented Dec 14, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50651/

pom.xml Outdated
<sbt.project.name>spark</sbt.project.name>
<slf4j.version>1.7.30</slf4j.version>
<log4j.version>1.2.17</log4j.version>
<log4j.version>2.12.1</log4j.version>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump to 2.16.0 please

@SparkQA
Copy link

SparkQA commented Dec 14, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50665/

@SparkQA
Copy link

SparkQA commented Dec 14, 2021

Test build #146178 has finished for PR 34895 at commit 0d7bd5d.

  • This patch fails from timeout after a configured wait of 500m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 14, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50667/

@SparkQA
Copy link

SparkQA commented Dec 14, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50665/

@SparkQA
Copy link

SparkQA commented Dec 14, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50667/

@SparkQA
Copy link

SparkQA commented Dec 18, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50820/

@SparkQA
Copy link

SparkQA commented Dec 18, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50821/

@SparkQA
Copy link

SparkQA commented Dec 18, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50820/

@SparkQA
Copy link

SparkQA commented Dec 18, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50821/

@dongjoon-hyun
Copy link
Member

catalyst & hive-thriftserer GitHub Action job passed!

@SparkQA
Copy link

SparkQA commented Dec 18, 2021

Test build #146346 has finished for PR 34895 at commit 2aaed30.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to master for Apache Spark 3.3.

@dongjoon-hyun
Copy link
Member

Thank you, @viirya , @dbtsai , @srowen , @laychengtan1401 , @ChethanGB , @FranciscoBorges , @enigmatic00 and all.

@viirya
Copy link
Member Author

viirya commented Dec 18, 2021

Thank you @dongjoon-hyun and all. The log4j1 properties files are useless now. I will open another PR to remove them.

@SparkQA
Copy link

SparkQA commented Dec 18, 2021

Test build #146349 has finished for PR 34895 at commit 67e5bae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public class RocksDB implements KVStore
  • public static class TypeAliases
  • class RocksDBIterator<T> implements KVStoreIterator<T>
  • class RocksDBTypeInfo
  • class Index
  • (defaultdict(<class 'list'>,
  • class ExecutorRollPlugin extends SparkPlugin
  • class ExecutorRollDriverPlugin extends DriverPlugin with Logging
  • case class RegrCount(left: Expression, right: Expression)
  • case class MapContainsKey(
  • case class AesEncrypt(
  • case class AesDecrypt(

@sunchao
Copy link
Member

sunchao commented Dec 18, 2021

Awesome job @viirya !

Looks like there is another CVE in 2.16: https://logging.apache.org/log4j/2.x/security.html. Should we upgrade to 2.17?

@viirya
Copy link
Member Author

viirya commented Dec 18, 2021

@sunchao yea, I will upgrade to 2.17.

public static class LogAppender extends AbstractAppender {

protected LogAppender(String name,
Filter filter,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent seems fine here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

// If Log4j 1.2 is being used, but is not initialized, load a default properties file
if (Logging.isLog4j12()) {
val log4j12Initialized = LogManager.getRootLogger.getAllAppenders.hasMoreElements
if (!Logging.isLog4j12()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code guarded by if(!Logging.isLog4j12()) only works with Log4J2 and not other SLF4J bindings (such as Logback). I think it might be a bit clearer and slightly more correct to replace this with a positive if(Logging.isLog4j2()) check which checks that the Log4J2 binding is in use (e.g. that the binder class is org.apache.logging.slf4j.Log4jLoggerFactory).

Similarly, I think log4j12Initialized should be renamed to log4j2Initialized (since we're not using Log4J 1.2 in the branch where that variable is defined).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I will fix them.

import org.apache.spark.SparkFunSuite
import org.apache.spark.util.Utils

class LoggingSuite extends SparkFunSuite {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we need to delete this suite? I think this suite's removal means that we no longer have unit test coverage of SparkShellLoggingFilter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I think I removed it at the beginning of the attempt. I will add it back with other fixes regarding your other comments.

Comment on lines +64 to +66
val fa = Log4jFileAppender.createAppender(localLogFile, "false", "false",
DriverLogger.APPENDER_NAME, "true", "false", "false", "4000", layout, null,
"false", null, config);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My IDE says that this createAppender method is deprecated. Since the old 1. code was only setting the filename and layout, could this instead be

    val fa = Log4jFileAppender.newBuilder().withFileName(localLogFile).setLayout(layout).build()

or do we need to set some of those other arguments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can rewrite it using builder.

val logger = LogManager.getRootLogger().asInstanceOf[Logger]
val fa = logger.getAppenders.get(DriverLogger.APPENDER_NAME)
logger.removeAppender(fa)
fa.stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we omit this fa.stop() since we have a Utils.tryLogNonFatalError(fa.stop()) on the following line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay.

Comment on lines +2427 to +2430
val rootLogger = org.apache.logging.log4j.LogManager.getRootLogger()
.asInstanceOf[org.apache.logging.log4j.core.Logger]
rootLogger.setLevel(l)
rootLogger.get().setLevel(l)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know if this is the best way to programmatically change the root logger's level? I noticed that https://stackoverflow.com/questions/23434252/programmatically-change-log-level-in-log4j2 mentions several techniques which differ from the one used here.

I ask because I'm not very familiar with Log4J2 and I'm unsure if there's subtle pitfalls to avoid (e.g. will this work if we don't call updateLoggers()?).

If we wind up changing this code then we should also update similar code in SparkFunSuite.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is a good question. I think I feel the same regarding log4j2 API. This is the way I finally made it work. I will look if there is more proper way for it.

logger.setLevel(level.get)
logger.get().setLevel(level.get)
}
case _ =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since loggers is a Seq[Logger], is this branch ever reachable? Unless I'm mistaken, I think think we'll always hit the case logger: Logger case.

while (logger.getParent() != null) {
if (logger.getLevel != null || logger.getAllAppenders.hasMoreElements) {
return Filter.NEUTRAL
private class SparkShellLoggingFilter extends Filter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could potentially simplify this class by having it extend AbstractFilter instead of Filter:

From AbstractFilter's Javadoc:

/**
 * Users should extend this class to implement filters. Filters can be either context wide or attached to
 * an appender. A filter may choose to support being called only from the context or only from an appender in
 * which case it will only implement the required method(s). The rest will default to return {@link org.apache.logging.log4j.core.Filter.Result#NEUTRAL}.
 * <p>
 * Garbage-free note: the methods with unrolled varargs by default delegate to the
 * {@link #filter(Logger, Level, Marker, String, Object...) filter method with vararg parameters}.
 * Subclasses that want to be garbage-free should override these methods to implement the appropriate filtering
 * without creating a vararg array.
 * </p>
 */
public abstract class AbstractFilter extends AbstractLifeCycle implements Filter {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I will try to rewrite this with AbstractFilter. I think the functionality should be the same but yea it looks better to use AbstractFilter here.

Comment on lines +717 to 720
<!--
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove this commented-out block since it is no longer being used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, they were removed in a followup later.

HyukjinKwon pushed a commit that referenced this pull request Dec 24, 2021
### What changes were proposed in this pull request?

Change log level of tests to WARN

### Why are the changes needed?

After #34895, the log level of tests becomes either "INFO" or "DEBUG", this increases the log output file to over 100MB!
E.g https://github.com/gengliangwang/spark/runs/4623214509?check_suite_focus=true

This makes it difficult to find the CI test failures. Changing the log level of tests to WARN should help.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

It's just conf change.

Closes #35011 from gengliangwang/loglevel.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
@steveloughran
Copy link
Contributor

steveloughran commented Dec 30, 2021

Pretty bold timing. I would have expected private builds to simply ship with a patched version of log4j 1.17, as we do https://mvnrepository.com/artifact/log4j/log4j/1.2.17-cloudera1

Given all projects using log4j 2.x have had to do three emergency releases in the space of the week, you are potentially adding more work in the immediate future. in hadoop we are now looking at logback

important have you made sure any/ all transit to dependencies of log4j 1.x have been excluded, including its SLF4J binding library? there may be some transient dependencies waiting to cause problems

turboFei added a commit to apache/kyuubi that referenced this pull request Jan 26, 2022
<!--
Thanks for sending a pull request!

Here are some tips for you:
  1. If this is your first time, please read our contributor guidelines: https://kyuubi.readthedocs.io/en/latest/community/contributions.html
  2. If the PR is related to an issue in https://github.com/apache/incubator-kyuubi/issues, add '[KYUUBI #XXXX]' in your PR title, e.g., '[KYUUBI #XXXX] Your PR title ...'.
  3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][KYUUBI #XXXX] Your PR title ...'.
-->

### _Why are the changes needed?_
<!--
Please clarify why the changes are needed. For instance,
  1. If you add a feature, you can talk about the use case of it.
  2. If you fix a bug, you can clarify why it is a bug.
-->
This patch proposes to migrate from log4j1 to log4j2 in Kyuubi.
Refer the spark patch: apache/spark#34895

### Does this PR introduce any user-facing change?
Yes. Users may need to rewrite their log4j properties file for log4j2. As of version 2.4, log4j now supports configuration via properties files. Note that the property syntax is not the same as the syntax used in log4j 1, but during the migration I found the syntax is pretty close so the migration should be straightforward.

### _How was this patch tested?_
Passed all existing tests.

Closes #1769 from turboFei/log4j2.

Closes #1769

8613779 [Fei Wang] remove log4j dependencies from spark-sql-engine
b2fe6db [Fei Wang] Use String to present default log level
762e2d0 [Fei Wang] only remove org.apache.logging.log4j:log4j-slf4j-impl
8a91208 [Fei Wang] remove dependencies from spark-sql-engine
7e3a498 [Fei Wang] address commments
051f49f [Fei Wang] address comments
85316a0 [Fei Wang] Keep compatible with log4j12
01d1a84 [Fei Wang] for log4j1
b9e17e1 [Fei Wang] refactor
e24391e [Fei Wang] revert log count
3880300 [Fei Wang] add log4j2.properties.template
4f0b22f [Fei Wang] save
7ce8411 [Fei Wang] modify log level
1ea5ca5 [Fei Wang] add log4j to engine
c4a86d4 [Fei Wang] use AbstractFilter
27b08b6 [Fei Wang] remove more
8cc15ae [Fei Wang] reformat
c13ec29 [Fei Wang] save temporarily
33a38e2 [Fei Wang] exclude log4j12 from spark-sql
9129a64 [Fei Wang] refactor
5362b43 [Fei Wang] make it run at first
7f27f51 [Fei Wang] more
56f4f1f [Fei Wang] fix logging
a74b6d3 [Fei Wang] start appender
dea964a [Fei Wang] fix build erorr at first
e20b750 [Fei Wang] address comments
2ec02b4 [Fei Wang] fix LogDivertAppender
dded129 [Fei Wang] more
c63e000 [Fei Wang] add log4j2.properties

Authored-by: Fei Wang <[email protected]>
Signed-off-by: Fei Wang <[email protected]>
sumwale pushed a commit to TIBCOSoftware/snappy-spark that referenced this pull request Feb 9, 2022
This patch proposes to migrate from log4j1 to log4j2 in Spark.

Apache Spark uses log4j 1.x, so hence is not susceptible to the recent log4j v2 vulnerability. However, log4j 1.x has reached end of life and is no longer supported. Vulnerabilities reported after August 2015 against log4j 1.x were not checked and will not be fixed. As a result, we are recommended to upgrade Spark to Log4j 2 as soon as possible.

Yes. Users may need to rewrite their log4j properties file for log4j2. As of version 2.4, log4j now supports configuration via properties files. Note that the property syntax is not the same as the syntax used in log4j 1, but during the migration I found the syntax is pretty close so the migration should be straightforward.

Passed all existing tests.

Closes apache#34895 from viirya/log4j2.

Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
jiaoew1991 pushed a commit to Kyligence/spark that referenced this pull request Apr 2, 2022
This patch proposes to migrate from log4j1 to log4j2 in Spark.

Apache Spark uses log4j 1.x, so hence is not susceptible to the recent log4j v2 vulnerability. However, log4j 1.x has reached end of life and is no longer supported. Vulnerabilities reported after August 2015 against log4j 1.x were not checked and will not be fixed. As a result, we are recommended to upgrade Spark to Log4j 2 as soon as possible.

Yes. Users may need to rewrite their log4j properties file for log4j2. As of version 2.4, log4j now supports configuration via properties files. Note that the property syntax is not the same as the syntax used in log4j 1, but during the migration I found the syntax is pretty close so the migration should be straightforward.

Passed all existing tests.

Closes apache#34895 from viirya/log4j2.

Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
asiunov pushed a commit to ascend-io/spark that referenced this pull request Aug 25, 2022
### What changes were proposed in this pull request?

This patch proposes to migrate from log4j1 to log4j2 in Spark.

### Why are the changes needed?

Apache Spark uses log4j 1.x, so hence is not susceptible to the recent log4j v2 vulnerability. However, log4j 1.x has reached end of life and is no longer supported. Vulnerabilities reported after August 2015 against log4j 1.x were not checked and will not be fixed. As a result, we are recommended to upgrade Spark to Log4j 2 as soon as possible.

### Does this PR introduce _any_ user-facing change?

Yes. Users may need to rewrite their log4j properties file for log4j2. As of version 2.4, log4j now supports configuration via properties files. Note that the property syntax is not the same as the syntax used in log4j 1, but during the migration I found the syntax is pretty close so the migration should be straightforward.

### How was this patch tested?

Passed all existing tests.

Closes apache#34895 from viirya/log4j2.

Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit ca5cabe)

appender.console.type = Console
appender.console.name = STDOUT
appender.console.target = SYSTEM_OUT
Copy link
Contributor

@Kimahriman Kimahriman Sep 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random thing I just started noticing, the default logging now goes to stdout instead of stderr, was that intentional? The log4j2.properties.template uses stderr too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Set everything to be logged to the console
log4j.rootCategory=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

rootLogger.level = info
rootLogger.appenderRef.stdout.ref = STDOUT
appender.console.type = Console
appender.console.name = STDOUT
appender.console.target = SYSTEM_OUT
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n%ex

look different, WDYT? @viirya

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be when I copied across properties file and missed to change.

Copy link
Contributor

@LuciferYang LuciferYang Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

roughly scanned, seems only this place changes from stderr to stdout, #37854 try fix this

@viirya viirya deleted the log4j2 branch December 27, 2023 18:25
@viirya viirya restored the log4j2 branch December 27, 2023 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.