Skip to content

Conversation

@hvanhovell
Copy link
Contributor

What changes were proposed in this pull request?

The explain output of DataSourceScanExec can contain sensitive information (like Amazon keys). Such information should not end up in logs, or be exposed to non privileged users.

This PR addresses this by adding a redaction facility for the DataSourceScanExec.treeString. A user can enable this by setting a regex in the spark.redaction.string.regex configuration.

How was this patch tested?

Added a unit test to check the output of DataSourceScanExec.

@hvanhovell
Copy link
Contributor Author

cc @liufengdb

@rxin
Copy link
Contributor

rxin commented Mar 23, 2017

LGTM

@SparkQA
Copy link

SparkQA commented Mar 23, 2017

Test build #75093 has finished for PR 17397 at commit 2648ce5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liufengdb
Copy link

lgtm


/**
* Shorthand for calling redactString() without specifying redacting rules
* */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: * */ -> */

@gatorsmile
Copy link
Member

gatorsmile commented Mar 23, 2017

Maybe we can verify the user input or capture the exception? If users input incorrect value, it could get a confusing error message later.

Unclosed group near index 19
spark-~!@#$%^^&*(12
                   ^
java.util.regex.PatternSyntaxException: Unclosed group near index 19
spark-~!@#$%^^&*(12
                   ^
	at java.util.regex.Pattern.error(Pattern.java:1955)
	at java.util.regex.Pattern.accept(Pattern.java:1813)
	at java.util.regex.Pattern.group0(Pattern.java:2908)
	at java.util.regex.Pattern.sequence(Pattern.java:2051)
	at java.util.regex.Pattern.expr(Pattern.java:1996)
	at java.util.regex.Pattern.compile(Pattern.java:1696)
	at java.util.regex.Pattern.<init>(Pattern.java:1351)
	at java.util.regex.Pattern.compile(Pattern.java:1028)
	at scala.util.matching.Regex.<init>(Regex.scala:191)
	at scala.collection.immutable.StringLike$class.r(StringLike.scala:255)
	at scala.collection.immutable.StringOps.r(StringOps.scala:29)
	at scala.collection.immutable.StringLike$class.r(StringLike.scala:244)
	at scala.collection.immutable.StringOps.r(StringOps.scala:29)
	at org.apache.spark.util.Utils$.redact(Utils.scala:2605)
	at org.apache.spark.sql.execution.DataSourceScanExec$class.org$apache$spark$sql$execution$DataSourceScanExec$$redact(DataSourceScanExec.scala:69)
	at org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$3.apply(DataSourceScanExec.scala:53)
	at org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$3.apply(DataSourceScanExec.scala:51)

@SparkQA
Copy link

SparkQA commented Mar 23, 2017

Test build #75100 has finished for PR 17397 at commit 2c160ea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! @hvanhovell Could we also output the conf name?

spark-[{. is not a valid regex
java.lang.IllegalArgumentException: spark-[{. is not a valid regex
	at org.apache.spark.internal.config.ConfigHelpers$.regexFromString(ConfigBuilder.scala:74)
	at org.apache.spark.internal.config.ConfigBuilder$$anonfun$regexConf$1.apply(ConfigBuilder.scala:228)
	at org.apache.spark.internal.config.ConfigBuilder$$anonfun$regexConf$1.apply(ConfigBuilder.scala:228)
	at scala.Option.map(Option.scala:146)
	at org.apache.spark.internal.config.ConfigEntryWithDefault.readFrom(ConfigEntry.scala:79)
	at org.apache.spark.SparkConf.get(SparkConf.scala:259)
	at org.apache.spark.util.Utils$.redact(Utils.scala:2604)
	at org.apache.spark.sql.execution.DataSourceScanExec$class.org$apache$spark$sql$execution$DataSourceScanExec$$redact(DataSourceScanExec.scala:69)

@SparkQA
Copy link

SparkQA commented Mar 24, 2017

Test build #75166 has finished for PR 17397 at commit cff62f6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM pending Jenkins

@SparkQA
Copy link

SparkQA commented Mar 24, 2017

Test build #75174 has finished for PR 17397 at commit 73cd8c4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merging to master.

@asfgit asfgit closed this in 91fa80f Mar 24, 2017
@ueshin
Copy link
Member

ueshin commented Mar 24, 2017

@hvanhovell This seems to have broken the 2.10 build:

[error] /home/jenkins/workspace/spark-master-compile-maven-scala-2.10/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala:228: value regex is not a member of scala.util.matching.Regex
[error]     new TypedConfigBuilder(this, regexFromString(_, this.key), _.regex)
[error]                                                                  ^

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-scala-2.10/4004/console

@hvanhovell
Copy link
Contributor Author

@ueshin here you go: #17420

@ueshin
Copy link
Member

ueshin commented Mar 25, 2017

I see, thanks!

@ueshin
Copy link
Member

ueshin commented Mar 27, 2017

@gatorsmile
Copy link
Member

Sorry, I did not catch it. Let me submit a follow-up to fix this test case issue.

@gatorsmile
Copy link
Member

Submitted the PR #17448. Please check whether the fix is appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants