-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-33504][CORE] The application log in the Spark history server contains sensitive attributes should be redacted #30446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Redaction support looks kind of inconsistent. The current pr takes the approach of But then, all of these was last changed 4 years or so back (#15971). |
|
ok to test |
core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
Outdated
Show resolved
Hide resolved
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #131441 has finished for PR 30446 at commit
|
|
Test build #131466 has finished for PR 30446 at commit
|
|
Test build #131467 has finished for PR 30446 at commit
|
|
Test build #131468 has finished for PR 30446 at commit
|
|
Test build #131469 has finished for PR 30446 at commit
|
|
Test build #131471 has finished for PR 30446 at commit
|
|
Test build #131473 has finished for PR 30446 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #131499 has finished for PR 30446 at commit
|
|
@mridulm thanks for your great advice, Please review it again when you have time. |
|
So both of the configs in the jira look like they are user generated configs. It sounds like you updated the spark.redaction.regex config but that didn't work for these events. It seems like we just missed these, so thanks for catching this. I think the current approach makes sense and matches our previous implementations. |
|
Test build #131591 has finished for PR 30446 at commit
|
|
@tgravescs The question I had was regarding when redaction is applied - while logging the event or when surfacing in UI/cli/etc. I am not sure if this was done intentionally (allowing superuser to look at the unredact properties - and so Thoughts ? Thanks. |
|
sorry for my delay, behind on reviews. I think we should redact from both ui/rest or from event logging as people may not realize event logging has the password and history server could allow viewing from people who shouldn't see those. If people want it differently I would say we add a config to allow it but have it off by default - but I would wait until someone requests this. I think it was an oversight that these events weren't redacted. so if you see other cases we aren't we should fix them. |
|
Thanks for clarifying Tom ! Sounds good to me. We can have follow up work to redact from other events as well - at event logging time. |
|
@mridulm @tgravescs thanks for your review! |
|
@mridulm Does it need someone else to continue reviewing, or can someone help me merge the code? thanks! |
|
I will merge this shortly |
…ontains sensitive attributes should be redacted ### What changes were proposed in this pull request? To make sure the sensitive attributes to be redacted in the history server log. ### Why are the changes needed? We found the secure attributes like password in SparkListenerJobStart and SparkListenerStageSubmitted events would not been redated, resulting in sensitive attributes can be viewd directly. The screenshot can be viewed in the attachment of JIRA spark-33504 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? muntual test works well, I have also added unit testcase. Closes #30446 from akiyamaneko/eventlog_unredact. Authored-by: neko <[email protected]> Signed-off-by: Thomas Graves <[email protected]> (cherry picked from commit 28dad1b) Signed-off-by: Thomas Graves <[email protected]>
|
thanks @akiyamaneko @mridulm merged to master and branch-3.0 |
|
Hi, @tgravescs . |
|
Could you take a look, please? I guess we need to recover the branch-3.0 first by reverting it and make a PR to |
|
yeah lets revert first |
|
#30576 is revert @akiyamaneko could you port this to branch-3.0 and put up a PR? |
…server contains sensitive attributes should be redacted" ### What changes were proposed in this pull request? Revert SPARK-33504 on branch-3.0 compilation error. Original PR #30446 This reverts commit e59179b. ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Closes #30576 from tgravescs/revert33504. Authored-by: Thomas Graves <[email protected]> Signed-off-by: Thomas Graves <[email protected]>
|
@akiyamaneko do you have time to back port? |
|
@tgravescs sorry for delay, Is there anything I can do? |
|
I don't think anyone back ported it yet so if you have time it would be great to put up a version again branch-3.0 |
…ver contains sensitive attributes should be redacted ### What changes were proposed in this pull request? To make sure the sensitive attributes to be redacted in the history server log. This is the backport of original PR #30446. ### Why are the changes needed? We found the secure attributes like password in SparkListenerJobStart and SparkListenerStageSubmitted events would not been redated, resulting in sensitive attributes can be viewd directly. The screenshot can be viewed in the attachment of JIRA Spark-33504 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test. Closes #31631 from viirya/SPARK-33504-3.0. Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: Liang-Chi Hsieh <[email protected]>
…ver contains sensitive attributes should be redacted ### What changes were proposed in this pull request? To make sure the sensitive attributes to be redacted in the history server log. This is the backport of original PR apache#30446. ### Why are the changes needed? We found the secure attributes like password in SparkListenerJobStart and SparkListenerStageSubmitted events would not been redated, resulting in sensitive attributes can be viewd directly. The screenshot can be viewed in the attachment of JIRA Spark-33504 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test. Closes apache#31631 from viirya/SPARK-33504-3.0. Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: Liang-Chi Hsieh <[email protected]>
What changes were proposed in this pull request?
To make sure the sensitive attributes to be redacted in the history server log.
Why are the changes needed?
We found the secure attributes like password in SparkListenerJobStart and SparkListenerStageSubmitted events would not been redated, resulting in sensitive attributes can be viewd directly.
The screenshot can be viewed in the attachment of JIRA spark-33504
Does this PR introduce any user-facing change?
no
How was this patch tested?
muntual test works well, I have also added unit testcase.