[SPARK-31923][Core]Ignore internal accumulators that use unrecognized types rather than crashing#28744
Closed
zsxwing wants to merge 1 commit intoapache:masterfrom
Closed
[SPARK-31923][Core]Ignore internal accumulators that use unrecognized types rather than crashing#28744zsxwing wants to merge 1 commit intoapache:masterfrom
zsxwing wants to merge 1 commit intoapache:masterfrom
Conversation
|
Test build #123592 has finished for PR 28744 at commit
|
Member
Author
|
cc @tdas @cloud-fan |
Contributor
|
LGTM. We should backport this branch-3.0 as well as this is a good narrow bug fix with low risk. |
Member
Author
|
Thanks! I will also merge this to branch-2.4 for the same reason. |
asfgit
pushed a commit
that referenced
this pull request
Jun 8, 2020
…d types rather than crashing ### What changes were proposed in this pull request? Ignore internal accumulators that use unrecognized types rather than crashing so that an event log containing such accumulators can still be converted to JSON and logged. ### Why are the changes needed? A user may use internal accumulators by adding the `internal.metrics.` prefix to the accumulator name to hide sensitive information from UI (Accumulators except internal ones will be shown in Spark UI). However, `org.apache.spark.util.JsonProtocol.accumValueToJson` assumes an internal accumulator has only 3 possible types: `int`, `long`, and `java.util.List[(BlockId, BlockStatus)]`. When an internal accumulator uses an unexpected type, it will crash. An event log that contains such accumulator will be dropped because it cannot be converted to JSON, and it will cause weird UI issue when rendering in Spark History Server. For example, if `SparkListenerTaskEnd` is dropped because of this issue, the user will see the task is still running even if it was finished. It's better to make `accumValueToJson` more robust because it's up to the user to pick up the accumulator name. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? The new unit tests. Closes #28744 from zsxwing/fix-internal-accum. Authored-by: Shixiong Zhu <zsxwing@gmail.com> Signed-off-by: Shixiong Zhu <zsxwing@gmail.com> (cherry picked from commit b333ed0) Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
Member
Author
|
Merged to master and branch-3.0. There is some minor conflict with branch-2.4. I will submit a backport PR. |
zsxwing
added a commit
to zsxwing/spark
that referenced
this pull request
Jun 8, 2020
…d types rather than crashing Ignore internal accumulators that use unrecognized types rather than crashing so that an event log containing such accumulators can still be converted to JSON and logged. A user may use internal accumulators by adding the `internal.metrics.` prefix to the accumulator name to hide sensitive information from UI (Accumulators except internal ones will be shown in Spark UI). However, `org.apache.spark.util.JsonProtocol.accumValueToJson` assumes an internal accumulator has only 3 possible types: `int`, `long`, and `java.util.List[(BlockId, BlockStatus)]`. When an internal accumulator uses an unexpected type, it will crash. An event log that contains such accumulator will be dropped because it cannot be converted to JSON, and it will cause weird UI issue when rendering in Spark History Server. For example, if `SparkListenerTaskEnd` is dropped because of this issue, the user will see the task is still running even if it was finished. It's better to make `accumValueToJson` more robust because it's up to the user to pick up the accumulator name. No The new unit tests. Closes apache#28744 from zsxwing/fix-internal-accum. Authored-by: Shixiong Zhu <zsxwing@gmail.com> Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
asfgit
pushed a commit
that referenced
this pull request
Jun 8, 2020
…d types rather than crashing (branch-2.4) ### What changes were proposed in this pull request? Backport #28744 to branch-2.4. ### Why are the changes needed? Low risky fix for branch-2.4. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New unit tests. Closes #28758 from zsxwing/SPARK-31923-2.4. Authored-by: Shixiong Zhu <zsxwing@gmail.com> Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
Member
wankunde
pushed a commit
to wankunde/spark
that referenced
this pull request
Mar 1, 2021
…d types rather than crashing ### What changes were proposed in this pull request? Ignore internal accumulators that use unrecognized types rather than crashing so that an event log containing such accumulators can still be converted to JSON and logged. ### Why are the changes needed? A user may use internal accumulators by adding the `internal.metrics.` prefix to the accumulator name to hide sensitive information from UI (Accumulators except internal ones will be shown in Spark UI). However, `org.apache.spark.util.JsonProtocol.accumValueToJson` assumes an internal accumulator has only 3 possible types: `int`, `long`, and `java.util.List[(BlockId, BlockStatus)]`. When an internal accumulator uses an unexpected type, it will crash. An event log that contains such accumulator will be dropped because it cannot be converted to JSON, and it will cause weird UI issue when rendering in Spark History Server. For example, if `SparkListenerTaskEnd` is dropped because of this issue, the user will see the task is still running even if it was finished. It's better to make `accumValueToJson` more robust because it's up to the user to pick up the accumulator name. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? The new unit tests. Closes apache#28744 from zsxwing/fix-internal-accum. Authored-by: Shixiong Zhu <zsxwing@gmail.com> Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Ignore internal accumulators that use unrecognized types rather than crashing so that an event log containing such accumulators can still be converted to JSON and logged.
Why are the changes needed?
A user may use internal accumulators by adding the
internal.metrics.prefix to the accumulator name to hide sensitive information from UI (Accumulators except internal ones will be shown in Spark UI).However,
org.apache.spark.util.JsonProtocol.accumValueToJsonassumes an internal accumulator has only 3 possible types:int,long, andjava.util.List[(BlockId, BlockStatus)]. When an internal accumulator uses an unexpected type, it will crash.An event log that contains such accumulator will be dropped because it cannot be converted to JSON, and it will cause weird UI issue when rendering in Spark History Server. For example, if
SparkListenerTaskEndis dropped because of this issue, the user will see the task is still running even if it was finished.It's better to make
accumValueToJsonmore robust because it's up to the user to pick up the accumulator name.Does this PR introduce any user-facing change?
No
How was this patch tested?
The new unit tests.