-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-52458][CORE] Support spark.eventLog.excludedPatterns
#51163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I addressed your comments. |
|
QQ: Is it possible that logs become unable to be rendered if some events are missing? |
Of course, yes, the users need to provide a meaningful configuration. For example, the event names like the following should not be used. As you see in the PR description, those survived after compactions also. The meaningful and intuitive set of configurations are the Spark UI ones in the PR descriptions like the following. Also, they should be excluded together. For example, In addition, this can be used to prevent the user-defined |
yaooqinn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM
|
Thank you, @yaooqinn ! |
|
Merged to master for Apache Spark 4.1.0. |
### What changes were proposed in this pull request? This PR aims to document newly added `core` module configurations as a part of Apache Spark 4.1.0 preparation. ### Why are the changes needed? To help the users use new features easily. - #47856 - #51130 - #51163 - #51604 - #51630 - #51708 - #51885 - #52091 - #52382 ### Does this PR introduce _any_ user-facing change? No behavior change because this is a documentation update. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52626 from dongjoon-hyun/SPARK-53926. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? This PR aims to document newly added `core` module configurations as a part of Apache Spark 4.1.0 preparation. ### Why are the changes needed? To help the users use new features easily. - apache#47856 - apache#51130 - apache#51163 - apache#51604 - apache#51630 - apache#51708 - apache#51885 - apache#52091 - apache#52382 ### Does this PR introduce _any_ user-facing change? No behavior change because this is a documentation update. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52626 from dongjoon-hyun/SPARK-53926. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This PR aims to support
spark.eventLog.excludedPatternsto exclude specificSparkEvents. This has two goals.logEvent.spark/common/utils/src/main/scala/org/apache/spark/scheduler/SparkListenerEvent.scala
Lines 25 to 28 in 68136fd
Why are the changes needed?
Historically, Apache Spark provides multiple ways to manage the event logs to save a storage cost.
spark.history.fs.cleaner.maxAge: Delete old Spark jobs by agespark.history.fs.cleaner.maxNum: Delete old Spark jobs by the total number of jobsspark.history.fs.eventLog.rolling.maxFilesToRetain: Decompress + Compact + Compress backFor example, after compaction, Spark event logs only have the following.
This PR aims to provide a simple alternative to allow the users to skip specific Spark events completely.
Does this PR introduce any user-facing change?
No. This is a new feature.
How was this patch tested?
Pass the CIs with the newly added test case.
Was this patch authored or co-authored using generative AI tooling?
No.