-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-30060][CORE] Rename metrics enable/disable configs #26692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-30060][CORE] Rename metrics enable/disable configs #26692
Conversation
|
ok to test |
|
|
||
| val STREAMING_METRICS_ENABLED = | ||
| buildConf("spark.sql.streaming.metricsEnabled") | ||
| buildConf("spark.metrics.sparkStreamingSource.enabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a breaking change to rename an already released config. Can we leave it as it is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, what about renaming together at 3.0? We still can support legacy config names together, cannot we?
Also, cc @gatorsmile since he suggest renaming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old config would have to be deprecated and be an alias for the new one, at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. +1 for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully agree on the need to deprecate the old parameter and adding an alias.
Another point I see about renaming spark.sql.streaming.metricsEnabled to spark.metrics.sparkStreamingSource.enabled, is that the configuration lives in [[SQLConf]] where all configuration parameters appear to have spark.sql as their prefix. For example, would using spark.sql.metrics.sparkStreamingSource.enabled be a better choice? I see pros and cons to that. Any advice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the voice of name name here: we may want to differentiate DStream and Structured Streaming. Before the change it is clear that it's for Structured Streaming (sql.streaming) which new name is not clear about that. If we want to only mention streaming it should also mention sql. If we are OK to have structuredStreaming as a part of name, that's fine.
|
Test build #114531 has finished for PR 26692 at commit
|
| .createWithDefault(true) | ||
|
|
||
| val STREAMING_METRICS_ENABLED = | ||
| buildConf("spark.sql.streaming.metricsEnabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to change this existing naming space, you should change all others consistently, e.g. spark.sql.streaming.fileSource.schema.forceNullable. sql.streaming looks pretty clear to me.
You should also consider moving from SQLConf to SparkConf if you want to change the name space consistently. (e.g., spark.pyspark vs spark.sql...pyspark)
It doesn't look like worth changing to me given the overhead. Let's keep this name space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that changing spark.sql.streaming.metricsEnabled appears to open a can of worms. Maybe we should just stick to the other 2 parameters that are easier to change.
dd089f0 to
852e865
Compare
|
Test build #114622 has finished for PR 26692 at commit
|
| .intConf | ||
| .createWithDefault(Int.MaxValue) | ||
|
|
||
| val APP_STATUS_METRICS_ENABLED = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we are consolidating this, shall we update APP_STATUS_METRICS_ENABLED -> METRICS_APP_STATUS_SOURCE_ENABLED together more consistently?
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me too except @dongjoon-hyun's comment. It should be best to keep PR description updated.
|
Test build #114756 has finished for PR 26692 at commit
|
|
retest this please |
|
Test build #114763 has finished for PR 26692 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Merged to master.
Thank you all!
### What changes were proposed in this pull request? This proposes to introduce a naming convention for Spark metrics configuration parameters used to enable/disable metrics source reporting using the Dropwizard metrics library: `spark.metrics.sourceNameCamelCase.enabled` and update 2 parameters to use this naming convention. ### Why are the changes needed? Currently Spark has a few parameters to enable/disable metrics reporting. Their naming pattern is not uniform and this can create confusion. Currently we have: `spark.metrics.static.sources.enabled` `spark.app.status.metrics.enabled` `spark.sql.streaming.metricsEnabled` ### Does this PR introduce any user-facing change? Update parameters for enabling/disabling metrics reporting new in Spark 3.0: `spark.metrics.static.sources.enabled` -> `spark.metrics.staticSources.enabled`, `spark.app.status.metrics.enabled` -> `spark.metrics.appStatusSource.enabled`. Note: `spark.sql.streaming.metricsEnabled` is left unchanged as it is already in use in Spark 2.x. ### How was this patch tested? Manually tested Closes apache#26692 from LucaCanali/uniformNamingMetricsEnableParameters. Authored-by: Luca Canali <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This proposes to introduce a naming convention for Spark metrics configuration parameters used to enable/disable metrics source reporting using the Dropwizard metrics library:
spark.metrics.sourceNameCamelCase.enabledand update 2 parameters to use this naming convention.Why are the changes needed?
Currently Spark has a few parameters to enable/disable metrics reporting. Their naming pattern is not uniform and this can create confusion. Currently we have:
spark.metrics.static.sources.enabledspark.app.status.metrics.enabledspark.sql.streaming.metricsEnabledDoes this PR introduce any user-facing change?
Update parameters for enabling/disabling metrics reporting new in Spark 3.0:
spark.metrics.static.sources.enabled->spark.metrics.staticSources.enabled,spark.app.status.metrics.enabled->spark.metrics.appStatusSource.enabled.Note:
spark.sql.streaming.metricsEnabledis left unchanged as it is already in use in Spark 2.x.How was this patch tested?
Manually tested