-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-34445][SQL][DOCS] Make spark.sql.legacy.replaceDatabricksSparkAvro.enabled as non-internal
#31571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34445][SQL][DOCS] Make spark.sql.legacy.replaceDatabricksSparkAvro.enabled as non-internal
#31571
Conversation
|
There are other legacy configs that are non-internal:
Though I am not sure why the configs ^^ are not internal, maybe there is a mistake because the configs are not mentioned in public docs. @HyukjinKwon WDYT? |
|
@HyukjinKwon @gengliangwang @cloud-fan Could you take a look at this, please. |
|
Test build #135169 has started for PR 31571 at commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this was there since 2.4.0, I'm not sure why Apache Spark 3.2.0 does require this suddenly. I'm a little reluctant to advertise this option more explicitly in Apache Spark side.
This additional exposure proposal puts us into a more difficult situation when we want to remove this completely in the future from the Apache Spark codebase. Eventually, Apache Spark avro data source should be the best and de-facto standard.
FYI, I have found this config while documenting other "internal" configs, see #31564 (review)
It has been already explicitly advertised in public docs:
Do you consider a situation when you can just remove this config w/o deprecation only because it is marked as From my point of view, even the config is "internal" de jure, it is external de facto. In this situation, it cannot be just removed hiddenly from users. I do believe we should make it as external de jure, so, maybe remove it in the future only via deprecation otherwise we can break users apps potentially. @HyukjinKwon @cloud-fan WDYT? |
|
No, I don't.
No. I disagree with the expression,
Like https://github.com/databricks/spark-avro says, only
|
| .createWithDefault(false) | ||
|
|
||
| val LEGACY_REPLACE_DATABRICKS_SPARK_AVRO_ENABLED = | ||
| buildConf("spark.sql.legacy.replaceDatabricksSparkAvro.enabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gengliangwang do we need this conf? We do the same thing in com.databricks.spark.csv already by default internally. We could just make it exposed and deprecate this config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this way, I think it will address most of concerns raised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And then we could remove both com.databricks.spark.csv and com.databricks.spark.avro fallbacks together in the future, of course (Spark 4.0?). I don't think it makes sense to keep both mapping forever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's deprecate it and then remove it in later versions. It's OK if a deprecated internal config is mentioned in public docs for migration purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the PR #31578 to deprecate the config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you all!
All legacy configs should be internal. These 2 are very likely mistakes. |
We have already marked all legacy configs as internal configs before the release 3.0, see #27448 . Those 2 configs were added after the commit, probably. I opened the PR #31577 with a test which checks that all legacy SQL configs are internal. |
|
Since #31578 is merged, I'll close this one. |
What changes were proposed in this pull request?
Remove
.internal()from the SQL legacy configspark.sql.legacy.replaceDatabricksSparkAvro.enabled.Why are the changes needed?
In fact, the SQL config
spark.sql.legacy.replaceDatabricksSparkAvro.enabledhas been already documented publicly, see http://spark.apache.org/docs/latest/sql-data-sources-avro.html. So, it cannot be considered as an internal config.Does this PR introduce any user-facing change?
This updates the list of auto generated SQL configs.
How was this patch tested?
By generating docs: