-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-39700][SQL][DOCS] Update two-parameter listColumns/getTable/getFunction/tableExists/functionExists functions docs to mention limitation
#37105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…nctionName). Such API should use the version with single tableName/FunctionName parameter.
|
R: @cloud-fan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this deprecation is technically required at Apache Spark 3.4, @amaliujia .
Given the 3l namespace support in Catalog API, now if a API takes, for example, tableName, it could be a.b.c.
In addition, Apache Spark community does not delete APIs at all in these days. I'm not sure about the benefit of deprecation proposed by this PR. Is there any other reasons for deprecations?
|
@dongjoon-hyun we deprecated |
|
I sent out the discussion for deprecating trigger in dev@ mailing list. That said, what about initiating the discussion in dev@ mailing list so that there are reasonable objections on the topic? |
|
Can one of the admins verify this patch? |
|
I can drive a discussion in dev@ for this deprecation idea. |
|
Do you think if we have reached a consensus that this will be an educational deprecation? |
|
Did you talk with @cloud-fan ? |
|
We shared conflicting opinions, but didn't make an agreement on the API deprecation yet in the community level. |
|
Let's do a "soft deprecation": explain the limitation of the current API and suggest users to use alternatives in the API doc, but do not use the java deprecation annotation. |
|
Thank you for the decision, @cloud-fan . I believe it makes sense in this case. |
|
Thanks all! I will update this PR and the dev@ email thread to reflect the decision. |
|
Thank you for your patience and leading this discussion, @amaliujia . |
|
I updated this PR to reflect our discussion. |
listColumns/getTable/getFunction/tableExists/functionExists functions docs to mention limitation
listColumns/getTable/getFunction/tableExists/functionExists functions docs to mention limitationlistColumns/getTable/getFunction/tableExists/functionExists functions docs to mention limitation
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you for your patience, @amaliujia .
Merged to master for Apache Spark 3.4.
|
Thank you all for your review! |
| /** | ||
| * Returns a list of columns for the given table/view in the specified database. | ||
| * | ||
| * This API does not support 3 layer namespace since 3.4.0. To use 3 layer namespace, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 layer namespace is a bit confusing, how about
This API does not support specifying the catalog name. To specify the catalog name, please use
`listColumns(qualifiedTableNameWithCatalog)` instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question to @cloud-fan because I agree with you that that naming is confusing.
3 layer namespace is a bit confusing
I've monitored many commits in the community.
a2c1038031 [SPARK-39579][SQL][PYTHON][R] Make ListFunctions/getFunction/functionExists compatible with 3 layer namespace
6e7a571532 [SPARK-39649][PYTHON] Make listDatabases / getDatabase / listColumns / refreshTable in PySpark support 3-layer-namespace
cbb4e7da69 [SPARK-39646][SQL] Make setCurrentDatabase compatible with 3 layer namespace
b0d297c6d1 [SPARK-39645][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace
8c02823b49 [SPARK-39583][SQL] Make RefreshTable be compatible with 3 layer namespace
ed1a3402d2 [SPARK-39598][PYTHON] Make *cache*, *catalog* in the python side support 3-layer-namespace
c1106fbe22 [SPARK-39597][PYTHON] Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace
1f15f2c6ad [SPARK-39615][SQL] Make listColumns be compatible with 3 layer namespace
b2d249b1aa [SPARK-39555][PYTHON] Make createTable and listTables in the python side support 3-layer-namespace
ca5f7e6c35 [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace
cb55efadea [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace
Comments like this.
spark/python/pyspark/sql/catalog.py
Line 464 in 0cc96f7
| multi-layer-namespace identifier, then try to ``tableName`` as a normal table |
Even in function naming like this.
| private def getTable3LNamespace(tableName: String): Table = { |
I believe we need a naming rule for this to promote new naming or demote it by preventing further usage. Which way do you prefer, @cloud-fan ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 layer name is more common in traditional databases as the identifier is 3 parts: catalog.schema.name. But Spark is more general and the identifier has n parts: catalog.ns1.ns2....name.
I think qualifiedNameWithCatalog is more accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me clean the naming up in a followup PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much, @cloud-fan !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this discussion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm working on it: #37287
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @cloud-fan .
What changes were proposed in this pull request?
This PR aims to update two-parameter
listColumns/getTable/getFunction/tableExists/functionExistsfunction's doc to mention the limitation. To use 3 layer namespace, the users can use single parameter functions.Why are the changes needed?
We can support the existing users without any overhead and advertise new 3 layer namespace API at the same time.
Does this PR introduce any user-facing change?
No. This is a doc change.
How was this patch tested?
N/A