Skip to content

Conversation

@amaliujia
Copy link
Contributor

@amaliujia amaliujia commented Jul 6, 2022

What changes were proposed in this pull request?

This PR aims to update two-parameter listColumns/getTable/getFunction/tableExists/functionExists function's doc to mention the limitation. To use 3 layer namespace, the users can use single parameter functions.

Why are the changes needed?

We can support the existing users without any overhead and advertise new 3 layer namespace API at the same time.

Does this PR introduce any user-facing change?

No. This is a doc change.

How was this patch tested?

N/A

…nctionName). Such API should use the version with single tableName/FunctionName parameter.
@amaliujia
Copy link
Contributor Author

R: @cloud-fan

@github-actions github-actions bot added the SQL label Jul 6, 2022
@amaliujia amaliujia changed the title [SPARK-39700] Deprecate API that has parameters (DBName, tableName/FunctionName) [SPARK-39700] Deprecate Catalog API that has input parameters (dbName, tableName/FunctionName) Jul 6, 2022
@amaliujia amaliujia closed this Jul 6, 2022
@amaliujia amaliujia reopened this Jul 6, 2022
@amaliujia amaliujia changed the title [SPARK-39700] Deprecate Catalog API that has input parameters (dbName, tableName/FunctionName) [SPARK-39700][SQL] Deprecate Catalog API that has input parameters (dbName, tableName/FunctionName) Jul 6, 2022
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this deprecation is technically required at Apache Spark 3.4, @amaliujia .

Given the 3l namespace support in Catalog API, now if a API takes, for example, tableName, it could be a.b.c.

In addition, Apache Spark community does not delete APIs at all in these days. I'm not sure about the benefit of deprecation proposed by this PR. Is there any other reasons for deprecations?

@cloud-fan
Copy link
Contributor

@dongjoon-hyun we deprecated createExternalTable when we add createTable. I think it's good to keep the API clean and mark unneeded (and inconsistent) APIs as deprecated. The more important thing is: with deprecation, hopefully users will move away from these APIs slowly, and at some point, we can actually remove these APIs.

@HeartSaVioR
Copy link
Contributor

I sent out the discussion for deprecating trigger in dev@ mailing list. That said, what about initiating the discussion in dev@ mailing list so that there are reasonable objections on the topic?

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@amaliujia
Copy link
Contributor Author

I can drive a discussion in dev@ for this deprecation idea.

@amaliujia
Copy link
Contributor Author

amaliujia commented Jul 14, 2022

@dongjoon-hyun

Do you think if we have reached a consensus that this will be an educational deprecation?

@dongjoon-hyun
Copy link
Member

Did you talk with @cloud-fan ?

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jul 14, 2022

We shared conflicting opinions, but didn't make an agreement on the API deprecation yet in the community level.

@cloud-fan
Copy link
Contributor

Let's do a "soft deprecation": explain the limitation of the current API and suggest users to use alternatives in the API doc, but do not use the java deprecation annotation.

@dongjoon-hyun
Copy link
Member

Thank you for the decision, @cloud-fan . I believe it makes sense in this case.

@amaliujia
Copy link
Contributor Author

amaliujia commented Jul 14, 2022

Thanks all! I will update this PR and the dev@ email thread to reflect the decision.

@dongjoon-hyun
Copy link
Member

Thank you for your patience and leading this discussion, @amaliujia .

@amaliujia
Copy link
Contributor Author

@cloud-fan @dongjoon-hyun

I updated this PR to reflect our discussion.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-39700][SQL] Deprecate Catalog API that has input parameters (dbName, tableName/FunctionName) [SPARK-39700][SQL] Update two-parameter listColumns/getTable/getFunction/tableExists/functionExists functions docs to mention limitation Jul 20, 2022
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-39700][SQL] Update two-parameter listColumns/getTable/getFunction/tableExists/functionExists functions docs to mention limitation [SPARK-39700][SQL][DOCS] Update two-parameter listColumns/getTable/getFunction/tableExists/functionExists functions docs to mention limitation Jul 20, 2022
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you for your patience, @amaliujia .
Merged to master for Apache Spark 3.4.

@amaliujia amaliujia deleted the deprecateapi branch July 20, 2022 00:59
@amaliujia
Copy link
Contributor Author

Thank you all for your review!

/**
* Returns a list of columns for the given table/view in the specified database.
*
* This API does not support 3 layer namespace since 3.4.0. To use 3 layer namespace,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 layer namespace is a bit confusing, how about

This API does not support specifying the catalog name. To specify the catalog name, please use
`listColumns(qualifiedTableNameWithCatalog)` instead.

Copy link
Member

@dongjoon-hyun dongjoon-hyun Jul 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question to @cloud-fan because I agree with you that that naming is confusing.

3 layer namespace is a bit confusing

I've monitored many commits in the community.

a2c1038031 [SPARK-39579][SQL][PYTHON][R] Make ListFunctions/getFunction/functionExists compatible with 3 layer namespace
6e7a571532 [SPARK-39649][PYTHON] Make listDatabases / getDatabase / listColumns / refreshTable in PySpark support 3-layer-namespace
cbb4e7da69 [SPARK-39646][SQL] Make setCurrentDatabase compatible with 3 layer namespace
b0d297c6d1 [SPARK-39645][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace
8c02823b49 [SPARK-39583][SQL] Make RefreshTable be compatible with 3 layer namespace
ed1a3402d2 [SPARK-39598][PYTHON] Make *cache*, *catalog* in the python side support 3-layer-namespace
c1106fbe22 [SPARK-39597][PYTHON] Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace
1f15f2c6ad [SPARK-39615][SQL] Make listColumns be compatible with 3 layer namespace
b2d249b1aa [SPARK-39555][PYTHON] Make createTable and listTables in the python side support 3-layer-namespace
ca5f7e6c35 [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace
cb55efadea [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

Comments like this.

multi-layer-namespace identifier, then try to ``tableName`` as a normal table

Even in function naming like this.

private def getTable3LNamespace(tableName: String): Table = {

I believe we need a naming rule for this to promote new naming or demote it by preventing further usage. Which way do you prefer, @cloud-fan ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 layer name is more common in traditional databases as the identifier is 3 parts: catalog.schema.name. But Spark is more general and the identifier has n parts: catalog.ns1.ns2....name.

I think qualifiedNameWithCatalog is more accurate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me clean the naming up in a followup PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much, @cloud-fan !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this discussion!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on it: #37287

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @cloud-fan .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants