[SPARK-35242][SQL] Support changing session catalog's default database#37679
[SPARK-35242][SQL] Support changing session catalog's default database#37679roczei wants to merge 20 commits intoapache:masterfrom roczei:SPARK-35242
Conversation
|
@cloud-fan, @hddong, @dongjoon-hyun could you please review my changes? Thanks! |
|
Can one of the admins verify this patch? |
|
Could you please take a look when you have some time? This PR is a follow-up PR for #32364. Thanks! |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
Outdated
Show resolved
Hide resolved
|
we should also update |
|
Thanks @cloud-fan, I will check these! |
@cloud-fan, did you mean this change? |
Yes |
@cloud-fan Ok, I have fixed this. |
@cloud-fan Good question. The reason that we cannot delete the user specified default database because we have the following if statement in the actual code: and this is the latest state of master: As you can see that I am just using the same logic. If you think that we should only deny the database drop for "default" and allow for the value of spark.sql.catalog.spark_catalog.defaultDatabase, it is ok for me. The change is very simple: and here is the validation: Is this solution acceptable for you? |
|
SGTM! |
|
Thanks @cloud-fan, I have implemented this and all tests passed. As I see we have resolved all of your feedbacks. |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/CatalogManagerSuite.scala
Outdated
Show resolved
Hide resolved
|
Hi @cloud-fan, All build issues have been fixed and all of your feedbacks have been implemented. Latest state: |
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
Outdated
Show resolved
Hide resolved
cloud-fan
left a comment
There was a problem hiding this comment.
LGTM except for 2 minor comments
Thanks @cloud-fan! I have implemented these two comments. |
|
thanks, meriging to master! |
|
Thank you very much for your help! |
What changes were proposed in this pull request?
This PR is a follow-up PR for #32364. It has been closed by github-actions because it hasn't been updated in a while. The previous PR has added a new custom parameter (spark.sql.catalog.$SESSION_CATALOG_NAME.defaultDatabase) to configure the session catalog's default database which is required by some use cases where the user does not have access to the default database.
Therefore I have created a new PR based on this and added these changes in addition:
Why are the changes needed?
If our user does not have any permissions for the Hive default database in Ranger, it will fail with the following error:
The idea is that we introduce a new configuration parameter where we can set a different database name for the default database. Our user has enough permissions for this in Ranger.
For example:
spark-shell --conf spark.sql.catalog.spark_catalog.defaultDatabase=other_dbDoes this PR introduce any user-facing change?
There will be a new configuration parameter as I mentioned above but the default value is "default" as it was previously.
How was this patch tested?
https://github.com/roczei/spark/actions/runs/2934863118
Scenario a) hrt_10 does not have access to the default database in Hive:
This is the expected behavior because it will use the "default" db name.
Scenario b) Use the "other" database where the hrt_10 user has proper permissions