-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-37590][SQL][TEST] Unify v1 and v2 ALTER TABLE .. SET PROPERTIES tests #34842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-37590][SQL][TEST] Unify v1 and v2 ALTER TABLE .. SET PROPERTIES tests #34842
Conversation
...test/scala/org/apache/spark/sql/execution/command/AlterNamespaceSetPropertiesSuiteBase.scala
Show resolved
Hide resolved
|
Kubernetes integration test starting |
| assert(getProperties(ns) === "", s"$key is a reserved namespace property and ignored") | ||
| val meta = spark.sessionState.catalogManager.catalog(catalog) | ||
| .asNamespaceCatalog.loadNamespaceMetadata(namespace.split('.')) | ||
| assert(meta.get(key) == null || !meta.get(key).contains("foo"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a behavior difference between v1 and v2? null vs empty string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was taken from DataSourceV2SQLSuite.scala, but further looking into this, there seems to be a difference.
In the above loop, the key will be either location or owner, and loadNamespaceMetadata returns the following:
| key | v1 catalog | v2 catalog |
|---|---|---|
| location | non-null | null |
| owner | null | non-null |
I think the null case is interesting.
- v2 catalog returns
nullforlocationbecauseCREATE NAMESPACEdoesn't create a default location if not specified, where as v1 catalog creates a default location. This is expected for v2 catalog, right? - v1 catalog returns
nullforownersince the following doesn't setownerto metadata:. Do you know if this was intentional?spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala
Lines 347 to 355 in 8f6e439
def toMetadata: util.Map[String, String] = { val metadata = mutable.HashMap[String, String]() catalogDatabase.properties.foreach { case (key, value) => metadata.put(key, value) } metadata.put(SupportsNamespaces.PROP_LOCATION, catalogDatabase.locationUri.toString) metadata.put(SupportsNamespaces.PROP_COMMENT, catalogDatabase.description)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the location, I think it's OK. The catalog implementation should decide the default location (or even no location if the source is not file-based). We should accept this difference.
For the owner, it seems a bug that V2SessionCatalog does not propagate the owner field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the owner, the difference comes because v2 command always adds the owner when the namespace is created:
Lines 45 to 47 in 7b00011
| val ownership = | |
| Map(PROP_OWNER -> Utils.getCurrentUserName()) | |
| catalog.createNamespace(ns, (properties ++ ownership).asJava) |
, whereas for v1 command doesn't add the owner when the database is created.
Instead, for v1 Hive catalog, the user property is inserted when the database is retrieved:
spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
Lines 370 to 373 in 7b00011
| override def getDatabase(dbName: String): CatalogDatabase = withHiveState { | |
| Option(shim.getDatabase(client, dbName)).map { d => | |
| val params = Option(d.getParameters).map(_.asScala.toMap).getOrElse(Map()) ++ | |
| Map(PROP_OWNER -> shim.getDatabaseOwnerName(d)) |
Meanwhile, the v1 in-memory catalog implementation doesn't add the owner when the database is retrieved, so we see null owner above. (and the owner is a part of the property, so updating V2SessionCatalog doesn't really address the issue).
One thing we can do is to update the v1 in-memory catalog to add the owner when the database is created or retrieved, but it is still not consistent since adding the owner is a responsibility of the command in v2, but a responsibility of the catalog in v1. Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be too risky to change the v1 behavior now (Hive metastore fills the owner field). Let's just update the v1 in-memory catalog to fill the owner field as well.
@yaooqinn which one do you think should set the owner field? Spark or the underlying catalog?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for v1 and v2 database and table creation, we both respect the sparkUser now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for alter properties and if it's not an explicitly ower change, we shall respect the catalog settings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an owner while creating a database in InMemoryCatalog, explicitly added a location while creating a namespace in the test (to handle the difference for v2 catalog), and removed the meta.get(key) == null check.
|
Kubernetes integration test status failure |
|
Test build #146025 has finished for PR 34842 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #146078 has finished for PR 34842 at commit
|
|
thanks, merging to master! |
…V2 command by default ### What changes were proposed in this pull request? This PR proposes to use V2 commands as default as outlined in [SPARK-36588](https://issues.apache.org/jira/browse/SPARK-36588), and this PR migrates `ALTER NAMESPACE ... SET PROPERTIES` to use v2 command by default. Note that the work to tests covering both v1/v2 were done in #34842. ### Why are the changes needed? It's been a while since we introduced the v2 commands, and it seems reasonable to use v2 commands by default even for the session catalog, with a legacy config to fall back to the v1 commands. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing *AlterNamespaceSetPropertiesSuite tests cover this PR's change. Closes #34891 from imback82/v2_alter_ns_set_properties. Authored-by: Terry Kim <[email protected]> Signed-off-by: Kousuke Saruta <[email protected]>
| withNamespace(ns) { | ||
| // Set the location explicitly because v2 catalog may not set the default location. | ||
| // Without this, `meta.get(key)` below may return null. | ||
| sql(s"CREATE NAMESPACE $ns LOCATION '/tmp'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all test environments can access /tmp because it's a directory under root. I think we should change it to tmp which will be qualified with the warehouse path.
@imback82 can you make a followup to fix this? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will do. Thanks.
…ace location ### What changes were proposed in this pull request? This is a follow up to address #34842 (comment), where setting `/tmp` as a namespace location may break certain test environments. This PR also fixes a minor string interpolation issue (unnecessary `s`) in the same file. ### Why are the changes needed? To fix a test issue that the namespace location may cause in certain environments where `/tmp` is not accessible. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Updated the test Closes #34930 from imback82/SPARK-37590-followup. Authored-by: Terry Kim <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
ALTER NAMESPACE ... SET PROPERTIESparsing tests toAlterNamespaceSetPropertiesParserSuite.ALTER NAMESPACE ... SET PROPERTIEStests into one traitorg.apache.spark.sql.execution.command.AlterNamespaceSetPropertiesSuiteBase, and put datasource specific tests to thev1.AlterNamespaceSetPropertiesSuiteandv2.AlterNamespaceSetPropertiesSuite.The changes follow the approach of #30287.
Why are the changes needed?
ALTER NAMESPACE ... SET PROPERTIEStests for both DSv1/Hive DSv1 and DSv2Does this PR introduce any user-facing change?
No
How was this patch tested?
Existing unit tests and new tests.