Skip to content

Conversation

@imback82
Copy link
Contributor

@imback82 imback82 commented Dec 9, 2021

What changes were proposed in this pull request?

  1. Move ALTER NAMESPACE ... SET PROPERTIES parsing tests to AlterNamespaceSetPropertiesParserSuite.
  2. Put common ALTER NAMESPACE ... SET PROPERTIES tests into one trait org.apache.spark.sql.execution.command.AlterNamespaceSetPropertiesSuiteBase, and put datasource specific tests to the v1.AlterNamespaceSetPropertiesSuite and v2.AlterNamespaceSetPropertiesSuite.

The changes follow the approach of #30287.

Why are the changes needed?

  1. The unification will allow to run common ALTER NAMESPACE ... SET PROPERTIES tests for both DSv1/Hive DSv1 and DSv2
  2. We can detect missing features and differences between DSv1 and DSv2 implementations.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing unit tests and new tests.

@github-actions github-actions bot added the SQL label Dec 9, 2021
@SparkQA
Copy link

SparkQA commented Dec 9, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50501/

assert(getProperties(ns) === "", s"$key is a reserved namespace property and ignored")
val meta = spark.sessionState.catalogManager.catalog(catalog)
.asNamespaceCatalog.loadNamespaceMetadata(namespace.split('.'))
assert(meta.get(key) == null || !meta.get(key).contains("foo"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a behavior difference between v1 and v2? null vs empty string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was taken from DataSourceV2SQLSuite.scala, but further looking into this, there seems to be a difference.

In the above loop, the key will be either location or owner, and loadNamespaceMetadata returns the following:

key v1 catalog v2 catalog
location non-null null
owner null non-null

I think the null case is interesting.

  1. v2 catalog returns null for location because CREATE NAMESPACE doesn't create a default location if not specified, where as v1 catalog creates a default location. This is expected for v2 catalog, right?
  2. v1 catalog returns null for owner since the following doesn't set owner to metadata:
    def toMetadata: util.Map[String, String] = {
    val metadata = mutable.HashMap[String, String]()
    catalogDatabase.properties.foreach {
    case (key, value) => metadata.put(key, value)
    }
    metadata.put(SupportsNamespaces.PROP_LOCATION, catalogDatabase.locationUri.toString)
    metadata.put(SupportsNamespaces.PROP_COMMENT, catalogDatabase.description)
    . Do you know if this was intentional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the location, I think it's OK. The catalog implementation should decide the default location (or even no location if the source is not file-based). We should accept this difference.

For the owner, it seems a bug that V2SessionCatalog does not propagate the owner field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the owner, the difference comes because v2 command always adds the owner when the namespace is created:

val ownership =
Map(PROP_OWNER -> Utils.getCurrentUserName())
catalog.createNamespace(ns, (properties ++ ownership).asJava)

, whereas for v1 command doesn't add the owner when the database is created.

Instead, for v1 Hive catalog, the user property is inserted when the database is retrieved:

override def getDatabase(dbName: String): CatalogDatabase = withHiveState {
Option(shim.getDatabase(client, dbName)).map { d =>
val params = Option(d.getParameters).map(_.asScala.toMap).getOrElse(Map()) ++
Map(PROP_OWNER -> shim.getDatabaseOwnerName(d))

Meanwhile, the v1 in-memory catalog implementation doesn't add the owner when the database is retrieved, so we see null owner above. (and the owner is a part of the property, so updating V2SessionCatalog doesn't really address the issue).

One thing we can do is to update the v1 in-memory catalog to add the owner when the database is created or retrieved, but it is still not consistent since adding the owner is a responsibility of the command in v2, but a responsibility of the catalog in v1. Any thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be too risky to change the v1 behavior now (Hive metastore fills the owner field). Let's just update the v1 in-memory catalog to fill the owner field as well.

@yaooqinn which one do you think should set the owner field? Spark or the underlying catalog?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for v1 and v2 database and table creation, we both respect the sparkUser now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for alter properties and if it's not an explicitly ower change, we shall respect the catalog settings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an owner while creating a database in InMemoryCatalog, explicitly added a location while creating a namespace in the test (to handle the difference for v2 catalog), and removed the meta.get(key) == null check.

@SparkQA
Copy link

SparkQA commented Dec 9, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50501/

@SparkQA
Copy link

SparkQA commented Dec 9, 2021

Test build #146025 has finished for PR 34842 at commit bef81fa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 10, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50553/

@SparkQA
Copy link

SparkQA commented Dec 10, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50553/

@SparkQA
Copy link

SparkQA commented Dec 11, 2021

Test build #146078 has finished for PR 34842 at commit aaf0e5c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 414771d Dec 13, 2021
sarutak pushed a commit that referenced this pull request Dec 14, 2021
…V2 command by default

### What changes were proposed in this pull request?

This PR proposes to use V2 commands as default as outlined in [SPARK-36588](https://issues.apache.org/jira/browse/SPARK-36588), and this PR migrates `ALTER NAMESPACE ... SET PROPERTIES` to use v2 command by default.

Note that the work to tests covering both v1/v2 were done in #34842.

### Why are the changes needed?

It's been a while since we introduced the v2 commands,  and it seems reasonable to use v2 commands by default even for the session catalog, with a legacy config to fall back to the v1 commands.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing *AlterNamespaceSetPropertiesSuite tests cover this PR's change.

Closes #34891 from imback82/v2_alter_ns_set_properties.

Authored-by: Terry Kim <[email protected]>
Signed-off-by: Kousuke Saruta <[email protected]>
withNamespace(ns) {
// Set the location explicitly because v2 catalog may not set the default location.
// Without this, `meta.get(key)` below may return null.
sql(s"CREATE NAMESPACE $ns LOCATION '/tmp'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all test environments can access /tmp because it's a directory under root. I think we should change it to tmp which will be qualified with the warehouse path.

@imback82 can you make a followup to fix this? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will do. Thanks.

cloud-fan pushed a commit that referenced this pull request Dec 17, 2021
…ace location

### What changes were proposed in this pull request?

This is a follow up to address #34842 (comment), where setting `/tmp` as a namespace location may break certain test environments.

This PR also fixes a minor string interpolation issue (unnecessary `s`) in the same file.

### Why are the changes needed?

To fix a test issue that the namespace location may cause in certain environments where `/tmp` is not accessible.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Updated the test

Closes #34930 from imback82/SPARK-37590-followup.

Authored-by: Terry Kim <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants