Skip to content

Conversation

@panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Mar 4, 2024

What changes were proposed in this pull request?

The pr aims to use createTable(..., schema: StructType, ...) instead of createTable(..., columns: Array[Column], ...) in UT.

Why are the changes needed?

Because the TableCatalog#createTable(..., schema: StructType, ...) method has been marked as deprecated after version 3.4.0.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • Pass GA.
  • Update existed UT.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Mar 4, 2024
@panbingkun panbingkun changed the title [SPARK-47265][SQL][TESTS] Enable test of v2 data sources in FileBasedDataSourceSuite [WIP][SPARK-47265][SQL][TESTS] Enable test of v2 data sources in FileBasedDataSourceSuite Mar 4, 2024
@panbingkun panbingkun closed this Mar 4, 2024
…..)` instead of `createTable(..., columns: Array[Column], ...)` in UT
…FileBasedDataSourceSuite`"

This reverts commit 3618b30.
@panbingkun panbingkun reopened this Mar 4, 2024
@panbingkun panbingkun changed the title [WIP][SPARK-47265][SQL][TESTS] Enable test of v2 data sources in FileBasedDataSourceSuite [WIP][SPARK-47265][SQL][TESTS] Use createTable(..., schema: StructType, ...) instead of createTable(..., columns: Array[Column], ...) in UT Mar 4, 2024
@LuciferYang
Copy link
Contributor

hmm.... Is the substitution relationship written in reverse...

@panbingkun panbingkun changed the title [WIP][SPARK-47265][SQL][TESTS] Use createTable(..., schema: StructType, ...) instead of createTable(..., columns: Array[Column], ...) in UT [WIP][SPARK-47265][SQL][TESTS] Replace createTable(..., schema: StructType, ...) with createTable(..., columns: Array[Column], ...) in UT Mar 4, 2024
@LuciferYang
Copy link
Contributor

[error] /home/runner/work/spark/spark/sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala:26:48: Unused import
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=<part of the message>, cat=unused-imports, site=org.apache.spark.sql.connector
[error] import org.apache.spark.sql.connector.catalog.{CatalogV2Util, Column, Identifier, InMemoryTableCatalog}
[error] 

@panbingkun panbingkun changed the title [WIP][SPARK-47265][SQL][TESTS] Replace createTable(..., schema: StructType, ...) with createTable(..., columns: Array[Column], ...) in UT [SPARK-47265][SQL][TESTS] Replace createTable(..., schema: StructType, ...) with createTable(..., columns: Array[Column], ...) in UT Mar 5, 2024
@panbingkun
Copy link
Contributor Author

cc @cloud-fan

@panbingkun panbingkun marked this pull request as ready for review March 5, 2024 05:38

override def createTable(
ident: Identifier,
schema: StructType,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove this overload now if no one is calling it?


override def createTable(
ident: Identifier,
schema: StructType,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

}

// TODO: remove it when no tests calling this deprecated method.
override def createTable(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will move the specific implementation of createTable from createTable(..., StructType, ...) to createTable(..., Array[Column], ...), so that when we delete the deprecated method createTable(..., StructType, ...) in TableCatalog, we can directly delete the method createTable(..., StructType, ...) here.

}

// TODO: remove it when no tests calling this deprecated method.
// TODO: remove it when the deprecated method `createTable(..., StructType, ...)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove it now? Spark won't call this method, only tests will

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to delete this now, then we need to remove the method TableCatalog#createTable(..., StructType, ...),

I am a bit worried that although TableCatalog is marked with @Evolving, third parties (eg: iceberg) relying on this interface will immediately feel the break changes, such as:
https://github.com/apache/iceberg/blob/f0b3733e0e7071894bdf63133bcc8c00754a1080/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkSessionCatalog.java#L172-L182
image

Do we really need to do this now?
At present, Spark can indeed directly delete it internally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, so we only keep it to pass compilation? Then we can just throw an internal exception in this method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, very good suggestion! I will try it out, thanks.

Copy link
Contributor

@LuciferYang LuciferYang Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply from the definition of @Evolving, it is indeed possible to make this change in 4.0.

I'm not sure if we need to pay attention to direct calls to this method from non-Spark internal code(s there really such a possibility?). If so, throwing an exception only delays the problem from compile time to runtime...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not about callers, but about existing TableCatalog implementations and it's better to not break them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

null // Return null to save the `loadTable` call for CREATE TABLE without AS SELECT.
}

// TODO: remove it when the deprecated method `createTable(..., StructType, ...)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@panbingkun panbingkun requested a review from cloud-fan March 7, 2024 06:38
}

// TODO: remove it when no tests calling this deprecated method.
// TODO: remove it when the deprecated method `createTable(..., StructType, ...)`
Copy link
Contributor

@LuciferYang LuciferYang Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove this TODO? Isn't the current solution the final one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually a bit hesitant, because if we eventually decide to completely remove this in a future version, keeping it now may be a good reminder, and I want to hear everyone's opinions. @cloud-fan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove the TODO now. We should never remove the API and implementation will have to provide a fake implementation that just fails.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding some new comments to explain the reason for retaining this API and throwing exceptions by default while removing the TODO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove the TODO now. We should never remove the API and implementation will have to provide a fake implementation that just fails.

Done.

null // Return null to save the `loadTable` call for CREATE TABLE without AS SELECT.
}

// TODO: remove it when the deprecated method `createTable(..., StructType, ...)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 5013105 Mar 8, 2024
yaooqinn pushed a commit that referenced this pull request Aug 15, 2024
…fail

### What changes were proposed in this pull request?

This is a followup of #45368 . `DelegatingCatalogExtension` can invoke any method in `V2SessionCatalog`, and we should make sure the two overloads of `createTable` in `V2SessionCatalog` both work.

### Why are the changes needed?

Avoid breaking custom catalogs that extend `DelegatingCatalogExtension`.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

N/A

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #47755 from cloud-fan/create-table.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants