-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-37878][SQL] Migrate SHOW CREATE TABLE to use v2 command by default #35204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@cloud-fan @imback82 Could you take a look? Thank you very much. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when v2 command work with v1 table. when we resolve the table. we load the table by V2SessionCatalog, then create V1Table(sessionCatalog.getTableMetadata), here when getTableMetadata, we will replace char/varchar with string in schema. So V1Table(t) t is alway with string type although type is char/varchar. So re-construct the schema of V1Table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can remove #L496
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should restore the raw table schema in SHOW CREATE TABLE, not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other v2 command works v1 table. maybe encounter this problem. but I will update as you said. thank you very much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it simply identifier.asMultiPartIdentifier.quoted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TableIdentifier could not asMultiPartIdentifier, Identifier can asMultiPartIdentifier
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to special-case v1 table. We should recognize these special partition transforms (bucket and sorted_bucket) and generate the CLUSTERED BY syntax for them.
@huaxingao have we unified this part? We discussed this before and the parser should generate partition transforms for bucket spec already, instead of BucketSpec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't changed this yet. Remember that
c.partitioning ++ c.tableSpec.bucketSpec.map(_.asTransform) didn't work inside AstBuilder because sortColumnNames needed to be empty. Now it should work. I can have a follow up to fix this.
643673e to
3b8bde2
Compare
55fd5e9 to
7b52ca9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can turn this if into an assert, as this can never happen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't it convered by TABLE_RESERVED_PROPERTIES?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As PROP_EXTERNA is not include by TABLE_RESERVED_PROPERTIES. do you think it should be included by TABLE_RESERVED_PROPERTIES ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea it should, it's indeed a reserved property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
|
|
||
| private def showTableDataColumns(table: Table, builder: StringBuilder): Unit = { | ||
| val columns = table.schema().fields.map(_.toDDL) | ||
| val columns = CharVarcharUtils.getRawSchema(table.schema(), SQLConf.get).fields.map(_.toDDL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| val columns = CharVarcharUtils.getRawSchema(table.schema(), SQLConf.get).fields.map(_.toDDL) | |
| val columns = CharVarcharUtils.getRawSchema(table.schema(), conf).fields.map(_.toDDL) |
We are within a physical plan, we can access conf here.
| if (tableOptions.nonEmpty) { | ||
| val props = tableOptions.toSeq.sortBy(_._1).map { case (key, value) => | ||
| s"'${escapeSingleQuotedString(key)}' = '${escapeSingleQuotedString(value)}'" | ||
| val props = SQLConf.get.redactOptions(tableOptions).toSeq.sortBy(_._1).map { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| -- !query output | ||
| CREATE TABLE `default`.`tbl` ( | ||
| CREATE TABLE default.tbl ( | ||
| `a` FLOAT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be consistent, shall we also only quote the column name if needed? This is can be done in a new PR, as it's not related to v2 command migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I will do it.
|
@cloud-fan Update the PR. As the testcase failed #testcase. Maybe "Add the external to TABLE_RESERVED_PROPERTIES" in separate PR is more suitable? |
|
do you know why the test failed? |
without |
|
@Peng-Lei can you be more specific? What's the expectation of the tests and how do we break it? I see that we exclude |
expectation of the tests:
At first I was really confused why filter |
|
@Peng-Lei We should update |
|
thanks, merging to master! |
…me if needed ### What changes were proposed in this pull request? Quote the column name just needed instead of anyway. ### Why are the changes needed? [#comments](#35204 (comment)) ### Does this PR introduce _any_ user-facing change? Yes,It will change the result that users get the schema eg: ``` "STRUCT<`_c0`: STRING, `_c1`: INT>" => "STRUCT<_c0: STRING, _c1: INT>" ``` At now. for end-user. I learn about the 3 way to get schema directly 1. the function: eg ``` schema_of_json schema_of_csv ``` 2. table schema df.schema or show create table 3. call `toDDL` for StructType or StructField. ### How was this patch tested? existed testcase. Closes #35227 from Peng-Lei/Quote-Column. Authored-by: PengLei <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request? Take `external` as a reserved table property. and do not allow use `external` for end-user when `spark.sql.legacy.notReserveProperties` == `false`. ### Why are the changes needed? [#disscuss](#35204 (comment)). keep it consistent with other properties like `location` `owner` `provider` and so on. ### Does this PR introduce _any_ user-facing change? Yes. end-user could not use `external` as property key when create table with tblproperties and alter table set tblproperties. ### How was this patch tested? existed testcase. Closes #35268 from Peng-Lei/SPARK-37950. Authored-by: PengLei <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…n" property ### What changes were proposed in this pull request? This is a followup of #35204 . #35204 introduced a potential regression: it removes the "location" table property from `V1Table` if the table is not external. The intention was to avoid putting the LOCATION clause for managed tables in `ShowCreateTableExec`. However, if we use the v2 DESCRIBE TABLE command by default in the future, this will bring a behavior change and v2 DESCRIBE TABLE command won't print the table location for managed tables. This PR fixes this regression by using a different idea to fix the SHOW CREATE TABLE issue: 1. introduce a new reserved table property `is_managed_location`, to indicate that the location is managed by the catalog, not user given. 2. `ShowCreateTableExec` only generates the LOCATION clause if the "location" property is present and is not managed. ### Why are the changes needed? avoid a potential regression ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests. We can add a test when we use v2 DESCRIBE TABLE command by default. Closes #36498 from cloud-fan/regression. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…n" property ### What changes were proposed in this pull request? This is a followup of #35204 . #35204 introduced a potential regression: it removes the "location" table property from `V1Table` if the table is not external. The intention was to avoid putting the LOCATION clause for managed tables in `ShowCreateTableExec`. However, if we use the v2 DESCRIBE TABLE command by default in the future, this will bring a behavior change and v2 DESCRIBE TABLE command won't print the table location for managed tables. This PR fixes this regression by using a different idea to fix the SHOW CREATE TABLE issue: 1. introduce a new reserved table property `is_managed_location`, to indicate that the location is managed by the catalog, not user given. 2. `ShowCreateTableExec` only generates the LOCATION clause if the "location" property is present and is not managed. ### Why are the changes needed? avoid a potential regression ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests. We can add a test when we use v2 DESCRIBE TABLE command by default. Closes #36498 from cloud-fan/regression. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit fa2bda5) Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
quoted(identifier: TableIdentifier)to quoted the table name of V1 command(SHOW CREATE TABLE[AS SERDE]) to match V2 behavior. It just work whenquoteIfNeededaddV2TablePropertiesofV1Table. Just whenexternal == true, we will addlocationproperty.V1Table.Schema, re-construct the original schema from the string.SHOW CRATE TABLEShowTablePartitionsto match V1 behavior.Why are the changes needed?
It's been a while since we introduced the v2 commands, and it seems reasonable to use v2 commands by default even for the session catalog, with a legacy config to fall back to the v1 commands.
Does this PR introduce any user-facing change?
use V2 command as default for
show create tableif LEGACY_USE_V1_COMMAND == true
will use V1 command
How was this patch tested?
build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *ShowCreateTableSuite"