-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-19724][SQL]create a managed table with an existed default table should throw an exception #20886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #88530 has finished for PR 20886 at commit
|
|
Test build #88564 has finished for PR 20886 at commit
|
|
Test build #88565 has finished for PR 20886 at commit
|
|
Test build #88742 has finished for PR 20886 at commit
|
|
@gatorsmile This PR is ready for review. |
|
Test build #88749 has finished for PR 20886 at commit
|
|
retest this please. |
|
Test build #88769 has finished for PR 20886 at commit
|
|
Test build #88796 has finished for PR 20886 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move them to DDLSuite? Change the format based on isUsingHiveMetastore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We always check defaultTablePath ? Is that possible, the table location points to the different location?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think managed tables are always created on default path. So the check here should be correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CatalogTable already contains TableIdentifier . What is the reason you do not use the one directly?
|
This is a behavior change, we need to make it configurable and also document it in the migration guide. |
|
Test build #88799 has finished for PR 20886 at commit
|
|
Test build #88807 has started for PR 20886 at commit |
|
retest this please. |
1 similar comment
|
retest this please. |
|
Test build #88846 has finished for PR 20886 at commit
|
| .createWithDefault(false) | ||
|
|
||
| val ALLOW_NONEMPTY_MANAGED_TABLE_LOCATION = | ||
| buildConf("spark.sql.allowNonemptyManagedTableLocation") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark.sql.allowCreateManagedTableUsingNonemptyLocation
Also this should be an internal conf
| val fs = tableLocation.getFileSystem(hadoopConf) | ||
|
|
||
| if (fs.exists(tableLocation) && fs.listStatus(tableLocation).nonEmpty) { | ||
| throw new AnalysisException(s"Can not create the managed table('${table.identifier}')" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can not -> Not allowed to
|
Test build #88860 has finished for PR 20886 at commit
|
|
Test build #88874 has finished for PR 20886 at commit
|
|
retest this please. |
|
Test build #88878 has finished for PR 20886 at commit
|
|
LGTM Thanks! Merged to master |
…le should throw an exception ## What changes were proposed in this pull request? This PR is to finish apache#17272 This JIRA is a follow up work after SPARK-19583 As we discussed in that PR The following DDL for a managed table with an existed default location should throw an exception: CREATE TABLE ... (PARTITIONED BY ...) AS SELECT ... CREATE TABLE ... (PARTITIONED BY ...) Currently there are some situations which are not consist with above logic: CREATE TABLE ... (PARTITIONED BY ...) succeed with an existed default location situation: for both hive/datasource(with HiveExternalCatalog/InMemoryCatalog) CREATE TABLE ... (PARTITIONED BY ...) AS SELECT ... situation: hive table succeed with an existed default location This PR is going to make above two situations consist with the logic that it should throw an exception with an existed default location. ## How was this patch tested? unit test added Author: Gengliang Wang <[email protected]> Closes apache#20886 from gengliangwang/pr-17272.
…noreIfExists is true ## What changes were proposed in this pull request? In the PR apache#20886, I mistakenly check the table location only when `ignoreIfExists` is false, which was following the original deprecated PR. That was wrong. When `ignoreIfExists` is true and the target table doesn't exist, we should also check the table location. In other word, **`ignoreIfExists` has nothing to do with table location validation**. This is a follow-up PR to fix the mistake. ## How was this patch tested? Add one unit test. Author: Gengliang Wang <[email protected]> Closes apache#21001 from gengliangwang/SPARK-19724-followup.
What changes were proposed in this pull request?
This PR is to finish #17272
This JIRA is a follow up work after SPARK-19583
As we discussed in that PR
The following DDL for a managed table with an existed default location should throw an exception:
CREATE TABLE ... (PARTITIONED BY ...) AS SELECT ...
CREATE TABLE ... (PARTITIONED BY ...)
Currently there are some situations which are not consist with above logic:
CREATE TABLE ... (PARTITIONED BY ...) succeed with an existed default location
situation: for both hive/datasource(with HiveExternalCatalog/InMemoryCatalog)
CREATE TABLE ... (PARTITIONED BY ...) AS SELECT ...
situation: hive table succeed with an existed default location
This PR is going to make above two situations consist with the logic that it should throw an exception
with an existed default location.
How was this patch tested?
unit test added