-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-27926][SQL] Allow altering table add columns with CSVFileFormat/JsonFileFormat provider #24776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
i just submit this PR (#24780) that actually refactors Maybe it's worth if i merge your changes into my PR? Any other idea? |
|
Test build #106104 has finished for PR 24776 at commit
|
|
@emanuelebardelli I think we can merge this one first, and then you can update your refactoring PR. |
| // come in here. | ||
| case _: JsonDataSourceV2 | _: CSVDataSourceV2 | _: ParquetFileFormat | _: OrcDataSourceV2 => | ||
| case _: CSVFileFormat | _: JsonFileFormat | _: ParquetFileFormat => | ||
| case _: JsonDataSourceV2 | _: CSVDataSourceV2 | _: OrcDataSourceV2 => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These V2 data sources also support ADD COLUMNs? Do we have the test cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, V2 doesn't support ADD COLUMN. If it requires catalog support, Spark will fall back V2 to V1.
Currently the result of DataSource.lookupDataSource for "csv"/"json"/"orc" will always be CSVDataSourceV2/JsonDataSourceV2/OrcDataSourceV2. So we need to match them here.
|
LGTM Thanks! Merged to master. |
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM too
What changes were proposed in this pull request?
In the previous work of csv/json migration, CSVFileFormat/JsonFileFormat is removed in the table provider whitelist of
AlterTableAddColumnsCommand.verifyAlterTableAddColumn:#24005
#24058
This is regression. If a table is created with Provider
org.apache.spark.sql.execution.datasources.csv.CSVFileFormatororg.apache.spark.sql.execution.datasources.json.JsonFileFormat, Spark should allow the "alter table add column" operation.How was this patch tested?
Unit test