-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE]: Migrate external tables not supported by the "sync" command #889
Comments
Migration strategy:
Changes required
Reference:
|
@qziyuan isn't table format already there? |
It looks like we have to pre-empt this decision making into create_table_mapping CSV |
@nfx For Hive Serde table, the current table format, derived from table.provider, will all be "HIVE". So we need extra info for serde, input/output format to differentiate them.
We could either
|
…erde tables (#1412) ## Changes 1. Add `MigrateHiveSerdeTablesInPlace` workflow to in-place upgrade external Parquet, Orc, Avro hiveserde tables. 2. Add functions in `tables.py` to describe the table and extract the hiveserde details, update the ddl from `show create table` by replacing the old table name with migration target and dbfs mount table location if any, the new ddl will be used to create the new table in UC for the in-place migrate. 3. Add `_migrate_external_table_hiveserde` function in `table_migrate.py`. Add two new arguments `mounts` and `hiveserde_in_place_migrate` in `TablesMigrator` class, `mounts` will be used to replace the dbfs mnt table location if any, `hiveserde_in_place_migrate` will be used to control which hiveserde to be migrated in current run so we can have multiple tasks running in parallel and each just migrate one type of hiveserde. This PR also removed majority of codes from PR #1432 , because only subset of table formats can be in-place migrated to UC with ddl from `show create table`. Simply creating table with the updated ddl for all `What.EXTERNAL_NO_SYNC` will fail. ### Linked issues Closes #889 ### Functionality - [ ] added relevant user documentation - [ ] added new CLI command - [ ] modified existing command: `databricks labs ucx ...` - [ ] added a new workflow - [ ] modified existing workflow: `...` - [ ] added a new table - [ ] modified existing table: `...` ### Tests <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [x] manually tested - [x] added unit tests - [x] added integration tests - [ ] verified on staging environment (screenshot attached)
…erde tables (databrickslabs#1412) ## Changes 1. Add `MigrateHiveSerdeTablesInPlace` workflow to in-place upgrade external Parquet, Orc, Avro hiveserde tables. 2. Add functions in `tables.py` to describe the table and extract the hiveserde details, update the ddl from `show create table` by replacing the old table name with migration target and dbfs mount table location if any, the new ddl will be used to create the new table in UC for the in-place migrate. 3. Add `_migrate_external_table_hiveserde` function in `table_migrate.py`. Add two new arguments `mounts` and `hiveserde_in_place_migrate` in `TablesMigrator` class, `mounts` will be used to replace the dbfs mnt table location if any, `hiveserde_in_place_migrate` will be used to control which hiveserde to be migrated in current run so we can have multiple tasks running in parallel and each just migrate one type of hiveserde. This PR also removed majority of codes from PR databrickslabs#1432 , because only subset of table formats can be in-place migrated to UC with ddl from `show create table`. Simply creating table with the updated ddl for all `What.EXTERNAL_NO_SYNC` will fail. ### Linked issues Closes databrickslabs#889 ### Functionality - [ ] added relevant user documentation - [ ] added new CLI command - [ ] modified existing command: `databricks labs ucx ...` - [ ] added a new workflow - [ ] modified existing workflow: `...` - [ ] added a new table - [ ] modified existing table: `...` ### Tests <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [x] manually tested - [x] added unit tests - [x] added integration tests - [ ] verified on staging environment (screenshot attached)
Is there an existing issue for this?
Problem statement
Tables that are not one of the supported table format for the sync command are not currently migrated to UC.
Fine-grained:
Related issues:
EXTERNAL
tables from cloud storage accounts #333databricks labs ucx migrate-tables
and a related workflow #670Proposed Solution
Allow users to migrate unsupported type, by converting these to Delta.
Additional Context
No response
The text was updated successfully, but these errors were encountered: