Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]: Create UC Storage Credential, Schema, and Table Grants based on Azure SPN access #907

Closed
1 task done
Tracked by #333
nkvuong opened this issue Feb 6, 2024 · 0 comments · Fixed by #1077
Closed
1 task done
Tracked by #333
Assignees
Labels
enhancement New feature or request migrate/access-control Access Control to things migrate/external go/uc/upgrade SYNC EXTERNAL TABLES step

Comments

@nkvuong
Copy link
Contributor

nkvuong commented Feb 6, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Problem statement

Currently, SPNs are used to govern access to Lakehouse tables. We need to recreate these ACLs in UC.

If the cluster, cluster policy, sql warehouse, or instance pool has Azure Service Principal configuration in it, grant relevant permissions of the same cluster to the storage credentials

Related issues:

Proposed Solution

  • select clusters, policies, sql warehouses and instance pools that triggered the Azure service principal failure
  • Map users/groups to those objects they can access directly or through an existing cluster.
  • Create table ACLS based on the SPN access.

Additional Context

No response

@nkvuong nkvuong added enhancement New feature or request needs-triage labels Feb 6, 2024
@nkvuong nkvuong added this to UCX Feb 6, 2024
@github-project-automation github-project-automation bot moved this to Triage in UCX Feb 6, 2024
@nfx nfx added migrate/external go/uc/upgrade SYNC EXTERNAL TABLES step and removed needs-triage labels Mar 5, 2024
@nfx nfx changed the title [FEATURE]: Create UC ACLs based on Azure SPN access [FEATURE]: Create UC Storage Credential, Schema, and Table Grants based on Azure SPN access Mar 13, 2024
@HariGS-DB HariGS-DB self-assigned this Mar 13, 2024
nfx pushed a commit that referenced this issue Mar 21, 2024
## Changes
<!-- Summary of your changes that are easy to understand. Add
screenshots when necessary -->

### Linked issues
<!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes,
fixed, resolve, resolves, resolved. See
https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword
-->

Resolves #340
To be followed up with PRs for  #887 #907

### Functionality 

- [x] modified existing workflow: `migrate-tables`

### Tests
<!-- How is this tested? Please see the checklist below and also
describe any other relevant tests -->

- [x] manually tested
- [x] added unit tests
- [x] added integration tests
- [x] verified on staging environment (screenshot attached)
nfx added a commit that referenced this issue Mar 21, 2024
* Added Legacy Table ACL grants migration ([#1054](#1054)). This commit introduces a legacy table ACL grants migration to the `migrate-tables` workflow, resolving issue [#340](#340) and paving the way for follow-up PRs [#887](#887) and [#907](#907). A new `GrantsCrawler` class is added for crawling grants, along with a `GroupManager` class to manage groups during migration. The `TablesMigrate` class is updated to accept an instance of `GrantsCrawler` and `GroupManager` in its constructor. The migration process has been thoroughly tested with unit tests, integration tests, and manual testing on a staging environment. The changes include the addition of a new Enum class `AclMigrationWhat` and updates to the `Table` dataclass, and affect the way tables are selected for migration based on rules. The logging and error handling have been improved in the `skip_schema` function.
* Added `databricks labs ucx cluster-remap` command to remap legacy cluster configurations to UC-compatible ([#994](#994)). In this open-source library update, we have developed and added the `databricks labs ucx cluster-remap` command, which facilitates the remapping of legacy cluster configurations to UC-compatible ones. This new CLI command comes with user documentation to guide the cluster remapping process. Additionally, we have expanded the functionality of creating and managing UC external catalogs and schemas with the inclusion of `create-catalogs-schemas` and `revert-cluster-remap` commands. This change does not modify existing commands or workflows and does not introduce new tables. The `databricks labs ucx cluster-remap` command allows users to re-map and revert the re-mapping of clusters from Unity Catalog (UC) using the CLI, ensuring compatibility and streamlining the migration process. The new command and associated functions have been manually tested for functionality.
* Added `migrate-tables` workflow ([#1051](#1051)). The `migrate-tables` workflow has been added, which allows for more fine-grained control over the resources allocated to the workspace. This workflow includes two new instance variables `min_workers` and `max_workers` in the `WorkspaceConfig` class, with default values of 1 and 10 respectively. A new `trigger` function has also been introduced, which initializes a configuration, SQL backend, and WorkspaceClient based on the provided configuration file. The `run_task` function has been added, which looks up the specified task, logs relevant information, and runs the task's function with the provided arguments. The `Task` class's `fn` attribute now includes an `Installation` object as a parameter. Additionally, a new `migrate-tables` workflow has been added for migrating tables from the Hive Metastore to the Unity Catalog, along with new classes and methods for table mapping, migration status refreshing, and migrating tables. The `migrate_dbfs_root_delta_tables` and `migrate_external_tables_sync` methods perform migrations for Delta tables located in the DBFS root and synchronize external tables, respectively. These functions use the workspace client to access the catalogs and ensure proper migration. Integration tests have also been added for these new methods to ensure their correct operation.
* Added handling for `SYNC` command failures ([#1073](#1073)). This pull request introduces changes to improve handling of `SYNC` command failures during external table migrations in the Hive metastore. Previously, the `SYNC` command's result was not checked, and failures were not logged. Now, the `_migrate_external_table` method in `table_migrate.py` fetches the result of the `SYNC` command execution, logs a warning message for failures, and returns `False` if the command fails. A new integration test has been added to simulate a failed `SYNC` command due to a non-existent catalog and schema, ensuring the migration tool handles such failures. A new test case has also been added to verify the handling of `SYNC` command failures during external table migrations, using a mock backend to simulate failures and checking for appropriate log messages. These changes enhance the reliability and robustness of the migration process, providing clearer error diagnosis and handling for potential `SYNC` command failures.
* Added initial version of `databricks labs ucx migrate-local-code` command ([#1067](#1067)). A new `databricks labs ucx migrate-local-code` command has been added to facilitate migration of local code to a Databricks environment, specifically targeting Python and SQL files. This initial version is experimental and aims to help users and administrators manage code migration, maintain consistency across workspaces, and enhance compatibility with the Unity Catalog, a component of Databricks' data and AI offerings. The command introduces a new `Files` class for applying migrations to code files, considering their language. It also updates the `.gitignore` file and the pyproject.toml file to ensure appropriate version control management. Additionally, new classes and methods have been added to support code analysis, transformation, and linting for various programming languages. These improvements will aid in streamlining the migration process and ensuring compatibility with Databricks' environment.
* Added instance pool to cluster policy ([#1078](#1078)). A new field, `instance_pool_id`, has been added to the cluster policy configuration in `policy.py`, allowing users to specify the ID of an instance pool to be applied to all workflow clusters in the policy. This ID can be manually set or automatically retrieved by the system. A new private method, `_get_instance_pool_id()`, has been added to handle the retrieval of the instance pool ID. Additionally, a new test for table migration jobs has been added to `test_installation.py` to ensure the migration job is correctly configured with the specified parallelism, minimum and maximum number of workers, and instance pool ID. A new test case for creating a cluster policy with an instance pool has also been added to `tests/unit/installer/test_policy.py` to ensure the instance pool is added to the cluster policy during creation. These changes provide users with more control over instance pools and cluster policies, and improve the overall functionality of the library.
* Fixed `ucx move` logic for `MANAGED` & `EXTERNAL` tables ([#1062](#1062)). The `ucx move` command has been updated to allow for the movement of UC tables/views after the table upgrade process, providing flexibility in managing catalog structure. The command now supports moving multiple tables simultaneously, dropping managed tables/views upon confirmation, and deep-cloning managed tables while dropping and recreating external tables. A refactoring of the `TableMove` class has improved code organization and readability, and the associated unit tests have been updated to reflect these changes. This feature is targeted towards developers and administrators seeking to adjust their catalog structure after table upgrades, with the added ability to manage exceptional conditions gracefully.
* Fixed integration testing with random product names ([#1074](#1074)). In the recent update, the `trigger` function in the `tasks.py` module of the `ucx` framework has undergone modification to incorporate a new argument, `install_folder`, within the `Installation` object. This object is now generated locally within the `trigger` function and subsequently passed to the `run_task` function. The `install_folder` is determined by obtaining the parent directory of the `config_path` variable, transforming it into a POSIX-style path, and eliminating the leading "/Workspace" prefix. This enhancement guarantees that the `run_task` function acquires the correct installation folder for the `ucx` framework, thereby improving the overall functionality and precision of the framework. Furthermore, the `Installation.current` method has been supplanted with the newly formed `Installation` object, which now encompasses the `install_folder` argument.
* Refactor installer to separate workflows methods from the installer class ([#1055](#1055)). In this release, the installer in the `cli.py` file has been refactored to improve modularity and maintainability. The installation and workflow functionalities have been separated by importing a new class called `WorkflowsInstallation` from `databricks.labs.ucx.installer.workflows`. The `WorkspaceInstallation` class is no longer used in various functions, and the new `WorkflowsInstallation` class is used instead. Additionally, a new mixin class called `InstallationMixin` has been introduced, which includes methods for uninstalling UCX, removing jobs, and validating installation steps. The `WorkflowsInstallation` class now inherits from this mixin class. A new file, `workflows.py`, has been added to the `databricks/labs/ucx/installer` directory, which contains methods for managing Databricks jobs. The new `WorkflowsInstallation` class is responsible for deploying workflows, uploading wheels to DBFS or WSFS, and creating debug notebooks. The refactoring also includes the addition of new methods for handling specific workflows, such as `run_workflow`, `validate_step`, and `repair_run`, which are now contained in the `WorkflowsInstallation` class. The `test_install.py` file in the `tests/unit` directory has also been updated to include new imports and test functions to accommodate these changes.
* Skip unsupported locations while migrating to external location in Azure ([#1066](#1066)). In this release, we have updated the functionality of migrating to an external location in Azure. A new private method `_filter_unsupported_location` has been added to the `locations.py` file, which checks if the location URLs are supported and removes the unsupported ones from the list. Only locations starting with "abfss://" are considered supported. Unsupported locations are logged with a warning message. Additionally, a new test `test_skip_unsupported_location` has been introduced to verify that the `location_migration` function correctly skips unsupported locations during migration to external locations in Azure. The test checks if the correct log messages are generated for skipped unsupported locations, and it mocks various scenarios such as crawled HMS external locations, storage credentials, UC external locations, and installation with permission mapping. The mock crawled HMS external locations contain two unsupported locations: `adl://` and `wasbs://`. This ensures that the function handles unsupported locations correctly, avoiding any unnecessary errors or exceptions during migration.
* Triggering Assessment Workflow from Installer based on User Prompt ([#1007](#1007)). A new functionality has been added to the installer that allows users to trigger an assessment workflow based on a prompt during the installation process. The `_trigger_workflow` method has been implemented, which can be initiated with a step string argument. This method retrieves the job ID for the specified step from the `_state.jobs` dictionary, generates the job URL, and triggers the job using the `run_now` method from the `jobs` class of the Workspace object. Users will be asked to confirm triggering the assessment workflow and will have the option to open the job URL in a web browser after triggering it. A new unit test, `test_triggering_assessment_wf`, has been introduced to the `test_install.py` file to verify the functionality of triggering an assessment workflow based on user prompt. This test uses existing classes and functions, such as `MockBackend`, `MockPrompts`, `WorkspaceConfig`, and `WorkspaceInstallation`, to run the `WorkspaceInstallation.run` method with a mocked `WorkspaceConfig` object and a mock installation. The test also includes a user prompt to confirm triggering the assessment job and opening the assessment job URL. The new functionality and test improve the installation process by enabling users to easily trigger the assessment workflow based on their specific needs.
* Updated README.md for Service Principal Installation Limit ([#1076](#1076)). This release includes an update to the README.md file to clarify that installing UCX with a Service Principal is not supported. Previously, the file indicated that Databricks Workspace Administrator privileges were required for the user running the installation, but did not explicitly state that Service Principal installation is not supported. The updated text now includes this information, ensuring that users have a clear understanding of the requirements and limitations of the installation process. The rest of the file remains unchanged and continues to provide instructions for installing UCX, including required software and network access. No new methods or functionality have been added, and no existing functionality has been changed beyond the addition of this clarification. The changes in this release have been manually tested to ensure they are functioning as intended.
@nfx nfx mentioned this issue Mar 21, 2024
nfx added a commit that referenced this issue Mar 21, 2024
* Added Legacy Table ACL grants migration
([#1054](#1054)). This
commit introduces a legacy table ACL grants migration to the
`migrate-tables` workflow, resolving issue
[#340](#340) and paving the
way for follow-up PRs
[#887](#887) and
[#907](#907). A new
`GrantsCrawler` class is added for crawling grants, along with a
`GroupManager` class to manage groups during migration. The
`TablesMigrate` class is updated to accept an instance of
`GrantsCrawler` and `GroupManager` in its constructor. The migration
process has been thoroughly tested with unit tests, integration tests,
and manual testing on a staging environment. The changes include the
addition of a new Enum class `AclMigrationWhat` and updates to the
`Table` dataclass, and affect the way tables are selected for migration
based on rules. The logging and error handling have been improved in the
`skip_schema` function.
* Added `databricks labs ucx cluster-remap` command to remap legacy
cluster configurations to UC-compatible
([#994](#994)). In this
open-source library update, we have developed and added the `databricks
labs ucx cluster-remap` command, which facilitates the remapping of
legacy cluster configurations to UC-compatible ones. This new CLI
command comes with user documentation to guide the cluster remapping
process. Additionally, we have expanded the functionality of creating
and managing UC external catalogs and schemas with the inclusion of
`create-catalogs-schemas` and `revert-cluster-remap` commands. This
change does not modify existing commands or workflows and does not
introduce new tables. The `databricks labs ucx cluster-remap` command
allows users to re-map and revert the re-mapping of clusters from Unity
Catalog (UC) using the CLI, ensuring compatibility and streamlining the
migration process. The new command and associated functions have been
manually tested for functionality.
* Added `migrate-tables` workflow
([#1051](#1051)). The
`migrate-tables` workflow has been added, which allows for more
fine-grained control over the resources allocated to the workspace. This
workflow includes two new instance variables `min_workers` and
`max_workers` in the `WorkspaceConfig` class, with default values of 1
and 10 respectively. A new `trigger` function has also been introduced,
which initializes a configuration, SQL backend, and WorkspaceClient
based on the provided configuration file. The `run_task` function has
been added, which looks up the specified task, logs relevant
information, and runs the task's function with the provided arguments.
The `Task` class's `fn` attribute now includes an `Installation` object
as a parameter. Additionally, a new `migrate-tables` workflow has been
added for migrating tables from the Hive Metastore to the Unity Catalog,
along with new classes and methods for table mapping, migration status
refreshing, and migrating tables. The `migrate_dbfs_root_delta_tables`
and `migrate_external_tables_sync` methods perform migrations for Delta
tables located in the DBFS root and synchronize external tables,
respectively. These functions use the workspace client to access the
catalogs and ensure proper migration. Integration tests have also been
added for these new methods to ensure their correct operation.
* Added handling for `SYNC` command failures
([#1073](#1073)). This pull
request introduces changes to improve handling of `SYNC` command
failures during external table migrations in the Hive metastore.
Previously, the `SYNC` command's result was not checked, and failures
were not logged. Now, the `_migrate_external_table` method in
`table_migrate.py` fetches the result of the `SYNC` command execution,
logs a warning message for failures, and returns `False` if the command
fails. A new integration test has been added to simulate a failed `SYNC`
command due to a non-existent catalog and schema, ensuring the migration
tool handles such failures. A new test case has also been added to
verify the handling of `SYNC` command failures during external table
migrations, using a mock backend to simulate failures and checking for
appropriate log messages. These changes enhance the reliability and
robustness of the migration process, providing clearer error diagnosis
and handling for potential `SYNC` command failures.
* Added initial version of `databricks labs ucx migrate-local-code`
command ([#1067](#1067)). A
new `databricks labs ucx migrate-local-code` command has been added to
facilitate migration of local code to a Databricks environment,
specifically targeting Python and SQL files. This initial version is
experimental and aims to help users and administrators manage code
migration, maintain consistency across workspaces, and enhance
compatibility with the Unity Catalog, a component of Databricks' data
and AI offerings. The command introduces a new `Files` class for
applying migrations to code files, considering their language. It also
updates the `.gitignore` file and the pyproject.toml file to ensure
appropriate version control management. Additionally, new classes and
methods have been added to support code analysis, transformation, and
linting for various programming languages. These improvements will aid
in streamlining the migration process and ensuring compatibility with
Databricks' environment.
* Added instance pool to cluster policy
([#1078](#1078)). A new
field, `instance_pool_id`, has been added to the cluster policy
configuration in `policy.py`, allowing users to specify the ID of an
instance pool to be applied to all workflow clusters in the policy. This
ID can be manually set or automatically retrieved by the system. A new
private method, `_get_instance_pool_id()`, has been added to handle the
retrieval of the instance pool ID. Additionally, a new test for table
migration jobs has been added to `test_installation.py` to ensure the
migration job is correctly configured with the specified parallelism,
minimum and maximum number of workers, and instance pool ID. A new test
case for creating a cluster policy with an instance pool has also been
added to `tests/unit/installer/test_policy.py` to ensure the instance
pool is added to the cluster policy during creation. These changes
provide users with more control over instance pools and cluster
policies, and improve the overall functionality of the library.
* Fixed `ucx move` logic for `MANAGED` & `EXTERNAL` tables
([#1062](#1062)). The `ucx
move` command has been updated to allow for the movement of UC
tables/views after the table upgrade process, providing flexibility in
managing catalog structure. The command now supports moving multiple
tables simultaneously, dropping managed tables/views upon confirmation,
and deep-cloning managed tables while dropping and recreating external
tables. A refactoring of the `TableMove` class has improved code
organization and readability, and the associated unit tests have been
updated to reflect these changes. This feature is targeted towards
developers and administrators seeking to adjust their catalog structure
after table upgrades, with the added ability to manage exceptional
conditions gracefully.
* Fixed integration testing with random product names
([#1074](#1074)). In the
recent update, the `trigger` function in the `tasks.py` module of the
`ucx` framework has undergone modification to incorporate a new
argument, `install_folder`, within the `Installation` object. This
object is now generated locally within the `trigger` function and
subsequently passed to the `run_task` function. The `install_folder` is
determined by obtaining the parent directory of the `config_path`
variable, transforming it into a POSIX-style path, and eliminating the
leading "/Workspace" prefix. This enhancement guarantees that the
`run_task` function acquires the correct installation folder for the
`ucx` framework, thereby improving the overall functionality and
precision of the framework. Furthermore, the `Installation.current`
method has been supplanted with the newly formed `Installation` object,
which now encompasses the `install_folder` argument.
* Refactor installer to separate workflows methods from the installer
class ([#1055](#1055)). In
this release, the installer in the `cli.py` file has been refactored to
improve modularity and maintainability. The installation and workflow
functionalities have been separated by importing a new class called
`WorkflowsInstallation` from `databricks.labs.ucx.installer.workflows`.
The `WorkspaceInstallation` class is no longer used in various
functions, and the new `WorkflowsInstallation` class is used instead.
Additionally, a new mixin class called `InstallationMixin` has been
introduced, which includes methods for uninstalling UCX, removing jobs,
and validating installation steps. The `WorkflowsInstallation` class now
inherits from this mixin class. A new file, `workflows.py`, has been
added to the `databricks/labs/ucx/installer` directory, which contains
methods for managing Databricks jobs. The new `WorkflowsInstallation`
class is responsible for deploying workflows, uploading wheels to DBFS
or WSFS, and creating debug notebooks. The refactoring also includes the
addition of new methods for handling specific workflows, such as
`run_workflow`, `validate_step`, and `repair_run`, which are now
contained in the `WorkflowsInstallation` class. The `test_install.py`
file in the `tests/unit` directory has also been updated to include new
imports and test functions to accommodate these changes.
* Skip unsupported locations while migrating to external location in
Azure ([#1066](#1066)). In
this release, we have updated the functionality of migrating to an
external location in Azure. A new private method
`_filter_unsupported_location` has been added to the `locations.py`
file, which checks if the location URLs are supported and removes the
unsupported ones from the list. Only locations starting with "abfss://"
are considered supported. Unsupported locations are logged with a
warning message. Additionally, a new test
`test_skip_unsupported_location` has been introduced to verify that the
`location_migration` function correctly skips unsupported locations
during migration to external locations in Azure. The test checks if the
correct log messages are generated for skipped unsupported locations,
and it mocks various scenarios such as crawled HMS external locations,
storage credentials, UC external locations, and installation with
permission mapping. The mock crawled HMS external locations contain two
unsupported locations: `adl://` and `wasbs://`. This ensures that the
function handles unsupported locations correctly, avoiding any
unnecessary errors or exceptions during migration.
* Triggering Assessment Workflow from Installer based on User Prompt
([#1007](#1007)). A new
functionality has been added to the installer that allows users to
trigger an assessment workflow based on a prompt during the installation
process. The `_trigger_workflow` method has been implemented, which can
be initiated with a step string argument. This method retrieves the job
ID for the specified step from the `_state.jobs` dictionary, generates
the job URL, and triggers the job using the `run_now` method from the
`jobs` class of the Workspace object. Users will be asked to confirm
triggering the assessment workflow and will have the option to open the
job URL in a web browser after triggering it. A new unit test,
`test_triggering_assessment_wf`, has been introduced to the
`test_install.py` file to verify the functionality of triggering an
assessment workflow based on user prompt. This test uses existing
classes and functions, such as `MockBackend`, `MockPrompts`,
`WorkspaceConfig`, and `WorkspaceInstallation`, to run the
`WorkspaceInstallation.run` method with a mocked `WorkspaceConfig`
object and a mock installation. The test also includes a user prompt to
confirm triggering the assessment job and opening the assessment job
URL. The new functionality and test improve the installation process by
enabling users to easily trigger the assessment workflow based on their
specific needs.
* Updated README.md for Service Principal Installation Limit
([#1076](#1076)). This
release includes an update to the README.md file to clarify that
installing UCX with a Service Principal is not supported. Previously,
the file indicated that Databricks Workspace Administrator privileges
were required for the user running the installation, but did not
explicitly state that Service Principal installation is not supported.
The updated text now includes this information, ensuring that users have
a clear understanding of the requirements and limitations of the
installation process. The rest of the file remains unchanged and
continues to provide instructions for installing UCX, including required
software and network access. No new methods or functionality have been
added, and no existing functionality has been changed beyond the
addition of this clarification. The changes in this release have been
manually tested to ensure they are functioning as intended.
dmoore247 pushed a commit that referenced this issue Mar 23, 2024
## Changes
<!-- Summary of your changes that are easy to understand. Add
screenshots when necessary -->

### Linked issues
<!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes,
fixed, resolve, resolves, resolved. See
https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword
-->

Resolves #340
To be followed up with PRs for  #887 #907

### Functionality 

- [x] modified existing workflow: `migrate-tables`

### Tests
<!-- How is this tested? Please see the checklist below and also
describe any other relevant tests -->

- [x] manually tested
- [x] added unit tests
- [x] added integration tests
- [x] verified on staging environment (screenshot attached)
dmoore247 pushed a commit that referenced this issue Mar 23, 2024
* Added Legacy Table ACL grants migration
([#1054](#1054)). This
commit introduces a legacy table ACL grants migration to the
`migrate-tables` workflow, resolving issue
[#340](#340) and paving the
way for follow-up PRs
[#887](#887) and
[#907](#907). A new
`GrantsCrawler` class is added for crawling grants, along with a
`GroupManager` class to manage groups during migration. The
`TablesMigrate` class is updated to accept an instance of
`GrantsCrawler` and `GroupManager` in its constructor. The migration
process has been thoroughly tested with unit tests, integration tests,
and manual testing on a staging environment. The changes include the
addition of a new Enum class `AclMigrationWhat` and updates to the
`Table` dataclass, and affect the way tables are selected for migration
based on rules. The logging and error handling have been improved in the
`skip_schema` function.
* Added `databricks labs ucx cluster-remap` command to remap legacy
cluster configurations to UC-compatible
([#994](#994)). In this
open-source library update, we have developed and added the `databricks
labs ucx cluster-remap` command, which facilitates the remapping of
legacy cluster configurations to UC-compatible ones. This new CLI
command comes with user documentation to guide the cluster remapping
process. Additionally, we have expanded the functionality of creating
and managing UC external catalogs and schemas with the inclusion of
`create-catalogs-schemas` and `revert-cluster-remap` commands. This
change does not modify existing commands or workflows and does not
introduce new tables. The `databricks labs ucx cluster-remap` command
allows users to re-map and revert the re-mapping of clusters from Unity
Catalog (UC) using the CLI, ensuring compatibility and streamlining the
migration process. The new command and associated functions have been
manually tested for functionality.
* Added `migrate-tables` workflow
([#1051](#1051)). The
`migrate-tables` workflow has been added, which allows for more
fine-grained control over the resources allocated to the workspace. This
workflow includes two new instance variables `min_workers` and
`max_workers` in the `WorkspaceConfig` class, with default values of 1
and 10 respectively. A new `trigger` function has also been introduced,
which initializes a configuration, SQL backend, and WorkspaceClient
based on the provided configuration file. The `run_task` function has
been added, which looks up the specified task, logs relevant
information, and runs the task's function with the provided arguments.
The `Task` class's `fn` attribute now includes an `Installation` object
as a parameter. Additionally, a new `migrate-tables` workflow has been
added for migrating tables from the Hive Metastore to the Unity Catalog,
along with new classes and methods for table mapping, migration status
refreshing, and migrating tables. The `migrate_dbfs_root_delta_tables`
and `migrate_external_tables_sync` methods perform migrations for Delta
tables located in the DBFS root and synchronize external tables,
respectively. These functions use the workspace client to access the
catalogs and ensure proper migration. Integration tests have also been
added for these new methods to ensure their correct operation.
* Added handling for `SYNC` command failures
([#1073](#1073)). This pull
request introduces changes to improve handling of `SYNC` command
failures during external table migrations in the Hive metastore.
Previously, the `SYNC` command's result was not checked, and failures
were not logged. Now, the `_migrate_external_table` method in
`table_migrate.py` fetches the result of the `SYNC` command execution,
logs a warning message for failures, and returns `False` if the command
fails. A new integration test has been added to simulate a failed `SYNC`
command due to a non-existent catalog and schema, ensuring the migration
tool handles such failures. A new test case has also been added to
verify the handling of `SYNC` command failures during external table
migrations, using a mock backend to simulate failures and checking for
appropriate log messages. These changes enhance the reliability and
robustness of the migration process, providing clearer error diagnosis
and handling for potential `SYNC` command failures.
* Added initial version of `databricks labs ucx migrate-local-code`
command ([#1067](#1067)). A
new `databricks labs ucx migrate-local-code` command has been added to
facilitate migration of local code to a Databricks environment,
specifically targeting Python and SQL files. This initial version is
experimental and aims to help users and administrators manage code
migration, maintain consistency across workspaces, and enhance
compatibility with the Unity Catalog, a component of Databricks' data
and AI offerings. The command introduces a new `Files` class for
applying migrations to code files, considering their language. It also
updates the `.gitignore` file and the pyproject.toml file to ensure
appropriate version control management. Additionally, new classes and
methods have been added to support code analysis, transformation, and
linting for various programming languages. These improvements will aid
in streamlining the migration process and ensuring compatibility with
Databricks' environment.
* Added instance pool to cluster policy
([#1078](#1078)). A new
field, `instance_pool_id`, has been added to the cluster policy
configuration in `policy.py`, allowing users to specify the ID of an
instance pool to be applied to all workflow clusters in the policy. This
ID can be manually set or automatically retrieved by the system. A new
private method, `_get_instance_pool_id()`, has been added to handle the
retrieval of the instance pool ID. Additionally, a new test for table
migration jobs has been added to `test_installation.py` to ensure the
migration job is correctly configured with the specified parallelism,
minimum and maximum number of workers, and instance pool ID. A new test
case for creating a cluster policy with an instance pool has also been
added to `tests/unit/installer/test_policy.py` to ensure the instance
pool is added to the cluster policy during creation. These changes
provide users with more control over instance pools and cluster
policies, and improve the overall functionality of the library.
* Fixed `ucx move` logic for `MANAGED` & `EXTERNAL` tables
([#1062](#1062)). The `ucx
move` command has been updated to allow for the movement of UC
tables/views after the table upgrade process, providing flexibility in
managing catalog structure. The command now supports moving multiple
tables simultaneously, dropping managed tables/views upon confirmation,
and deep-cloning managed tables while dropping and recreating external
tables. A refactoring of the `TableMove` class has improved code
organization and readability, and the associated unit tests have been
updated to reflect these changes. This feature is targeted towards
developers and administrators seeking to adjust their catalog structure
after table upgrades, with the added ability to manage exceptional
conditions gracefully.
* Fixed integration testing with random product names
([#1074](#1074)). In the
recent update, the `trigger` function in the `tasks.py` module of the
`ucx` framework has undergone modification to incorporate a new
argument, `install_folder`, within the `Installation` object. This
object is now generated locally within the `trigger` function and
subsequently passed to the `run_task` function. The `install_folder` is
determined by obtaining the parent directory of the `config_path`
variable, transforming it into a POSIX-style path, and eliminating the
leading "/Workspace" prefix. This enhancement guarantees that the
`run_task` function acquires the correct installation folder for the
`ucx` framework, thereby improving the overall functionality and
precision of the framework. Furthermore, the `Installation.current`
method has been supplanted with the newly formed `Installation` object,
which now encompasses the `install_folder` argument.
* Refactor installer to separate workflows methods from the installer
class ([#1055](#1055)). In
this release, the installer in the `cli.py` file has been refactored to
improve modularity and maintainability. The installation and workflow
functionalities have been separated by importing a new class called
`WorkflowsInstallation` from `databricks.labs.ucx.installer.workflows`.
The `WorkspaceInstallation` class is no longer used in various
functions, and the new `WorkflowsInstallation` class is used instead.
Additionally, a new mixin class called `InstallationMixin` has been
introduced, which includes methods for uninstalling UCX, removing jobs,
and validating installation steps. The `WorkflowsInstallation` class now
inherits from this mixin class. A new file, `workflows.py`, has been
added to the `databricks/labs/ucx/installer` directory, which contains
methods for managing Databricks jobs. The new `WorkflowsInstallation`
class is responsible for deploying workflows, uploading wheels to DBFS
or WSFS, and creating debug notebooks. The refactoring also includes the
addition of new methods for handling specific workflows, such as
`run_workflow`, `validate_step`, and `repair_run`, which are now
contained in the `WorkflowsInstallation` class. The `test_install.py`
file in the `tests/unit` directory has also been updated to include new
imports and test functions to accommodate these changes.
* Skip unsupported locations while migrating to external location in
Azure ([#1066](#1066)). In
this release, we have updated the functionality of migrating to an
external location in Azure. A new private method
`_filter_unsupported_location` has been added to the `locations.py`
file, which checks if the location URLs are supported and removes the
unsupported ones from the list. Only locations starting with "abfss://"
are considered supported. Unsupported locations are logged with a
warning message. Additionally, a new test
`test_skip_unsupported_location` has been introduced to verify that the
`location_migration` function correctly skips unsupported locations
during migration to external locations in Azure. The test checks if the
correct log messages are generated for skipped unsupported locations,
and it mocks various scenarios such as crawled HMS external locations,
storage credentials, UC external locations, and installation with
permission mapping. The mock crawled HMS external locations contain two
unsupported locations: `adl://` and `wasbs://`. This ensures that the
function handles unsupported locations correctly, avoiding any
unnecessary errors or exceptions during migration.
* Triggering Assessment Workflow from Installer based on User Prompt
([#1007](#1007)). A new
functionality has been added to the installer that allows users to
trigger an assessment workflow based on a prompt during the installation
process. The `_trigger_workflow` method has been implemented, which can
be initiated with a step string argument. This method retrieves the job
ID for the specified step from the `_state.jobs` dictionary, generates
the job URL, and triggers the job using the `run_now` method from the
`jobs` class of the Workspace object. Users will be asked to confirm
triggering the assessment workflow and will have the option to open the
job URL in a web browser after triggering it. A new unit test,
`test_triggering_assessment_wf`, has been introduced to the
`test_install.py` file to verify the functionality of triggering an
assessment workflow based on user prompt. This test uses existing
classes and functions, such as `MockBackend`, `MockPrompts`,
`WorkspaceConfig`, and `WorkspaceInstallation`, to run the
`WorkspaceInstallation.run` method with a mocked `WorkspaceConfig`
object and a mock installation. The test also includes a user prompt to
confirm triggering the assessment job and opening the assessment job
URL. The new functionality and test improve the installation process by
enabling users to easily trigger the assessment workflow based on their
specific needs.
* Updated README.md for Service Principal Installation Limit
([#1076](#1076)). This
release includes an update to the README.md file to clarify that
installing UCX with a Service Principal is not supported. Previously,
the file indicated that Databricks Workspace Administrator privileges
were required for the user running the installation, but did not
explicitly state that Service Principal installation is not supported.
The updated text now includes this information, ensuring that users have
a clear understanding of the requirements and limitations of the
installation process. The rest of the file remains unchanged and
continues to provide instructions for installing UCX, including required
software and network access. No new methods or functionality have been
added, and no existing functionality has been changed beyond the
addition of this clarification. The changes in this release have been
manually tested to ensure they are functioning as intended.
@nfx nfx added the migrate/access-control Access Control to things label Mar 25, 2024
@nfx nfx closed this as completed in #1077 Mar 31, 2024
@github-project-automation github-project-automation bot moved this from Triage to Archive in UCX Mar 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request migrate/access-control Access Control to things migrate/external go/uc/upgrade SYNC EXTERNAL TABLES step
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants