[BUG]: principal-prefix-access/create-missing-principals does not create all the roles from external locations #3505

pritishpai · 2025-01-09T17:51:51Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

The cli commands do not create all the UC credentials automatically based on the external locations identified in the assessment. Current workaround is to create the roles manually. The locations for which the roles are not created do not have a trailing slash after the bucket name.

Expected Behavior

The commands should automatically create the required UC credentials

Steps To Reproduce

No response

Cloud

AWS

Operating System

macOS

Version

latest via Databricks CLI

Relevant log output

No response

…age account (#3510) Closes #3505

* Implement disposition field in SQL backend ([#3477](#3477)). This commit introduces the `query_statement_disposition` configuration value to handle large SQL queries during assessment results export for workspaces with numerous findings. A new parameter is added to the `config.yml` file, allowing users to specify the disposition method for running large SQL statements. The modification includes changes to the `databricks labs install ucx` and `databricks labs ucx export-assessment` commands and updates to the SqlBackend definition. The `Disposition` enum is utilized to specify the disposition method in tests, which have been manually verified. This feature, developed by Michele Daddetta and Guenia Izquierdo Delgado, resolves issue [#3447](#3447) and is based on changes from PR [#3455](#3455). * AWS role issue with external locations pointing to the root of a storage account ([#3510](#3510)). In this release, the `AWSResources` class in `aws.py` has been updated to improve S3 bucket ARN pattern matching by modifying the regular expression pattern for matching. The `_identify_missing_paths` function in `access.py` has been enhanced to check for AWS role compatibility with external locations that point to the root of a storage account using `PurePath` class. Additionally, new unit tests have been added to `tests/unit/aws/test_access.py` to ensure the correct creation of all necessary UC roles, including the new external location `s3://BUCKET4` with an appropriate access level. These changes improve the accuracy of ARN pattern matching and enhance compatibility checking and testing for AWS roles and external locations. This release is part of the ongoing development of the AWS assessment tool and addresses issues [#3510](#3510) and [#3505](#3505). * Added dashboards to migration progress dashboard ([#3314](#3314)). This commit, co-authored by Guenia Izquierdo Delgado, modifies the migration progress dashboard to include linting resources, adds new dashboards, and improves overall functionality and maintainability. The changes include modifying the existing 'Migration [main]' dashboard and updating associated unit and integration tests. New dashboards such as `Dashboards migrated` and `Dashboard pending migration` provide valuable insights into the migration progress, displaying successful migrations and pending migration status by owner. The commit also reorganizes some existing queries and adds new methods to support the new functionality, addressing dependencies from issue [#3424](#3424) and progressing work on issue [#3045](#3045), while breaking up issue [#3112](#3112). * Added history log encoder for dashboards ([#3424](#3424)). This commit introduces a history log encoder for dashboards in the context of a larger application, addressing issues [#3368](#3368) and [#3369](#3369). The `experimental-migration-progress` workflow has been modified, and new classes, properties, and methods have been added to handle dashboard-related progress encoding. Specifically, the `Dashboard` class, `DashboardOwnership` class, and `DashboardProgressEncoder` class have been introduced, along with several methods for assessing dashboard ownership. These changes are tested through manual testing, unit tests, and integration tests. Additionally, the existing `TableProgressEncoder` class has been updated with new tests for failure scenarios involving tables that have not been migrated. The `WorkspacePathOwnership` method has been added to determine the owner of a given workspace path, and a new unit test has been added to test table creation from historical data. * Create specific failure for Python syntax error while parsing with Astroid ([#3498](#3498)). This commit enhances the Python linting functionality in our open-source library by introducing a specific failure message for syntax errors that occur during code parsing with Astroid. Previously, a generic `system-error` message was displayed, which provided limited guidance for users. Now, a new failure type called `python-parse-error` is displayed when a SyntaxError is raised during parsing, with detailed information such as the error message, line, and column numbers. This change aligns the failure type with `sql-parse-error` and adds a default GitHub issue template to report the error. Additionally, the commit renames `system-error` to `python-parse-error` to maintain consistency and updates the README to explain the new failure type. The commit also includes new unit tests to ensure that the new failure type is being handled correctly, and modifies the Python linting-related code to add a new method `Tree.maybe_parse()` to handle syntax errors. * DBR 16 and later support ([#3481](#3481)). This pull request introduces support for Databricks Runtime (DBR) 16 in the optional conversion of Hive Metastore (HMS) tables to external tables within the `migrate-tables` workflow. The update includes modifications to the existing `migrate-tables` workflow, such as the addition of a `_get_entity_storage_locations` method to check for the presence of the `entityStorageLocations` property in the table metadata, which is required for the `CatalogTable` constructor in DBR 16.0. The changes have been tested manually on DBR16, passed integration tests on DBR15, and verified on a staging environment using DBR16. Additionally, the `test_running_real_assessment_job` function in `test_workflows.py` has been updated to include the `skip_job_wait=True` parameter when running the `run_workflow` method for the `assessment` workflow, improving testing efficiency. The commit also includes a deprecated test case for converting managed tables to external before migrating, with a note about its failure from DBR 16.0 onwards due to a JDK update. The test case remains unchanged, but the note serves as a reminder for further investigation. The `run_workflow` function in the test cases has been modified to include a `skip_job_wait` parameter, allowing tests to bypass waiting for job completion, reducing overall test runtime and improving the developer experience. * Exclude ucx dashboards from Lakeview dashboard crawler ([#3450](#3450)). In this release, we have introduced modifications to the `assessment` workflow, specifically in the `dashboards.py` file, to exclude dashboards from the UCX package in the Lakeview dashboard crawler and prevent false positives. The `lakeview_crawler` method in the `application.py` file has been updated to include a new argument `exclude_dashboard_ids`, set to the list of dashboard IDs in the `install_state.dashboards` object. This ensures that these dashboards are excluded from the crawler. Additionally, two new unit tests have been added to ensure the exclusion functionality works correctly. The first test checks if the crawler skips the dashboard with the ID specified in the `exclude_dashboard_ids` parameter, and the second test ensures that the `exclude_dashboard_ids` parameter takes priority over the `include_dashboard_ids` parameter when both are provided. The changes have been manually tested and verified on the staging environment, and the linked issue [#3441](#3441) has been resolved. * Fixed issue in installing UCX on UC enabled workspace ([#3501](#3501)). In this release, we have updated the UCX policy definition for `spark_version` from a fixed value to an allowlist with a default value. This change resolves an issue where enabling UC on a workspace caused the cluster definition to take on `single_user` and `user_isolation` values instead of `Legacy_Single_User` and 'Legacy_Table_ACL'. The policy was found to be overriding these values, and changing `spark_version` from fixed to allowlist resolved the issue. Additionally, the job definition now uses the default value if no value is provided by setting `apply_policy_default_values` to true. This change resolves issue [#3420](#3420). No new methods have been added, and existing functionality has not been significantly altered. To test this change, updated unit tests, integration tests, and a static installation test should be performed. The code modification includes a new method called `test_job_cluster_on_uc_enabled_workspace` which tests the behavior of installation on a UC-enabled workspace, verifying that the correct data security modes are set for different job clusters. The changes in this release are backward compatible and do not affect existing functionality. The modification to the UCX policy ensures that the correct spark version and node type are selected, while also allowing for flexibility in data security modes. The updated tests provide confidence in the correct behavior of the installation process on both standard and UC-enabled workspaces. * Fixed typo in workflow name (in error message) ([#3491](#3491)). This PR fixes a minor typo in an error message that appears when group permissions fail to migrate successfully. The typo, found in the name of the workflow for validating permissions, has been corrected from `validate-group-permissions` to "validate-groups-permissions". This change enhances the user experience by providing clearer instructions for addressing issues with group permissions during migration. No new methods have been introduced, and existing functionality has been modified solely for the correction of the typo. The change does not impact any other parts of the codebase. This project is geared towards software engineers who seek to utilize its features. * Refactor `PipelineMigrator`'s to add `include_pipeline_ids` ([#3495](#3495)). In this release, the `PipelineMigrator` class in the `pipelines_migrate.py` file has been refactored to enhance the pipeline migration process. The refactor introduces a new parameter `include_pipeline_ids`, which allows users to specify a list of pipelines to migrate. Previously, users could only skip pipelines that were already migrated or explicitly specified using the `skip_pipeline_ids` parameter. With this refactor, users now have more control over the migration process by being able to explicitly include and exclude pipelines using the `include_pipeline_ids` and `exclude_pipeline_ids` parameters, respectively. Additionally, the implementation of the `PipelineMigrator` class has been simplified, and unit tests and integration tests have been updated to reflect these changes. As a software engineer, it is important to thoroughly test and validate this new behavior to ensure compatibility with existing systems. * Schedule the migration progress workflow to run daily ([#3485](#3485)). This PR introduces a daily schedule for the UCX installation's migration progress workflow, refactoring workflow management/installation plumbing to enable Cron-based scheduling and setting the default schedule for the migration progress workflow to run at 5 a.m. UTC. Relevant user documentation has been updated, and the existing `migration-progress-experimental` workflow has been modified. New test methods have been added to check for the presence of workflows and tasks, as well as validate the workflow's schedule and pause status. These changes improve automation and maintainability of the UCX installation process, while ensuring that existing functionalities are working correctly. * Scope crawled pipelines in PipelineCrawler ([#3513](#3513)). In this release, the `PipelineCrawler` class in the `databricks/labs/ucx/assessment` directory has been updated with a new optional argument `include_pipeline_ids` in the constructor. This argument is a list of strings that represent the IDs of pipelines to be crawled. If not provided, all pipelines will be crawled. The `_crawl` method has been modified to accept a list of pipeline IDs and now obtains a list of pipeline IDs instead of pipeline objects. For each pipeline ID, the method tries to get the pipeline and extract its configuration, while also checking for any failures. Additionally, assertions have been added to ensure that the `pipeline_id` and `spec.configuration` attributes are not `None`. A new test function `test_include_pipeline_ids()` has been introduced to verify the functionality of this argument. These changes improve the functionality of the `PipelineCrawler` class by allowing users to crawl specific pipelines based on their IDs. * Updated databricks-labs-blueprint requirement from <0.10,>=0.9.1 to >=0.9.1,<0.11 ([#3519](#3519)). In this update, the requirement for the `databricks-labs-blueprint` package has been updated to a version greater than or equal to 0.9.1 and strictly less than 0.11, previously it was greater than or equal to 0.9.1 and strictly less than 0.10. This change allows the latest version of the package to be used. Additionally, the commit includes release notes, a changelog, and commit information for the updated package, as well as instructions for Dependabot commands and options. The changes are limited to the `pyproject.toml` file and do not have any impact on other parts of the codebase. * Updated sqlglot requirement from <26.1,>=25.5.0 to >=25.5.0,<26.2 ([#3500](#3500)). In this pull request, we have updated the version requirement for the `sqlglot` dependency in the 'pyproject.toml' file. The previous version constraint was for a version greater than or equal to 25.5.0 and less than 26.1, but it has been relaxed to permit versions greater than or equal to 25.5.0 and less than 26.2. This change was made to enable the use of the latest version of 'sqlglot', which includes several new features, bug fixes, and breaking changes as detailed in the 26.1.0 changelog. We have also included the commit history for the `sqlglot` repository to provide further context and reference. This update aims to ensure compatibility with the latest version of `sqlglot` while also providing transparency regarding the changes implemented. * Updated sqlglot requirement from <26.2,>=25.5.0 to >=25.5.0,<26.3 ([#3528](#3528)). In this release, we have updated the required version constraint of the `sqlglot` library in the `pyproject.toml` file. The previous constraint `>=25.5.0,<26.2` has been updated to `>=25.5.0,<26.3`. This change allows the project to utilize the latest version of `sqlglot` within the newly specified range while maintaining compatibility with the project's existing requirements. Notably, this update does not introduce any new methods to the project; it only affects the version constraint for the `sqlglot` library. Software engineers integrating this project can now benefit from the latest `sqlglot` versions within the specified range. * Updated table-migration workflows to also capture updated migration progress into the history log ([#3239](#3239)). This pull request enhances the table-migration workflows by logging updated migration progress in the history log, providing improved visibility into the migration process. The workflows, including `migrate-tables`, `migrate-external-hiveserde-tables-in-place-experimental`, `migrate-external-tables-ctas`, `scan-tables-in-mounts-experimental`, and `migrate-tables-in-mounts-experimental`, have been updated to include this new logging functionality. In addition to these changes, the documentation has been updated to reflect which workflows update which tables, and the `TableMigrationStatus` data initialization behavior has been modified. New and updated unit and integration tests have been manually tested to ensure the changes are functioning correctly. Co-authored by Serge Smertin and Cor Zuurmond. Dependency updates: * Updated sqlglot requirement from <26.1,>=25.5.0 to >=25.5.0,<26.2 ([#3500](#3500)). * Updated databricks-labs-blueprint requirement from <0.10,>=0.9.1 to >=0.9.1,<0.11 ([#3519](#3519)).

pritishpai added bug needs-triage labels Jan 9, 2025

pritishpai added this to UCX Jan 9, 2025

github-project-automation bot moved this to Todo in UCX Jan 9, 2025

pritishpai assigned FastLee Jan 9, 2025

FastLee mentioned this issue Jan 10, 2025

AWS role issue with external locations pointing to the root of a storage account #3510

Merged

github-merge-queue bot pushed a commit that referenced this issue Jan 13, 2025

AWS role issue with external locations pointing to the root of a stor…

1c543c4

…age account (#3510) Closes #3505

FastLee closed this as completed in #3510 Jan 13, 2025

github-project-automation bot moved this from Todo to Done in UCX Jan 13, 2025

gueniai mentioned this issue Jan 16, 2025

Release v0.54.0 #3530

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: principal-prefix-access/create-missing-principals does not create all the roles from external locations #3505

[BUG]: principal-prefix-access/create-missing-principals does not create all the roles from external locations #3505

pritishpai commented Jan 9, 2025 •

edited

Loading

[BUG]: principal-prefix-access/create-missing-principals does not create all the roles from external locations #3505

[BUG]: principal-prefix-access/create-missing-principals does not create all the roles from external locations #3505

Comments

pritishpai commented Jan 9, 2025 • edited Loading

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Cloud

Operating System

Version

Relevant log output

pritishpai commented Jan 9, 2025 •

edited

Loading