added doc to clarify that principal-prefix-access for azure will only list abfss storage accounts #2212

HariGS-DB · 2024-07-18T21:53:41Z

Changes

This PR updates the documentation to clarify that principal-prefix-access cli cmd for azure will only list abfss:// storage account. storage account used as adl:// or wasb:// will not be listed as those will not be used for credential migration

Linked issues

#1065

Resolves #1552

Functionality

added relevant user documentation
added new CLI command
modified existing command: databricks labs ucx ...
added a new workflow
modified existing workflow: ...
added a new table
modified existing table: ...

Tests

manually tested
added unit tests
added integration tests
verified on staging environment (screenshot attached)

…list wasb:// or adl:// storage account

github-actions · 2024-07-18T21:58:08Z

✅ 2/2 passed, 13s total

_{Running from acceptance #4743}

JCZuurmond

Small nit

README.md

* Added `lsql` lakeview dashboard-as-code implementation ([#1920](#1920)). The open-source library has been updated with new features in its dashboard creation functionality. The `assessment_report` and `estimates_report` jobs, along with their corresponding tasks, have been removed. The `crawl_groups` task has been modified to accept a new parameter, `group_manager`. These changes are part of a larger implementation of the `lsql` Lakeview dashboard-as-code system for creating dashboards. The new implementation has been tested through manual testing, existing unit tests, integration tests, and verification on a staging environment, and is expected to improve the functionality and maintainability of the dashboards. The removal of the `assessment_report` and `estimates_report` jobs and tasks may indicate that their functionality has been incorporated into the new `lsql` implementation or is no longer necessary. The new `crawl_groups` task parameter may be used in conjunction with the new `lsql` implementation to enhance the assessment and estimation of groups. * Added new widget to get table count ([#2202](#2202)). A new widget has been introduced that presents a table count summary, categorized by type (external or managed), location (DBFS root, mount, cloud), and format (delta, parquet, etc.). This enhancement is complemented by an additional SQL file, responsible for generating necessary count statistics. The script discerns the table type and location through location string analysis and subsequent categorization. The output is structured and ordered by table type. It's important to note that no existing functionality has been altered, and the new feature is self-contained within the added SQL file. To ensure the correct functioning of this addition, relevant documentation and manual tests have been incorporated. * Added support for DBFS when building the dependency graph for tasks ([#2199](#2199)). In this update, we have added support for the Databricks File System (DBFS) when building the dependency graph for tasks during workflow assessment. This enhancement allows for the use of wheels, eggs, requirements.txt files, and PySpark jobs located in DBFS when assessing workflows. The `DependencyGraph` object's `register_library` method has been updated to handle paths in both Workspace and DBFS formats. Additionally, we have introduced the `_as_path` method and the `_temporary_copy` context manager to manage file copying and path determination. This development resolves issue [#1558](#1558) and includes modifications to the existing `assessment` workflow and new unit tests. * Applied `databricks labs lsql fmt` for SQL files ([#2184](#2184)). The engineering team has developed and applied formatting to several SQL files using the `databricks labs lsql fmt` tool from various pull requests, including <databrickslabs/lsql#221>. These changes improve code readability and consistency without affecting functionality. The formatting includes adding comment delimiters, converting subqueries to nested SELECT statements, renaming columns for clarity, updating comments, modifying conditional statements, and improving indentation. The impacted SQL files include queries related to data migration complexity, assessing data modeling complexity, generating table estimates, and calculating data migration effort. Manual testing has been performed to ensure that the update does not introduce any issues in the installed dashboards. * Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 ([#2182](#2182)). In this release, the version of `sigstore/gh-action-sigstore-python` is bumped to 3.0.0 from 2.1.1 in the project's GitHub Actions workflow. This new version brings several changes, additions, and removals, such as the removal of certain settings like `fulcio-url`, `rekor-url`, `ctfe`, and `rekor-root-pubkey`, and output settings like `signature`, `certificate`, and `bundle`. The `inputs` field is now parsed according to POSIX shell lexing rules and is optional if `release-signing-artifacts` is true and the action's event is a `release` event. The default suffix has changed from `.sigstore` to `.sigstore.json`. Additionally, various deprecations present in `sigstore-python`'s 2.x series have been resolved. This PR also includes several commits, including preparing for version 3.0.0, cleaning up workflows, and removing old output settings. There are no conflicts with this PR, and Dependabot will resolve them automatically. Users can trigger Dependabot actions by commenting on this PR with specific commands. * Consistently cleanup linter codes ([#2194](#2194)). This commit introduces changes to the linting functionality of PySpark, focusing on enhancing code consistency and accuracy. New checks have been added for detecting code incompatibilities with UC Shared Clusters, targeting Python UDF unsupported eval types, spark.catalog.X APIs on DBR versions earlier than 14.3, and the use of commandContext. A new file, python-udfs_14_3.py, containing tests for these incompatibilities has been added. The commit also resolves false linting advice for homonymous method names and updates the code for static analysis message codes, improving self-documentation and maintainability. These changes are limited to the linting functionality of PySpark and do not affect any other functionalities. Co-authored by Eric Vergnaud and Serge Smertin. * Disable the builtin pip version check when running pip commands ([#2214](#2214)). In this release, we have introduced a modification to disable the built-in pip version check when using pip to install dependencies. This change involves altering the existing workflow of the `_install_pip` method to include the `--disable-pip-version-check` flag in the pip install command, reducing noise in pip-related errors and messages, and enhancing user experience. We have conducted manual and unit testing to ensure that the changes do not introduce any regressions and that existing functionalities remain unaffected. The error message has been updated to reflect the new pip behavior, including the `--disable-pip-version-check` flag in the message. Overall, these changes improve the user experience by reducing unnecessary error messages and providing clearer error information. * Document `principal-prefix-access` for azure will only list abfss storage accounts ([#2212](#2212)). In this release, we have updated the documentation for the `principal-prefix-access` CLI command in the context of Azure. This command now exclusively lists Azure Storage Blob Gen2 accounts and disregards unsupported storage formats such as wasb:// or adl://. This change is significant as these unsupported storage formats are not compatible with Unity Catalog (UC) and will be disregarded during the migration process. This update clarifies the behavior of the command, ensuring that only relevant storage accounts are displayed. This modification is crucial for users who are migrating credentials to UC, as it prevents the incorporation of unsupported storage accounts, resulting in a more streamlined and efficient migration process. * Group migration: change error logging format ([#2215](#2215)). In this release, we have updated the error logging format for failed permissions migrations during the experimental group migration workflow to enhance readability and debugging capabilities. Previously, the logs only stated that a migration failure occurred without further details. Now, the new format includes both the source and destination account names, as well as a description of the simulated failure during the migration process. This improves the transparency and usefulness of the error logs for debugging and troubleshooting purposes. Additionally, we have added unit tests to ensure the proper logging of failed migrations, ensuring the reliability of the group migration process for our users. This update demonstrates our commitment to providing clear and informative error messages to make the software engineering experience better. * Improve error handling as already exists error occurs ([#2077](#2077)). The recent change enhances error handling for the `create-catalogs-schemas` CLI command, addressing an issue where the command would fail if the catalog or schema already existed. The modification involves the introduction of the `_get_missing_catalogs_schemas` method to avoid recreating existing ones. The `create_all_catalogs_schemas` method has been updated to include try-except blocks for `_create_catalog_validate` and `_create_schema` methods, skipping creation if a `BadRequest` error occurs with the message "already exists." This ensures that no overwriting of existing catalogs and schemas takes place. A new test case, "test_create_catalogs_schemas_handles_existing," has been added to verify the command's handling of existing catalogs and schemas. This change resolves issue [#1939](#1939) and is manually tested; no new methods were added, and existing functionality was changed only within the test file. * Support run assessment as a collection ([#1925](#1925)). This commit introduces the capability to run eligible CLI commands as a collection, with an initial implementation for the assessment run command. A new parameter `collection_workspace_id` has been added to determine whether the current installation workflow is run or if an account context is created to iterate through all workspaces of the specified collection and run the assessment workflow. The `join_collection` method has been updated to accept a list of workspace IDs and a boolean value. Unit tests have been added and existing tests have been updated to ensure proper functionality. The `databricks labs ucx` command has also been modified to support this feature, with the `join_collection` method syncing workspaces in the collection when the `sync` flag is set to True. * Test UCX over Python v3.10, v3.11, and v3.12 ([#2195](#2195)). In this release, we introduce significant enhancements to our GitHub Actions CI workflow, enabling more comprehensive testing of UCX over Python versions 3.10, 3.11, and 3.12. We've implemented a new matrix strategy in the `push.yml` workflow file, dynamically setting the `python-version` using the `${{ matrix.pyVersion }}` variable. This allows developers to test UCX with specific Python versions by setting the `HATCH_PYTHON` variable. Additionally, we've updated the `pyproject.toml` file, removing the Python 3.10 requirement and improving virtual environment integration with popular IDEs. The `test_migrator_supported_language_with_fixer` function in `test_files.py` has been refactored for a more efficient 'migrator.apply' method test using temporary directories and files. This release aims to ensure compatibility, identify version-specific issues, and improve the user experience for developers. * Updated databricks-labs-blueprint requirement from ~=0.7.0 to >=0.7,<0.9 ([#2191](#2191)). In this pull request, the `databricks-labs-blueprint` package requirement has been updated from version `~=0.7.0` to `>=0.8,<0.9`. This update ensures compatibility with the project's requirements while allowing the use of the latest version of the package. The pull request also includes release notes and changelog information from the `databrickslabs/blueprint` repository, detailing various improvements and bug fixes, such as support for Python 3.12, type annotations for path-related unit tests, and fixes for the `WorkspacePath` class. A list of commits and their corresponding hashes is provided for engineers to review the changes made in the update and ensure compatibility with their projects. * Updated databricks-labs-lsql requirement from <0.7,>=0.5 to >=0.5,<0.8 ([#2189](#2189)). In this update, the version requirement of the `databricks-labs-lsql` dependency has been updated from `<0.7,>=0.5` to `>=0.5,<0.8`. This change allows for the use of the latest version of the `databricks-labs-lsql` package while ensuring compatibility with the current system. Additionally, this commit includes the release notes, changelog, and commit details from the `databricks-labs-lsql` repository for version 0.7.1. These documents provide information on various bug fixes, improvements, and changes, such as updating the `sigstore/gh-action-sigstore-python` package from 2.1.1 to 3.0.0, using a default factory to create `Tile._position`, and other enhancements. The changelog includes detailed information about releases and features, while the commit details highlight the changes and contributors for each individual commit. * Updated sqlglot requirement from <25.6,>=25.5.0 to >=25.5.0,<25.7 ([#2211](#2211)). In this update, we have revised the requirement range for the `sqlglot` library to '>=25.5.0,<25.7' from '<25.6,>=25.5.0'. This modification allows us to utilize the latest version of sqlglot, which is v25.6.0, while ensuring that the version does not surpass 25.7. This change is part of issue [#2211](#2211), and the new version includes several enhancements such as support for ORDER BY ALL, FROM ROWS FROM (...) in PostgreSQL, and exp.TimestampAdd in Presto and Trino. Furthermore, the update encompasses modifications to the bigquery, clickhouse, and duckdb dialects, as well as several bug fixes. These improvements are aimed at increasing functionality, stability, and addressing issues in the library. * Yield `DependencyProblem` if job on runtime DBR14+ and using .egg dependency ([#2020](#2020)). In this release, we have introduced a new method, `_register_egg`, to handle the registration of libraries in .egg format in the `build_dependency_graph` method. This method checks the runtime version of Databricks. If the version is DBR14 or higher, it yields `DependencyProblem` with code 'not-supported', indicating that installing eggs is no longer supported in Databricks 14.0 or higher. For lower runtime versions, the method downloads the .egg file from the workspace, writes it to a temporary directory, and then registers the library with the `DependencyGraph`. The existing functionality, such as registering libraries in .whl format and registering notebooks, remains unchanged. This release also includes a new test case, `test_job_dependency_problem_egg_dbr14plus`, which creates a job with an .egg dependency and verifies that the expected `DependencyProblem` is raised when using .egg dependencies in a job on Databricks Runtime (DBR) version 14 or higher. This change addresses issue [#1793](#1793) and improves dependency management, making it easier for software engineers to adopt and work seamlessly with the project. Dependency updates: * Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 ([#2182](#2182)). * Updated databricks-labs-lsql requirement from <0.7,>=0.5 to >=0.5,<0.8 ([#2189](#2189)). * Updated databricks-labs-blueprint requirement from ~=0.7.0 to >=0.7,<0.9 ([#2191](#2191)). * Updated sqlglot requirement from <25.6,>=25.5.0 to >=25.5.0,<25.7 ([#2211](#2211)).

HariGS-DB added 2 commits July 18, 2024 22:17

added doc to clarify that principal-prefix-access for azure will not …

633ac56

…list wasb:// or adl:// storage account

Merge branch 'main' into bug/1552

b3865d7

HariGS-DB marked this pull request as ready for review July 18, 2024 21:56

HariGS-DB requested review from a team and nkvuong July 18, 2024 21:56

HariGS-DB temporarily deployed to account-admin July 18, 2024 21:57 — with GitHub Actions Inactive

HariGS-DB requested review from nfx and removed request for nkvuong July 19, 2024 09:05

JCZuurmond approved these changes Jul 19, 2024

View reviewed changes

README.md Show resolved Hide resolved

nfx merged commit 76a638f into main Jul 19, 2024
7 checks passed

nfx deleted the bug/1552 branch July 19, 2024 10:26

nfx mentioned this pull request Jul 19, 2024

Release v0.29.0 #2216

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added doc to clarify that principal-prefix-access for azure will only list abfss storage accounts #2212

added doc to clarify that principal-prefix-access for azure will only list abfss storage accounts #2212

HariGS-DB commented Jul 18, 2024

github-actions bot commented Jul 18, 2024

JCZuurmond left a comment

added doc to clarify that principal-prefix-access for azure will only list abfss storage accounts #2212

added doc to clarify that principal-prefix-access for azure will only list abfss storage accounts #2212

Conversation

HariGS-DB commented Jul 18, 2024

Changes

Linked issues

Functionality

Tests

github-actions bot commented Jul 18, 2024

JCZuurmond left a comment

Choose a reason for hiding this comment