Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added doc to clarify that principal-prefix-access for azure will only list abfss storage accounts #2212

Merged
merged 2 commits into from
Jul 19, 2024

Conversation

HariGS-DB
Copy link
Contributor

Changes

This PR updates the documentation to clarify that principal-prefix-access cli cmd for azure will only list abfss:// storage account. storage account used as adl:// or wasb:// will not be listed as those will not be used for credential migration

Linked issues

#1065

Resolves #1552

Functionality

  • added relevant user documentation
  • added new CLI command
  • modified existing command: databricks labs ucx ...
  • added a new workflow
  • modified existing workflow: ...
  • added a new table
  • modified existing table: ...

Tests

  • manually tested
  • added unit tests
  • added integration tests
  • verified on staging environment (screenshot attached)

@HariGS-DB HariGS-DB marked this pull request as ready for review July 18, 2024 21:56
@HariGS-DB HariGS-DB requested review from a team and nkvuong July 18, 2024 21:56
Copy link

✅ 2/2 passed, 13s total

Running from acceptance #4743

@HariGS-DB HariGS-DB requested review from nfx and removed request for nkvuong July 19, 2024 09:05
Copy link
Member

@JCZuurmond JCZuurmond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit

README.md Show resolved Hide resolved
@nfx nfx merged commit 76a638f into main Jul 19, 2024
7 checks passed
@nfx nfx deleted the bug/1552 branch July 19, 2024 10:26
nfx added a commit that referenced this pull request Jul 19, 2024
* Added `lsql` lakeview dashboard-as-code implementation ([#1920](#1920)). The open-source library has been updated with new features in its dashboard creation functionality. The `assessment_report` and `estimates_report` jobs, along with their corresponding tasks, have been removed. The `crawl_groups` task has been modified to accept a new parameter, `group_manager`. These changes are part of a larger implementation of the `lsql` Lakeview dashboard-as-code system for creating dashboards. The new implementation has been tested through manual testing, existing unit tests, integration tests, and verification on a staging environment, and is expected to improve the functionality and maintainability of the dashboards. The removal of the `assessment_report` and `estimates_report` jobs and tasks may indicate that their functionality has been incorporated into the new `lsql` implementation or is no longer necessary. The new `crawl_groups` task parameter may be used in conjunction with the new `lsql` implementation to enhance the assessment and estimation of groups.
* Added new widget to get table count ([#2202](#2202)). A new widget has been introduced that presents a table count summary, categorized by type (external or managed), location (DBFS root, mount, cloud), and format (delta, parquet, etc.). This enhancement is complemented by an additional SQL file, responsible for generating necessary count statistics. The script discerns the table type and location through location string analysis and subsequent categorization. The output is structured and ordered by table type. It's important to note that no existing functionality has been altered, and the new feature is self-contained within the added SQL file. To ensure the correct functioning of this addition, relevant documentation and manual tests have been incorporated.
* Added support for DBFS when building the dependency graph for tasks ([#2199](#2199)). In this update, we have added support for the Databricks File System (DBFS) when building the dependency graph for tasks during workflow assessment. This enhancement allows for the use of wheels, eggs, requirements.txt files, and PySpark jobs located in DBFS when assessing workflows. The `DependencyGraph` object's `register_library` method has been updated to handle paths in both Workspace and DBFS formats. Additionally, we have introduced the `_as_path` method and the `_temporary_copy` context manager to manage file copying and path determination. This development resolves issue [#1558](#1558) and includes modifications to the existing `assessment` workflow and new unit tests.
* Applied `databricks labs lsql fmt` for SQL files ([#2184](#2184)). The engineering team has developed and applied formatting to several SQL files using the `databricks labs lsql fmt` tool from various pull requests, including <databrickslabs/lsql#221>. These changes improve code readability and consistency without affecting functionality. The formatting includes adding comment delimiters, converting subqueries to nested SELECT statements, renaming columns for clarity, updating comments, modifying conditional statements, and improving indentation. The impacted SQL files include queries related to data migration complexity, assessing data modeling complexity, generating table estimates, and calculating data migration effort. Manual testing has been performed to ensure that the update does not introduce any issues in the installed dashboards.
* Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 ([#2182](#2182)). In this release, the version of `sigstore/gh-action-sigstore-python` is bumped to 3.0.0 from 2.1.1 in the project's GitHub Actions workflow. This new version brings several changes, additions, and removals, such as the removal of certain settings like `fulcio-url`, `rekor-url`, `ctfe`, and `rekor-root-pubkey`, and output settings like `signature`, `certificate`, and `bundle`. The `inputs` field is now parsed according to POSIX shell lexing rules and is optional if `release-signing-artifacts` is true and the action's event is a `release` event. The default suffix has changed from `.sigstore` to `.sigstore.json`. Additionally, various deprecations present in `sigstore-python`'s 2.x series have been resolved. This PR also includes several commits, including preparing for version 3.0.0, cleaning up workflows, and removing old output settings. There are no conflicts with this PR, and Dependabot will resolve them automatically. Users can trigger Dependabot actions by commenting on this PR with specific commands.
* Consistently cleanup linter codes ([#2194](#2194)). This commit introduces changes to the linting functionality of PySpark, focusing on enhancing code consistency and accuracy. New checks have been added for detecting code incompatibilities with UC Shared Clusters, targeting Python UDF unsupported eval types, spark.catalog.X APIs on DBR versions earlier than 14.3, and the use of commandContext. A new file, python-udfs_14_3.py, containing tests for these incompatibilities has been added. The commit also resolves false linting advice for homonymous method names and updates the code for static analysis message codes, improving self-documentation and maintainability. These changes are limited to the linting functionality of PySpark and do not affect any other functionalities. Co-authored by Eric Vergnaud and Serge Smertin.
* Disable the builtin pip version check when running pip commands ([#2214](#2214)). In this release, we have introduced a modification to disable the built-in pip version check when using pip to install dependencies. This change involves altering the existing workflow of the `_install_pip` method to include the `--disable-pip-version-check` flag in the pip install command, reducing noise in pip-related errors and messages, and enhancing user experience. We have conducted manual and unit testing to ensure that the changes do not introduce any regressions and that existing functionalities remain unaffected. The error message has been updated to reflect the new pip behavior, including the `--disable-pip-version-check` flag in the message. Overall, these changes improve the user experience by reducing unnecessary error messages and providing clearer error information.
* Document `principal-prefix-access` for azure will only list abfss storage accounts ([#2212](#2212)). In this release, we have updated the documentation for the `principal-prefix-access` CLI command in the context of Azure. This command now exclusively lists Azure Storage Blob Gen2 accounts and disregards unsupported storage formats such as wasb:// or adl://. This change is significant as these unsupported storage formats are not compatible with Unity Catalog (UC) and will be disregarded during the migration process. This update clarifies the behavior of the command, ensuring that only relevant storage accounts are displayed. This modification is crucial for users who are migrating credentials to UC, as it prevents the incorporation of unsupported storage accounts, resulting in a more streamlined and efficient migration process.
* Group migration: change error logging format ([#2215](#2215)). In this release, we have updated the error logging format for failed permissions migrations during the experimental group migration workflow to enhance readability and debugging capabilities. Previously, the logs only stated that a migration failure occurred without further details. Now, the new format includes both the source and destination account names, as well as a description of the simulated failure during the migration process. This improves the transparency and usefulness of the error logs for debugging and troubleshooting purposes. Additionally, we have added unit tests to ensure the proper logging of failed migrations, ensuring the reliability of the group migration process for our users. This update demonstrates our commitment to providing clear and informative error messages to make the software engineering experience better.
* Improve error handling as already exists error occurs ([#2077](#2077)). The recent change enhances error handling for the `create-catalogs-schemas` CLI command, addressing an issue where the command would fail if the catalog or schema already existed. The modification involves the introduction of the `_get_missing_catalogs_schemas` method to avoid recreating existing ones. The `create_all_catalogs_schemas` method has been updated to include try-except blocks for `_create_catalog_validate` and `_create_schema` methods, skipping creation if a `BadRequest` error occurs with the message "already exists." This ensures that no overwriting of existing catalogs and schemas takes place. A new test case, "test_create_catalogs_schemas_handles_existing," has been added to verify the command's handling of existing catalogs and schemas. This change resolves issue [#1939](#1939) and is manually tested; no new methods were added, and existing functionality was changed only within the test file.
* Support run assessment as a collection ([#1925](#1925)). This commit introduces the capability to run eligible CLI commands as a collection, with an initial implementation for the assessment run command. A new parameter `collection_workspace_id` has been added to determine whether the current installation workflow is run or if an account context is created to iterate through all workspaces of the specified collection and run the assessment workflow. The `join_collection` method has been updated to accept a list of workspace IDs and a boolean value. Unit tests have been added and existing tests have been updated to ensure proper functionality. The `databricks labs ucx` command has also been modified to support this feature, with the `join_collection` method syncing workspaces in the collection when the `sync` flag is set to True.
* Test UCX over Python v3.10, v3.11, and v3.12 ([#2195](#2195)). In this release, we introduce significant enhancements to our GitHub Actions CI workflow, enabling more comprehensive testing of UCX over Python versions 3.10, 3.11, and 3.12. We've implemented a new matrix strategy in the `push.yml` workflow file, dynamically setting the `python-version` using the `${{ matrix.pyVersion }}` variable. This allows developers to test UCX with specific Python versions by setting the `HATCH_PYTHON` variable. Additionally, we've updated the `pyproject.toml` file, removing the Python 3.10 requirement and improving virtual environment integration with popular IDEs. The `test_migrator_supported_language_with_fixer` function in `test_files.py` has been refactored for a more efficient 'migrator.apply' method test using temporary directories and files. This release aims to ensure compatibility, identify version-specific issues, and improve the user experience for developers.
* Updated databricks-labs-blueprint requirement from ~=0.7.0 to >=0.7,<0.9 ([#2191](#2191)). In this pull request, the `databricks-labs-blueprint` package requirement has been updated from version `~=0.7.0` to `>=0.8,<0.9`. This update ensures compatibility with the project's requirements while allowing the use of the latest version of the package. The pull request also includes release notes and changelog information from the `databrickslabs/blueprint` repository, detailing various improvements and bug fixes, such as support for Python 3.12, type annotations for path-related unit tests, and fixes for the `WorkspacePath` class. A list of commits and their corresponding hashes is provided for engineers to review the changes made in the update and ensure compatibility with their projects.
* Updated databricks-labs-lsql requirement from <0.7,>=0.5 to >=0.5,<0.8 ([#2189](#2189)). In this update, the version requirement of the `databricks-labs-lsql` dependency has been updated from `<0.7,>=0.5` to `>=0.5,<0.8`. This change allows for the use of the latest version of the `databricks-labs-lsql` package while ensuring compatibility with the current system. Additionally, this commit includes the release notes, changelog, and commit details from the `databricks-labs-lsql` repository for version 0.7.1. These documents provide information on various bug fixes, improvements, and changes, such as updating the `sigstore/gh-action-sigstore-python` package from 2.1.1 to 3.0.0, using a default factory to create `Tile._position`, and other enhancements. The changelog includes detailed information about releases and features, while the commit details highlight the changes and contributors for each individual commit.
* Updated sqlglot requirement from <25.6,>=25.5.0 to >=25.5.0,<25.7 ([#2211](#2211)). In this update, we have revised the requirement range for the `sqlglot` library to '>=25.5.0,<25.7' from '<25.6,>=25.5.0'. This modification allows us to utilize the latest version of sqlglot, which is v25.6.0, while ensuring that the version does not surpass 25.7. This change is part of issue [#2211](#2211), and the new version includes several enhancements such as support for ORDER BY ALL, FROM ROWS FROM (...) in PostgreSQL, and exp.TimestampAdd in Presto and Trino. Furthermore, the update encompasses modifications to the bigquery, clickhouse, and duckdb dialects, as well as several bug fixes. These improvements are aimed at increasing functionality, stability, and addressing issues in the library.
* Yield `DependencyProblem` if job on runtime DBR14+ and using .egg dependency ([#2020](#2020)). In this release, we have introduced a new method, `_register_egg`, to handle the registration of libraries in .egg format in the `build_dependency_graph` method. This method checks the runtime version of Databricks. If the version is DBR14 or higher, it yields `DependencyProblem` with code 'not-supported', indicating that installing eggs is no longer supported in Databricks 14.0 or higher. For lower runtime versions, the method downloads the .egg file from the workspace, writes it to a temporary directory, and then registers the library with the `DependencyGraph`. The existing functionality, such as registering libraries in .whl format and registering notebooks, remains unchanged. This release also includes a new test case, `test_job_dependency_problem_egg_dbr14plus`, which creates a job with an .egg dependency and verifies that the expected `DependencyProblem` is raised when using .egg dependencies in a job on Databricks Runtime (DBR) version 14 or higher. This change addresses issue [#1793](#1793) and improves dependency management, making it easier for software engineers to adopt and work seamlessly with the project.

Dependency updates:

 * Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 ([#2182](#2182)).
 * Updated databricks-labs-lsql requirement from <0.7,>=0.5 to >=0.5,<0.8 ([#2189](#2189)).
 * Updated databricks-labs-blueprint requirement from ~=0.7.0 to >=0.7,<0.9 ([#2191](#2191)).
 * Updated sqlglot requirement from <25.6,>=25.5.0 to >=25.5.0,<25.7 ([#2211](#2211)).
@nfx nfx mentioned this pull request Jul 19, 2024
nfx added a commit that referenced this pull request Jul 19, 2024
* Added `lsql` lakeview dashboard-as-code implementation
([#1920](#1920)). The
open-source library has been updated with new features in its dashboard
creation functionality. The `assessment_report` and `estimates_report`
jobs, along with their corresponding tasks, have been removed. The
`crawl_groups` task has been modified to accept a new parameter,
`group_manager`. These changes are part of a larger implementation of
the `lsql` Lakeview dashboard-as-code system for creating dashboards.
The new implementation has been tested through manual testing, existing
unit tests, integration tests, and verification on a staging
environment, and is expected to improve the functionality and
maintainability of the dashboards. The removal of the
`assessment_report` and `estimates_report` jobs and tasks may indicate
that their functionality has been incorporated into the new `lsql`
implementation or is no longer necessary. The new `crawl_groups` task
parameter may be used in conjunction with the new `lsql` implementation
to enhance the assessment and estimation of groups.
* Added new widget to get table count
([#2202](#2202)). A new
widget has been introduced that presents a table count summary,
categorized by type (external or managed), location (DBFS root, mount,
cloud), and format (delta, parquet, etc.). This enhancement is
complemented by an additional SQL file, responsible for generating
necessary count statistics. The script discerns the table type and
location through location string analysis and subsequent categorization.
The output is structured and ordered by table type. It's important to
note that no existing functionality has been altered, and the new
feature is self-contained within the added SQL file. To ensure the
correct functioning of this addition, relevant documentation and manual
tests have been incorporated.
* Added support for DBFS when building the dependency graph for tasks
([#2199](#2199)). In this
update, we have added support for the Databricks File System (DBFS) when
building the dependency graph for tasks during workflow assessment. This
enhancement allows for the use of wheels, eggs, requirements.txt files,
and PySpark jobs located in DBFS when assessing workflows. The
`DependencyGraph` object's `register_library` method has been updated to
handle paths in both Workspace and DBFS formats. Additionally, we have
introduced the `_as_path` method and the `_temporary_copy` context
manager to manage file copying and path determination. This development
resolves issue
[#1558](#1558) and includes
modifications to the existing `assessment` workflow and new unit tests.
* Applied `databricks labs lsql fmt` for SQL files
([#2184](#2184)). The
engineering team has developed and applied formatting to several SQL
files using the `databricks labs lsql fmt` tool from various pull
requests, including <databrickslabs/lsql#221>.
These changes improve code readability and consistency without affecting
functionality. The formatting includes adding comment delimiters,
converting subqueries to nested SELECT statements, renaming columns for
clarity, updating comments, modifying conditional statements, and
improving indentation. The impacted SQL files include queries related to
data migration complexity, assessing data modeling complexity,
generating table estimates, and calculating data migration effort.
Manual testing has been performed to ensure that the update does not
introduce any issues in the installed dashboards.
* Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0
([#2182](#2182)). In this
release, the version of `sigstore/gh-action-sigstore-python` is bumped
to 3.0.0 from 2.1.1 in the project's GitHub Actions workflow. This new
version brings several changes, additions, and removals, such as the
removal of certain settings like `fulcio-url`, `rekor-url`, `ctfe`, and
`rekor-root-pubkey`, and output settings like `signature`,
`certificate`, and `bundle`. The `inputs` field is now parsed according
to POSIX shell lexing rules and is optional if
`release-signing-artifacts` is true and the action's event is a
`release` event. The default suffix has changed from `.sigstore` to
`.sigstore.json`. Additionally, various deprecations present in
`sigstore-python`'s 2.x series have been resolved. This PR also includes
several commits, including preparing for version 3.0.0, cleaning up
workflows, and removing old output settings. There are no conflicts with
this PR, and Dependabot will resolve them automatically. Users can
trigger Dependabot actions by commenting on this PR with specific
commands.
* Consistently cleanup linter codes
([#2194](#2194)). This
commit introduces changes to the linting functionality of PySpark,
focusing on enhancing code consistency and accuracy. New checks have
been added for detecting code incompatibilities with UC Shared Clusters,
targeting Python UDF unsupported eval types, spark.catalog.X APIs on DBR
versions earlier than 14.3, and the use of commandContext. A new file,
python-udfs_14_3.py, containing tests for these incompatibilities has
been added. The commit also resolves false linting advice for homonymous
method names and updates the code for static analysis message codes,
improving self-documentation and maintainability. These changes are
limited to the linting functionality of PySpark and do not affect any
other functionalities. Co-authored by Eric Vergnaud and Serge Smertin.
* Disable the builtin pip version check when running pip commands
([#2214](#2214)). In this
release, we have introduced a modification to disable the built-in pip
version check when using pip to install dependencies. This change
involves altering the existing workflow of the `_install_pip` method to
include the `--disable-pip-version-check` flag in the pip install
command, reducing noise in pip-related errors and messages, and
enhancing user experience. We have conducted manual and unit testing to
ensure that the changes do not introduce any regressions and that
existing functionalities remain unaffected. The error message has been
updated to reflect the new pip behavior, including the
`--disable-pip-version-check` flag in the message. Overall, these
changes improve the user experience by reducing unnecessary error
messages and providing clearer error information.
* Document `principal-prefix-access` for azure will only list abfss
storage accounts
([#2212](#2212)). In this
release, we have updated the documentation for the
`principal-prefix-access` CLI command in the context of Azure. This
command now exclusively lists Azure Storage Blob Gen2 accounts and
disregards unsupported storage formats such as wasb:// or adl://. This
change is significant as these unsupported storage formats are not
compatible with Unity Catalog (UC) and will be disregarded during the
migration process. This update clarifies the behavior of the command,
ensuring that only relevant storage accounts are displayed. This
modification is crucial for users who are migrating credentials to UC,
as it prevents the incorporation of unsupported storage accounts,
resulting in a more streamlined and efficient migration process.
* Group migration: change error logging format
([#2215](#2215)). In this
release, we have updated the error logging format for failed permissions
migrations during the experimental group migration workflow to enhance
readability and debugging capabilities. Previously, the logs only stated
that a migration failure occurred without further details. Now, the new
format includes both the source and destination account names, as well
as a description of the simulated failure during the migration process.
This improves the transparency and usefulness of the error logs for
debugging and troubleshooting purposes. Additionally, we have added unit
tests to ensure the proper logging of failed migrations, ensuring the
reliability of the group migration process for our users. This update
demonstrates our commitment to providing clear and informative error
messages to make the software engineering experience better.
* Improve error handling as already exists error occurs
([#2077](#2077)). The recent
change enhances error handling for the `create-catalogs-schemas` CLI
command, addressing an issue where the command would fail if the catalog
or schema already existed. The modification involves the introduction of
the `_get_missing_catalogs_schemas` method to avoid recreating existing
ones. The `create_all_catalogs_schemas` method has been updated to
include try-except blocks for `_create_catalog_validate` and
`_create_schema` methods, skipping creation if a `BadRequest` error
occurs with the message "already exists." This ensures that no
overwriting of existing catalogs and schemas takes place. A new test
case, "test_create_catalogs_schemas_handles_existing," has been added to
verify the command's handling of existing catalogs and schemas. This
change resolves issue
[#1939](#1939) and is
manually tested; no new methods were added, and existing functionality
was changed only within the test file.
* Support run assessment as a collection
([#1925](#1925)). This
commit introduces the capability to run eligible CLI commands as a
collection, with an initial implementation for the assessment run
command. A new parameter `collection_workspace_id` has been added to
determine whether the current installation workflow is run or if an
account context is created to iterate through all workspaces of the
specified collection and run the assessment workflow. The
`join_collection` method has been updated to accept a list of workspace
IDs and a boolean value. Unit tests have been added and existing tests
have been updated to ensure proper functionality. The `databricks labs
ucx` command has also been modified to support this feature, with the
`join_collection` method syncing workspaces in the collection when the
`sync` flag is set to True.
* Test UCX over Python v3.10, v3.11, and v3.12
([#2195](#2195)). In this
release, we introduce significant enhancements to our GitHub Actions CI
workflow, enabling more comprehensive testing of UCX over Python
versions 3.10, 3.11, and 3.12. We've implemented a new matrix strategy
in the `push.yml` workflow file, dynamically setting the
`python-version` using the `${{ matrix.pyVersion }}` variable. This
allows developers to test UCX with specific Python versions by setting
the `HATCH_PYTHON` variable. Additionally, we've updated the
`pyproject.toml` file, removing the Python 3.10 requirement and
improving virtual environment integration with popular IDEs. The
`test_migrator_supported_language_with_fixer` function in
`test_files.py` has been refactored for a more efficient
'migrator.apply' method test using temporary directories and files. This
release aims to ensure compatibility, identify version-specific issues,
and improve the user experience for developers.
* Updated databricks-labs-blueprint requirement from ~=0.7.0 to
>=0.7,<0.9 ([#2191](#2191)).
In this pull request, the `databricks-labs-blueprint` package
requirement has been updated from version `~=0.7.0` to `>=0.8,<0.9`.
This update ensures compatibility with the project's requirements while
allowing the use of the latest version of the package. The pull request
also includes release notes and changelog information from the
`databrickslabs/blueprint` repository, detailing various improvements
and bug fixes, such as support for Python 3.12, type annotations for
path-related unit tests, and fixes for the `WorkspacePath` class. A list
of commits and their corresponding hashes is provided for engineers to
review the changes made in the update and ensure compatibility with
their projects.
* Updated databricks-labs-lsql requirement from <0.7,>=0.5 to >=0.5,<0.8
([#2189](#2189)). In this
update, the version requirement of the `databricks-labs-lsql` dependency
has been updated from `<0.7,>=0.5` to `>=0.5,<0.8`. This change allows
for the use of the latest version of the `databricks-labs-lsql` package
while ensuring compatibility with the current system. Additionally, this
commit includes the release notes, changelog, and commit details from
the `databricks-labs-lsql` repository for version 0.7.1. These documents
provide information on various bug fixes, improvements, and changes,
such as updating the `sigstore/gh-action-sigstore-python` package from
2.1.1 to 3.0.0, using a default factory to create `Tile._position`, and
other enhancements. The changelog includes detailed information about
releases and features, while the commit details highlight the changes
and contributors for each individual commit.
* Updated sqlglot requirement from <25.6,>=25.5.0 to >=25.5.0,<25.7
([#2211](#2211)). In this
update, we have revised the requirement range for the `sqlglot` library
to '>=25.5.0,<25.7' from '<25.6,>=25.5.0'. This modification allows us
to utilize the latest version of sqlglot, which is v25.6.0, while
ensuring that the version does not surpass 25.7. This change is part of
issue [#2211](#2211), and
the new version includes several enhancements such as support for ORDER
BY ALL, FROM ROWS FROM (...) in PostgreSQL, and exp.TimestampAdd in
Presto and Trino. Furthermore, the update encompasses modifications to
the bigquery, clickhouse, and duckdb dialects, as well as several bug
fixes. These improvements are aimed at increasing functionality,
stability, and addressing issues in the library.
* Yield `DependencyProblem` if job on runtime DBR14+ and using .egg
dependency ([#2020](#2020)).
In this release, we have introduced a new method, `_register_egg`, to
handle the registration of libraries in .egg format in the
`build_dependency_graph` method. This method checks the runtime version
of Databricks. If the version is DBR14 or higher, it yields
`DependencyProblem` with code 'not-supported', indicating that
installing eggs is no longer supported in Databricks 14.0 or higher. For
lower runtime versions, the method downloads the .egg file from the
workspace, writes it to a temporary directory, and then registers the
library with the `DependencyGraph`. The existing functionality, such as
registering libraries in .whl format and registering notebooks, remains
unchanged. This release also includes a new test case,
`test_job_dependency_problem_egg_dbr14plus`, which creates a job with an
.egg dependency and verifies that the expected `DependencyProblem` is
raised when using .egg dependencies in a job on Databricks Runtime (DBR)
version 14 or higher. This change addresses issue
[#1793](#1793) and improves
dependency management, making it easier for software engineers to adopt
and work seamlessly with the project.

Dependency updates:

* Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0
([#2182](#2182)).
* Updated databricks-labs-lsql requirement from <0.7,>=0.5 to >=0.5,<0.8
([#2189](#2189)).
* Updated databricks-labs-blueprint requirement from ~=0.7.0 to
>=0.7,<0.9 ([#2191](#2191)).
* Updated sqlglot requirement from <25.6,>=25.5.0 to >=25.5.0,<25.7
([#2211](#2211)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: wasbs:// and dbfs:// mount external tables are not listing in principal-prefix-access command
3 participants