Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update databricks-labs-blueprint requirement from ~=0.7.0 to >=0.7,<0.9 #2191

Merged
merged 2 commits into from
Jul 16, 2024

Conversation

dependabot[bot]
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Jul 16, 2024

Updates the requirements on databricks-labs-blueprint to permit the latest version.

Release notes

Sourced from databricks-labs-blueprint's releases.

v0.8.0

  • Added DBFSPath as os.PathLike implementation (#131). The open-source library has been updated with a new class DBFSPath, an implementation of os.PathLike for Databricks File System (DBFS) paths. This new class extends the existing WorkspacePath support and provides pathlib-like functionality for DBFS paths, including methods for creating directories, renaming and deleting files and directories, and reading and writing files. The addition of DBFSPath includes type-hinting for improved code linting and is integrated in the test suite with new and updated tests for path-like objects. The behavior of the exists and unlink methods have been updated for WorkspacePath to improve performance and raise appropriate errors.
  • Fixed .as_uri() and .absolute() implementations for WorkspacePath (#127). In this release, the WorkspacePath class in the paths.py module has been updated with several improvements to the .as_uri() and .absolute() methods. These methods now utilize PathLib internals, providing better cross-version compatibility. The .as_uri() method now uses an f-string for concatenation and returns the UTF-8 encoded string representation of the WorkspacePath object via a new __bytes__() dunder method. Additionally, the .absolute() method has been implemented for the trivial (no-op) case and now supports returning the absolute path of files or directories in Databricks Workspace. Furthermore, the glob() and rglob() methods have been enhanced to support case-sensitive pattern matching based on a new case_sensitive parameter. To ensure the integrity of these changes, two new test cases, test_as_uri() and test_absolute(), have been added, thoroughly testing the functionality of these methods.
  • Fixed WorkspacePath support for python 3.11 (#121). The WorkspacePath class in our open-source library has been updated to improve compatibility with Python 3.11. The .expanduser() and .glob() methods have been modified to address internal changes in Python 3.11. The is_dir() and is_file() methods now include a follow_symlinks parameter, although it is not currently used. A new method, _scandir(), has been added for compatibility with Python 3.11. The expanduser() method has also been updated to expand ~ (but not ~user) constructs. Additionally, a new method is_notebook() has been introduced to check if the path points to a notebook in Databricks Workspace. These changes aim to ensure that the library functions smoothly with the latest version of Python and provides additional functionality for users working with Databricks Workspace.
  • Properly verify versions of python (#118). In this release, we have made significant updates to the pyproject.toml file to enhance project dependency and development environment management. We have added several new packages to the dependencies section to expand the library's functionality and compatibility. Additionally, we have removed the python field, as it is no longer necessary. We have also updated the path field to specify the location of the virtual environment, which can improve integration with popular development tools such as Visual Studio Code and PyCharm. These changes are intended to streamline the development process and make it easier to manage dependencies and set up the development environment.
  • Type annotations on path-related unit tests (#128). In this open-source library update, type annotations have been added to path-related unit tests to enhance code clarity and maintainability. The tests encompass various scenarios, including verifying if a path exists, creating, removing, and checking directories, and testing file attributes such as distinguishing directories, notebooks, and regular files. The additions also cover functionality for opening and manipulating files in different modes like read binary, write binary, read text, and write text. Furthermore, tests for checking file permissions, handling errors, and globbing (pattern-based file path matching) have been incorporated. The tests interact with a WorkspaceClient mock object, simulating file system interactions. This enhancement bolsters the library's reliability and assists developers in creating robust, well-documented code when working with file system paths.
  • Updated WorkspacePath to support Python 3.12 (#122). In this release, the WorkspacePath implementation has been updated to ensure compatibility with Python 3.12, in addition to Python 3.10 and 3.11. The class was modified to replace most of the internal implementation and add extensive tests for public interfaces, ensuring that the superclass implementations are not used unless they are known to be safe. This change is in response to the significant changes in the superclass implementations between Python 3.11 and 3.12, which were found to be incompatible with each other. The WorkspacePath class now includes several new methods and tests to ensure that it functions seamlessly with different versions of Python. These changes include testing for initialization, equality, hash, comparison, path components, and various path manipulations. This update enhances the library's adaptability and ensures it functions correctly with different versions of Python. Classifiers have also been updated to include support for Python 3.12.
  • WorkspacePath fixes for the .resolve() implementation (#129). The .resolve() method for WorkspacePath has been updated to improve its handling of relative paths and the strict argument. Previously, relative paths were not properly validated and would be returned as-is. Now, relative paths will cause the method to fail. The strict argument is now checked, and if set to True and the path does not exist, a FileNotFoundError will be raised. The method .absolute() is used to obtain the absolute path of the file or directory in Databricks Workspace and is used in the implementation of .resolve(). A new test, test_resolve(), has been added to verify these changes, covering scenarios where the path is absolute, the path exists, the path does not exist, and the path is relative. In the case of relative paths, a NotImplementedError is raised, as .resolve() is not supported for them.
  • WorkspacePath: Fix the .rename() and .replace() implementations to return the target path (#130). The .rename() and .replace() methods of the WorkspacePath class have been updated to return the target path as part of the public API, with .rename() no longer accepting the overwrite keyword argument and always failing if the target path already exists. A new private method, ._rename(), has been added to include the overwrite argument and is used by both .rename() and .replace(). This update is a preparatory step for factoring out common code to support DBFS paths. The tests have been updated accordingly, combining and adding functions to test the new and updated methods. The .unlink() method's behavior remains unchanged. Please note that the exact error raised when .rename() fails due to an existing target path is yet to be defined.

Dependency updates:

  • Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 (#133).

Contributors: @​asnare, @​nfx, @​dependabot[bot]

Changelog

Sourced from databricks-labs-blueprint's changelog.

0.8.0

  • Added DBFSPath as os.PathLike implementation (#131). The open-source library has been updated with a new class DBFSPath, an implementation of os.PathLike for Databricks File System (DBFS) paths. This new class extends the existing WorkspacePath support and provides pathlib-like functionality for DBFS paths, including methods for creating directories, renaming and deleting files and directories, and reading and writing files. The addition of DBFSPath includes type-hinting for improved code linting and is integrated in the test suite with new and updated tests for path-like objects. The behavior of the exists and unlink methods have been updated for WorkspacePath to improve performance and raise appropriate errors.
  • Fixed .as_uri() and .absolute() implementations for WorkspacePath (#127). In this release, the WorkspacePath class in the paths.py module has been updated with several improvements to the .as_uri() and .absolute() methods. These methods now utilize PathLib internals, providing better cross-version compatibility. The .as_uri() method now uses an f-string for concatenation and returns the UTF-8 encoded string representation of the WorkspacePath object via a new __bytes__() dunder method. Additionally, the .absolute() method has been implemented for the trivial (no-op) case and now supports returning the absolute path of files or directories in Databricks Workspace. Furthermore, the glob() and rglob() methods have been enhanced to support case-sensitive pattern matching based on a new case_sensitive parameter. To ensure the integrity of these changes, two new test cases, test_as_uri() and test_absolute(), have been added, thoroughly testing the functionality of these methods.
  • Fixed WorkspacePath support for python 3.11 (#121). The WorkspacePath class in our open-source library has been updated to improve compatibility with Python 3.11. The .expanduser() and .glob() methods have been modified to address internal changes in Python 3.11. The is_dir() and is_file() methods now include a follow_symlinks parameter, although it is not currently used. A new method, _scandir(), has been added for compatibility with Python 3.11. The expanduser() method has also been updated to expand ~ (but not ~user) constructs. Additionally, a new method is_notebook() has been introduced to check if the path points to a notebook in Databricks Workspace. These changes aim to ensure that the library functions smoothly with the latest version of Python and provides additional functionality for users working with Databricks Workspace.
  • Properly verify versions of python (#118). In this release, we have made significant updates to the pyproject.toml file to enhance project dependency and development environment management. We have added several new packages to the dependencies section to expand the library's functionality and compatibility. Additionally, we have removed the python field, as it is no longer necessary. We have also updated the path field to specify the location of the virtual environment, which can improve integration with popular development tools such as Visual Studio Code and PyCharm. These changes are intended to streamline the development process and make it easier to manage dependencies and set up the development environment.
  • Type annotations on path-related unit tests (#128). In this open-source library update, type annotations have been added to path-related unit tests to enhance code clarity and maintainability. The tests encompass various scenarios, including verifying if a path exists, creating, removing, and checking directories, and testing file attributes such as distinguishing directories, notebooks, and regular files. The additions also cover functionality for opening and manipulating files in different modes like read binary, write binary, read text, and write text. Furthermore, tests for checking file permissions, handling errors, and globbing (pattern-based file path matching) have been incorporated. The tests interact with a WorkspaceClient mock object, simulating file system interactions. This enhancement bolsters the library's reliability and assists developers in creating robust, well-documented code when working with file system paths.
  • Updated WorkspacePath to support Python 3.12 (#122). In this release, the WorkspacePath implementation has been updated to ensure compatibility with Python 3.12, in addition to Python 3.10 and 3.11. The class was modified to replace most of the internal implementation and add extensive tests for public interfaces, ensuring that the superclass implementations are not used unless they are known to be safe. This change is in response to the significant changes in the superclass implementations between Python 3.11 and 3.12, which were found to be incompatible with each other. The WorkspacePath class now includes several new methods and tests to ensure that it functions seamlessly with different versions of Python. These changes include testing for initialization, equality, hash, comparison, path components, and various path manipulations. This update enhances the library's adaptability and ensures it functions correctly with different versions of Python. Classifiers have also been updated to include support for Python 3.12.
  • WorkspacePath fixes for the .resolve() implementation (#129). The .resolve() method for WorkspacePath has been updated to improve its handling of relative paths and the strict argument. Previously, relative paths were not properly validated and would be returned as-is. Now, relative paths will cause the method to fail. The strict argument is now checked, and if set to True and the path does not exist, a FileNotFoundError will be raised. The method .absolute() is used to obtain the absolute path of the file or directory in Databricks Workspace and is used in the implementation of .resolve(). A new test, test_resolve(), has been added to verify these changes, covering scenarios where the path is absolute, the path exists, the path does not exist, and the path is relative. In the case of relative paths, a NotImplementedError is raised, as .resolve() is not supported for them.
  • WorkspacePath: Fix the .rename() and .replace() implementations to return the target path (#130). The .rename() and .replace() methods of the WorkspacePath class have been updated to return the target path as part of the public API, with .rename() no longer accepting the overwrite keyword argument and always failing if the target path already exists. A new private method, ._rename(), has been added to include the overwrite argument and is used by both .rename() and .replace(). This update is a preparatory step for factoring out common code to support DBFS paths. The tests have been updated accordingly, combining and adding functions to test the new and updated methods. The .unlink() method's behavior remains unchanged. Please note that the exact error raised when .rename() fails due to an existing target path is yet to be defined.

Dependency updates:

  • Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 (#133).

0.7.0

  • Added databricks.labs.blueprint.paths.WorkspacePath as pathlib.Path equivalent (#115). This commit introduces the databricks.labs.blueprint.paths.WorkspacePath library, providing Python-native pathlib.Path-like interfaces to simplify working with Databricks Workspace paths. The library includes WorkspacePath and WorkspacePathDuringTest classes offering advanced functionality for handling user home folders, relative file paths, browser URLs, and file manipulation methods such as read/write_text(), read/write_bytes(), and glob(). This addition brings enhanced, Pythonic ways to interact with Databricks Workspace paths, including creating and moving files, managing directories, and generating browser-accessible URIs. Additionally, the commit includes updates to existing methods and introduces new fixtures for creating notebooks, accompanied by extensive unit tests to ensure reliability and functionality.
  • Added propagation of blueprint version into User-Agent header when it is used as library (#114). A new feature has been introduced in the library that allows for the propagation of the blueprint version and the name of the command line interface (CLI) command used in the User-Agent header when the library is utilized as a library. This feature includes the addition of two new pairs of OtherInfo: blueprint/X.Y.Z to indicate that the request is made using the blueprint library and cmd/<name> to store the name of the CLI command used for making the request. The implementation involves using the with_user_agent_extra function from databricks.sdk.config to set the user agent consistently with the Databricks CLI. Several changes have been made to the test file for test_useragent.py to include a new test case, test_user_agent_is_propagated, which checks if the blueprint version and the name of the command are correctly propagated to the User-Agent header. A context manager http_fixture_server has been added that creates an HTTP server with a custom handler, which extracts the blueprint version and the command name from the User-Agent header and stores them in the user_agent dictionary. The test case calls the foo command with a mocked WorkspaceClient instance and sets the DATABRICKS_HOST and DATABRICKS_TOKEN environment variables to test the propagation of the blueprint version and the command name in the User-Agent header. The test case then asserts that the blueprint version and the name of the command are present and correctly set in the user_agent dictionary.
  • Bump actions/checkout from 4.1.6 to 4.1.7 (#112). In this release, the version of the "actions/checkout" action used in the Checkout Code step of the acceptance workflow has been updated from 4.1.6 to 4.1.7. This update may include bug fixes, performance improvements, and new features, although specific changes are not mentioned in the commit message. The Unshallow step remains unchanged, continuing to fetch and clean up the repository's history. This update ensures that the latest enhancements from the "actions/checkout" action are utilized, aiming to improve the reliability and performance of the code checkout process in the GitHub Actions workflow. Software engineers should be aware of this update and its potential impact on their workflows.

Dependency updates:

  • Bump actions/checkout from 4.1.6 to 4.1.7 (#112).

0.6.3

  • fixed Command.get_argument_type bug with UnionType (#110). In this release, the Command.get_argument_type method has been updated to include special handling for UnionType, resolving a bug that caused the function to crash when encountering this type. The method now returns the string representation of the annotation if the argument is a UnionType, providing more accurate and reliable results. To facilitate this, modifications were made using the types module. Additionally, the foo function has a new optional argument optional_arg of type str, with a default value of None. This argument is passed to the some function in the assertion. The Prompts type has been added to the foo function signature, and an assertion has been added to verify if prompts is an instance of Prompts. Lastly, the default value of the address argument has been changed from an empty string to "default", and the same changes have been applied to the test_injects_prompts test function.

0.6.2

  • Applied type casting & remove empty kwarg for Command (#108). A new method, get_argument_type, has been added to the Command class in the cli.py file to determine the type of a given argument name based on the function's signature. The _route method has been updated to remove any empty keyword arguments from the kwargs dictionary, and apply type casting based on the argument type using the get_argument_type method. This ensures that the kwargs passed into App.command are correctly typed and eliminates any empty keyword arguments, which were previously passed as empty strings. In the test file for the command-line interface, the foo command's keyword arguments have been updated to include age (int), salary (float), is_customer (bool), and address (str) types, with the name argument remaining and a default value for address. The test_commands and test_injects_prompts functions have been updated accordingly. These changes aim to improve the input validation and type safety of the App.command method.

0.6.1

  • Made ProductInfo.version a cached_property to avoid failure when comparing wheel uploads in development (#105). In this release, the apply method of a class has been updated to sort upgrade scripts in semantic versioning order before applying them, addressing potential issues with version comparison during development. The implementation of ProductInfo.version has been refactored to a cached_property called _version, which calculates and caches the project version, addressing a failure during wheel upload comparisons in development. The Wheels class constructor has also been updated to include explicit keyword-only arguments, and a deprecation warning has been added. These changes aim to improve the reliability and predictability of the upgrade process and the library as a whole.

Dependency updates:

  • Bump actions/checkout from 4.1.5 to 4.1.6 (#106).

0.6.0

  • Added upstream wheel uploads for Databricks Workspaces without Public Internet access (#99). This commit introduces a new feature for uploading upstream wheel dependencies to Databricks Workspaces without Public Internet access. A new flag has been added to upload functions, allowing users to include or exclude dependencies in the download list. The WheelsV2 class has been updated with a new method, upload_wheel_dependencies(prefixes), which checks if each wheel's name starts with any of the provided prefixes before uploading it to the Workspace File System (WSFS). This feature also includes two new tests to verify the functionality of uploading the main wheel package and dependent wheel packages, optimizing downloads based on specific use cases. This enables users to more easily use the package in offline environments with restricted internet access, particularly for Databricks Workspaces with extra layers of network security.
  • Fixed bug for double-uploading of unreleased wheels in air-gapped setups (#103). In this release, we have addressed a bug in the upload_wheel_dependencies method of the WheelsV2 class, which caused double-uploading of unreleased wheels in air-gapped setups. This issue occurred due to the condition if wheel.name == self._local_wheel.name not being met, resulting in undefined behavior. We have introduced a cached property _current_version to tackle this bug for unreleased versions uploaded to air-gapped workspaces. We also added a new method, upload_to_wsfs(), that uploads files to the workspace file system (WSFS) in the integration test. This release also includes new tests to ensure that only the Databricks SDK is uploaded and that the number of installation files is correct. These changes have resolved the double-uploading issue, and the number of installation files, Databricks SDK, Blueprint, and version.json metadata are now uploaded correctly to WSFS.

0.5.0

... (truncated)

Commits
  • 49c74a3 Release v0.8.0 (#134)
  • 40cb3a4 Added DBFSPath as os.PathLike implementation (#131)
  • 3d32bb3 Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 (#133)
  • 4d38663 Improved integration tests for WorkspacePath (#132)
  • bef22a7 WorkspacePath: Fix the .rename() and .replace() implementations to return t...
  • 3e76b65 WorkspacePath fixes for the .resolve() implementation (#129)
  • 5ca08fc Type annotations on path-related unit tests (#128)
  • 29dd960 Fix .as_uri() and .absolute() implementations for WorkspacePath (#127)
  • 3831e28 Update WorkspacePath to support Python 3.12 (#122)
  • a4cf2df Fix build workflow: verify/lint (#123)
  • Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Updates the requirements on [databricks-labs-blueprint](https://github.com/databrickslabs/blueprint) to permit the latest version.
- [Release notes](https://github.com/databrickslabs/blueprint/releases)
- [Changelog](https://github.com/databrickslabs/blueprint/blob/main/CHANGELOG.md)
- [Commits](databrickslabs/blueprint@v0.7.0...v0.8.0)

---
updated-dependencies:
- dependency-name: databricks-labs-blueprint
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot requested review from a team and aminmovahed-db July 16, 2024 14:58
@dependabot dependabot bot added dependencies python Pull requests that update Python code labels Jul 16, 2024
pyproject.toml Outdated Show resolved Hide resolved
@nfx nfx temporarily deployed to account-admin July 16, 2024 15:00 — with GitHub Actions Inactive
Copy link

✅ 2/2 passed, 17s total

Running from acceptance #4685

@nfx nfx merged commit bfc72e2 into main Jul 16, 2024
4 checks passed
@nfx nfx deleted the dependabot/pip/databricks-labs-blueprint-gte-0.7-and-lt-0.9 branch July 16, 2024 15:03
nfx added a commit that referenced this pull request Jul 19, 2024
* Added `lsql` lakeview dashboard-as-code implementation ([#1920](#1920)). The open-source library has been updated with new features in its dashboard creation functionality. The `assessment_report` and `estimates_report` jobs, along with their corresponding tasks, have been removed. The `crawl_groups` task has been modified to accept a new parameter, `group_manager`. These changes are part of a larger implementation of the `lsql` Lakeview dashboard-as-code system for creating dashboards. The new implementation has been tested through manual testing, existing unit tests, integration tests, and verification on a staging environment, and is expected to improve the functionality and maintainability of the dashboards. The removal of the `assessment_report` and `estimates_report` jobs and tasks may indicate that their functionality has been incorporated into the new `lsql` implementation or is no longer necessary. The new `crawl_groups` task parameter may be used in conjunction with the new `lsql` implementation to enhance the assessment and estimation of groups.
* Added new widget to get table count ([#2202](#2202)). A new widget has been introduced that presents a table count summary, categorized by type (external or managed), location (DBFS root, mount, cloud), and format (delta, parquet, etc.). This enhancement is complemented by an additional SQL file, responsible for generating necessary count statistics. The script discerns the table type and location through location string analysis and subsequent categorization. The output is structured and ordered by table type. It's important to note that no existing functionality has been altered, and the new feature is self-contained within the added SQL file. To ensure the correct functioning of this addition, relevant documentation and manual tests have been incorporated.
* Added support for DBFS when building the dependency graph for tasks ([#2199](#2199)). In this update, we have added support for the Databricks File System (DBFS) when building the dependency graph for tasks during workflow assessment. This enhancement allows for the use of wheels, eggs, requirements.txt files, and PySpark jobs located in DBFS when assessing workflows. The `DependencyGraph` object's `register_library` method has been updated to handle paths in both Workspace and DBFS formats. Additionally, we have introduced the `_as_path` method and the `_temporary_copy` context manager to manage file copying and path determination. This development resolves issue [#1558](#1558) and includes modifications to the existing `assessment` workflow and new unit tests.
* Applied `databricks labs lsql fmt` for SQL files ([#2184](#2184)). The engineering team has developed and applied formatting to several SQL files using the `databricks labs lsql fmt` tool from various pull requests, including <databrickslabs/lsql#221>. These changes improve code readability and consistency without affecting functionality. The formatting includes adding comment delimiters, converting subqueries to nested SELECT statements, renaming columns for clarity, updating comments, modifying conditional statements, and improving indentation. The impacted SQL files include queries related to data migration complexity, assessing data modeling complexity, generating table estimates, and calculating data migration effort. Manual testing has been performed to ensure that the update does not introduce any issues in the installed dashboards.
* Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 ([#2182](#2182)). In this release, the version of `sigstore/gh-action-sigstore-python` is bumped to 3.0.0 from 2.1.1 in the project's GitHub Actions workflow. This new version brings several changes, additions, and removals, such as the removal of certain settings like `fulcio-url`, `rekor-url`, `ctfe`, and `rekor-root-pubkey`, and output settings like `signature`, `certificate`, and `bundle`. The `inputs` field is now parsed according to POSIX shell lexing rules and is optional if `release-signing-artifacts` is true and the action's event is a `release` event. The default suffix has changed from `.sigstore` to `.sigstore.json`. Additionally, various deprecations present in `sigstore-python`'s 2.x series have been resolved. This PR also includes several commits, including preparing for version 3.0.0, cleaning up workflows, and removing old output settings. There are no conflicts with this PR, and Dependabot will resolve them automatically. Users can trigger Dependabot actions by commenting on this PR with specific commands.
* Consistently cleanup linter codes ([#2194](#2194)). This commit introduces changes to the linting functionality of PySpark, focusing on enhancing code consistency and accuracy. New checks have been added for detecting code incompatibilities with UC Shared Clusters, targeting Python UDF unsupported eval types, spark.catalog.X APIs on DBR versions earlier than 14.3, and the use of commandContext. A new file, python-udfs_14_3.py, containing tests for these incompatibilities has been added. The commit also resolves false linting advice for homonymous method names and updates the code for static analysis message codes, improving self-documentation and maintainability. These changes are limited to the linting functionality of PySpark and do not affect any other functionalities. Co-authored by Eric Vergnaud and Serge Smertin.
* Disable the builtin pip version check when running pip commands ([#2214](#2214)). In this release, we have introduced a modification to disable the built-in pip version check when using pip to install dependencies. This change involves altering the existing workflow of the `_install_pip` method to include the `--disable-pip-version-check` flag in the pip install command, reducing noise in pip-related errors and messages, and enhancing user experience. We have conducted manual and unit testing to ensure that the changes do not introduce any regressions and that existing functionalities remain unaffected. The error message has been updated to reflect the new pip behavior, including the `--disable-pip-version-check` flag in the message. Overall, these changes improve the user experience by reducing unnecessary error messages and providing clearer error information.
* Document `principal-prefix-access` for azure will only list abfss storage accounts ([#2212](#2212)). In this release, we have updated the documentation for the `principal-prefix-access` CLI command in the context of Azure. This command now exclusively lists Azure Storage Blob Gen2 accounts and disregards unsupported storage formats such as wasb:// or adl://. This change is significant as these unsupported storage formats are not compatible with Unity Catalog (UC) and will be disregarded during the migration process. This update clarifies the behavior of the command, ensuring that only relevant storage accounts are displayed. This modification is crucial for users who are migrating credentials to UC, as it prevents the incorporation of unsupported storage accounts, resulting in a more streamlined and efficient migration process.
* Group migration: change error logging format ([#2215](#2215)). In this release, we have updated the error logging format for failed permissions migrations during the experimental group migration workflow to enhance readability and debugging capabilities. Previously, the logs only stated that a migration failure occurred without further details. Now, the new format includes both the source and destination account names, as well as a description of the simulated failure during the migration process. This improves the transparency and usefulness of the error logs for debugging and troubleshooting purposes. Additionally, we have added unit tests to ensure the proper logging of failed migrations, ensuring the reliability of the group migration process for our users. This update demonstrates our commitment to providing clear and informative error messages to make the software engineering experience better.
* Improve error handling as already exists error occurs ([#2077](#2077)). The recent change enhances error handling for the `create-catalogs-schemas` CLI command, addressing an issue where the command would fail if the catalog or schema already existed. The modification involves the introduction of the `_get_missing_catalogs_schemas` method to avoid recreating existing ones. The `create_all_catalogs_schemas` method has been updated to include try-except blocks for `_create_catalog_validate` and `_create_schema` methods, skipping creation if a `BadRequest` error occurs with the message "already exists." This ensures that no overwriting of existing catalogs and schemas takes place. A new test case, "test_create_catalogs_schemas_handles_existing," has been added to verify the command's handling of existing catalogs and schemas. This change resolves issue [#1939](#1939) and is manually tested; no new methods were added, and existing functionality was changed only within the test file.
* Support run assessment as a collection ([#1925](#1925)). This commit introduces the capability to run eligible CLI commands as a collection, with an initial implementation for the assessment run command. A new parameter `collection_workspace_id` has been added to determine whether the current installation workflow is run or if an account context is created to iterate through all workspaces of the specified collection and run the assessment workflow. The `join_collection` method has been updated to accept a list of workspace IDs and a boolean value. Unit tests have been added and existing tests have been updated to ensure proper functionality. The `databricks labs ucx` command has also been modified to support this feature, with the `join_collection` method syncing workspaces in the collection when the `sync` flag is set to True.
* Test UCX over Python v3.10, v3.11, and v3.12 ([#2195](#2195)). In this release, we introduce significant enhancements to our GitHub Actions CI workflow, enabling more comprehensive testing of UCX over Python versions 3.10, 3.11, and 3.12. We've implemented a new matrix strategy in the `push.yml` workflow file, dynamically setting the `python-version` using the `${{ matrix.pyVersion }}` variable. This allows developers to test UCX with specific Python versions by setting the `HATCH_PYTHON` variable. Additionally, we've updated the `pyproject.toml` file, removing the Python 3.10 requirement and improving virtual environment integration with popular IDEs. The `test_migrator_supported_language_with_fixer` function in `test_files.py` has been refactored for a more efficient 'migrator.apply' method test using temporary directories and files. This release aims to ensure compatibility, identify version-specific issues, and improve the user experience for developers.
* Updated databricks-labs-blueprint requirement from ~=0.7.0 to >=0.7,<0.9 ([#2191](#2191)). In this pull request, the `databricks-labs-blueprint` package requirement has been updated from version `~=0.7.0` to `>=0.8,<0.9`. This update ensures compatibility with the project's requirements while allowing the use of the latest version of the package. The pull request also includes release notes and changelog information from the `databrickslabs/blueprint` repository, detailing various improvements and bug fixes, such as support for Python 3.12, type annotations for path-related unit tests, and fixes for the `WorkspacePath` class. A list of commits and their corresponding hashes is provided for engineers to review the changes made in the update and ensure compatibility with their projects.
* Updated databricks-labs-lsql requirement from <0.7,>=0.5 to >=0.5,<0.8 ([#2189](#2189)). In this update, the version requirement of the `databricks-labs-lsql` dependency has been updated from `<0.7,>=0.5` to `>=0.5,<0.8`. This change allows for the use of the latest version of the `databricks-labs-lsql` package while ensuring compatibility with the current system. Additionally, this commit includes the release notes, changelog, and commit details from the `databricks-labs-lsql` repository for version 0.7.1. These documents provide information on various bug fixes, improvements, and changes, such as updating the `sigstore/gh-action-sigstore-python` package from 2.1.1 to 3.0.0, using a default factory to create `Tile._position`, and other enhancements. The changelog includes detailed information about releases and features, while the commit details highlight the changes and contributors for each individual commit.
* Updated sqlglot requirement from <25.6,>=25.5.0 to >=25.5.0,<25.7 ([#2211](#2211)). In this update, we have revised the requirement range for the `sqlglot` library to '>=25.5.0,<25.7' from '<25.6,>=25.5.0'. This modification allows us to utilize the latest version of sqlglot, which is v25.6.0, while ensuring that the version does not surpass 25.7. This change is part of issue [#2211](#2211), and the new version includes several enhancements such as support for ORDER BY ALL, FROM ROWS FROM (...) in PostgreSQL, and exp.TimestampAdd in Presto and Trino. Furthermore, the update encompasses modifications to the bigquery, clickhouse, and duckdb dialects, as well as several bug fixes. These improvements are aimed at increasing functionality, stability, and addressing issues in the library.
* Yield `DependencyProblem` if job on runtime DBR14+ and using .egg dependency ([#2020](#2020)). In this release, we have introduced a new method, `_register_egg`, to handle the registration of libraries in .egg format in the `build_dependency_graph` method. This method checks the runtime version of Databricks. If the version is DBR14 or higher, it yields `DependencyProblem` with code 'not-supported', indicating that installing eggs is no longer supported in Databricks 14.0 or higher. For lower runtime versions, the method downloads the .egg file from the workspace, writes it to a temporary directory, and then registers the library with the `DependencyGraph`. The existing functionality, such as registering libraries in .whl format and registering notebooks, remains unchanged. This release also includes a new test case, `test_job_dependency_problem_egg_dbr14plus`, which creates a job with an .egg dependency and verifies that the expected `DependencyProblem` is raised when using .egg dependencies in a job on Databricks Runtime (DBR) version 14 or higher. This change addresses issue [#1793](#1793) and improves dependency management, making it easier for software engineers to adopt and work seamlessly with the project.

Dependency updates:

 * Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 ([#2182](#2182)).
 * Updated databricks-labs-lsql requirement from <0.7,>=0.5 to >=0.5,<0.8 ([#2189](#2189)).
 * Updated databricks-labs-blueprint requirement from ~=0.7.0 to >=0.7,<0.9 ([#2191](#2191)).
 * Updated sqlglot requirement from <25.6,>=25.5.0 to >=25.5.0,<25.7 ([#2211](#2211)).
@nfx nfx mentioned this pull request Jul 19, 2024
nfx added a commit that referenced this pull request Jul 19, 2024
* Added `lsql` lakeview dashboard-as-code implementation
([#1920](#1920)). The
open-source library has been updated with new features in its dashboard
creation functionality. The `assessment_report` and `estimates_report`
jobs, along with their corresponding tasks, have been removed. The
`crawl_groups` task has been modified to accept a new parameter,
`group_manager`. These changes are part of a larger implementation of
the `lsql` Lakeview dashboard-as-code system for creating dashboards.
The new implementation has been tested through manual testing, existing
unit tests, integration tests, and verification on a staging
environment, and is expected to improve the functionality and
maintainability of the dashboards. The removal of the
`assessment_report` and `estimates_report` jobs and tasks may indicate
that their functionality has been incorporated into the new `lsql`
implementation or is no longer necessary. The new `crawl_groups` task
parameter may be used in conjunction with the new `lsql` implementation
to enhance the assessment and estimation of groups.
* Added new widget to get table count
([#2202](#2202)). A new
widget has been introduced that presents a table count summary,
categorized by type (external or managed), location (DBFS root, mount,
cloud), and format (delta, parquet, etc.). This enhancement is
complemented by an additional SQL file, responsible for generating
necessary count statistics. The script discerns the table type and
location through location string analysis and subsequent categorization.
The output is structured and ordered by table type. It's important to
note that no existing functionality has been altered, and the new
feature is self-contained within the added SQL file. To ensure the
correct functioning of this addition, relevant documentation and manual
tests have been incorporated.
* Added support for DBFS when building the dependency graph for tasks
([#2199](#2199)). In this
update, we have added support for the Databricks File System (DBFS) when
building the dependency graph for tasks during workflow assessment. This
enhancement allows for the use of wheels, eggs, requirements.txt files,
and PySpark jobs located in DBFS when assessing workflows. The
`DependencyGraph` object's `register_library` method has been updated to
handle paths in both Workspace and DBFS formats. Additionally, we have
introduced the `_as_path` method and the `_temporary_copy` context
manager to manage file copying and path determination. This development
resolves issue
[#1558](#1558) and includes
modifications to the existing `assessment` workflow and new unit tests.
* Applied `databricks labs lsql fmt` for SQL files
([#2184](#2184)). The
engineering team has developed and applied formatting to several SQL
files using the `databricks labs lsql fmt` tool from various pull
requests, including <databrickslabs/lsql#221>.
These changes improve code readability and consistency without affecting
functionality. The formatting includes adding comment delimiters,
converting subqueries to nested SELECT statements, renaming columns for
clarity, updating comments, modifying conditional statements, and
improving indentation. The impacted SQL files include queries related to
data migration complexity, assessing data modeling complexity,
generating table estimates, and calculating data migration effort.
Manual testing has been performed to ensure that the update does not
introduce any issues in the installed dashboards.
* Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0
([#2182](#2182)). In this
release, the version of `sigstore/gh-action-sigstore-python` is bumped
to 3.0.0 from 2.1.1 in the project's GitHub Actions workflow. This new
version brings several changes, additions, and removals, such as the
removal of certain settings like `fulcio-url`, `rekor-url`, `ctfe`, and
`rekor-root-pubkey`, and output settings like `signature`,
`certificate`, and `bundle`. The `inputs` field is now parsed according
to POSIX shell lexing rules and is optional if
`release-signing-artifacts` is true and the action's event is a
`release` event. The default suffix has changed from `.sigstore` to
`.sigstore.json`. Additionally, various deprecations present in
`sigstore-python`'s 2.x series have been resolved. This PR also includes
several commits, including preparing for version 3.0.0, cleaning up
workflows, and removing old output settings. There are no conflicts with
this PR, and Dependabot will resolve them automatically. Users can
trigger Dependabot actions by commenting on this PR with specific
commands.
* Consistently cleanup linter codes
([#2194](#2194)). This
commit introduces changes to the linting functionality of PySpark,
focusing on enhancing code consistency and accuracy. New checks have
been added for detecting code incompatibilities with UC Shared Clusters,
targeting Python UDF unsupported eval types, spark.catalog.X APIs on DBR
versions earlier than 14.3, and the use of commandContext. A new file,
python-udfs_14_3.py, containing tests for these incompatibilities has
been added. The commit also resolves false linting advice for homonymous
method names and updates the code for static analysis message codes,
improving self-documentation and maintainability. These changes are
limited to the linting functionality of PySpark and do not affect any
other functionalities. Co-authored by Eric Vergnaud and Serge Smertin.
* Disable the builtin pip version check when running pip commands
([#2214](#2214)). In this
release, we have introduced a modification to disable the built-in pip
version check when using pip to install dependencies. This change
involves altering the existing workflow of the `_install_pip` method to
include the `--disable-pip-version-check` flag in the pip install
command, reducing noise in pip-related errors and messages, and
enhancing user experience. We have conducted manual and unit testing to
ensure that the changes do not introduce any regressions and that
existing functionalities remain unaffected. The error message has been
updated to reflect the new pip behavior, including the
`--disable-pip-version-check` flag in the message. Overall, these
changes improve the user experience by reducing unnecessary error
messages and providing clearer error information.
* Document `principal-prefix-access` for azure will only list abfss
storage accounts
([#2212](#2212)). In this
release, we have updated the documentation for the
`principal-prefix-access` CLI command in the context of Azure. This
command now exclusively lists Azure Storage Blob Gen2 accounts and
disregards unsupported storage formats such as wasb:// or adl://. This
change is significant as these unsupported storage formats are not
compatible with Unity Catalog (UC) and will be disregarded during the
migration process. This update clarifies the behavior of the command,
ensuring that only relevant storage accounts are displayed. This
modification is crucial for users who are migrating credentials to UC,
as it prevents the incorporation of unsupported storage accounts,
resulting in a more streamlined and efficient migration process.
* Group migration: change error logging format
([#2215](#2215)). In this
release, we have updated the error logging format for failed permissions
migrations during the experimental group migration workflow to enhance
readability and debugging capabilities. Previously, the logs only stated
that a migration failure occurred without further details. Now, the new
format includes both the source and destination account names, as well
as a description of the simulated failure during the migration process.
This improves the transparency and usefulness of the error logs for
debugging and troubleshooting purposes. Additionally, we have added unit
tests to ensure the proper logging of failed migrations, ensuring the
reliability of the group migration process for our users. This update
demonstrates our commitment to providing clear and informative error
messages to make the software engineering experience better.
* Improve error handling as already exists error occurs
([#2077](#2077)). The recent
change enhances error handling for the `create-catalogs-schemas` CLI
command, addressing an issue where the command would fail if the catalog
or schema already existed. The modification involves the introduction of
the `_get_missing_catalogs_schemas` method to avoid recreating existing
ones. The `create_all_catalogs_schemas` method has been updated to
include try-except blocks for `_create_catalog_validate` and
`_create_schema` methods, skipping creation if a `BadRequest` error
occurs with the message "already exists." This ensures that no
overwriting of existing catalogs and schemas takes place. A new test
case, "test_create_catalogs_schemas_handles_existing," has been added to
verify the command's handling of existing catalogs and schemas. This
change resolves issue
[#1939](#1939) and is
manually tested; no new methods were added, and existing functionality
was changed only within the test file.
* Support run assessment as a collection
([#1925](#1925)). This
commit introduces the capability to run eligible CLI commands as a
collection, with an initial implementation for the assessment run
command. A new parameter `collection_workspace_id` has been added to
determine whether the current installation workflow is run or if an
account context is created to iterate through all workspaces of the
specified collection and run the assessment workflow. The
`join_collection` method has been updated to accept a list of workspace
IDs and a boolean value. Unit tests have been added and existing tests
have been updated to ensure proper functionality. The `databricks labs
ucx` command has also been modified to support this feature, with the
`join_collection` method syncing workspaces in the collection when the
`sync` flag is set to True.
* Test UCX over Python v3.10, v3.11, and v3.12
([#2195](#2195)). In this
release, we introduce significant enhancements to our GitHub Actions CI
workflow, enabling more comprehensive testing of UCX over Python
versions 3.10, 3.11, and 3.12. We've implemented a new matrix strategy
in the `push.yml` workflow file, dynamically setting the
`python-version` using the `${{ matrix.pyVersion }}` variable. This
allows developers to test UCX with specific Python versions by setting
the `HATCH_PYTHON` variable. Additionally, we've updated the
`pyproject.toml` file, removing the Python 3.10 requirement and
improving virtual environment integration with popular IDEs. The
`test_migrator_supported_language_with_fixer` function in
`test_files.py` has been refactored for a more efficient
'migrator.apply' method test using temporary directories and files. This
release aims to ensure compatibility, identify version-specific issues,
and improve the user experience for developers.
* Updated databricks-labs-blueprint requirement from ~=0.7.0 to
>=0.7,<0.9 ([#2191](#2191)).
In this pull request, the `databricks-labs-blueprint` package
requirement has been updated from version `~=0.7.0` to `>=0.8,<0.9`.
This update ensures compatibility with the project's requirements while
allowing the use of the latest version of the package. The pull request
also includes release notes and changelog information from the
`databrickslabs/blueprint` repository, detailing various improvements
and bug fixes, such as support for Python 3.12, type annotations for
path-related unit tests, and fixes for the `WorkspacePath` class. A list
of commits and their corresponding hashes is provided for engineers to
review the changes made in the update and ensure compatibility with
their projects.
* Updated databricks-labs-lsql requirement from <0.7,>=0.5 to >=0.5,<0.8
([#2189](#2189)). In this
update, the version requirement of the `databricks-labs-lsql` dependency
has been updated from `<0.7,>=0.5` to `>=0.5,<0.8`. This change allows
for the use of the latest version of the `databricks-labs-lsql` package
while ensuring compatibility with the current system. Additionally, this
commit includes the release notes, changelog, and commit details from
the `databricks-labs-lsql` repository for version 0.7.1. These documents
provide information on various bug fixes, improvements, and changes,
such as updating the `sigstore/gh-action-sigstore-python` package from
2.1.1 to 3.0.0, using a default factory to create `Tile._position`, and
other enhancements. The changelog includes detailed information about
releases and features, while the commit details highlight the changes
and contributors for each individual commit.
* Updated sqlglot requirement from <25.6,>=25.5.0 to >=25.5.0,<25.7
([#2211](#2211)). In this
update, we have revised the requirement range for the `sqlglot` library
to '>=25.5.0,<25.7' from '<25.6,>=25.5.0'. This modification allows us
to utilize the latest version of sqlglot, which is v25.6.0, while
ensuring that the version does not surpass 25.7. This change is part of
issue [#2211](#2211), and
the new version includes several enhancements such as support for ORDER
BY ALL, FROM ROWS FROM (...) in PostgreSQL, and exp.TimestampAdd in
Presto and Trino. Furthermore, the update encompasses modifications to
the bigquery, clickhouse, and duckdb dialects, as well as several bug
fixes. These improvements are aimed at increasing functionality,
stability, and addressing issues in the library.
* Yield `DependencyProblem` if job on runtime DBR14+ and using .egg
dependency ([#2020](#2020)).
In this release, we have introduced a new method, `_register_egg`, to
handle the registration of libraries in .egg format in the
`build_dependency_graph` method. This method checks the runtime version
of Databricks. If the version is DBR14 or higher, it yields
`DependencyProblem` with code 'not-supported', indicating that
installing eggs is no longer supported in Databricks 14.0 or higher. For
lower runtime versions, the method downloads the .egg file from the
workspace, writes it to a temporary directory, and then registers the
library with the `DependencyGraph`. The existing functionality, such as
registering libraries in .whl format and registering notebooks, remains
unchanged. This release also includes a new test case,
`test_job_dependency_problem_egg_dbr14plus`, which creates a job with an
.egg dependency and verifies that the expected `DependencyProblem` is
raised when using .egg dependencies in a job on Databricks Runtime (DBR)
version 14 or higher. This change addresses issue
[#1793](#1793) and improves
dependency management, making it easier for software engineers to adopt
and work seamlessly with the project.

Dependency updates:

* Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0
([#2182](#2182)).
* Updated databricks-labs-lsql requirement from <0.7,>=0.5 to >=0.5,<0.8
([#2189](#2189)).
* Updated databricks-labs-blueprint requirement from ~=0.7.0 to
>=0.7,<0.9 ([#2191](#2191)).
* Updated sqlglot requirement from <25.6,>=25.5.0 to >=25.5.0,<25.7
([#2211](#2211)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant