Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update databricks-labs-lsql requirement from ~=0.5.0 to >=0.5,<0.7 #2160

Merged
merged 1 commit into from
Jul 11, 2024

Conversation

dependabot[bot]
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Jul 11, 2024

Updates the requirements on databricks-labs-lsql to permit the latest version.

Release notes

Sourced from databricks-labs-lsql's releases.

v0.6.0

  • Added method to dashboards to get dashboard url (#211). In this release, we have added a new method get_url to the lakeview_dashboards object in the laksedashboard library. This method utilizes the Databricks SDK to retrieve the dashboard URL, simplifying the code and making it more maintainable. Previously, the dashboard URL was constructed by concatenating the host and dashboard ID, but this new method ensures that the URL is obtained correctly, even if the format changes in the future. Additionally, a new unit test has been added for a method that gets the dashboard URL using the workspace client. This new functionality allows users to easily retrieve the URL for a dashboard using its ID and the workspace client.
  • Extend replace database in query (#210). This commit extends the database replacement functionality in the DashboardMetadata class, allowing users to specify which database and catalog to replace. The enhancement includes support for catalog replacement and a new replace_database method in the DashboardMetadata class, which replaces the catalog and/or database in the query based on provided parameters. These changes enhance the flexibility and customization of the database replacement feature in queries, making it easier for users to control how their data is displayed in the dashboard. The create_dashboard function has also been updated to use the new method for replacing the database and catalog. Additionally, the TileMetadata update method has been replaced with a new merge method, and the QueryTile and Tile classes have new properties and methods for handling content, width, height, and position. The commit also includes several unit tests to ensure the new functionality works as expected.
  • Improve object oriented dashboard-as-code implementation (#208). In this release, the object-oriented implementation of the dashboard-as-code feature has been significantly improved, addressing previous pull request comments (#201). The TileMetadata dataclass now includes methods for updating and comparing tile metadata, and the DashboardMetadata class has been removed and its functionality incorporated into the Dashboards class. The Dashboards class now generates tiles, datasets, and layouts for dashboards using the provided query_transformer. The code's readability and maintainability have been further enhanced by replacing the use of the copy module with dataclasses.replace for creating object copies. Additionally, updates have been made to the unit tests for dashboard functionality in the project, with new methods and attributes added to check for valid dashboard metadata and handle duplicate query or widget IDs, as well as to specify the order in which tiles and widgets should be displayed in the dashboard.

Contributors: @​JCZuurmond

Changelog

Sourced from databricks-labs-lsql's changelog.

0.6.0

  • Added method to dashboards to get dashboard url (#211). In this release, we have added a new method get_url to the lakeview_dashboards object in the laksedashboard library. This method utilizes the Databricks SDK to retrieve the dashboard URL, simplifying the code and making it more maintainable. Previously, the dashboard URL was constructed by concatenating the host and dashboard ID, but this new method ensures that the URL is obtained correctly, even if the format changes in the future. Additionally, a new unit test has been added for a method that gets the dashboard URL using the workspace client. This new functionality allows users to easily retrieve the URL for a dashboard using its ID and the workspace client.
  • Extend replace database in query (#210). This commit extends the database replacement functionality in the DashboardMetadata class, allowing users to specify which database and catalog to replace. The enhancement includes support for catalog replacement and a new replace_database method in the DashboardMetadata class, which replaces the catalog and/or database in the query based on provided parameters. These changes enhance the flexibility and customization of the database replacement feature in queries, making it easier for users to control how their data is displayed in the dashboard. The create_dashboard function has also been updated to use the new method for replacing the database and catalog. Additionally, the TileMetadata update method has been replaced with a new merge method, and the QueryTile and Tile classes have new properties and methods for handling content, width, height, and position. The commit also includes several unit tests to ensure the new functionality works as expected.
  • Improve object oriented dashboard-as-code implementation (#208). In this release, the object-oriented implementation of the dashboard-as-code feature has been significantly improved, addressing previous pull request comments (#201). The TileMetadata dataclass now includes methods for updating and comparing tile metadata, and the DashboardMetadata class has been removed and its functionality incorporated into the Dashboards class. The Dashboards class now generates tiles, datasets, and layouts for dashboards using the provided query_transformer. The code's readability and maintainability have been further enhanced by replacing the use of the copy module with dataclasses.replace for creating object copies. Additionally, updates have been made to the unit tests for dashboard functionality in the project, with new methods and attributes added to check for valid dashboard metadata and handle duplicate query or widget IDs, as well as to specify the order in which tiles and widgets should be displayed in the dashboard.

0.5.0

  • Added Command Execution backend which uses Command Execution API on a cluster (#95). In this release, the databricks labs lSQL library has been updated with a new Command Execution backend that utilizes the Command Execution API. A new CommandExecutionBackend class has been implemented, which initializes a CommandExecutor instance taking a cluster ID, workspace client, and language as parameters. The execute method runs SQL commands on the specified cluster, and the fetch method returns the query result as an iterator of Row objects. The existing StatementExecutionBackend class has been updated to inherit from a new abstract base class called ExecutionBackend, which includes a save_table method for saving data to tables and is meant to be a common base class for both Statement and Command Execution backends. The StatementExecutionBackend class has also been updated to use the new ExecutionBackend abstract class and its constructor now accepts a max_records_per_batch parameter. The execute and fetch methods have been updated to use the new _only_n_bytes method for logging truncated SQL statements. Additionally, the CommandExecutionBackend class has several methods, execute, fetch, and save_table to execute commands on a cluster and save the results to tables in the databricks workspace. This new backend is intended to be used for executing commands on a cluster and saving the results in a databricks workspace.
  • Added basic integration with Lakeview Dashboards (#66). In this release, we've added basic integration with Lakeview Dashboards to the project, enhancing its capabilities. This includes updating the databricks-labs-blueprint dependency to version 0.4.2 with the [yaml] extra, allowing for additional functionality related to handling YAML files. A new file, dashboards.py, has been introduced, providing a class for interacting with Databricks dashboards, along with methods for retrieving and saving dashboard configurations. Additionally, a new __init__.py file under the src/databricks/labs/lsql/lakeview directory imports all classes and functions from the model.py module, providing a foundation for further development and customization. The release also introduces a new file, model.py, containing code generated from OpenAPI specs by the Databricks SDK Generator, and a template file, model.py.tmpl, used for handling JSON data during integration with Lakeview Dashboards. A new file, polymorphism.py, provides utilities for checking if a value can be assigned to a specific type, supporting correct data typing and formatting with Lakeview Dashboards. Furthermore, a .gitignore file has been added to the tests/integration directory as part of the initial steps in adding integration testing to ensure compatibility with the Lakeview Dashboards platform. Lastly, the test_dashboards.py file in the tests/integration directory contains a function, test_load_dashboard(ws), which uses the Dashboards class to save a dashboard from a source to a destination path, facilitating testing during the integration process.
  • Added dashboard-as-code functionality (#201). This commit introduces dashboard-as-code functionality for the UCX project, enabling the creation and management of dashboards using code. The feature resolves multiple issues and includes a new create-dashboard command for creating unpublished dashboards. The functionality is available in the lsql lab and allows for specifying the order and width of widgets, overriding default widget identifiers, and supporting various SQL and markdown header arguments. The dashboard.yml file is used to define top-level metadata for the dashboard. This commit also includes extensive documentation and examples for using the dashboard as a library and configuring different options.
  • Automate opening integration test dashboard in debug mode (#167). A new feature has been added to automatically open the integration test dashboard in debug mode, making it easier for software engineers to debug and troubleshoot. This has been achieved by importing the webbrowser and is_in_debug modules from "databricks.labs.blueprint.entrypoint", and adding a check in the create function to determine if the code is running in debug mode. If it is, a dashboard URL is constructed from the workspace configuration and dashboard ID, and then opened in a web browser using "webbrowser.open". This allows for a more streamlined debugging process for the integration test dashboard. No other parts of the code have been affected by this change.
  • Automatically tile widgets (#109). In this release, we've introduced an automatic widget tiling feature for the dashboard creation process in our open-source library. The Dashboards class now includes a new class variable, _maximum_dashboard_width, set to 6, representing the maximum width allowed for each row of widgets in the dashboard. The create_dashboard method has been updated to accept a new self parameter, turning it into an instance method. A new _get_position method has been introduced to calculate and return the next available position for placing a widget, and a _get_width_and_height method has been added to return the width and height for a widget specification, initially handling CounterSpec instances. Additionally, we've added new unit tests to improve testing coverage, ensuring that widgets are created, positioned, and sized correctly. These tests also cover the correct positioning of widgets based on their order and available space, as well as the expected width and height for each widget.
  • Bump actions/checkout from 4.1.3 to 4.1.6 (#102). In the latest release, the 'actions/checkout' GitHub Action has been updated from version 4.1.3 to 4.1.6, which includes checking the platform to set the archive extension appropriately. This release also bumps the version of github/codeql-action from 2 to 3, actions/setup-node from 1 to 4, and actions/upload-artifact from 2 to 4. Additionally, the minor-actions-dependencies group was updated with two new versions. Disabling extensions.worktreeConfig when disabling sparse-checkout was introduced in version 4.1.4. The release notes and changelog for this update can be found in the provided link. This commit was made by dependabot[bot] with contributions from cory-miller and jww3.
  • Bump actions/checkout from 4.1.6 to 4.1.7 (#151). In the latest release, the 'actions/checkout' GitHub action has been updated from version 4.1.6 to 4.1.7 in the project's push workflow, which checks out the repository at the start of the workflow. This change brings potential bug fixes, performance improvements, or new features compared to the previous version. The update only affects the version number in the YAML configuration for the 'actions/checkout' step in the release.yml file, with no new methods or alterations to existing functionality. This update aims to ensure a smooth and enhanced user experience for those utilizing the project's push workflows by taking advantage of the possible improvements or bug fixes in the new version of 'actions/checkout'.
  • Create a dashboard with a counter from a single query (#107). In this release, we have introduced several enhancements to our dashboard-as-code approach, including the creation of a Dashboards class that provides methods for getting, saving, and deploying dashboards. A new method, create_dashboard, has been added to create a dashboard with a single page containing a counter widget. The counter widget is associated with a query that counts the number of rows in a specified dataset. The deploy_dashboard method has also been added to deploy the dashboard to the workspace. Additionally, we have implemented a new feature for creating dashboards with a counter from a single query, including modifications to the test_dashboards.py file and the addition of four new tests. These changes improve the robustness of the dashboard creation process and provide a more automated way to view important metrics.
  • Create text widget from markdown file (#142). A new feature has been implemented in the library that allows for the creation of a text widget from a markdown file, enhancing customization and readability for users. This development resolves issue #1
  • Design document for dashboards-as-code (#105). "The latest release introduces 'Dashboards as Code,' a method for defining and managing dashboards through configuration files, enabling version control and controlled changes. The building blocks include .sql, .md, and dashboard.yml files, with .sql defining queries and determining tile order, and dashboard.yml specifying top-level metadata and tile overrides. Metadata can be inferred or explicitly defined in the query or files. The tile order can be determined by SQL file order, tiles order in dashboard.yml, or SQL file metadata. This project can also be used as a library for embedding dashboard generation in your code. Configuration precedence follows command-line flags, SQL file headers, dashboard.yml, and SQL query content. The command-line interface is utilized for dashboard generation from configuration files."
  • Ensure propagation of lsql version into User-Agent header when it is used as library (#206). In this release, the pyproject.toml file has been updated to ensure that the correct version of the lsql library is propagated into the User-Agent header when used as a library, improving attribution. The databricks-sdk version has been updated from 0.22.0 to 0.29.0, and the __init__.py file of the lsql library has been modified to add the with_user_agent_extra function from the databricks.sdk.core package for correct attribution. The backends.py file has also been updated with improved type handling in the _row_to_sql and save_table functions for accurate SQL insertion and handling of user-defined classes. Additionally, a test has been added to ensure that the lsql version is correctly propagated in the User-Agent header when used as a library. These changes offer improved functionality and accurate type handling, making it easier for developers to identify the library version when used in other projects.
  • Fixed counter encodings (#143). In this release, we have improved the encoding of counters in the lsql dashboard by modifying the create_dashboard function in the dashboards.py file. Previously, the counter field encoding was hardcoded as "count," but has been changed to dynamically determine the first field name of the given fields, ensuring that counters are expected to have only one field. Additionally, a new integration test has been added to the tests/integration/test_dashboards.py file to ensure that the dashboard deployment functionality correctly handles SQL queries that do not perform a count. A new test for the Dashboards class has also been added to check that counter field encoding names are created as expected. The WorkspaceClient is mocked and not called in this test. These changes enhance the accuracy of counter encoding and improve the overall functionality and reliability of the lsql dashboard.
  • Fixed non-existing reference and typo in the documentation (#104). In this release, we've made improvements to the documentation of our open-source library, specifically addressing issue #104. The changes include fixing a non-existent reference and a typo in the Library size comparison section of the "comparison.md" document. This section provides guidance for selecting a library based on factors like library size, unified authentication, and compatibility with various Databricks warehouses and SQL Python APIs. The updates clarify the required dependency size for simple applications and scripts, and offer more detailed information about each library option. We've also added a new subsection titled Detailed comparison to provide a more comprehensive overview of each library's features. These changes are intended to help software engineers better understand which library is best suited for their specific needs, particularly for applications that require data transfer of large amounts of data serialized in Apache Arrow format and low result fetching latency, where we recommend using the Databricks SQL Connector for Python for efficient data transfer and low latency.
  • Fixed parsing message (#146). In this release, the warning message logged during the creation of a dashboard when a ParseError occurs has been updated to provide clearer and more detailed information about the parsing error. The new error message now includes the specific query being parsed and the exact parsing error, enabling developers to quickly identify the cause of parsing issues. This change ensures that engineers can efficiently diagnose and address parsing errors, improving the overall development and debugging experience with a more informative log format: "Parsing {query}: {error}".
  • Improve dashboard as code (#108). The Dashboards class in the 'dashboards.py' file has been updated to improve functionality and usability, with changes such as the addition of a type variable T for type checking and more descriptive names for methods. The save_to_folder method now accepts a Dashboard object and returns a Dashboard object, and a new static method create_dashboard has been added. Additionally, two new methods _with_better_names and _replace_names have been added for improved readability. The get_dashboard method now returns a Dashboard object instead of a dictionary. The save_to_folder method now also formats SQL code before saving it to file. These changes aim to enhance the functionality and readability of the codebase and provide more user-friendly methods for interacting with the Dashboards class. In addition to the changes in the Dashboards class, there have been updates in the organization of the project structure. The 'queries/counter.sql' file has been moved to 'dashboards/one_counter/counter.sql' in the 'tests/integration' directory. This modification enhances the organization of the project. Furthermore, several tests for the Dashboards class have been introduced in the 'databricks.labs.lsql.dashboards' module, demonstrating various functionalities of the class and ensuring that it functions as intended. The tests cover saving SQL and YML files to a specified folder, creating a dataset and a counter widget for each query, deploying dashboards with a given display name or dashboard ID, and testing the behavior of the save_to_folder and deploy_dashboard methods. Lastly, the commit removes the test_load_dashboard function and updates the test_dashboard_creates_one_dataset_per_query and test_dashboard_creates_one_counter_widget_per_query functions to use the updated Dashboard class. A new replace_recursively function is introduced to replace specific fields in a dataclass recursively. A new test function test_dashboards_deploys_exported_dashboard_definition has been added, which reads a dashboard definition from a JSON file, deploys it, and checks if it's successfully deployed using the Dashboards class. A new test function test_dashboard_deploys_dashboard_the_same_as_created_dashboard has also been added, which compares the original and deployed dashboards to ensure they are identical. Overall, these changes aim to improve the functionality and readability of the codebase and provide more user-friendly methods for interacting with the Dashboards class, as well as enhance the organization of the project structure and add new tests for the Dashboards class to ensure it functions as intended.
  • Infer fields from a query (#111). The Dashboards class in the dashboards.py file has been updated with the addition of a new method, _get_fields, which accepts a SQL query as input and returns a list of Field objects using the sqlglot library to parse the query and extract the necessary information. The create_dashboard method has been modified to call this new function when creating Query objects for each dataset. If a ParseError occurs, a warning is logged and iteration continues. This allows for the automatic population of fields when creating a new dashboard, eliminating the need for manual specification. Additionally, new tests have been added for invalid queries and for checking if the fields in a query have the expected names. These tests include test_dashboards_skips_invalid_query and test_dashboards_gets_fields_with_expected_names, which utilize the caplog fixture and create temporary query files to verify functionality. Existing functionality related to creating dashboards remains unchanged.
  • Make constant all caps (#140). In this release, the project's 'dashboards.py' file has been updated to improve code readability and maintainability. A constant variable _maximum_dashboard_width has been changed to all caps, becoming '_MAXIMUM_DASHBOARD_WIDTH'. This modification affects the Dashboards class and its methods, particularly _get_fields and '_get_position'. The _get_position method has been revised to use the new all caps constant variable. This change ensures better visibility of constants within the code, addressing issue #140. It's important to note that this modification only impacts the 'dashboards.py' file and does not affect any other functionalities.
  • Read display name from dashboard.yml (#144). In this release, we have introduced a new DashboardMetadata dataclass that reads the display name of a dashboard from a dashboard.yml file located in the dashboard's directory. If the dashboard.yml file is absent, the folder name will be used as the display name. This change improves the readability and maintainability of the dashboard configuration by explicitly defining the display name and reducing the need to specify widget information in multiple places. We have also added a new fixture called make_dashboard for creating and cleaning up lakeview dashboards in the test suite. The fixture handles creation and deletion of the dashboard and provides an option to set a custom display name. Additionally, we have added and modified several unit tests to ensure the proper handling of the DashboardMetadata class and the dashboard creation process, including tests for missing, present, or incorrect display_name keys in the YAML file. The dashboards.deploy_dashboard() function has been updated to handle cases where only dashboard_id is provided.
  • Set widget id in query header (#154). In this release, we've made significant improvements to widget metadata handling in our open-source library. We've introduced a new WidgetMetadata class that replaces the previous WidgetMetadata dataclass, now featuring a path attribute, spec_type property, and optional parameters for order, width, height, and _id. The _get_widgets method has been updated to accept an Iterable of WidgetMetadata objects, and both _get_layouts and _get_widgets methods now sort widgets using the order field. A new class method, WidgetMetadata.from_path, handles parsing widget metadata from a file path, replacing the removed _get_width_and_height method. Additionally, the WidgetMetadata class is now used in the deploy_dashboard method, and the test suite for the dashboards module has been enhanced with updated test_widget_metadata_replaces_width_and_height and test_widget_metadata_replaces_attribute functions, as well as new tests for specific scenarios. Issue #154 has been addressed by setting the widget id in the query header, and the aforementioned changes improve flexibility and ease of use for dashboard development.
  • Use order key in query header if defined (#149). In this release, we've introduced a new feature to use an order key in the query header if defined, enhancing the flexibility and control over the dashboard creation process. The WidgetMetadata dataclass now includes an optional order parameter of type int, and the _get_arguments_parser() method accepts the --order flag with type int. The replace_from_arguments() method has been updated to support the new order parameter, with a default value of self.order. The create_dashboard() method now implements a new _get_datasets() method to retrieve datasets from the dashboard folder and introduces a _get_widgets() method, which accepts a list of files, iterates over them, and yields tuples containing widgets and their corresponding metadata, including the order. These improvements enable the use of an order key in query headers, ensuring the correct order of widgets in the dashboard creation process. Additionally, a new test case has been added to verify the correct behavior of the dashboard deployment with a specified order key in the query header. This feature resolves issue #148.
  • Use widget width and height defined in query header (#147). In this release, the handling of metadata in SQL files has been updated to utilize the header of the file, instead of the first line, for improved readability and flexibility. This change includes a new WidgetMetadata class for defining the width and height of a widget in a dashboard, as well as new methods for parsing the widget metadata from a provided path. The release also includes updates to the documentation to cover the supported widget arguments -w or --width and '-h or --height', and resolves issue #114 by adding a test for deploying a dashboard with a big widget using a new function test_dashboard_deploys_dashboard_with_big_widget. Additionally, new test cases have been added for creating dashboards with custom-sized widgets based on query header width and height values, improving functionality and error handling.

Dependency updates:

  • Bump actions/checkout from 4.1.3 to 4.1.6 (#102).
  • Bump actions/checkout from 4.1.6 to 4.1.7 (#151).

0.4.3

  • Bump actions/checkout from 4.1.2 to 4.1.3 (#97). The actions/checkout dependency has been updated from version 4.1.2 to 4.1.3 in the update-main-version.yml file. This new version includes a check to verify the git version before attempting to disable sparse-checkout, and adds an SSH user parameter to improve functionality and compatibility. The release notes and CHANGELOG.md file provide detailed information on the specific changes and improvements. The pull request also includes a detailed commit history and links to corresponding issues and pull requests on GitHub for transparency. You can review and merge the pull request to update the actions/checkout dependency in your project.
  • Maintain PySpark compatibility for databricks.labs.lsql.core.Row (#99). In this release, we have added a new method asDict to the Row class in the databricks.labs.lsql.core module to maintain compatibility with PySpark. This method returns a dictionary representation of the Row object, with keys corresponding to column names and values corresponding to the values in each column. Additionally, we have modified the fetch function in the backends.py file to return Row objects of pyspark.sql when using self._spark.sql(sql).collect(). This change is temporary and marked with a TODO comment, indicating that it will be addressed in the future. We have also added error handling code in the fetch function to ensure the function operates as expected. The asDict method in this implementation simply calls the existing as_dict method, meaning the behavior of the asDict method is identical to the as_dict method. The as_dict method returns a dictionary representation of the Row object, with keys corresponding to column names and values corresponding to the values in each column. The optional recursive argument in the asDict method, when set to True, enables recursive conversion of nested Row objects to nested dictionaries. However, this behavior is not currently implemented, and the recursive argument is always False by default.

Dependency updates:

  • Bump actions/checkout from 4.1.2 to 4.1.3 (#97).

0.4.2

  • Added more NotFound error type (#94). In the latest update, the core.py file in the databricks/labs/lsql package has undergone enhancements to the error handling functionality. The _raise_if_needed function has been modified to raise a NotFound error when the error message includes the phrase "does not exist". This update enables the system to categorize specific SQL query errors as NotFound error messages, thereby improving the overall error handling and reporting capabilities. This change was a collaborative effort, as indicated by the co-authored-by statement in the commit.

... (truncated)

Commits

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Updates the requirements on [databricks-labs-lsql](https://github.com/databrickslabs/lsql) to permit the latest version.
- [Release notes](https://github.com/databrickslabs/lsql/releases)
- [Changelog](https://github.com/databrickslabs/lsql/blob/main/CHANGELOG.md)
- [Commits](databrickslabs/lsql@v0.5.0...v0.6.0)

---
updated-dependencies:
- dependency-name: databricks-labs-lsql
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot requested a review from a team July 11, 2024 15:13
@dependabot dependabot bot added dependencies python Pull requests that update Python code labels Jul 11, 2024
@dependabot dependabot bot requested a review from ericvergnaud July 11, 2024 15:13
@JCZuurmond JCZuurmond enabled auto-merge July 11, 2024 15:14
@JCZuurmond
Copy link
Member

Need it for #1920

@nfx nfx disabled auto-merge July 11, 2024 15:30
@nfx nfx merged commit 4e6fcc8 into main Jul 11, 2024
3 of 4 checks passed
@nfx nfx deleted the dependabot/pip/databricks-labs-lsql-gte-0.5-and-lt-0.7 branch July 11, 2024 15:30
nfx added a commit that referenced this pull request Jul 12, 2024
* Fixed `Table Access Control is not enabled on this cluster` error ([#2167](#2167)). A fix has been implemented to address the `Table Access Control is not enabled on this cluster` error, changing it to a warning when the exception is raised. This modification involves the introduction of a new constant `CLUSTER_WITHOUT_ACL_FRAGMENT` to represent the error message and updates to the `snapshot` and `grants` methods to conditionally log a warning instead of raising an error when the exception is caught. These changes improve the robustness of the integration test by handling exceptions when many test schemas are being created and deleted quickly, without introducing any new functionality. However, the change has not been thoroughly tested.
* Fixed infinite recursion when checking module of expression ([#2159](#2159)). In this release, we have addressed an infinite recursion issue ([#2159](#2159)) that occurred when checking the module of an expression. The `append_statements` method has been updated to no longer overwrite existing statements for globals when appending trees, instead extending the existing list of statements for the global with new values. This modification ensures that the accuracy of module checks is improved and prevents the infinite recursion issue. Additionally, unit tests have been added to verify the correct behavior of the changes and confirm the resolution of both the infinite recursion issue and the appending behavior. This enhancement was a collaborative effort with Eric Vergnaud.
* Fixed parsing unsupported magic syntax ([#2157](#2157)). In this update, we have addressed a crashing issue that occurred when parsing unsupported magic syntax in a notebook's source code. We accomplished this by modifying the `_read_notebook_path` function in the `cells.py` file. Specifically, we changed the way the `start` variable, which marks the position of the command in a line, is obtained. Instead of using the `index()` method, we now use the `find()` method. This change resolves the crash and enhances the parser's robustness in handling various magic syntax types. The commit also includes a manual test to confirm the fix, which addresses one of the two reported issues.
* Infer values from child notebook in magic line ([#2091](#2091)). This commit introduces improvements to the notebook linter for enhanced value inference during linting. By utilizing values from child notebooks loaded via the `%run` magic line, the linter can now provide more accurate suggestions and error detection. The `FileLinter` class has been updated to include a `session_state` parameter, allowing it to access variables and objects defined in child notebooks. New methods such as `append_tree()`, `append_nodes()`, and `append_globals()` have been added to the `BaseLinter` class for better code tree manipulation, enabling more accurate linting of combined code trees. Additionally, unit tests have been added to ensure the correct behavior of this feature. This change addresses issue [#1201](#1201) and progresses issue [#1901](#1901).
* Updated databricks-labs-lsql requirement from ~=0.5.0 to >=0.5,<0.7 ([#2160](#2160)). In this update, the version constraint for the databricks-labs-lsql library has been updated from ~=0.5.0 to >=0.5,<0.7, allowing the project to utilize the latest features and bug fixes available in the library while maintaining compatibility with the existing codebase. This change ensures that the project can take advantage of any improvements or additions made to databricks-labs-lsql version 0.6.0 and above. For reference, the release notes for databricks-labs-lsql version 0.6.0 have been included in the commit, detailing the new features and improvements that come with the updated library.
* Whitelist phonetics ([#2163](#2163)). This release introduces a whitelist for phonetics functionality in the `known.json` configuration file, allowing engineers to utilize five new phonetics methods: `phonetics`, `phonetics.metaphone`, `phonetics.nysiis`, `phonetics.soundex`, and `phonetics.utils`. These methods have been manually tested and are now available for use, contributing to issue [#2163](#2163) and progressing issue [#1901](#1901). As an adopting engineer, this addition enables you to incorporate these phonetics methods into your system's functionality, expanding the capabilities of the open-source library.
* Whitelist pydantic ([#2162](#2162)). In this release, we have added the Pydantic library to the `known.json` file, which manages our project's third-party libraries. Pydantic is a data validation library for Python that allows developers to define data models and enforce type constraints, improving data consistency and correctness in the application. With this change, Pydantic and its submodules have been whitelisted and can be used in the project without being flagged as unknown libraries. This improvement enables us to utilize Pydantic's features for data validation and modeling, ensuring higher data quality and reducing the likelihood of errors in our application.
* Whitelist statsmodels ([#2161](#2161)). In this change, the statsmodels library has been whitelisted for use in the project. Statsmodels is a comprehensive Python library for statistics and econometrics that offers a variety of tools for statistical modeling, testing, and visualization. With this update, the library has been added to the project's configuration file, enabling users to utilize its features without causing any conflicts. The modification does not affect the existing functionality of the project, but rather expands the range of statistical models and analysis tools available to users. Additionally, a test has been included to verify the successful integration of the library. These enhancements streamline the process of conducting statistical analysis and modeling within the project.
* whitelist dbignite ([#2132](#2132)). A new commit has been made to whitelist the dbignite repository and add a set of codes and messages in the "known.json" file related to the use of RDD APIs on UC Shared Clusters and the change in the default format from Parquet to Delta in Databricks Runtime 8.0. The affected components include dbignite.fhir_mapping_model, dbignite.fhir_resource, dbignite.hosp_feeds, dbignite.hosp_feeds.adt, dbignite.omop, dbignite.omop.data_model, dbignite.omop.schemas, dbignite.omop.utils, and dbignite.readers. These changes are intended to provide information and warnings regarding the use of the specified APIs on UC Shared Clusters and the change in default format. It is important to note that no new methods have been added, and no existing functionality has been changed as part of this update. The focus of this commit is solely on the addition of the dbignite repository and its associated codes and messages.
* whitelist duckdb ([#2134](#2134)). In this release, we have whitelisted the DuckDB library by adding it to the "known.json" file in the source code. DuckDB is an in-memory analytical database written in C++. This addition includes several modules such as `adbc_driver_duckdb`, `duckdb.bytes_io_wrapper`, `duckdb.experimental`, `duckdb.filesystem`, `duckdb.functional`, and `duckdb.typing`. Of particular note is the `duckdb.experimental.spark.sql.session` module, which includes a change in the default format for Databricks Runtime 8.0, from Parquet to Delta. This change is indicated by the `table-migrate` code and message in the commit. Additionally, the commit includes tests that have been manually verified. DuckDB is a powerful new addition to our library, and we are excited to make it available to our users.
* whitelist fs ([#2136](#2136)). In this release, we have added the `fs` package to the `known.json` file, allowing its use in our open-source library. The `fs` package contains a wide range of modules and sub-packages, including `fs._bulk`, `fs.appfs`, `fs.base`, `fs.compress`, `fs.copy`, `fs.error_tools`, `fs.errors`, `fs.filesize`, `fs.ftpfs`, `fs.glob`, `fs.info`, `fs.iotools`, `fs.lrucache`, `fs.memoryfs`, `fs.mirror`, `fs.mode`, `fs.mountfs`, `fs.move`, `fs.multifs`, `fs.opener`, `fs.osfs`, `fs.path`, `fs.permissions`, `fs.subfs`, `fs.tarfs`, `fs.tempfs`, `fs.time`, `fs.tools`, `fs.tree`, `fs.walk`, `fs.wildcard`, `fs.wrap`, `fs.wrapfs`, and `fs.zipfs`. These additions address issue [#1901](#1901) and have been thoroughly manually tested to ensure proper functionality.
* whitelist httpx ([#2139](#2139)). In this release, we have updated the "known.json" file to include the `httpx` library along with all its submodules. This change serves to whitelist the library, and it does not introduce any new functionality or impact existing functionalities. The addition of `httpx` is purely for informational purposes, and it will not result in the inclusion of new methods or functions. Rest assured, the team has manually tested the changes, and the project's behavior remains unaffected. We recommend this update to software engineers looking to adopt our project, highlighting that the addition of `httpx` will only influence the library whitelist and not the overall functionality.
* whitelist jsonschema and jsonschema-specifications ([#2140](#2140)). In this release, we have made changes to the "known.json" file to whitelist the `jsonschema` and `jsonschema-specifications` libraries. This modification addresses issue [#1901](#1901) and does not introduce any new functionality or tests. The `jsonschema` library is utilized for schema validation, while the `jsonschema-specifications` library offers additional specifications for the `jsonschema` library. By adding these libraries to the "known.json" file, we ensure that they are recognized as approved dependencies and are not flagged as unknown or unapproved in the future. This enhancement improves the reliability and efficiency of our dependency management system, making it easier for software engineers to work with these libraries.
* whitelist pickleshare ([#2141](#2141)). A new commit has been added to whitelist Pickleshare, a Python module for managing persistent data structures, in the known.json file. This change aligns with issue [#1901](#1901) and is a preparatory step to ensure Pickleshare's compatibility with the project. The Pillow module is already included in the whitelist. No new functionality has been introduced, and existing functionality remains unchanged. The purpose of the whitelist is not explicitly stated in the given context. As a software engineer integrating this project, you are advised to verify the necessity of whitelisting Pickleshare for your specific use case.
* whitelist referencing ([#2142](#2142)). This commit introduces a new whitelist referencing feature, which includes the creation of a `referencing` section in the "known.json" file. The new section contains several entries, including "referencing._attrs", "referencing._core", "referencing.exceptions", "referencing.jsonschema", "referencing.retrieval", and "referencing.typing", all of which are initially empty. This change is a step towards completing issue [#2142](#2142) and addresses issue [#1901](#1901). Manual testing has been conducted to ensure the proper functioning of the new functionality. This enhancement was co-authored by Eric Vergnaud.
* whitelist slicer ([#2143](#2143)). A new security measure has been implemented in the slicer module with the addition of a whitelist that specifies allowed modules and functions. The whitelist is implemented as a JSON object in the `known.json` file, preventing unauthorized access or usage of certain parts of the codebase. A test has been included to verify the functionality of the whitelist, ensuring that the slicer module is secure and functioning as intended. No new methods were added and existing functionality remains unchanged. The changes are localized to the `known.json` file and the slicer module, enhancing the security and integrity of the project. This feature was developed by Eric Vergnaud and myself.
* whitelist sparse ([#2144](#2144)). In this release, we have whitelisted the `sparse` module, adding it to the known.json file. This module encompasses various sub-modules and components such as _common, _compressed, _coo, _dok, _io, _numba_extension, _settings, _slicing, _sparse_array, _umath, _utils, finch_backend, and numba_backend. Each component may contain additional classes, functions, or functionality, and the numba_backend sub-module includes further sub-components. This change aims to improve organization, enhance codebase understanding, and prevent accidental deletion or modification of critical code. The modification is in reference to issue [#1901](#1901) for additional context. Comprehensive testing has been carried out to guarantee the correct implementation of the whitelisting.
* whitelist splink ([#2145](#2145)). In this release, we have added the `splink` library to our known_json file, which includes various modules and functions for entity resolution and data linking. This change is in line with issue [#190](#190)
* whitelist toolz ([#2146](#2146)). In this release, we have whitelisted the `toolz` library and added it to the known.json file. The `toolz` library is a collection of functional utilities, compatible with CPython, PyPy, Jython, and IronPython, and is a port of various modules from Python's standard library and other open-source packages. The newly added modules include tlz, toolz, toolz._signatures, toolz._version, toolz.compatibility, toolz.curried, toolz.dicttoolz, toolz.functoolz, toolz.itertoolz, toolz.recipes, toolz.sandbox, toolz.sandbox.core, toolz.sandbox.parallel, and toolz.utils. These changes have been manually tested and may address issue [#1901](#1901).
* whitelist xmod ([#2147](#2147)). In this release, we have made a modification to the open-source library that involves whitelisting `xmod` in the known.json file. This change includes the addition of a new key for `xmod` with an empty array as its initial value. It is important to note that this modification does not alter the existing functionality of the code. The development team has thoroughly tested the changes through manual testing to ensure proper implementation. This update is a significant milestone towards the progress of issue [#1901](#1901). Software engineers are encouraged to incorporate these updates in their code to leverage the new whitelisting functionality for "xmod."

Dependency updates:

 * Updated databricks-labs-lsql requirement from ~=0.5.0 to >=0.5,<0.7 ([#2160](#2160)).
@nfx nfx mentioned this pull request Jul 12, 2024
nfx added a commit that referenced this pull request Jul 12, 2024
* Fixed `Table Access Control is not enabled on this cluster` error
([#2167](#2167)). A fix has
been implemented to address the `Table Access Control is not enabled on
this cluster` error, changing it to a warning when the exception is
raised. This modification involves the introduction of a new constant
`CLUSTER_WITHOUT_ACL_FRAGMENT` to represent the error message and
updates to the `snapshot` and `grants` methods to conditionally log a
warning instead of raising an error when the exception is caught. These
changes improve the robustness of the integration test by handling
exceptions when many test schemas are being created and deleted quickly,
without introducing any new functionality. However, the change has not
been thoroughly tested.
* Fixed infinite recursion when checking module of expression
([#2159](#2159)). In this
release, we have addressed an infinite recursion issue
([#2159](#2159)) that
occurred when checking the module of an expression. The
`append_statements` method has been updated to no longer overwrite
existing statements for globals when appending trees, instead extending
the existing list of statements for the global with new values. This
modification ensures that the accuracy of module checks is improved and
prevents the infinite recursion issue. Additionally, unit tests have
been added to verify the correct behavior of the changes and confirm the
resolution of both the infinite recursion issue and the appending
behavior. This enhancement was a collaborative effort with Eric
Vergnaud.
* Fixed parsing unsupported magic syntax
([#2157](#2157)). In this
update, we have addressed a crashing issue that occurred when parsing
unsupported magic syntax in a notebook's source code. We accomplished
this by modifying the `_read_notebook_path` function in the `cells.py`
file. Specifically, we changed the way the `start` variable, which marks
the position of the command in a line, is obtained. Instead of using the
`index()` method, we now use the `find()` method. This change resolves
the crash and enhances the parser's robustness in handling various magic
syntax types. The commit also includes a manual test to confirm the fix,
which addresses one of the two reported issues.
* Infer values from child notebook in magic line
([#2091](#2091)). This
commit introduces improvements to the notebook linter for enhanced value
inference during linting. By utilizing values from child notebooks
loaded via the `%run` magic line, the linter can now provide more
accurate suggestions and error detection. The `FileLinter` class has
been updated to include a `session_state` parameter, allowing it to
access variables and objects defined in child notebooks. New methods
such as `append_tree()`, `append_nodes()`, and `append_globals()` have
been added to the `BaseLinter` class for better code tree manipulation,
enabling more accurate linting of combined code trees. Additionally,
unit tests have been added to ensure the correct behavior of this
feature. This change addresses issue
[#1201](#1201) and
progresses issue
[#1901](#1901).
* Updated databricks-labs-lsql requirement from ~=0.5.0 to >=0.5,<0.7
([#2160](#2160)). In this
update, the version constraint for the databricks-labs-lsql library has
been updated from ~=0.5.0 to >=0.5,<0.7, allowing the project to utilize
the latest features and bug fixes available in the library while
maintaining compatibility with the existing codebase. This change
ensures that the project can take advantage of any improvements or
additions made to databricks-labs-lsql version 0.6.0 and above. For
reference, the release notes for databricks-labs-lsql version 0.6.0 have
been included in the commit, detailing the new features and improvements
that come with the updated library.
* Whitelist phonetics
([#2163](#2163)). This
release introduces a whitelist for phonetics functionality in the
`known.json` configuration file, allowing engineers to utilize five new
phonetics methods: `phonetics`, `phonetics.metaphone`,
`phonetics.nysiis`, `phonetics.soundex`, and `phonetics.utils`. These
methods have been manually tested and are now available for use,
contributing to issue
[#2163](#2163) and
progressing issue
[#1901](#1901). As an
adopting engineer, this addition enables you to incorporate these
phonetics methods into your system's functionality, expanding the
capabilities of the open-source library.
* Whitelist pydantic
([#2162](#2162)). In this
release, we have added the Pydantic library to the `known.json` file,
which manages our project's third-party libraries. Pydantic is a data
validation library for Python that allows developers to define data
models and enforce type constraints, improving data consistency and
correctness in the application. With this change, Pydantic and its
submodules have been whitelisted and can be used in the project without
being flagged as unknown libraries. This improvement enables us to
utilize Pydantic's features for data validation and modeling, ensuring
higher data quality and reducing the likelihood of errors in our
application.
* Whitelist statsmodels
([#2161](#2161)). In this
change, the statsmodels library has been whitelisted for use in the
project. Statsmodels is a comprehensive Python library for statistics
and econometrics that offers a variety of tools for statistical
modeling, testing, and visualization. With this update, the library has
been added to the project's configuration file, enabling users to
utilize its features without causing any conflicts. The modification
does not affect the existing functionality of the project, but rather
expands the range of statistical models and analysis tools available to
users. Additionally, a test has been included to verify the successful
integration of the library. These enhancements streamline the process of
conducting statistical analysis and modeling within the project.
* whitelist dbignite
([#2132](#2132)). A new
commit has been made to whitelist the dbignite repository and add a set
of codes and messages in the "known.json" file related to the use of RDD
APIs on UC Shared Clusters and the change in the default format from
Parquet to Delta in Databricks Runtime 8.0. The affected components
include dbignite.fhir_mapping_model, dbignite.fhir_resource,
dbignite.hosp_feeds, dbignite.hosp_feeds.adt, dbignite.omop,
dbignite.omop.data_model, dbignite.omop.schemas, dbignite.omop.utils,
and dbignite.readers. These changes are intended to provide information
and warnings regarding the use of the specified APIs on UC Shared
Clusters and the change in default format. It is important to note that
no new methods have been added, and no existing functionality has been
changed as part of this update. The focus of this commit is solely on
the addition of the dbignite repository and its associated codes and
messages.
* whitelist duckdb
([#2134](#2134)). In this
release, we have whitelisted the DuckDB library by adding it to the
"known.json" file in the source code. DuckDB is an in-memory analytical
database written in C++. This addition includes several modules such as
`adbc_driver_duckdb`, `duckdb.bytes_io_wrapper`, `duckdb.experimental`,
`duckdb.filesystem`, `duckdb.functional`, and `duckdb.typing`. Of
particular note is the `duckdb.experimental.spark.sql.session` module,
which includes a change in the default format for Databricks Runtime
8.0, from Parquet to Delta. This change is indicated by the
`table-migrate` code and message in the commit. Additionally, the commit
includes tests that have been manually verified. DuckDB is a powerful
new addition to our library, and we are excited to make it available to
our users.
* whitelist fs
([#2136](#2136)). In this
release, we have added the `fs` package to the `known.json` file,
allowing its use in our open-source library. The `fs` package contains a
wide range of modules and sub-packages, including `fs._bulk`,
`fs.appfs`, `fs.base`, `fs.compress`, `fs.copy`, `fs.error_tools`,
`fs.errors`, `fs.filesize`, `fs.ftpfs`, `fs.glob`, `fs.info`,
`fs.iotools`, `fs.lrucache`, `fs.memoryfs`, `fs.mirror`, `fs.mode`,
`fs.mountfs`, `fs.move`, `fs.multifs`, `fs.opener`, `fs.osfs`,
`fs.path`, `fs.permissions`, `fs.subfs`, `fs.tarfs`, `fs.tempfs`,
`fs.time`, `fs.tools`, `fs.tree`, `fs.walk`, `fs.wildcard`, `fs.wrap`,
`fs.wrapfs`, and `fs.zipfs`. These additions address issue
[#1901](#1901) and have been
thoroughly manually tested to ensure proper functionality.
* whitelist httpx
([#2139](#2139)). In this
release, we have updated the "known.json" file to include the `httpx`
library along with all its submodules. This change serves to whitelist
the library, and it does not introduce any new functionality or impact
existing functionalities. The addition of `httpx` is purely for
informational purposes, and it will not result in the inclusion of new
methods or functions. Rest assured, the team has manually tested the
changes, and the project's behavior remains unaffected. We recommend
this update to software engineers looking to adopt our project,
highlighting that the addition of `httpx` will only influence the
library whitelist and not the overall functionality.
* whitelist jsonschema and jsonschema-specifications
([#2140](#2140)). In this
release, we have made changes to the "known.json" file to whitelist the
`jsonschema` and `jsonschema-specifications` libraries. This
modification addresses issue
[#1901](#1901) and does not
introduce any new functionality or tests. The `jsonschema` library is
utilized for schema validation, while the `jsonschema-specifications`
library offers additional specifications for the `jsonschema` library.
By adding these libraries to the "known.json" file, we ensure that they
are recognized as approved dependencies and are not flagged as unknown
or unapproved in the future. This enhancement improves the reliability
and efficiency of our dependency management system, making it easier for
software engineers to work with these libraries.
* whitelist pickleshare
([#2141](#2141)). A new
commit has been added to whitelist Pickleshare, a Python module for
managing persistent data structures, in the known.json file. This change
aligns with issue
[#1901](#1901) and is a
preparatory step to ensure Pickleshare's compatibility with the project.
The Pillow module is already included in the whitelist. No new
functionality has been introduced, and existing functionality remains
unchanged. The purpose of the whitelist is not explicitly stated in the
given context. As a software engineer integrating this project, you are
advised to verify the necessity of whitelisting Pickleshare for your
specific use case.
* whitelist referencing
([#2142](#2142)). This
commit introduces a new whitelist referencing feature, which includes
the creation of a `referencing` section in the "known.json" file. The
new section contains several entries, including "referencing._attrs",
"referencing._core", "referencing.exceptions", "referencing.jsonschema",
"referencing.retrieval", and "referencing.typing", all of which are
initially empty. This change is a step towards completing issue
[#2142](#2142) and addresses
issue [#1901](#1901). Manual
testing has been conducted to ensure the proper functioning of the new
functionality. This enhancement was co-authored by Eric Vergnaud.
* whitelist slicer
([#2143](#2143)). A new
security measure has been implemented in the slicer module with the
addition of a whitelist that specifies allowed modules and functions.
The whitelist is implemented as a JSON object in the `known.json` file,
preventing unauthorized access or usage of certain parts of the
codebase. A test has been included to verify the functionality of the
whitelist, ensuring that the slicer module is secure and functioning as
intended. No new methods were added and existing functionality remains
unchanged. The changes are localized to the `known.json` file and the
slicer module, enhancing the security and integrity of the project. This
feature was developed by Eric Vergnaud and myself.
* whitelist sparse
([#2144](#2144)). In this
release, we have whitelisted the `sparse` module, adding it to the
known.json file. This module encompasses various sub-modules and
components such as _common, _compressed, _coo, _dok, _io,
_numba_extension, _settings, _slicing, _sparse_array, _umath, _utils,
finch_backend, and numba_backend. Each component may contain additional
classes, functions, or functionality, and the numba_backend sub-module
includes further sub-components. This change aims to improve
organization, enhance codebase understanding, and prevent accidental
deletion or modification of critical code. The modification is in
reference to issue
[#1901](#1901) for
additional context. Comprehensive testing has been carried out to
guarantee the correct implementation of the whitelisting.
* whitelist splink
([#2145](#2145)). In this
release, we have added the `splink` library to our known_json file,
which includes various modules and functions for entity resolution and
data linking. This change is in line with issue
[#190](#190)
* whitelist toolz
([#2146](#2146)). In this
release, we have whitelisted the `toolz` library and added it to the
known.json file. The `toolz` library is a collection of functional
utilities, compatible with CPython, PyPy, Jython, and IronPython, and is
a port of various modules from Python's standard library and other
open-source packages. The newly added modules include tlz, toolz,
toolz._signatures, toolz._version, toolz.compatibility, toolz.curried,
toolz.dicttoolz, toolz.functoolz, toolz.itertoolz, toolz.recipes,
toolz.sandbox, toolz.sandbox.core, toolz.sandbox.parallel, and
toolz.utils. These changes have been manually tested and may address
issue [#1901](#1901).
* whitelist xmod
([#2147](#2147)). In this
release, we have made a modification to the open-source library that
involves whitelisting `xmod` in the known.json file. This change
includes the addition of a new key for `xmod` with an empty array as its
initial value. It is important to note that this modification does not
alter the existing functionality of the code. The development team has
thoroughly tested the changes through manual testing to ensure proper
implementation. This update is a significant milestone towards the
progress of issue
[#1901](#1901). Software
engineers are encouraged to incorporate these updates in their code to
leverage the new whitelisting functionality for "xmod."

Dependency updates:

* Updated databricks-labs-lsql requirement from ~=0.5.0 to >=0.5,<0.7
([#2160](#2160)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants