Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved error handling for all queries executed during table migration #1679

Merged
merged 3 commits into from
May 10, 2024

Conversation

nkvuong
Copy link
Contributor

@nkvuong nkvuong commented May 10, 2024

Changes

  • As we are migrating multiple tables & views in parallel, any error in SQL execution should not fail the workflow. Added error handling to all sql_backend.execute commands

Linked issues

Resolves #1676

Tests

  • manually tested
  • added unit tests
  • verified on staging environment (screenshot attached)

@nkvuong nkvuong requested review from a team and dmoore247 May 10, 2024 10:36
Copy link

codecov bot commented May 10, 2024

Codecov Report

Attention: Patch coverage is 91.42857% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 88.49%. Comparing base (b4de9b6) to head (1cc0d24).

Files Patch % Lines
...atabricks/labs/ucx/hive_metastore/table_migrate.py 91.42% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1679      +/-   ##
==========================================
- Coverage   88.49%   88.49%   -0.01%     
==========================================
  Files          88       88              
  Lines       11024    11043      +19     
  Branches     1946     1946              
==========================================
+ Hits         9756     9772      +16     
- Misses        894      897       +3     
  Partials      374      374              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nkvuong nkvuong temporarily deployed to account-admin May 10, 2024 11:10 — with GitHub Actions Inactive
Copy link

✅ 165/165 passed, 25 skipped, 2h11m20s total

Running from acceptance #3119

@nkvuong nkvuong enabled auto-merge May 10, 2024 11:51
@nkvuong nkvuong requested review from JCZuurmond and removed request for dmoore247 May 10, 2024 11:51
Copy link
Member

@JCZuurmond JCZuurmond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should adapt the ViewSequencer to handle the situation of a view not being migrated.

For example, it might confuse the users when they see a View {src_view.src.key} is not supported for migration message for the views that are dependent on an invalid view as it does not explain why they are not supported for migration

self._backend.execute(src_view.src.sql_alter_from(src_view.rule.as_uc_table_key, self._ws.get_workspace_id()))
try:
self._backend.execute(view_migrate_sql)
self._backend.execute(src_view.src.sql_alter_to(src_view.rule.as_uc_table_key))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe leave the sql_alter_to and sql_alter_from out for now? We want nudge users to notify us if the view is migrated, but the table properties can not be set

src_view.src.sql_alter_from(src_view.rule.as_uc_table_key, self._ws.get_workspace_id())
)
except DatabricksError as e:
logger.warning(f"Failed to migrate view {src_view.src.key} to {src_view.rule.as_uc_table_key}: {e}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should log this an error. The job should still fail, but it should not fail early

Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@nfx nfx changed the title Added error handling for all queries executed during table migration Improved error handling for all queries executed during table migration May 10, 2024
@nfx nfx disabled auto-merge May 10, 2024 15:04
@nfx nfx merged commit 2532625 into main May 10, 2024
8 checks passed
@nfx nfx deleted the fix/error_handling branch May 10, 2024 15:04
nfx added a commit that referenced this pull request May 10, 2024
* Improved error handling for `migrate-tables` workflows ([#1674](#1674)). This commit enhances the error handling for `migrate-tables` workflows by introducing new tests that cover specific scenarios where failures may occur during table migration. The changes include the creation of mock objects and injecting failures for the `get_tables_to_migrate` method of the `TableMapping` class. Three new tests have been added, each testing a specific scenario, including token errors when checking table properties, errors when trying to get properties for a non-existing table, and errors when trying to unset the `upgraded_to` property. The commit also asserts that specific error messages are logged during these failures. These improvements ensure better visibility and debugging capabilities during table migration. The code was manually tested, and unit tests were added and verified on a staging environment, ensuring that the new error handling mechanisms function as intended.
* Improved error handling for all queries executed during table migration ([#1679](#1679)). This release includes improved error handling during table migration in our data workflow, resolving issue [#167](#167)
* Removed dependency on internal `pathlib` implementations ([#1672](#1672)). In this release, we have introduced a new custom `_DatabricksFlavor` class as a replacement for the internal `pathlib._Flavor` implementations, specifically designed for the Databricks environment. This class handles various functionalities including separation of paths, joining and parsing of parts, and casefolding of strings, among others. The `make_uri` method has also been updated to generate the correct URI for the workspace. This change removes the dependency on internal `pathlib._Flavor` implementations which were not available on Windows. As part of this change, the `test_wspath.py` file in the `tests/integration/mixins` directory has been updated, with the `test_exists` and `test_mkdirs` methods being modified to reflect the removal of `_Flavor`. These updates improve the compatibility and reliability of the codebase on Windows systems.
* Updated databricks-labs-blueprint requirement from ~=0.4.3 to >=0.4.3,<0.6.0 ([#1670](#1670)). In this update, we have adjusted the requirement for `databricks-labs-blueprint` from version `~=0.4.3` to `>=0.4.3,<0.6.0`, ensuring the latest version can be installed while remaining below `0.6.0`. This change is part of issue [#1670](#1670) and includes the release notes and changelog in the commit message, highlighting improvements and updates in version `0.5.0`. These enhancements consist of content assertion in `MockInstallation`, better handling of partial functions in `parallel.Threads`, and adjusted configurations aligned with the UCX project. The commit also covers various dependency updates and bug fixes, providing a more robust and efficient library experience for software engineers.

Dependency updates:

 * Updated databricks-labs-blueprint requirement from ~=0.4.3 to >=0.4.3,<0.6.0 ([#1670](#1670)).
@nfx nfx mentioned this pull request May 10, 2024
nfx added a commit that referenced this pull request May 10, 2024
* Improved error handling for `migrate-tables` workflows
([#1674](#1674)). This
commit enhances the error handling for `migrate-tables` workflows by
introducing new tests that cover specific scenarios where failures may
occur during table migration. The changes include the creation of mock
objects and injecting failures for the `get_tables_to_migrate` method of
the `TableMapping` class. Three new tests have been added, each testing
a specific scenario, including token errors when checking table
properties, errors when trying to get properties for a non-existing
table, and errors when trying to unset the `upgraded_to` property. The
commit also asserts that specific error messages are logged during these
failures. These improvements ensure better visibility and debugging
capabilities during table migration. The code was manually tested, and
unit tests were added and verified on a staging environment, ensuring
that the new error handling mechanisms function as intended.
* Improved error handling for all queries executed during table
migration ([#1679](#1679)).
This release includes improved error handling during table migration in
our data workflow, resolving issue
[#167](#167)
* Removed dependency on internal `pathlib` implementations
([#1672](#1672)). In this
release, we have introduced a new custom `_DatabricksFlavor` class as a
replacement for the internal `pathlib._Flavor` implementations,
specifically designed for the Databricks environment. This class handles
various functionalities including separation of paths, joining and
parsing of parts, and casefolding of strings, among others. The
`make_uri` method has also been updated to generate the correct URI for
the workspace. This change removes the dependency on internal
`pathlib._Flavor` implementations which were not available on Windows.
As part of this change, the `test_wspath.py` file in the
`tests/integration/mixins` directory has been updated, with the
`test_exists` and `test_mkdirs` methods being modified to reflect the
removal of `_Flavor`. These updates improve the compatibility and
reliability of the codebase on Windows systems.
* Updated databricks-labs-blueprint requirement from ~=0.4.3 to
>=0.4.3,<0.6.0
([#1670](#1670)). In this
update, we have adjusted the requirement for `databricks-labs-blueprint`
from version `~=0.4.3` to `>=0.4.3,<0.6.0`, ensuring the latest version
can be installed while remaining below `0.6.0`. This change is part of
issue [#1670](#1670) and
includes the release notes and changelog in the commit message,
highlighting improvements and updates in version `0.5.0`. These
enhancements consist of content assertion in `MockInstallation`, better
handling of partial functions in `parallel.Threads`, and adjusted
configurations aligned with the UCX project. The commit also covers
various dependency updates and bug fixes, providing a more robust and
efficient library experience for software engineers.

Dependency updates:

* Updated databricks-labs-blueprint requirement from ~=0.4.3 to
>=0.4.3,<0.6.0
([#1670](#1670)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: Error when running migrate-tables workflow. Views in an invalid state break the migrate-views task
3 participants