Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract command codes and unify the checks for spark_conf, cluster_policy, init_scripts #855

Merged
merged 9 commits into from
Jan 30, 2024

Conversation

qziyuan
Copy link
Contributor

@qziyuan qziyuan commented Jan 29, 2024

Changes

Extract the command codes for checking spark_conf, cluster_policy, init_scripts and put them into unified functions in
src/databricks/labs/ucx/assessment/crawlers.py.
The new functions will then be called by:

  • src/databricks/labs/ucx/assessment/clusters.py
  • src/databricks/labs/ucx/assessment/init_scripts.py
  • src/databricks/labs/ucx/assessment/jobs.py
  • src/databricks/labs/ucx/assessment/pipelines.py

Linked issues

Resolves #823

Functionality

  • added relevant user documentation
  • added new CLI command
  • modified existing command: databricks labs ucx ...
  • added a new workflow
  • modified existing workflow: ...
  • added a new table
  • modified existing table: ...

Tests

  • manually tested
  • added unit tests
  • added integration tests
  • verified on staging environment (screenshot attached)

…wlers.py and let clusters, jobs, pipelines, init_scripts call those functions.
- fix conflicts in assessment/clusters.py and assessment/jobs.py from the PR #825 and PR #838
- move _check_cluster_failures logic into assessment/crawlers.py and let jobs and clusters call this function
- applies the _check_cluster_failures changes from PR #845.
- Merge change from PR #845.
- Move the _check_cluster_failures logic to assessment/crawlers.py
- Remove the ClusterInfo logic from _check_cluster_failures as it ties to assessment/clusters.py and should not be involved when assessment/jobs.py calls it.
- filtering out job cluster when scan all purpose cluster
- _try_fetch for ClustersCrawler
@qziyuan qziyuan requested review from a team and renardeinside January 29, 2024 20:08
Copy link

codecov bot commented Jan 29, 2024

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (606dd72) 85.67% compared to head (06ebd30) 85.86%.
Report is 1 commits behind head on main.

Files Patch % Lines
src/databricks/labs/ucx/assessment/clusters.py 86.15% 4 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #855      +/-   ##
==========================================
+ Coverage   85.67%   85.86%   +0.19%     
==========================================
  Files          42       42              
  Lines        5311     5335      +24     
  Branches      969      968       -1     
==========================================
+ Hits         4550     4581      +31     
+ Misses        542      537       -5     
+ Partials      219      217       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't import private functions from modules. fix bugs related to union types. you've effectively re-introduced a bug in this PR. I'm fine with getting rid of mixins, but then make top-level functions in the crawlers.py type annotated. i'd still prefer if those functions live in a domain-groupped modules - e.g. all cluster-related functions and classes - in clusters.py, all job-related functions - in jobs.py, etc

@@ -92,3 +96,72 @@ def spark_version_compatibility(spark_version: str | None) -> str:
if (10, 0) <= version < (11, 3):
return "kinda works"
return "supported"


def _check_spark_conf(conf: dict[str, str], source) -> list[str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these functions are private (starting with _). you cannot export private methods from a module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved those functions back to cluster mixin. I make check_cluster_failures and check_spark_conf public function now as the former is used by jobs.py and the latter is used by pipelines.py also.

return None


def _check_cluster_policy(ws: WorkspaceClient, cluster, source):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add type annotations to top-level members if we really don't want mixins

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, type annotations added even we go back to mixin.

return failures


def _check_cluster_failures(ws: WorkspaceClient, cluster: ClusterDetails | compute.ClusterSpec, source):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't do union types, they already caused a lot of bugs.

convert cluster spec into cluster details instead:

https://github.com/databrickslabs/ucx/blob/main/src/databricks/labs/ucx/assessment/jobs.py#L88-L89

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, it was a miss when merge with #845

return failures


def _safe_get_cluster_policy(ws: WorkspaceClient, policy_id: str) -> Policy | None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, logical question - do we want to do diamond shape dependencies or not? e.g. _safe_get_cluster_policy - where it should live? in crawlers.py or in clusters.py? what about _check_cluster_failures?

flowchart TD
    assessment --> crawlers
    crawlers --> clusters
    crawlers --> jobs
    crawlers --> init_scripts
    crawlers --> pipelines
    crawlers --> azure_spns
    clusters --> runtime.py
    jobs --> runtime.py
    init_scripts --> runtime.py
    pipelines --> runtime.py
    azure_spns --> runtime.py
Loading

or something like this:

flowchart TD
    assessment --> crawlers
    azure_spns -->|mixin| clusters
    crawlers --> clusters
    clusters -->|mixin| jobs
    crawlers --> jobs
    azure_spns -->|mixin| init_scripts
    crawlers --> init_scripts
    jobs -->|mixin| pipelines
    crawlers --> pipelines
    crawlers --> azure_spns
    clusters --> runtime.py
    jobs --> runtime.py
    init_scripts --> runtime.py
    pipelines --> runtime.py
    azure_spns --> runtime.py
Loading

Copy link
Contributor Author

@qziyuan qziyuan Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to select the first dependency structure, it's more clear to me.
Per my understanding cluster.py crawl and check all the all-purpose clusters, jobs.py crawl and check all jobs (so far it only check job cluster, but we may need to check the job code in the future), pipelines.py crawl and check all DLT pipelines (right now it only checks pipeline config, but it should check the pipeline cluster as well).
It's clear for them to inherent the _check_cluster_failures function to check spark conf, init script, cluster policy, instead of letting jobs and pipelines to inherent _check_cluster_failures from the clusters.
There are some logical that may stay in the domain-groupped modules, because they are not commonly shared across crawlers:

  • check all-purpose cluster mode (we don't have this logical yet, but may have it in the future). The cluster may need to be put into shared mode which has more limitations, but this check does not apply to job cluster.
  • check job code (we don't have this logical yet, but may have it in the future)

return failures


def _check_cluster_init_script(ws: WorkspaceClient, init_scripts, source):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're not doing mixins, then why this function is defined so far away from _get_init_script_data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move them to mixin and make these two functions next to each other.

Comment on lines 8 to 9
from databricks.sdk.service import compute
from databricks.sdk.service.compute import ClusterDetails, Policy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 9 is redundant because you've imported the whole compute package. use one style of imports, don't mix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

logger,
spark_version_compatibility,
)
from databricks.labs.ucx.assessment.crawlers import _check_cluster_failures, logger
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't import logger, initialize one in the top of the module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

… 2. Make jobs inherenting the cluster mixin so it can reuse the `check_cluster_failures` 3. Make pipelines inherenting the cluster mixin so it can reuse the `check_spark_conf`. 4. Move `check_init_script` to mixin of `init_scripts.py`
Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few nits remaining

try:
data = self._ws.dbfs.read(file_api_format_destination).data
if data is not None:
return base64.b64decode(data).decode("utf-8")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's 9 levels of nesting. Can you reduce the nesting, so that's a bit more readable?

return failures

def _get_init_script_data(self, init_script_info: InitScriptInfo) -> str | None:
if init_script_info.dbfs is not None and init_script_info.dbfs.destination is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nesting can be reduced by refactoring "if COND: 20 lines" into "if not COND: continue 20 lines"


from databricks.labs.ucx.assessment.crawlers import (
_AZURE_SP_CONF_FAILURE_MSG,
_INIT_SCRIPT_DBFS_PATH,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constant can live in the init script mixin

@@ -20,7 +22,7 @@ class JobInfo:
creator: str | None = None


class JobsMixin(ClustersMixin):
class JobsMixin:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, shouldn't it be named CheckJobsMixin?

@nfx nfx merged commit 2ab1321 into main Jan 30, 2024
7 checks passed
@nfx nfx deleted the fix/unify_conf_check_823 branch January 30, 2024 09:04
nfx added a commit that referenced this pull request Feb 1, 2024
* Added "what" property for migration to scope down table migrations ([#856](#856)).
* Added job count in the assessment dashboard ([#858](#858)).
* Adopted `installation` package from `databricks-labs-blueprint` ([#860](#860)).
* Debug logs to print only the first 96 bytes of SQL query by default, tunable by `debug_truncate_bytes` SDK configuration property ([#859](#859)).
* Extract command codes and unify the checks for spark_conf, cluster_policy, init_scripts ([#855](#855)).
* Improved installation failure with actionable message ([#840](#840)).
* Improved validating groups membership cli command ([#816](#816)).

Dependency updates:

 * Updated databricks-labs-blueprint requirement from ~=0.1.0 to ~=0.2.4 ([#867](#867)).
@nfx nfx mentioned this pull request Feb 1, 2024
nfx added a commit that referenced this pull request Feb 1, 2024
* Added "what" property for migration to scope down table migrations
([#856](#856)).
* Added job count in the assessment dashboard
([#858](#858)).
* Adopted `installation` package from `databricks-labs-blueprint`
([#860](#860)).
* Debug logs to print only the first 96 bytes of SQL query by default,
tunable by `debug_truncate_bytes` SDK configuration property
([#859](#859)).
* Extract command codes and unify the checks for spark_conf,
cluster_policy, init_scripts
([#855](#855)).
* Improved installation failure with actionable message
([#840](#840)).
* Improved validating groups membership cli command
([#816](#816)).

Dependency updates:

* Updated databricks-labs-blueprint requirement from ~=0.1.0 to ~=0.2.4
([#867](#867)).
dmoore247 pushed a commit that referenced this pull request Mar 23, 2024
dmoore247 pushed a commit that referenced this pull request Mar 23, 2024
* Added "what" property for migration to scope down table migrations
([#856](#856)).
* Added job count in the assessment dashboard
([#858](#858)).
* Adopted `installation` package from `databricks-labs-blueprint`
([#860](#860)).
* Debug logs to print only the first 96 bytes of SQL query by default,
tunable by `debug_truncate_bytes` SDK configuration property
([#859](#859)).
* Extract command codes and unify the checks for spark_conf,
cluster_policy, init_scripts
([#855](#855)).
* Improved installation failure with actionable message
([#840](#840)).
* Improved validating groups membership cli command
([#816](#816)).

Dependency updates:

* Updated databricks-labs-blueprint requirement from ~=0.1.0 to ~=0.2.4
([#867](#867)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants