[BUG]: `jobs`, `pipelines`, `policies` and `clusters` assessment is incorrect and needs cleanup to extract common code paths #823

nfx · 2024-01-22T13:16:25Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

Cluster policy and cluster scanning is duplicated for 4 times in clusters, policies, jobs, and DLTs. This is unmaintainable, has incorrect checks, and has to be refactored to extract common parts.

Expected Behavior

tests are concise and refer to fixtures in JSON files:

https://github.com/databrickslabs/ucx/blob/main/tests/unit/assessment/test_clusters.py#L111-L116

def test_cluster_assessment_cluster_policy_no_spark_conf():
    ws = workspace_client_mock(clusters="no-spark-conf.json")
    crawler = ClustersCrawler(ws, MockBackend(), "ucx")
    result_set1 = list(crawler.snapshot())
    assert len(result_set1) == 1
    assert result_set1[0].success == 1

Steps To Reproduce

No response

Cloud

AWS

Operating System

macOS

Version

latest via Databricks CLI

Relevant log output

No response

qziyuan · 2024-01-29T17:36:15Z

@nfx Do you want all unit tests to be refactored to use the json file like

ws = workspace_client_mock(clusters="assortment-conf.json")

instead of things like:

    sample_clusters = [
        ClusterDetails(
            autoscale=AutoScale(min_workers=1, max_workers=6),
            cluster_source=ClusterSource.UI,
            spark_context_id=5134472582179565315,
            spark_env_vars=None,
            spark_conf={
                "spark.hadoop.fs.azure.account.oauth2.client.id.abcde.dfs.core.windows.net": "1234567890",
                "spark.databricks.delta.formatCheck.enabled": "false",
            },
            spark_version="9.3.x-cpu-ml-scala2.12",
            cluster_id="0810-225833-atlanta69",
            cluster_name="Tech Summit FY24 Cluster-1",
            policy_id="bdqwbdqiwd1111",
        )
    ]

qziyuan · 2024-01-29T17:39:42Z

@nfx to deduplicate the cluster scanning, does it make sense to have one crawler to scan all cluster, init_script, cluster_policy info once and save them to a delta table, and then let the cluster, jobs, pipelines, init_scripts to load the cluster info from this table instead of calling the api individually?

nfx added bug needs-triage labels Jan 22, 2024

nfx added this to UCX (weekly) - DO NOT USE THIS BOARD Jan 22, 2024

github-project-automation bot moved this to Todo in UCX (weekly) - DO NOT USE THIS BOARD Jan 22, 2024

nfx mentioned this issue Jan 22, 2024

Cleaned up Job assessment and Cluster assessment to improve testing and reduce redundancy. #825

Merged

11 tasks

qziyuan self-assigned this Jan 22, 2024

nfx removed the needs-triage label Jan 23, 2024

pohlposition added the step/assessment go/uc/upgrade - Assessment Step label Jan 27, 2024

qziyuan mentioned this issue Jan 29, 2024

Extract command codes and unify the checks for spark_conf, cluster_policy, init_scripts #855

Merged

11 tasks

nfx closed this as completed in #855 Jan 30, 2024

github-project-automation bot moved this from Todo to Done in UCX (weekly) - DO NOT USE THIS BOARD Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: `jobs`, `pipelines`, `policies` and `clusters` assessment is incorrect and needs cleanup to extract common code paths #823

[BUG]: `jobs`, `pipelines`, `policies` and `clusters` assessment is incorrect and needs cleanup to extract common code paths #823

nfx commented Jan 22, 2024 •

edited

Loading

qziyuan commented Jan 29, 2024 •

edited

Loading

qziyuan commented Jan 29, 2024

[BUG]: jobs, pipelines, policies and clusters assessment is incorrect and needs cleanup to extract common code paths #823

[BUG]: jobs, pipelines, policies and clusters assessment is incorrect and needs cleanup to extract common code paths #823

Comments

nfx commented Jan 22, 2024 • edited Loading

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Cloud

Operating System

Version

Relevant log output

qziyuan commented Jan 29, 2024 • edited Loading

qziyuan commented Jan 29, 2024

[BUG]: `jobs`, `pipelines`, `policies` and `clusters` assessment is incorrect and needs cleanup to extract common code paths #823

[BUG]: `jobs`, `pipelines`, `policies` and `clusters` assessment is incorrect and needs cleanup to extract common code paths #823

nfx commented Jan 22, 2024 •

edited

Loading

qziyuan commented Jan 29, 2024 •

edited

Loading