Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crawler for Externally Orchestrated Jobs with Failing Configuration #395

Closed
wants to merge 2 commits into from

Conversation

zpappa
Copy link

@zpappa zpappa commented Oct 6, 2023

Resolves #266

Added ExternallyOrchestratedJobsWithFailingConfigCrawler
Added a crawler to look at JobRuns from the SDK and determine which of the job runs are from the RunsSubmit API

Added Unit Tests
Added tests to cover basic logic and some edge cases

Integration Tests Pending

@codecov
Copy link

codecov bot commented Oct 6, 2023

Codecov Report

Attention: 52 lines in your changes are missing coverage. Please review.

Comparison is base (606dd72) 85.67% compared to head (02dfe85) 82.61%.
Report is 1 commits behind head on main.

❗ Current head 02dfe85 differs from pull request most recent head 7801cc8. Consider uploading reports for the commit 7801cc8 to get more accurate results

Files Patch % Lines
src/databricks/labs/ucx/assessment/crawlers.py 55.17% 34 Missing and 18 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #395      +/-   ##
==========================================
- Coverage   85.67%   82.61%   -3.07%     
==========================================
  Files          42       30      -12     
  Lines        5311     2490    -2821     
  Branches      969      445     -524     
==========================================
- Hits         4550     2057    -2493     
+ Misses        542      326     -216     
+ Partials      219      107     -112     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a passing integration test. please attach a screenshot once you get it to work locally

@pohlposition pohlposition added the step/assessment go/uc/upgrade - Assessment Step label Oct 6, 2023
@pohlposition pohlposition added this to the 1 week milestone Oct 6, 2023
@@ -57,6 +61,14 @@ class PipelineInfo:
failures: str


@dataclass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to capture only the failing ones or all of them with the failing one anotated?

def is_custom_image(version_string: str):
"""
Is this a custom version?
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it require implementation?

pattern = r"^(?P<major>\d+)?\.(?P<minor>\d+)?\.(?P<patch>[\dx]+)?.*"
lvg = re.match(pattern, left_version)
rvg = re.match(pattern, right_version)
left = (int(lvg.group("major")), int(lvg.group("minor")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a proper unit test for that?

@@ -84,6 +119,26 @@ def spark_version_compatibility(spark_version: str) -> str:
return "supported"


def get_job_cluster_from_task(
task: RunTask, job_run: BaseRun, all_clusters: dict[str, ClusterDetails]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re factor job cluster to use the same mechanism

self._ws = ws

def _crawl(self) -> list[ExternallyOrchestratedJobRunWithFailingConfiguration]:
no_of_days_back = datetime.timedelta(days=30) # todo make configurable in yaml?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make timedelta externally configurable

@nfx nfx added the step/assign metastore go/uc/upgrade Assign Metastore label Oct 19, 2023
@zpappa zpappa force-pushed the feature/external-orchestrator-job-run-crawler branch from 02dfe85 to ea103b4 Compare October 20, 2023 13:02
@zpappa zpappa force-pushed the feature/external-orchestrator-job-run-crawler branch from ea103b4 to 14daf86 Compare October 20, 2023 13:03
@CLAassistant
Copy link

CLAassistant commented Nov 27, 2023

CLA assistant check
All committers have signed the CLA.

@nfx nfx assigned renardeinside and unassigned dipankarkush-db Dec 14, 2023
@FastLee FastLee closed this Jan 26, 2024
@nfx nfx reopened this Jan 29, 2024
…estrator-job-run-crawler

# Conflicts:
#	src/databricks/labs/ucx/assessment/crawlers.py
#	tests/unit/assessment/test_assessment.py
@nfx nfx closed this Jan 30, 2024
@databrickslabs databrickslabs locked and limited conversation to collaborators Jan 30, 2024
@nfx nfx deleted the feature/external-orchestrator-job-run-crawler branch April 4, 2024 22:32
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
step/assessment go/uc/upgrade - Assessment Step step/assign metastore go/uc/upgrade Assign Metastore
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Assessment for RunSubmit API usages
7 participants