'Submission_method: Job Cluster' should provide databricks job run information (job_id / run_id) #420

jeffrey-harrison · 2023-08-17T18:40:55Z

Describe the feature

Currently, we can't get the run-id of a job submitted via 'job_cluster'. As a result, we can't set view permissions for job runs when submitted via a databricks service principle, nor can we link to relevant job logs.

Send run-id and other job run information back to the user when job runs are submitted.

run-id exists here:

dbt-databricks/dbt/adapters/databricks/python_submissions.py

Line 123 in de5d66f

run_id = self._submit_job(whole_file_path, cluster_spec)

Job run information exists in the same function:

dbt-databricks/dbt/adapters/databricks/python_submissions.py

Line 143 in de5d66f

result_state = json_run_output["metadata"]["state"]["result_state"]

Describe alternatives you've considered

Currently, we query for all job runs in the past 24 hours and set permissions. This results in an excessive number of API calls to databricks.

Additional context

Job runs are not viewable after submission. We could fix this with an API call if we had the run-id.

Who will this benefit?

This will benefit users who use dbt-databricks to submit python models.

Are you interested in contributing this feature?

I should be able to contribute, depending on the complexity of returning this information.

The text was updated successfully, but these errors were encountered:

benc-db · 2023-09-05T23:48:42Z

Send run-id and other job run information back to the user when job runs are submitted.

What did you have in mind?

benc-db · 2023-09-13T16:31:52Z

@jeffrey-harrison I think this is a reasonable request, just wondering when you say 'Send run-id and other job run information back to the user when job runs are submitted' in what form you're expecting? As a line in the log?

jeffrey-harrison · 2023-09-21T21:52:51Z

A line in stdout would work.

@benc-db I can make a quick PR that just calls the INFO logger if that works.

jeffrey-harrison · 2023-09-21T22:16:30Z

#454

Let me know what I need to add to get it passed. I'm making an assumption that users will see logger.INFO

I made an attempt to add a test, but couldn't quite figure out how to capture whatever the adapter logger is doing.

    @patch("requests.get")
    @patch("requests.post")
    def test_submit_job_logging(self, mock_post, mock_get):
        log_prefix = "Submitted databricks job: "
        log_dict = {"run_id": "1"}
        logger_name = "Databricks"

        with self.assertLogs("stdout", level='INFO') as cm:
            # Mock the start command
            mock_post.return_value.status_code = 200
            # Mock the status command
            mock_get.return_value.status_code = 200
            mock_get.return_value.json = Mock(return_value=log_dict)

            with patch.object(BaseDatabricksHelper, "__init__", lambda x, y, z: None):
                job_helper = BaseDatabricksHelper(Mock(), Mock())
                job_helper.schema = "schema"
                job_helper.identifier = "identifier"
                job_helper.parsed_model = {}
                job_helper.parsed_model["config"] = {}
                job_helper.credentials = Mock()
                job_helper.auth_header = Mock()
                job_helper._submit_job("/test/path", {})

        expected_log = f"INFO:{logger_name}:{log_prefix}{log_dict}"
        assert expected_log in cm.output

benc-db · 2023-09-21T22:18:24Z

I'll take a look tomorrow and see what I can do to massage the test. Thanks :).

benc-db · 2023-09-25T16:53:50Z

Was out sick Friday, but I see your PR and will take a look shortly.

jeffrey-harrison added the enhancement New feature or request label Aug 17, 2023

jeffrey-harrison changed the title ~~'Submission_method: Job Cluster' should provide databricks job run information (run-id)~~ 'Submission_method: Job Cluster' should provide databricks job run information (job_id / run_id) Aug 24, 2023

jeffrey-harrison mentioned this issue Sep 21, 2023

Log databricks job info on Python submission #454

Merged

3 tasks

benc-db closed this as completed in #454 Sep 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'Submission_method: Job Cluster' should provide databricks job run information (job_id / run_id) #420

'Submission_method: Job Cluster' should provide databricks job run information (job_id / run_id) #420

jeffrey-harrison commented Aug 17, 2023 •

edited

Loading

benc-db commented Sep 5, 2023

benc-db commented Sep 13, 2023

jeffrey-harrison commented Sep 21, 2023 •

edited

Loading

jeffrey-harrison commented Sep 21, 2023 •

edited

Loading

benc-db commented Sep 21, 2023

benc-db commented Sep 25, 2023

'Submission_method: Job Cluster' should provide databricks job run information (job_id / run_id) #420

'Submission_method: Job Cluster' should provide databricks job run information (job_id / run_id) #420

Comments

jeffrey-harrison commented Aug 17, 2023 • edited Loading

Describe the feature

Describe alternatives you've considered

Additional context

Who will this benefit?

Are you interested in contributing this feature?

benc-db commented Sep 5, 2023

benc-db commented Sep 13, 2023

jeffrey-harrison commented Sep 21, 2023 • edited Loading

jeffrey-harrison commented Sep 21, 2023 • edited Loading

benc-db commented Sep 21, 2023

benc-db commented Sep 25, 2023

jeffrey-harrison commented Aug 17, 2023 •

edited

Loading

jeffrey-harrison commented Sep 21, 2023 •

edited

Loading

jeffrey-harrison commented Sep 21, 2023 •

edited

Loading