Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Submission_method: Job Cluster' should provide databricks job run information (job_id / run_id) #420

Closed
jeffrey-harrison opened this issue Aug 17, 2023 · 6 comments · Fixed by #454
Labels
enhancement New feature or request

Comments

@jeffrey-harrison
Copy link
Contributor

jeffrey-harrison commented Aug 17, 2023

Describe the feature

Currently, we can't get the run-id of a job submitted via 'job_cluster'. As a result, we can't set view permissions for job runs when submitted via a databricks service principle, nor can we link to relevant job logs.

Send run-id and other job run information back to the user when job runs are submitted.

run-id exists here:

run_id = self._submit_job(whole_file_path, cluster_spec)

Job run information exists in the same function:

result_state = json_run_output["metadata"]["state"]["result_state"]

Describe alternatives you've considered

Currently, we query for all job runs in the past 24 hours and set permissions. This results in an excessive number of API calls to databricks.

Additional context

Job runs are not viewable after submission. We could fix this with an API call if we had the run-id.
Screenshot 2023-08-17 at 11 09 26 AM

Who will this benefit?

This will benefit users who use dbt-databricks to submit python models.

Are you interested in contributing this feature?

I should be able to contribute, depending on the complexity of returning this information.

@jeffrey-harrison jeffrey-harrison added the enhancement New feature or request label Aug 17, 2023
@jeffrey-harrison jeffrey-harrison changed the title 'Submission_method: Job Cluster' should provide databricks job run information (run-id) 'Submission_method: Job Cluster' should provide databricks job run information (job_id / run_id) Aug 24, 2023
@benc-db
Copy link
Collaborator

benc-db commented Sep 5, 2023

Send run-id and other job run information back to the user when job runs are submitted.

What did you have in mind?

@benc-db
Copy link
Collaborator

benc-db commented Sep 13, 2023

@jeffrey-harrison I think this is a reasonable request, just wondering when you say 'Send run-id and other job run information back to the user when job runs are submitted' in what form you're expecting? As a line in the log?

@jeffrey-harrison
Copy link
Contributor Author

jeffrey-harrison commented Sep 21, 2023

A line in stdout would work.

@benc-db I can make a quick PR that just calls the INFO logger if that works.

@jeffrey-harrison
Copy link
Contributor Author

jeffrey-harrison commented Sep 21, 2023

#454

Let me know what I need to add to get it passed. I'm making an assumption that users will see logger.INFO

I made an attempt to add a test, but couldn't quite figure out how to capture whatever the adapter logger is doing.

    @patch("requests.get")
    @patch("requests.post")
    def test_submit_job_logging(self, mock_post, mock_get):
        log_prefix = "Submitted databricks job: "
        log_dict = {"run_id": "1"}
        logger_name = "Databricks"

        with self.assertLogs("stdout", level='INFO') as cm:
            # Mock the start command
            mock_post.return_value.status_code = 200
            # Mock the status command
            mock_get.return_value.status_code = 200
            mock_get.return_value.json = Mock(return_value=log_dict)

            with patch.object(BaseDatabricksHelper, "__init__", lambda x, y, z: None):
                job_helper = BaseDatabricksHelper(Mock(), Mock())
                job_helper.schema = "schema"
                job_helper.identifier = "identifier"
                job_helper.parsed_model = {}
                job_helper.parsed_model["config"] = {}
                job_helper.credentials = Mock()
                job_helper.auth_header = Mock()
                job_helper._submit_job("/test/path", {})

        expected_log = f"INFO:{logger_name}:{log_prefix}{log_dict}"
        assert expected_log in cm.output

@benc-db
Copy link
Collaborator

benc-db commented Sep 21, 2023

I'll take a look tomorrow and see what I can do to massage the test. Thanks :).

@benc-db
Copy link
Collaborator

benc-db commented Sep 25, 2023

Was out sick Friday, but I see your PR and will take a look shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants