-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'Submission_method: Job Cluster' should provide databricks job run information (job_id / run_id) #420
Comments
What did you have in mind? |
@jeffrey-harrison I think this is a reasonable request, just wondering when you say 'Send run-id and other job run information back to the user when job runs are submitted' in what form you're expecting? As a line in the log? |
A line in stdout would work. @benc-db I can make a quick PR that just calls the INFO logger if that works. |
Let me know what I need to add to get it passed. I'm making an assumption that users will see I made an attempt to add a test, but couldn't quite figure out how to capture whatever the adapter logger is doing. @patch("requests.get")
@patch("requests.post")
def test_submit_job_logging(self, mock_post, mock_get):
log_prefix = "Submitted databricks job: "
log_dict = {"run_id": "1"}
logger_name = "Databricks"
with self.assertLogs("stdout", level='INFO') as cm:
# Mock the start command
mock_post.return_value.status_code = 200
# Mock the status command
mock_get.return_value.status_code = 200
mock_get.return_value.json = Mock(return_value=log_dict)
with patch.object(BaseDatabricksHelper, "__init__", lambda x, y, z: None):
job_helper = BaseDatabricksHelper(Mock(), Mock())
job_helper.schema = "schema"
job_helper.identifier = "identifier"
job_helper.parsed_model = {}
job_helper.parsed_model["config"] = {}
job_helper.credentials = Mock()
job_helper.auth_header = Mock()
job_helper._submit_job("/test/path", {})
expected_log = f"INFO:{logger_name}:{log_prefix}{log_dict}"
assert expected_log in cm.output |
I'll take a look tomorrow and see what I can do to massage the test. Thanks :). |
Was out sick Friday, but I see your PR and will take a look shortly. |
Describe the feature
Currently, we can't get the run-id of a job submitted via 'job_cluster'. As a result, we can't set view permissions for job runs when submitted via a databricks service principle, nor can we link to relevant job logs.
Send run-id and other job run information back to the user when job runs are submitted.
run-id
exists here:dbt-databricks/dbt/adapters/databricks/python_submissions.py
Line 123 in de5d66f
Job run information exists in the same function:
dbt-databricks/dbt/adapters/databricks/python_submissions.py
Line 143 in de5d66f
Describe alternatives you've considered
Currently, we query for all job runs in the past 24 hours and set permissions. This results in an excessive number of API calls to databricks.
Additional context
Job runs are not viewable after submission. We could fix this with an API call if we had the

run-id
.Who will this benefit?
This will benefit users who use dbt-databricks to submit python models.
Are you interested in contributing this feature?
I should be able to contribute, depending on the complexity of returning this information.
The text was updated successfully, but these errors were encountered: