-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Add debugging terminal support for CustomJob, HyperparameterTun… #699
Feat: Add debugging terminal support for CustomJob, HyperparameterTun… #699
Conversation
b71b3fb
to
84304ed
Compare
6518fb3
to
41eca99
Compare
google/cloud/aiplatform/jobs.py
Outdated
(Dict[str, str]) - web access uris of the custom job | ||
""" | ||
self._sync_gca_resource() | ||
return self._gca_resource.web_access_uris |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this field need to be cast to a dict
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works the same as labels
, I see we have labels
as it is and it is a dict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks like a bug:
isinstance(ds.labels, dict)
# False
type(ds.labels)
# google.protobuf.pyext._message.ScalarMapContainer
I created an issue to track that here: b/203653647
Preference to not carry that issue over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahhh, noted
google/cloud/aiplatform/jobs.py
Outdated
self._sync_gca_resource() | ||
|
||
if self._gca_resource.trials: | ||
return self._gca_resource.trials[-1].web_access_uris |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can trials execute in parallel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trials can be executed in parallel upon parallel_trial_count
is set. Updated for HyperparameterTuningJob
to check web_access_uris
of trials in parallel.
self._gca_resource.training_task_metadata | ||
and self._gca_resource.training_task_metadata.get("backingCustomJob") | ||
and self._gca_resource.training_task_inputs.get("enable_web_access") | ||
and not self._has_logged_web_access_uris |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible the web_access_uris
have changed throughout the run? If, for example, one of the workers failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current service setup is that if workers failed and restarted, the web_access_uris
will redirect to the new workers, but won't change itself.
58c656d
to
0fbb148
Compare
81a4ae4
to
5e44f24
Compare
71f5813
to
706423f
Compare
706423f
to
7214e21
Compare
7214e21
to
f8b67ea
Compare
Fixes #<b/195449603> 🦕