-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Please fill out the form below.
System Information
- Framework: TensorFlow
- Framework Version: 1.12
- Python Version: 3.6
- CPU or GPU: N/A
- Python SDK Version: 1.18.7
- Are you using a custom image: No
Describe the problem
Describe the problem or feature request clearly here.
Minimal repro / logs
Please provide any logs and a bare minimum reproducible test case, as this will be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
The crash is reproducible on any completed TensorFlow job which was executed in Script Mode.
After attaching to such job and getting hyperparameters from it, the SDK crashes.
from sagemaker.tensorflow import TensorFlow
job = TensorFlow.attach(training_job_name='JOB NAME')
hp = job.hyperparameters()Analysis
When submitting TensorFlow job in script mode, it is not possible to specify checkpoint_path anymore (if specified, the SDK raises exception stating that parameter is not supported in script mode).
When attaching to the completed job, the SDK does not set value for the checkpoint_path variable (for obvious reasons) and does not populate private variable _current_job_name as well (which is supposed to hold job name).
When call to hyperparameters() is made, first it tries to get checkpoint_path (which is not used below for the script mode - there is an if). But since checkpoint_path is not defined, it calls _default_s3_path() method to recreate one. That function in turn just concatenates series of sub-paths including _current_job_name variable which is set to None. So, here SDK crashes.
To fix it, the code line which queries check point path should be moved to the else section where it is used.