Skip to content

How to debug sagemkaer local mode : with a custom image #261

@yshvrdhn

Description

@yshvrdhn

Please fill out the form below.

System Information

  • **Keras (tensorflow)/ MaskRCNN:
  • Keras 2.2 tensorflow 1.7:
  • Py3:
  • (GPU):
  • Python 3.6:
  • Yes using a custom Image:

Describe the problem

HI I am trying to debug the docker image that I am using for sagemaker. However while trying to run the notebook in local mode it gives the following error : How do I access the logs for the run ?

RuntimeError                              Traceback (most recent call last)
~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/local/image.py in train(self, input_data_config, hyperparameters)
    110         try:
--> 111             _stream_output(process)
    112         except RuntimeError as e:

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/local/image.py in _stream_output(process)
    588     if exit_code != 0:
--> 589         raise RuntimeError("Process exited with code: %s" % exit_code)
    590 

RuntimeError: Process exited with code: 1

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<timed exec> in <module>()

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/estimator.py in fit(self, inputs, wait, logs, job_name)
    176         self._prepare_for_training(job_name=job_name)
    177 
--> 178         self.latest_training_job = _TrainingJob.start_new(self, inputs)
    179         if wait:
    180             self.latest_training_job.wait(logs=logs)

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/estimator.py in start_new(cls, estimator, inputs)
    361                                           job_name=estimator._current_job_name, output_config=config['output_config'],
    362                                           resource_config=config['resource_config'], hyperparameters=hyperparameters,
--> 363                                           stop_condition=config['stop_condition'], tags=estimator.tags)
    364 
    365         return cls(estimator.sagemaker_session, estimator._current_job_name)

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/session.py in train(self, image, input_mode, input_config, role, job_name, output_config, resource_config, hyperparameters, stop_condition, tags)
    262         LOGGER.info('Creating training-job with name: {}'.format(job_name))
    263         LOGGER.debug('train request: {}'.format(json.dumps(train_request, indent=4)))
--> 264         self.sagemaker_client.create_training_job(**train_request)
    265 
    266     def tune(self, job_name, strategy, objective_type, objective_metric_name,

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/local/local_session.py in create_training_job(self, TrainingJobName, AlgorithmSpecification, RoleArn, InputDataConfig, OutputDataConfig, ResourceConfig, StoppingCondition, HyperParameters, Tags)
     73                                    data_distribution)
     74 
---> 75         self.s3_model_artifacts = self.train_container.train(InputDataConfig, HyperParameters)
     76 
     77     def describe_training_job(self, TrainingJobName):

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/local/image.py in train(self, input_data_config, hyperparameters)
    113             # _stream_output() doesn't have the command line. We will handle the exception
    114             # which contains the exit code and append the command line to it.
--> 115             msg = "Failed to run: %s, %s" % (compose_command, e.message)
    116             raise RuntimeError(msg)
    117 

AttributeError: 'RuntimeError' object has no attribute 'message'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions