Skip to content

Local mode is not working in the latest sdk #187

@Stanpol

Description

@Stanpol

System Information

  • Framework: Tensorflow
  • Framework Version: 1.8.0
  • Python Version: 3.5
  • CPU or GPU: CPU
  • Python SDK Version: latest
  • Are you using a custom image: No

Describe the problem

Local mode is not working. This problem seems similar to #144.

I'm using the latest sagemaker-python-sdk via pip install git+https://github.com/aws/sagemaker-python-sdk
I'm trying to run the following code from the provided example (https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/tensorflow_distributed_mnist/tensorflow_local_mode_mnist.ipynb):

import sagemaker
import utils
from tensorflow.contrib.learn.python.learn.datasets import mnist
import tensorflow as tf

data_sets = mnist.read_data_sets('data', dtype=tf.uint8, reshape=False, validation_size=5000)

utils.convert_to(data_sets.train, 'train', 'data')
utils.convert_to(data_sets.validation, 'validation', 'data')
utils.convert_to(data_sets.test, 'test', 'data')

import boto3
session = boto3.Session(profile_name='my_profile')
s3 = session.resource('s3')
for bucket in s3.buckets.all():
    print(bucket.name)

#### Entering MFA code here

sagemaker_session = sagemaker.Session(boto_session=session)
role=sagemaker.get_execution_role(sagemaker_session=sagemaker_session)
inputs = sagemaker_session.upload_data(path='data', key_prefix='data/mnist', bucket='sagemaker-my-bucket')

from sagemaker.tensorflow import TensorFlow

mnist_estimator = TensorFlow(entry_point='mnist.py',
                             role=role,
                             training_steps=10, 
                             evaluation_steps=10,
                             train_instance_count=2,
                             train_instance_type='local',
                             sagemaker_session=sagemaker_session)

mnist_estimator.fit(inputs)

I get the following error:

Minimal repro / logs

ClientError: An error occurred (ValidationException) when calling the CreateTrainingJob operation: 1 validation error detected: Value 'local' at 'resourceConfig.instanceType' failed to satisfy constraint: Member must satisfy enum value set: [ml.p2.xlarge, ml.m5.4xlarge, ml.m4.16xlarge, ml.p3.16xlarge, ml.m5.large, ml.p2.16xlarge, ml.c4.2xlarge, ml.c5.2xlarge, ml.c4.4xlarge, ml.c5.4xlarge, ml.c4.8xlarge, ml.c5.9xlarge, ml.c5.xlarge, ml.c4.xlarge, ml.c5.18xlarge, ml.p3.2xlarge, ml.m5.xlarge, ml.m4.10xlarge, ml.m5.12xlarge, ml.m4.xlarge, ml.m5.24xlarge, ml.m4.2xlarge, ml.p2.8xlarge, ml.m5.2xlarge, ml.p3.8xlarge, ml.m4.4xlarge]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions