How to install smdistributed? #2452
Replies: 4 comments 1 reply
-
Hi, the easiest way is to either use or start from one of the built-in TensorFlow/PyTorch containers. You can also build your own, starting from the SageMaker Training Toolkit at https://github.com/aws/sagemaker-training-toolkit. More details at https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-use-api.html |
Beta Was this translation helpful? Give feedback.
-
Note: SageMaker distributed is not currently supported in the Chinese regions. I'm adding this note because I see that you've pointed to a doc in the China region. |
Beta Was this translation helpful? Give feedback.
-
I see, I'm actually not sure why I ended up at a |
Beta Was this translation helpful? Give feedback.
-
You can install it with pip install https://smdataparallel.s3.amazonaws.com/binary/pytorch/2.4.1/cu121/2024-10-09/smdistributed_dataparallel-2.5.0-cp311-cp311-linux_x86_64.whl The available binaries (i.e. links to the .whl) can be found in https://docs.aws.amazon.com/sagemaker/latest/dg/distributed-data-parallel-support.html#distributed-data-parallel-supported-frameworks |
Beta Was this translation helpful? Give feedback.
-
What did you find confusing? Please describe.
I installed sagemaker with
pip install sagemaker --update
, and am attempting to use distributed model parallel with pytorch. However, I'm unable to importsmdistributed
.The docs https://sagemaker.readthedocs.io/en/stable/api/training/smp_versions/v1.2.0/smd_model_parallel_pytorch.html don't have installation instructions for smdistributed. I was wondering how do I get
smdistributed
installed? Thank you!I am also looking at https://docs.amazonaws.cn/en_us/sagemaker/latest/dg/model-parallel-customize-training-script-pt.html which directs me to https://sagemaker.readthedocs.io/en/stable/api/training/smp_versions/v1.2.0/smd_model_parallel_common_api.html#smp.init to initialize the sagemaker distributed environment. But again I'm not sure how to get the
smdistributed
library.https://github.com/aws/amazon-sagemaker-examples has some
smdistributed
examples but doesn't provide any clear installation instructions.environment.yml
in that repo seems to indicate all that's needed issagemaker
which I have installed.Describe how documentation can be improved
Could not find clear installation instructions for smdistributed, would it be possible to add these?
Additional context
Add any other context or screenshots about the documentation request here.
Beta Was this translation helpful? Give feedback.
All reactions