Multi GPU training on g5.12xlarge #4412

H3zi · 2024-02-05T13:03:32Z

H3zi
Feb 5, 2024

Hi all,
I wonder if there's a built-in solution in SageMaker to run distributed training on a g5.12xlarge (4xGPUs)?
From what I can see, SMDDP does not support g5.12xlarge instances, so specifying distributed config in the estimator will not work.

I can hack my way using pytorch\accelerate but I wonder why there's no straightforward way to do it with the SM SDK?

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi GPU training on g5.12xlarge #4412

{{title}}

Replies: 0 comments

Select a reply

Multi GPU training on g5.12xlarge #4412

H3zi Feb 5, 2024

Replies: 0 comments

H3zi
Feb 5, 2024