revert node-count val #253

mohamedzeidan2021 · 2025-09-02T22:41:18Z

What's changing and why?

the pytorch job expects 1 of the following 2 (not both)

node-count
combination of accelerators, vcpu, memory-in-gib must be specified for instance-type ml.p4d.24xlarge

this change reverts the default of node_count back to None, so users can specify on the pytorch job flags whether they want one or the other.

How was this change tested?

tested locally by running commands like

hyp create hyp-pytorch-job \
  --version 1.1 \
  --job-name my-training-job \
  --image pytorch/pytorch:latest \
  --command '[python, train.py]' \ 
  --instance-type ml.g5.8xlarge \
  --tasks-per-node 1 \
  --accelerators 1 \
  --accelerators-limit 1 \
  --vcpu 8 \
  --vcpu-limit 8 \
  --memory 32 \
  --memory-limit 32

Unit test coverage

All new/modified code has unit tests
Coverage verified for changed code
N/A - no testable code changes

Do we need integration tests?

Yes - integration tests added
No - unit tests sufficient
No - infrastructure/config change only
Unsure - please advise

Checklist

PR title clearly describes the change
No sensitive information exposed and security is maintained
Ready for review

Co-authored-by: Mohamed Zeidan <[email protected]>

revert node-count val

8b037ad

mohamedzeidan2021 requested a review from a team as a code owner September 2, 2025 22:41

mohamedzeidan2021 temporarily deployed to auto-approve September 2, 2025 22:41 — with GitHub Actions Inactive

zhaoqizqwang approved these changes Sep 2, 2025

View reviewed changes

jam-jee approved these changes Sep 2, 2025

View reviewed changes

mohamedzeidan2021 merged commit d30a69b into aws:main Sep 2, 2025
5 of 6 checks passed

papriwal pushed a commit to papriwal/sagemaker-hyperpod-cli that referenced this pull request Sep 4, 2025

revert node-count val (aws#253)

06dca41

Co-authored-by: Mohamed Zeidan <[email protected]>

mohamedzeidan2021 added a commit to mohamedzeidan2021/sagemaker-hyperpod-cli that referenced this pull request Sep 10, 2025

revert node-count val (aws#253)

17251eb

Co-authored-by: Mohamed Zeidan <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

revert node-count val #253

revert node-count val #253

Uh oh!

mohamedzeidan2021 commented Sep 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

revert node-count val #253

revert node-count val #253

Uh oh!

Conversation

mohamedzeidan2021 commented Sep 2, 2025

What's changing and why?

How was this change tested?

Unit test coverage

Do we need integration tests?

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants