cp: chore(fix): Deployment parallelism (2189) into r0.3.0#2406
cp: chore(fix): Deployment parallelism (2189) into r0.3.0#2406
chore(fix): Deployment parallelism (2189) into r0.3.0#2406Conversation
Signed-off-by: oliver könig <okoenig@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
|
/ok to test 02b5f5d |
📝 WalkthroughWalkthroughThree configuration-related scripts are modified: parallelism arguments removed from evaluation deployment commands, the command string reformatted for readability, and the maximum experiment name length adjusted for dgxc clusters. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
examples/evaluation/launch_evaluation_pipeline.py (1)
14-14:⚠️ Potential issue | 🟡 MinorShebang line should be on line 1, before the copyright header.
The
#!/usr/bin/env python3shebang on line 14 is placed after the copyright comment block. For the OS to recognize the interpreter directive, it must be the very first line of the file. This is a pre-existing issue but worth fixing.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/evaluation/launch_evaluation_pipeline.py` at line 14, Move the interpreter directive to the very top of the script so the OS recognizes it: place the line "#!/usr/bin/env python3" as line 1 of launch_evaluation_pipeline.py (before any copyright or comment block), removing the existing duplicate later in the file; ensure there are no blank lines or BOM before the shebang.
🧹 Nitpick comments (3)
examples/evaluation/launch_evaluation_pipeline.py (1)
114-123: Reformatted command looks correct; minor: missing space before|on line 118.The multi-line reformatting is a readability improvement. However, there's a missing space before the pipe on line 118 (
{args.num_gpus}| tee), unlike line 122 which has{args.parallelism} | tee. While bash handles this fine, consistency is nice.Consistency fix
- {args.num_gpus}| tee -a deploy.log & \ + {args.num_gpus} | tee -a deploy.log & \🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/evaluation/launch_evaluation_pipeline.py` around lines 114 - 123, The f-string command block used for launching the evaluation pipeline has a missing space before the pipe after {args.num_gpus} causing an inconsistency (see the multiline f-string assigned to command in launch_evaluation_pipeline.py); edit that string to insert a space so it reads "{args.num_gpus} | tee -a deploy.log & \\" to match the later "{args.parallelism} | tee -a eval.log" and keep formatting consistent.scripts/performance/setup_experiment.py (1)
387-389: Consider extracting the magic number33into a named constant.The truncation limit changed from 37 to 33, but the rationale for the specific value isn't documented. A named constant (e.g.,
DGXC_MAX_EXP_NAME_LENGTH = 33) near the top of the file would make the constraint self-documenting and easier to find if it needs to change again.Proposed refactor
Near the top of the file (e.g., after line 60):
DGXC_MAX_EXP_NAME_LENGTH = 33Then at line 388:
- exp_name[:33] if dgxc_cluster is not None else exp_name + exp_name[:DGXC_MAX_EXP_NAME_LENGTH] if dgxc_cluster is not None else exp_name🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/performance/setup_experiment.py` around lines 387 - 389, Replace the magic number 33 with a named constant: define DGXC_MAX_EXP_NAME_LENGTH = 33 near the top of the file (e.g., after imports or around line 60) and update the truncation site that uses exp_name and dgxc_cluster to use DGXC_MAX_EXP_NAME_LENGTH (exp_name = exp_name[:DGXC_MAX_EXP_NAME_LENGTH] if dgxc_cluster is not None else exp_name); add a brief comment next to the constant explaining the Kubernetes name-length constraint so future maintainers can find and change it easily.examples/evaluation/deploy.sh (1)
1-16: Missing shebang and NVIDIA copyright header.The script lacks both a shebang line (
#!/bin/bash) and the NVIDIA copyright header. While these are pre-existing omissions (not introduced by this PR), the coding guidelines require an NVIDIA copyright header on all shell scripts, and best practice calls for a shebang. Consider adding them in a follow-up.As per coding guidelines: "Add NVIDIA copyright header to all Python files and shell scripts at the top of the file."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/evaluation/deploy.sh` around lines 1 - 16, Add a bash shebang and the required NVIDIA copyright header at the top of this shell script (deploy.sh) so it complies with coding guidelines; place the shebang (#!/bin/bash) as the very first line and immediately below it insert the standard NVIDIA copyright/header block used across the repo, then preserve the existing logic that references MEGATRON_CHECKPOINT, NUM_REPLICAS, NUM_GPUS and the python invocation without other changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@examples/evaluation/launch_evaluation_pipeline.py`:
- Line 14: Move the interpreter directive to the very top of the script so the
OS recognizes it: place the line "#!/usr/bin/env python3" as line 1 of
launch_evaluation_pipeline.py (before any copyright or comment block), removing
the existing duplicate later in the file; ensure there are no blank lines or BOM
before the shebang.
---
Nitpick comments:
In `@examples/evaluation/deploy.sh`:
- Around line 1-16: Add a bash shebang and the required NVIDIA copyright header
at the top of this shell script (deploy.sh) so it complies with coding
guidelines; place the shebang (#!/bin/bash) as the very first line and
immediately below it insert the standard NVIDIA copyright/header block used
across the repo, then preserve the existing logic that references
MEGATRON_CHECKPOINT, NUM_REPLICAS, NUM_GPUS and the python invocation without
other changes.
In `@examples/evaluation/launch_evaluation_pipeline.py`:
- Around line 114-123: The f-string command block used for launching the
evaluation pipeline has a missing space before the pipe after {args.num_gpus}
causing an inconsistency (see the multiline f-string assigned to command in
launch_evaluation_pipeline.py); edit that string to insert a space so it reads
"{args.num_gpus} | tee -a deploy.log & \\" to match the later
"{args.parallelism} | tee -a eval.log" and keep formatting consistent.
In `@scripts/performance/setup_experiment.py`:
- Around line 387-389: Replace the magic number 33 with a named constant: define
DGXC_MAX_EXP_NAME_LENGTH = 33 near the top of the file (e.g., after imports or
around line 60) and update the truncation site that uses exp_name and
dgxc_cluster to use DGXC_MAX_EXP_NAME_LENGTH (exp_name =
exp_name[:DGXC_MAX_EXP_NAME_LENGTH] if dgxc_cluster is not None else exp_name);
add a brief comment next to the constant explaining the Kubernetes name-length
constraint so future maintainers can find and change it easily.
|
will be merged via #2509 |
beep boop [🤖]: Hi @ko3n1g 👋,
Summary by CodeRabbit
Changes
Chores