Skip to content

Conversation

@AnuradhaKaruppiah
Copy link
Contributor

@AnuradhaKaruppiah AnuradhaKaruppiah commented Oct 3, 2025

Description

Closes

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

  • Documentation
    • Added a GPU Cluster Sizing section to the CLI reference covering the nat sizing calc command, modes (online/offline), options, and defaults.
    • Included detailed usage/help output and option descriptions (e.g., config file, latency/runtime targets, users, GPU count, concurrency, passes, output, endpoint, timeouts).
    • Provided three example workflows: online metrics collection, offline estimation, and combined runs.
    • Surfaced the sizing guidance in the Optimize section to highlight this capability.

@AnuradhaKaruppiah AnuradhaKaruppiah self-assigned this Oct 3, 2025
@AnuradhaKaruppiah AnuradhaKaruppiah added the doc Improvements or additions to documentation label Oct 3, 2025
@AnuradhaKaruppiah AnuradhaKaruppiah requested a review from a team as a code owner October 3, 2025 17:02
@AnuradhaKaruppiah AnuradhaKaruppiah added the improvement Improvement to existing functionality label Oct 3, 2025
@coderabbitai
Copy link

coderabbitai bot commented Oct 3, 2025

Walkthrough

Adds a new "GPU Cluster Sizing" section to the CLI reference documenting the nat sizing calc command, its flags, defaults, and example usages for online, offline, and combined runs. Also mirrors this sizing content near the end of the Optimize section within the same file.

Changes

Cohort / File(s) Summary of Changes
Docs — CLI Reference
docs/source/reference/cli.md
Introduces GPU Cluster Sizing section for nat sizing calc; documents options (e.g., config_file, offline_mode, target_* params, endpoint, timeouts), defaults, help output; adds online/offline/combined example flows; mirrors sizing content in the Optimize section.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant U as User
    participant CLI as nat sizing calc
    participant EP as Metrics Endpoint (online)
    participant FS as Filesystem (offline)

    U->>CLI: Run with flags
    alt Online mode enabled
        CLI->>EP: Collect runtime metrics (timeout)
        EP-->>CLI: Metrics data
        CLI->>CLI: Compute sizing based on collected metrics
    else Offline mode
        CLI->>FS: Read config/test data
        FS-->>CLI: Local inputs
        CLI->>CLI: Estimate sizing from local inputs
    else Combined
        CLI->>EP: Collect metrics
        EP-->>CLI: Metrics data
        CLI->>FS: Read local inputs
        FS-->>CLI: Local inputs
        CLI->>CLI: Synthesize sizing from both sources
    end
    CLI-->>U: Write outputs to calc_output_dir (append if set)
Loading

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Suggested labels

non-breaking

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title uses the imperative verb “Add,” concisely describes introducing a sizing calculation summary into the main CLI documentation, and is within the 72-character guideline, accurately reflecting the primary change in the pull request.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 014d05c and 3f76a92.

📒 Files selected for processing (1)
  • docs/source/reference/cli.md (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
docs/source/**/*.md

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

docs/source/**/*.md: Use the official naming throughout documentation: first use “NVIDIA NeMo Agent toolkit”, subsequent “NeMo Agent toolkit”; never use deprecated names (Agent Intelligence toolkit, aiqtoolkit, AgentIQ, AIQ/aiq)
Documentation sources are Markdown files under docs/source; images belong in docs/source/_static
Keep docs in sync with code; documentation pipeline must pass Sphinx and link checks; avoid TODOs/FIXMEs/placeholders; avoid offensive/outdated terms; ensure spelling correctness
Do not use words listed in ci/vale/styles/config/vocabularies/nat/reject.txt; accepted terms in accept.txt are allowed

Files:

  • docs/source/reference/cli.md
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • docs/source/reference/cli.md
docs/source/**/*

⚙️ CodeRabbit configuration file

This directory contains the source code for the documentation. All documentation should be written in Markdown format. Any image files should be placed in the docs/source/_static directory.

Files:

  • docs/source/reference/cli.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: CI Pipeline / Check

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai bot added the non-breaking Non-breaking change label Oct 3, 2025
@AnuradhaKaruppiah AnuradhaKaruppiah removed the improvement Improvement to existing functionality label Oct 3, 2025
@AnuradhaKaruppiah
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 5be77b0 into NVIDIA:release/1.3 Oct 3, 2025
17 checks passed
@AnuradhaKaruppiah AnuradhaKaruppiah deleted the ak-sizing-calc-cli branch October 7, 2025 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc Improvements or additions to documentation non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants