Skip to content

Conversation

@nzmora-nvidia
Copy link
Collaborator

@nzmora-nvidia nzmora-nvidia commented Aug 26, 2025

Summary by CodeRabbit

  • Tests
    • Added a helper to centralize TinyLlama path resolution for reliable model lookup.
    • Benchmarks and dataset setup now detect and prefer local model copies and pass model location through workflows.
    • Benchmark/test interfaces extended to accept a model path parameter; runs can still emit optional JSON reports.
    • Cache-size metrics parsing improved (MB→bytes conversion) for more accurate reporting.
    • External hub alias preserved.

Description

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 26, 2025

📝 Walkthrough

Walkthrough

Centralized TinyLlama path/name resolution was added and tests updated to prefer local model directories. run_benchmark, prepare_dataset, and run_multiple_benchmarks_with_outlier_removal signatures were extended to accept model path info; import os moved to module scope and KV-cache parsing adjusted to read MB values and convert to bytes.

Changes

Cohort / File(s) Summary of Changes
Single test file — helper & API adjustments
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
Added tiny_llama_details() to centralize model path/name resolution; moved import os to module scope; changed prepare_dataset(root_dir, temp_dir, model_path_or_name) signature and call sites; extended run_benchmark(...) and run_multiple_benchmarks_with_outlier_removal(...) signatures to accept model_path; CLI now appends --model_path <model_path> when a local model dir exists; updated dataset path usage to model_path_or_name; enhanced KV-cache parsing to recognize "Current cache size (MB):" and convert MB→bytes; updated tests to use the new helper and signatures.

Sequence Diagram(s)

sequenceDiagram
  participant Test as Unit Test
  participant Helper as tiny_llama_details()
  participant Runner as run_benchmark()
  participant Bench as trtllm-bench

  Test->>Helper: request model_path_or_name, model_name, model_path
  Helper-->>Test: return details
  Test->>Runner: call(model_name, model_path, dataset_path, temp_dir, ...)
  alt model_path is local directory
    Runner->>Bench: python -m tensorrt_llm.commands.bench --model <model_name> --model_path <model_path> --dataset <dataset_path> ...
  else remote hub id used
    Runner->>Bench: python -m tensorrt_llm.commands.bench --model <model_name> --dataset <dataset_path> ...
  end
  Bench-->>Runner: benchmark output (stdout/json)
  Runner-->>Test: parsed results (including KV-cache converted to bytes)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • lucaslie
  • suyoggupta

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbit in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbit in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbit gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbit read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbit help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbit ignore or @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbit summary or @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbit or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@nzmora-nvidia nzmora-nvidia changed the title https://nvbugs/5474453: fix path to tested model [https://nvbugs/5474453][fix]: fix path to tested model Aug 26, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (1)

1-6: Add the NVIDIA copyright header and import sys.

Compliance: all .py files should carry the current-year NVIDIA header. Also, importing sys here will enable using sys.executable in subprocess calls below.

Apply this diff at the file top:

+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+
 import json
 import re
 import subprocess
 import tempfile
+import sys
 from pathlib import Path
🧹 Nitpick comments (3)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (3)

476-479: DRY up model path/hub-id and prefer Path joins over hard-coded slashes.

Avoid duplicate literals and OS-specific separators by centralizing the TinyLlama path and hub ID.

Apply this minimal diff at both call sites:

-        f"{llm_models_root()}/llama-models-v2/TinyLlama-1.1B-Chat-v1.0", "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
+        TINYLLAMA_LOCAL_DIR, TINYLLAMA_HUB_ID

Add these constants near the imports (outside the selected ranges):

# TinyLlama test model references
TINYLLAMA_LOCAL_DIR = str(Path(llm_models_root(), "llama-models-v2", "TinyLlama-1.1B-Chat-v1.0"))
TINYLLAMA_HUB_ID = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

Also applies to: 585-587


66-70: Use sys.executable to ensure the same Python interpreter is used in subprocesses.

Invoking "python" vs "python3" can diverge from the active interpreter and virtualenv. Use sys.executable for reproducibility in CI.

Apply these diffs:

  1. Bench command launcher:
-    cmd = [
-        "python",
+    cmd = [
+        sys.executable,
         "-m",
         "tensorrt_llm.commands.bench",
  1. Dataset preparation script:
-    command = [
-        "python3",
+    command = [
+        sys.executable,
         f"{dataset_tool}",
         "--stdout",

Note: This assumes the earlier import of sys at the file top.

Also applies to: 239-255


404-414: require_metrics=False still fails the test — behavior contradicts the docstring.

The function says “If False, just warn,” but both branches still assert. This turns a soft check into a hard failure and can create unnecessary flakiness when metrics aren’t produced.

Apply this diff to make the parameter effective:

-        if require_metrics:
-            assert False, f"REQUIRED {message}"
-        else:
-            print(f"ℹ️ {message}")
-            assert False, "KV cache metrics are missing"
+        if require_metrics:
+            assert False, f"REQUIRED {message}"
+        else:
+            print(f"ℹ️ {message}")
+            return None, None
@@
-        if require_metrics:
-            assert False, message
-        else:
-            print(f"ℹ️ KV cache validation skipped - {message}")
-            assert False, "KV cache metrics are missing"
+        if require_metrics:
+            assert False, message
+        else:
+            print(f"ℹ️ KV cache validation skipped - {message}")
+            return None, None

Callers that set require_metrics=False should handle a (None, None) return.

Also applies to: 424-429

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 87d1d3a and bd265e0.

📒 Files selected for processing (1)
  • tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Code must target Python 3.8+
Indent Python code with 4 spaces; do not use tabs
Preserve module namespaces when importing; import modules/packages and access members via the module (e.g., from package.subpackage import foo; foo.SomeClass())
Python file names should be snake_case
Python class names should be PascalCase
Python functions/methods and local variables should be snake_case; variables beginning with a number should be prefixed with k_ (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE prefixed with G_ (e.g., G_MY_GLOBAL); constants should be UPPER_SNAKE_CASE
Avoid shadowing variables from outer scopes; initialize all externally visible members in init
Prefer docstrings for interfaces used outside a file; comments should be reserved for in-function or file-local interfaces
Use Google-style docstrings for classes and functions; attributes and variables may be documented inline with trailing string literals
Avoid reflection when simpler, explicit code suffices (e.g., avoid dict(**locals()) patterns)
In try/except, catch the narrowest exceptions possible
For duck-typing patterns, keep the try body minimal and move logic to else to avoid masking unrelated failures

Files:

  • tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
**/*.{c,cc,cpp,cxx,h,hh,hpp,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA copyright header (current year) to all source files (.cpp, .h, .cu, .py, etc.)

Files:

  • tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (3)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (3)

476-479: Path fix to TinyLlama local dir looks correct and consistent.

Both the local path prefix and the unchanged HF hub ID align with the intent of NVBugs 5474453. No logic or API impact.


585-587: Second call site updated consistently.

The same local dir change is applied here; hub ID alias remains stable. Good consistency across tests.


476-479: No remaining direct references to the old TinyLlama local path

I ran the recommended grep commands and confirmed:

  • No occurrences of the legacy llm_models_root()/TinyLlama-1.1B-Chat-v1.0 path remain in the codebase.
  • All remaining matches for TinyLlama-1.1B-Chat-v1.0 are part of the HF Hub identifier string ("TinyLlama/TinyLlama-1.1B-Chat-v1.0") and should not be modified.

This sweep shows the old local‐directory references have been fully updated.

@nzmora-nvidia nzmora-nvidia force-pushed the user/nzmora/nvbug-5474453 branch from cedf44c to f148008 Compare August 26, 2025 22:46
@nzmora-nvidia nzmora-nvidia force-pushed the user/nzmora/nvbug-5474453 branch from 39d1eb6 to 4a1e7f4 Compare August 27, 2025 07:11
@nzmora-nvidia nzmora-nvidia changed the title [https://nvbugs/5474453][fix]: fix path to tested model [https://nvbugs/5474453][fix] fix path to tested model Aug 27, 2025
@nzmora-nvidia
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16651 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16651 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #12501 completed with status: 'FAILURE'

@nzmora-nvidia nzmora-nvidia force-pushed the user/nzmora/nvbug-5474453 branch from 5c7e625 to cb6f998 Compare August 27, 2025 14:22
@nzmora-nvidia
Copy link
Collaborator Author

/bot run

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (2)

15-19: Centralize path building with Path and document return order.

Using Path ensures portability; add a short docstring to avoid tuple-order mistakes.

Apply this diff:

-def tiny_llama_details():
-    model_path = f"{llm_models_root()}/llama-models-v2/TinyLlama-1.1B-Chat-v1.0"
-    model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
-    model_path_or_name = _hf_model_dir_or_hub_id(model_path, model_name)
-    return model_path_or_name, model_name, model_path
+def tiny_llama_details():
+    """Return (model_path_or_name, model_name, model_path) for TinyLlama."""
+    model_path = str(Path(llm_models_root()) / "llama-models-v2" / "TinyLlama-1.1B-Chat-v1.0")
+    model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
+    model_path_or_name = _hf_model_dir_or_hub_id(model_path, model_name)
+    return model_path_or_name, model_name, model_path

242-255: Use the active interpreter for dataset generation for portability.

Replace hardcoded "python3" with sys.executable to avoid env mismatches.

Apply this diff:

-    command = [
-        "python3",
+    command = [
+        sys.executable,
         f"{dataset_tool}",
         "--stdout",
         "--tokenizer",
         model_path_or_name,

Also add at the top of the file (outside this hunk):

import sys
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4a1e7f4 and cb6f998.

📒 Files selected for processing (1)
  • tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (11 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Code must target Python 3.8+
Indent Python code with 4 spaces; do not use tabs
Preserve module namespaces when importing; import modules/packages and access members via the module (e.g., from package.subpackage import foo; foo.SomeClass())
Python file names should be snake_case
Python class names should be PascalCase
Python functions/methods and local variables should be snake_case; variables beginning with a number should be prefixed with k_ (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE prefixed with G_ (e.g., G_MY_GLOBAL); constants should be UPPER_SNAKE_CASE
Avoid shadowing variables from outer scopes; initialize all externally visible members in init
Prefer docstrings for interfaces used outside a file; comments should be reserved for in-function or file-local interfaces
Use Google-style docstrings for classes and functions; attributes and variables may be documented inline with trailing string literals
Avoid reflection when simpler, explicit code suffices (e.g., avoid dict(**locals()) patterns)
In try/except, catch the narrowest exceptions possible
For duck-typing patterns, keep the try body minimal and move logic to else to avoid masking unrelated failures

Files:

  • tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
**/*.{c,cc,cpp,cxx,h,hh,hpp,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA copyright header (current year) to all source files (.cpp, .h, .cu, .py, etc.)

Files:

  • tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
🧬 Code graph analysis (1)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (1)
tests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.py (1)
  • _hf_model_dir_or_hub_id (266-273)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (10)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (10)

2-2: Top-level import of os is appropriate.


91-94: Good: prefer local model copy when available.

Passing --model_path conditionally makes tests resilient to hub/network hiccups.


487-489: LGTM: centralized TinyLlama resolution is correctly consumed.


502-503: LGTM: dataset now keyed by model_path_or_name to support local or hub tokenizers.


507-517: LGTM: autodeploy run passes model_path and captures report.


531-541: LGTM: PyTorch run mirrors autodeploy invocation and includes model_path.


595-597: LGTM: reuse helper for TinyLlama details in simple smoke test.


608-610: LGTM: call sites updated to new signatures.


1-1: Add mandatory NVIDIA copyright header.

Per project guidelines, prepend the current-year NVIDIA header to all .py files.

Apply this diff:

+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+
 import json
⛔ Skipped due to learnings
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

59-70: Docstring updated and run_benchmark call sites verified – no further changes required
The docstring now fully documents the new model_path parameter. A quick search confirmed all run_benchmark calls (e.g., in tests/scripts/perf-sanity/run_benchmark_serve.py, tests/unittest/_torch/auto_deploy/unit/singlegpu/utils/test_benchmark_mlp.py, and tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py) match the updated signature and require no modifications.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16701 [ run ] triggered by Bot

@nzmora-nvidia
Copy link
Collaborator Author

/bot kill

@nzmora-nvidia nzmora-nvidia force-pushed the user/nzmora/nvbug-5474453 branch from cb6f998 to 01a6584 Compare August 27, 2025 14:57
@nzmora-nvidia
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16705 [ kill ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16701 [ run ] completed with state ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16705 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit 01a6584

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16707 [ run ] triggered by Bot

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (2)

15-20: Centralized TinyLlama path resolution looks good.

Nice consolidation; reduces duplication and favors local cache when available. Minor: consider a small NamedTuple to self-document the triple.

Example:

from typing import NamedTuple

class TinyLlamaDetails(NamedTuple):
    model_path_or_name: str
    model_name: str
    model_path: str

59-69: Make model_path optional to preserve call-site flexibility.

New param is fine, but a default plus a guard avoids churn if reused elsewhere.

Apply this diff:

 def run_benchmark(
     model_name: str,
-    model_path: str,
+    model_path: str = "",
     dataset_path: str,
     temp_dir: str,

And later:

-    if os.path.isdir(model_path):
+    if model_path and os.path.isdir(model_path):
         cmd.extend(["--model_path", model_path])

Optional: use sys.executable consistently for subprocess invocations to avoid environment ambiguity (python vs python3).

Outside-range snippet:

import sys
# ...
cmd = [sys.executable, "-m", "tensorrt_llm.commands.bench", "--model", model_name]
# ...
command = [sys.executable, f"{dataset_tool}", "--stdout", "--tokenizer", model_path_or_name, ...]
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between cb6f998 and 01a6584.

📒 Files selected for processing (1)
  • tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (11 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Code must target Python 3.8+
Indent Python code with 4 spaces; do not use tabs
Preserve module namespaces when importing; import modules/packages and access members via the module (e.g., from package.subpackage import foo; foo.SomeClass())
Python file names should be snake_case
Python class names should be PascalCase
Python functions/methods and local variables should be snake_case; variables beginning with a number should be prefixed with k_ (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE prefixed with G_ (e.g., G_MY_GLOBAL); constants should be UPPER_SNAKE_CASE
Avoid shadowing variables from outer scopes; initialize all externally visible members in init
Prefer docstrings for interfaces used outside a file; comments should be reserved for in-function or file-local interfaces
Use Google-style docstrings for classes and functions; attributes and variables may be documented inline with trailing string literals
Avoid reflection when simpler, explicit code suffices (e.g., avoid dict(**locals()) patterns)
In try/except, catch the narrowest exceptions possible
For duck-typing patterns, keep the try body minimal and move logic to else to avoid masking unrelated failures

Files:

  • tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
**/*.{c,cc,cpp,cxx,h,hh,hpp,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA copyright header (current year) to all source files (.cpp, .h, .cu, .py, etc.)

Files:

  • tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
🧬 Code graph analysis (1)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (1)
tests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.py (1)
  • _hf_model_dir_or_hub_id (266-273)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (6)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (6)

2-2: Good move: import at module scope.

Bringing os to module scope avoids repeat imports and supports the new isdir checks.


245-258: prepare_dataset now uses model_path_or_name for tokenizer — good.

This matches the _hf_model_dir_or_hub_id behavior and improves CI resilience.


490-490: LGTM: single source of truth for model identifiers in test.


599-599: LGTM: tests consume tiny_llama_details() consistently.


611-612: LGTM: call sites updated to new signatures.

Dataset uses model_path_or_name; benchmark receives model_path for local override.


1-3: Missing NVIDIA copyright header (2025).

Per repo guidelines, prepend the NVIDIA copyright header at the top of all source files.

Apply this diff:

+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+
 import json
 import os
 import re
⛔ Skipped due to learnings
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16707 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #12539 completed with status: 'FAILURE'

@nzmora-nvidia nzmora-nvidia force-pushed the user/nzmora/nvbug-5474453 branch from 01a6584 to 4020f72 Compare August 27, 2025 19:47
@nzmora-nvidia
Copy link
Collaborator Author

/bot run

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (1)

139-155: Fix CLI invocation: place subcommand before options and avoid unsupported --model_path.

Click/Typer subcommand options must follow the subcommand. Also, trtllm-bench doesn’t support --model_path (see prior comment), and --model already accepts a local directory. Prefer passing the resolved local path via --model.

Apply this diff:

-    cmd = ["python", "-m", "tensorrt_llm.commands.bench", "--model", model_name]
-
-    # If the model exists locally, then using the local copy will make the test robust to CI network issues
-    if os.path.isdir(model_path):
-        cmd.extend(["--model_path", model_path])
-
-    cmd.extend(
-        [
-            "throughput",
-            "--backend",
-            backend,
-            "--dataset",
-            str(dataset_path),
-            "--max_batch_size",
-            str(max_batch_size),
-        ]
-    )
+    # Prefer local directory if it exists; otherwise use hub id
+    model_identifier = model_path if os.path.isdir(model_path) else model_name
+    cmd = [
+        sys.executable, "-m", "tensorrt_llm.commands.bench",
+        "throughput",
+        "--model", model_identifier,
+        "--backend", backend,
+        "--dataset", str(dataset_path),
+        "--max_batch_size", str(max_batch_size),
+    ]

Add missing import (outside this hunk):

import sys
🧹 Nitpick comments (1)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (1)

86-89: Harden KV-cache parsing: accept decimal MB and MiB; convert robustly.

Logs may emit fractional MB and “MiB”. Parse floats, normalize units, and store bytes for cache size.

Apply this diff:

-    patterns = {
-        "current_cache_size": r"Current cache size \(MB\):\s*(\d+)",
-        "free_mem_pre_mb": r"Free memory before forward pass \(MB\):\s*(\d+)",
-        "free_mem_post_mb": r"Free memory after forward pass \(MB\):\s*(\d+)",
-    }
+    patterns = {
+        "current_cache_size": r"Current cache size \((?:MB|MiB)\):\s*([\d.]+)",
+        "free_mem_pre_mb": r"Free memory before forward pass \(MB\):\s*([\d.]+)",
+        "free_mem_post_mb": r"Free memory after forward pass \(MB\):\s*([\d.]+)",
+    }
@@
-        if match:
-            value = int(match.group(1))
-            metrics[metric_name] = value
+        if match:
+            raw = match.group(1)
+            if metric_name == "current_cache_size":
+                metrics[metric_name] = float(raw)  # MB for now; convert to bytes below
+            else:
+                metrics[metric_name] = int(float(raw))  # keep MB as int
@@
-    try:
-        metrics["current_cache_size"] = metrics["current_cache_size"] * 1024 * 1024
+    try:
+        metrics["current_cache_size"] = int(metrics["current_cache_size"] * 1024 * 1024)
     except KeyError:
         print("  ❌ Could not find current_cache_size")

Also applies to: 101-105, 92-98

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 01a6584 and 4020f72.

📒 Files selected for processing (1)
  • tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (14 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Code must target Python 3.8+
Indent Python code with 4 spaces; do not use tabs
Preserve module namespaces when importing; import modules/packages and access members via the module (e.g., from package.subpackage import foo; foo.SomeClass())
Python file names should be snake_case
Python class names should be PascalCase
Python functions/methods and local variables should be snake_case; variables beginning with a number should be prefixed with k_ (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE prefixed with G_ (e.g., G_MY_GLOBAL); constants should be UPPER_SNAKE_CASE
Avoid shadowing variables from outer scopes; initialize all externally visible members in init
Prefer docstrings for interfaces used outside a file; comments should be reserved for in-function or file-local interfaces
Use Google-style docstrings for classes and functions; attributes and variables may be documented inline with trailing string literals
Avoid reflection when simpler, explicit code suffices (e.g., avoid dict(**locals()) patterns)
In try/except, catch the narrowest exceptions possible
For duck-typing patterns, keep the try body minimal and move logic to else to avoid masking unrelated failures

Files:

  • tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
**/*.{c,cc,cpp,cxx,h,hh,hpp,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA copyright header (current year) to all source files (.cpp, .h, .cu, .py, etc.)

Files:

  • tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
🧬 Code graph analysis (1)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (1)
tests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.py (1)
  • _hf_model_dir_or_hub_id (266-273)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (5)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (5)

304-317: Dataset tokenizer now uses resolved path-or-name: LGTM.

Passing model_path_or_name improves robustness offline. No further action.


672-677: Centralized TinyLlama path/name resolution: LGTM.

Good consolidation; reduces duplication and brittleness.


803-817: Smoke test uses new helpers correctly, but depends on the CLI fix above.

Once the CLI construction is corrected (subcommand ordering and model arg), this should pass.


1-1: Add NVIDIA copyright header (2025).

All source files must include the NVIDIA header.

Apply at file top:

+# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
⛔ Skipped due to learnings
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

124-132: All run_benchmark and run_multiple_benchmarks_with_outlier_removal call sites have been updated – no stale invocations detected.
No further changes are required.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16743 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16743 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #12566 completed with status: 'FAILURE'

@nzmora-nvidia nzmora-nvidia force-pushed the user/nzmora/nvbug-5474453 branch from 4020f72 to 5de41ff Compare August 28, 2025 07:21
@nzmora-nvidia
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16814 [ run ] triggered by Bot

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (2)

139-155: bench CLI likely doesn’t support --model_path; use --model with local path and sys.executable.

This flag caused failures previously; pass the local directory via --model instead. Also prefer sys.executable for venv correctness.

-    cmd = ["python", "-m", "tensorrt_llm.commands.bench", "--model", model_name]
-
-    # If the model exists locally, then using the local copy will make the test robust to CI network issues
-    if os.path.isdir(model_path):
-        cmd.extend(["--model_path", model_path])
-
-    cmd.extend(
+    local_model = model_path if os.path.isdir(model_path) else model_name
+    cmd = [sys.executable, "-m", "tensorrt_llm.commands.bench", "--model", local_model]
+    cmd.extend(
         [
             "throughput",
             "--backend",
             backend,
             "--dataset",
             str(dataset_path),
             "--max_batch_size",
             str(max_batch_size),
         ]
     )

Add missing import:

+import sys

139-155: Remove unsupported --model_path flag from test
CLI tensorrt_llm.commands.bench throughput does not support --model_path, so drop the conditional extension
in tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (139–155).

🧹 Nitpick comments (2)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (2)

73-78: Prefer Path joins for portability in tiny_llama_details.

Avoid manual string concatenation; keep path ops OS-agnostic.

-def tiny_llama_details():
-    model_path = f"{llm_models_root()}/llama-models-v2/TinyLlama-1.1B-Chat-v1.0"
-    model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
-    model_path_or_name = _hf_model_dir_or_hub_id(model_path, model_name)
-    return model_path_or_name, model_name, model_path
+def tiny_llama_details():
+    local_dir = Path(llm_models_root()) / "llama-models-v2" / "TinyLlama-1.1B-Chat-v1.0"
+    model_path = str(local_dir)
+    model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
+    model_path_or_name = _hf_model_dir_or_hub_id(model_path, model_name)
+    return model_path_or_name, model_name, model_path

101-105: KV cache size parsing: add bytes-format fallback to prevent missing metric.

If logs already output bytes (no “(MB)”), the current logic never sets current_cache_size; add a robust fallback.

-    try:
-        metrics["current_cache_size"] = metrics["current_cache_size"] * 1024 * 1024
-    except KeyError:
-        print("  ❌ Could not find current_cache_size")
+    # Normalize current_cache_size to bytes. Prefer MB key; fallback to bytes-only line.
+    if "current_cache_size" in metrics:
+        metrics["current_cache_size"] *= 1024 * 1024
+    else:
+        m = re.search(r"Current cache size:\s*(\d+)", log_output, re.IGNORECASE)
+        if m:
+            metrics["current_cache_size"] = int(m.group(1))
+            print("  ✅ Found current_cache_size (bytes).")
+        else:
+            print("  ❌ Could not find current_cache_size")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4020f72 and 5de41ff.

📒 Files selected for processing (1)
  • tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (14 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
🧬 Code graph analysis (1)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (1)
tests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.py (1)
  • _hf_model_dir_or_hub_id (266-273)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (5)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (5)

304-317: Tokenizer source generalized to local-or-hub.

Passing model_path_or_name to prepare_dataset improves robustness when local cache exists. Looks good.


527-536: Propagating model_path through the averaging runner is consistent.

Matches the updated run_benchmark signature and keeps behavior identical otherwise.

Also applies to: 558-566


672-673: Centralized TinyLlama resolution is cleaner and reduces duplication.

Using tiny_llama_details() in both the unified comparison and the smoke test is a good consolidation.

Also applies to: 687-694, 717-718


803-816: Smoke test updated to new API and centralized paths.

Invocation aligns with the new run_benchmark signature and dataset prep.


122-132: All run_benchmark call sites updated Verified that every invocation now supplies the new model_path parameter.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16814 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #12622 completed with status: 'FAILURE'

@QiJune
Copy link
Collaborator

QiJune commented Aug 28, 2025

/bot skip --comment "test_ad_trtllm_bench passed"

@QiJune QiJune enabled auto-merge (squash) August 28, 2025 11:40
@tensorrt-cicd
Copy link
Collaborator

PR_Github #16854 [ skip ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16854 [ skip ] completed with state SUCCESS
Skipping testing for commit 5de41ff

@QiJune QiJune merged commit 08f9356 into NVIDIA:main Aug 28, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants