-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[https://nvbugs/5474453][fix] fix path to tested model #7272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughCentralized TinyLlama path/name resolution was added and tests updated to prefer local model directories. Changes
Sequence Diagram(s)sequenceDiagram
participant Test as Unit Test
participant Helper as tiny_llama_details()
participant Runner as run_benchmark()
participant Bench as trtllm-bench
Test->>Helper: request model_path_or_name, model_name, model_path
Helper-->>Test: return details
Test->>Runner: call(model_name, model_path, dataset_path, temp_dir, ...)
alt model_path is local directory
Runner->>Bench: python -m tensorrt_llm.commands.bench --model <model_name> --model_path <model_path> --dataset <dataset_path> ...
else remote hub id used
Runner->>Bench: python -m tensorrt_llm.commands.bench --model <model_name> --dataset <dataset_path> ...
end
Bench-->>Runner: benchmark output (stdout/json)
Runner-->>Test: parsed results (including KV-cache converted to bytes)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (1)
1-6: Add the NVIDIA copyright header and import sys.Compliance: all .py files should carry the current-year NVIDIA header. Also, importing sys here will enable using sys.executable in subprocess calls below.
Apply this diff at the file top:
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. + import json import re import subprocess import tempfile +import sys from pathlib import Path
🧹 Nitpick comments (3)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (3)
476-479: DRY up model path/hub-id and prefer Path joins over hard-coded slashes.Avoid duplicate literals and OS-specific separators by centralizing the TinyLlama path and hub ID.
Apply this minimal diff at both call sites:
- f"{llm_models_root()}/llama-models-v2/TinyLlama-1.1B-Chat-v1.0", "TinyLlama/TinyLlama-1.1B-Chat-v1.0" + TINYLLAMA_LOCAL_DIR, TINYLLAMA_HUB_IDAdd these constants near the imports (outside the selected ranges):
# TinyLlama test model references TINYLLAMA_LOCAL_DIR = str(Path(llm_models_root(), "llama-models-v2", "TinyLlama-1.1B-Chat-v1.0")) TINYLLAMA_HUB_ID = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"Also applies to: 585-587
66-70: Use sys.executable to ensure the same Python interpreter is used in subprocesses.Invoking "python" vs "python3" can diverge from the active interpreter and virtualenv. Use sys.executable for reproducibility in CI.
Apply these diffs:
- Bench command launcher:
- cmd = [ - "python", + cmd = [ + sys.executable, "-m", "tensorrt_llm.commands.bench",
- Dataset preparation script:
- command = [ - "python3", + command = [ + sys.executable, f"{dataset_tool}", "--stdout",Note: This assumes the earlier import of sys at the file top.
Also applies to: 239-255
404-414:require_metrics=Falsestill fails the test — behavior contradicts the docstring.The function says “If False, just warn,” but both branches still assert. This turns a soft check into a hard failure and can create unnecessary flakiness when metrics aren’t produced.
Apply this diff to make the parameter effective:
- if require_metrics: - assert False, f"REQUIRED {message}" - else: - print(f"ℹ️ {message}") - assert False, "KV cache metrics are missing" + if require_metrics: + assert False, f"REQUIRED {message}" + else: + print(f"ℹ️ {message}") + return None, None @@ - if require_metrics: - assert False, message - else: - print(f"ℹ️ KV cache validation skipped - {message}") - assert False, "KV cache metrics are missing" + if require_metrics: + assert False, message + else: + print(f"ℹ️ KV cache validation skipped - {message}") + return None, NoneCallers that set require_metrics=False should handle a (None, None) return.
Also applies to: 424-429
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Code must target Python 3.8+
Indent Python code with 4 spaces; do not use tabs
Preserve module namespaces when importing; import modules/packages and access members via the module (e.g., from package.subpackage import foo; foo.SomeClass())
Python file names should be snake_case
Python class names should be PascalCase
Python functions/methods and local variables should be snake_case; variables beginning with a number should be prefixed with k_ (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE prefixed with G_ (e.g., G_MY_GLOBAL); constants should be UPPER_SNAKE_CASE
Avoid shadowing variables from outer scopes; initialize all externally visible members in init
Prefer docstrings for interfaces used outside a file; comments should be reserved for in-function or file-local interfaces
Use Google-style docstrings for classes and functions; attributes and variables may be documented inline with trailing string literals
Avoid reflection when simpler, explicit code suffices (e.g., avoid dict(**locals()) patterns)
In try/except, catch the narrowest exceptions possible
For duck-typing patterns, keep the try body minimal and move logic to else to avoid masking unrelated failures
Files:
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
**/*.{c,cc,cpp,cxx,h,hh,hpp,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend the NVIDIA copyright header (current year) to all source files (.cpp, .h, .cu, .py, etc.)
Files:
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (3)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (3)
476-479: Path fix to TinyLlama local dir looks correct and consistent.Both the local path prefix and the unchanged HF hub ID align with the intent of NVBugs 5474453. No logic or API impact.
585-587: Second call site updated consistently.The same local dir change is applied here; hub ID alias remains stable. Good consistency across tests.
476-479: No remaining direct references to the old TinyLlama local pathI ran the recommended grep commands and confirmed:
- No occurrences of the legacy
llm_models_root()/TinyLlama-1.1B-Chat-v1.0path remain in the codebase.- All remaining matches for
TinyLlama-1.1B-Chat-v1.0are part of the HF Hub identifier string ("TinyLlama/TinyLlama-1.1B-Chat-v1.0") and should not be modified.This sweep shows the old local‐directory references have been fully updated.
cedf44c to
f148008
Compare
39d1eb6 to
4a1e7f4
Compare
|
/bot run |
|
PR_Github #16651 [ run ] triggered by Bot |
|
PR_Github #16651 [ run ] completed with state |
5c7e625 to
cb6f998
Compare
|
/bot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (2)
15-19: Centralize path building with Path and document return order.Using Path ensures portability; add a short docstring to avoid tuple-order mistakes.
Apply this diff:
-def tiny_llama_details(): - model_path = f"{llm_models_root()}/llama-models-v2/TinyLlama-1.1B-Chat-v1.0" - model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0" - model_path_or_name = _hf_model_dir_or_hub_id(model_path, model_name) - return model_path_or_name, model_name, model_path +def tiny_llama_details(): + """Return (model_path_or_name, model_name, model_path) for TinyLlama.""" + model_path = str(Path(llm_models_root()) / "llama-models-v2" / "TinyLlama-1.1B-Chat-v1.0") + model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0" + model_path_or_name = _hf_model_dir_or_hub_id(model_path, model_name) + return model_path_or_name, model_name, model_path
242-255: Use the active interpreter for dataset generation for portability.Replace hardcoded "python3" with sys.executable to avoid env mismatches.
Apply this diff:
- command = [ - "python3", + command = [ + sys.executable, f"{dataset_tool}", "--stdout", "--tokenizer", model_path_or_name,Also add at the top of the file (outside this hunk):
import sys
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py(11 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Code must target Python 3.8+
Indent Python code with 4 spaces; do not use tabs
Preserve module namespaces when importing; import modules/packages and access members via the module (e.g., from package.subpackage import foo; foo.SomeClass())
Python file names should be snake_case
Python class names should be PascalCase
Python functions/methods and local variables should be snake_case; variables beginning with a number should be prefixed with k_ (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE prefixed with G_ (e.g., G_MY_GLOBAL); constants should be UPPER_SNAKE_CASE
Avoid shadowing variables from outer scopes; initialize all externally visible members in init
Prefer docstrings for interfaces used outside a file; comments should be reserved for in-function or file-local interfaces
Use Google-style docstrings for classes and functions; attributes and variables may be documented inline with trailing string literals
Avoid reflection when simpler, explicit code suffices (e.g., avoid dict(**locals()) patterns)
In try/except, catch the narrowest exceptions possible
For duck-typing patterns, keep the try body minimal and move logic to else to avoid masking unrelated failures
Files:
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
**/*.{c,cc,cpp,cxx,h,hh,hpp,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend the NVIDIA copyright header (current year) to all source files (.cpp, .h, .cu, .py, etc.)
Files:
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
🧬 Code graph analysis (1)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (1)
tests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.py (1)
_hf_model_dir_or_hub_id(266-273)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (10)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (10)
2-2: Top-level import of os is appropriate.
91-94: Good: prefer local model copy when available.Passing --model_path conditionally makes tests resilient to hub/network hiccups.
487-489: LGTM: centralized TinyLlama resolution is correctly consumed.
502-503: LGTM: dataset now keyed by model_path_or_name to support local or hub tokenizers.
507-517: LGTM: autodeploy run passes model_path and captures report.
531-541: LGTM: PyTorch run mirrors autodeploy invocation and includes model_path.
595-597: LGTM: reuse helper for TinyLlama details in simple smoke test.
608-610: LGTM: call sites updated to new signatures.
1-1: Add mandatory NVIDIA copyright header.Per project guidelines, prepend the current-year NVIDIA header to all .py files.
Apply this diff:
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. + import json⛔ Skipped due to learnings
Learnt from: galagam PR: NVIDIA/TensorRT-LLM#6487 File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12 Timestamp: 2025-08-06T13:58:07.506Z Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
59-70: Docstring updated and run_benchmark call sites verified – no further changes required
The docstring now fully documents the newmodel_pathparameter. A quick search confirmed allrun_benchmarkcalls (e.g., intests/scripts/perf-sanity/run_benchmark_serve.py,tests/unittest/_torch/auto_deploy/unit/singlegpu/utils/test_benchmark_mlp.py, andtests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py) match the updated signature and require no modifications.
|
PR_Github #16701 [ run ] triggered by Bot |
|
/bot kill |
cb6f998 to
01a6584
Compare
|
/bot run |
|
PR_Github #16705 [ kill ] triggered by Bot |
|
PR_Github #16701 [ run ] completed with state |
|
PR_Github #16705 [ kill ] completed with state |
|
PR_Github #16707 [ run ] triggered by Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (2)
15-20: Centralized TinyLlama path resolution looks good.Nice consolidation; reduces duplication and favors local cache when available. Minor: consider a small NamedTuple to self-document the triple.
Example:
from typing import NamedTuple class TinyLlamaDetails(NamedTuple): model_path_or_name: str model_name: str model_path: str
59-69: Make model_path optional to preserve call-site flexibility.New param is fine, but a default plus a guard avoids churn if reused elsewhere.
Apply this diff:
def run_benchmark( model_name: str, - model_path: str, + model_path: str = "", dataset_path: str, temp_dir: str,And later:
- if os.path.isdir(model_path): + if model_path and os.path.isdir(model_path): cmd.extend(["--model_path", model_path])Optional: use sys.executable consistently for subprocess invocations to avoid environment ambiguity (python vs python3).
Outside-range snippet:
import sys # ... cmd = [sys.executable, "-m", "tensorrt_llm.commands.bench", "--model", model_name] # ... command = [sys.executable, f"{dataset_tool}", "--stdout", "--tokenizer", model_path_or_name, ...]
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py(11 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Code must target Python 3.8+
Indent Python code with 4 spaces; do not use tabs
Preserve module namespaces when importing; import modules/packages and access members via the module (e.g., from package.subpackage import foo; foo.SomeClass())
Python file names should be snake_case
Python class names should be PascalCase
Python functions/methods and local variables should be snake_case; variables beginning with a number should be prefixed with k_ (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE prefixed with G_ (e.g., G_MY_GLOBAL); constants should be UPPER_SNAKE_CASE
Avoid shadowing variables from outer scopes; initialize all externally visible members in init
Prefer docstrings for interfaces used outside a file; comments should be reserved for in-function or file-local interfaces
Use Google-style docstrings for classes and functions; attributes and variables may be documented inline with trailing string literals
Avoid reflection when simpler, explicit code suffices (e.g., avoid dict(**locals()) patterns)
In try/except, catch the narrowest exceptions possible
For duck-typing patterns, keep the try body minimal and move logic to else to avoid masking unrelated failures
Files:
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
**/*.{c,cc,cpp,cxx,h,hh,hpp,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend the NVIDIA copyright header (current year) to all source files (.cpp, .h, .cu, .py, etc.)
Files:
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
🧬 Code graph analysis (1)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (1)
tests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.py (1)
_hf_model_dir_or_hub_id(266-273)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (6)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (6)
2-2: Good move: import at module scope.Bringing os to module scope avoids repeat imports and supports the new isdir checks.
245-258: prepare_dataset now uses model_path_or_name for tokenizer — good.This matches the _hf_model_dir_or_hub_id behavior and improves CI resilience.
490-490: LGTM: single source of truth for model identifiers in test.
599-599: LGTM: tests consume tiny_llama_details() consistently.
611-612: LGTM: call sites updated to new signatures.Dataset uses model_path_or_name; benchmark receives model_path for local override.
1-3: Missing NVIDIA copyright header (2025).Per repo guidelines, prepend the NVIDIA copyright header at the top of all source files.
Apply this diff:
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. + import json import os import re⛔ Skipped due to learnings
Learnt from: galagam PR: NVIDIA/TensorRT-LLM#6487 File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12 Timestamp: 2025-08-06T13:58:07.506Z Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
Outdated
Show resolved
Hide resolved
|
PR_Github #16707 [ run ] completed with state |
01a6584 to
4020f72
Compare
|
/bot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (1)
139-155: Fix CLI invocation: place subcommand before options and avoid unsupported --model_path.Click/Typer subcommand options must follow the subcommand. Also,
trtllm-benchdoesn’t support--model_path(see prior comment), and--modelalready accepts a local directory. Prefer passing the resolved local path via--model.Apply this diff:
- cmd = ["python", "-m", "tensorrt_llm.commands.bench", "--model", model_name] - - # If the model exists locally, then using the local copy will make the test robust to CI network issues - if os.path.isdir(model_path): - cmd.extend(["--model_path", model_path]) - - cmd.extend( - [ - "throughput", - "--backend", - backend, - "--dataset", - str(dataset_path), - "--max_batch_size", - str(max_batch_size), - ] - ) + # Prefer local directory if it exists; otherwise use hub id + model_identifier = model_path if os.path.isdir(model_path) else model_name + cmd = [ + sys.executable, "-m", "tensorrt_llm.commands.bench", + "throughput", + "--model", model_identifier, + "--backend", backend, + "--dataset", str(dataset_path), + "--max_batch_size", str(max_batch_size), + ]Add missing import (outside this hunk):
import sys
🧹 Nitpick comments (1)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (1)
86-89: Harden KV-cache parsing: accept decimal MB and MiB; convert robustly.Logs may emit fractional MB and “MiB”. Parse floats, normalize units, and store bytes for cache size.
Apply this diff:
- patterns = { - "current_cache_size": r"Current cache size \(MB\):\s*(\d+)", - "free_mem_pre_mb": r"Free memory before forward pass \(MB\):\s*(\d+)", - "free_mem_post_mb": r"Free memory after forward pass \(MB\):\s*(\d+)", - } + patterns = { + "current_cache_size": r"Current cache size \((?:MB|MiB)\):\s*([\d.]+)", + "free_mem_pre_mb": r"Free memory before forward pass \(MB\):\s*([\d.]+)", + "free_mem_post_mb": r"Free memory after forward pass \(MB\):\s*([\d.]+)", + } @@ - if match: - value = int(match.group(1)) - metrics[metric_name] = value + if match: + raw = match.group(1) + if metric_name == "current_cache_size": + metrics[metric_name] = float(raw) # MB for now; convert to bytes below + else: + metrics[metric_name] = int(float(raw)) # keep MB as int @@ - try: - metrics["current_cache_size"] = metrics["current_cache_size"] * 1024 * 1024 + try: + metrics["current_cache_size"] = int(metrics["current_cache_size"] * 1024 * 1024) except KeyError: print(" ❌ Could not find current_cache_size")Also applies to: 101-105, 92-98
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py(14 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Code must target Python 3.8+
Indent Python code with 4 spaces; do not use tabs
Preserve module namespaces when importing; import modules/packages and access members via the module (e.g., from package.subpackage import foo; foo.SomeClass())
Python file names should be snake_case
Python class names should be PascalCase
Python functions/methods and local variables should be snake_case; variables beginning with a number should be prefixed with k_ (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE prefixed with G_ (e.g., G_MY_GLOBAL); constants should be UPPER_SNAKE_CASE
Avoid shadowing variables from outer scopes; initialize all externally visible members in init
Prefer docstrings for interfaces used outside a file; comments should be reserved for in-function or file-local interfaces
Use Google-style docstrings for classes and functions; attributes and variables may be documented inline with trailing string literals
Avoid reflection when simpler, explicit code suffices (e.g., avoid dict(**locals()) patterns)
In try/except, catch the narrowest exceptions possible
For duck-typing patterns, keep the try body minimal and move logic to else to avoid masking unrelated failures
Files:
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
**/*.{c,cc,cpp,cxx,h,hh,hpp,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend the NVIDIA copyright header (current year) to all source files (.cpp, .h, .cu, .py, etc.)
Files:
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
🧬 Code graph analysis (1)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (1)
tests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.py (1)
_hf_model_dir_or_hub_id(266-273)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (5)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (5)
304-317: Dataset tokenizer now uses resolved path-or-name: LGTM.Passing
model_path_or_nameimproves robustness offline. No further action.
672-677: Centralized TinyLlama path/name resolution: LGTM.Good consolidation; reduces duplication and brittleness.
803-817: Smoke test uses new helpers correctly, but depends on the CLI fix above.Once the CLI construction is corrected (subcommand ordering and model arg), this should pass.
1-1: Add NVIDIA copyright header (2025).All source files must include the NVIDIA header.
Apply at file top:
+# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.⛔ Skipped due to learnings
Learnt from: galagam PR: NVIDIA/TensorRT-LLM#6487 File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12 Timestamp: 2025-08-06T13:58:07.506Z Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
124-132: Allrun_benchmarkandrun_multiple_benchmarks_with_outlier_removalcall sites have been updated – no stale invocations detected.
No further changes are required.
|
PR_Github #16743 [ run ] triggered by Bot |
|
PR_Github #16743 [ run ] completed with state |
Signed-off-by: Neta Zmora <[email protected]>
Signed-off-by: Neta Zmora <[email protected]>
Signed-off-by: Neta Zmora <[email protected]>
Signed-off-by: Neta Zmora <[email protected]>
Signed-off-by: Neta Zmora <[email protected]>
Signed-off-by: Neta Zmora <[email protected]>
4020f72 to
5de41ff
Compare
|
/bot run |
|
PR_Github #16814 [ run ] triggered by Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (2)
139-155: bench CLI likely doesn’t support --model_path; use --model with local path and sys.executable.This flag caused failures previously; pass the local directory via --model instead. Also prefer sys.executable for venv correctness.
- cmd = ["python", "-m", "tensorrt_llm.commands.bench", "--model", model_name] - - # If the model exists locally, then using the local copy will make the test robust to CI network issues - if os.path.isdir(model_path): - cmd.extend(["--model_path", model_path]) - - cmd.extend( + local_model = model_path if os.path.isdir(model_path) else model_name + cmd = [sys.executable, "-m", "tensorrt_llm.commands.bench", "--model", local_model] + cmd.extend( [ "throughput", "--backend", backend, "--dataset", str(dataset_path), "--max_batch_size", str(max_batch_size), ] )Add missing import:
+import sys
139-155: Remove unsupported --model_path flag from test
CLI tensorrt_llm.commands.bench throughput does not support --model_path, so drop the conditional extension
in tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (139–155).
🧹 Nitpick comments (2)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (2)
73-78: Prefer Path joins for portability in tiny_llama_details.Avoid manual string concatenation; keep path ops OS-agnostic.
-def tiny_llama_details(): - model_path = f"{llm_models_root()}/llama-models-v2/TinyLlama-1.1B-Chat-v1.0" - model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0" - model_path_or_name = _hf_model_dir_or_hub_id(model_path, model_name) - return model_path_or_name, model_name, model_path +def tiny_llama_details(): + local_dir = Path(llm_models_root()) / "llama-models-v2" / "TinyLlama-1.1B-Chat-v1.0" + model_path = str(local_dir) + model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0" + model_path_or_name = _hf_model_dir_or_hub_id(model_path, model_name) + return model_path_or_name, model_name, model_path
101-105: KV cache size parsing: add bytes-format fallback to prevent missing metric.If logs already output bytes (no “(MB)”), the current logic never sets current_cache_size; add a robust fallback.
- try: - metrics["current_cache_size"] = metrics["current_cache_size"] * 1024 * 1024 - except KeyError: - print(" ❌ Could not find current_cache_size") + # Normalize current_cache_size to bytes. Prefer MB key; fallback to bytes-only line. + if "current_cache_size" in metrics: + metrics["current_cache_size"] *= 1024 * 1024 + else: + m = re.search(r"Current cache size:\s*(\d+)", log_output, re.IGNORECASE) + if m: + metrics["current_cache_size"] = int(m.group(1)) + print(" ✅ Found current_cache_size (bytes).") + else: + print(" ❌ Could not find current_cache_size")
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py(14 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py
🧬 Code graph analysis (1)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (1)
tests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.py (1)
_hf_model_dir_or_hub_id(266-273)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (5)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py (5)
304-317: Tokenizer source generalized to local-or-hub.Passing model_path_or_name to prepare_dataset improves robustness when local cache exists. Looks good.
527-536: Propagating model_path through the averaging runner is consistent.Matches the updated run_benchmark signature and keeps behavior identical otherwise.
Also applies to: 558-566
672-673: Centralized TinyLlama resolution is cleaner and reduces duplication.Using tiny_llama_details() in both the unified comparison and the smoke test is a good consolidation.
Also applies to: 687-694, 717-718
803-816: Smoke test updated to new API and centralized paths.Invocation aligns with the new run_benchmark signature and dataset prep.
122-132: All run_benchmark call sites updated Verified that every invocation now supplies the new model_path parameter.
|
PR_Github #16814 [ run ] completed with state |
|
/bot skip --comment "test_ad_trtllm_bench passed" |
|
PR_Github #16854 [ skip ] triggered by Bot |
|
PR_Github #16854 [ skip ] completed with state |
Summary by CodeRabbit
Description
Test Coverage
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-listparameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.mdand the
scripts/test_to_stage_mapping.pyhelper.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.