-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[None][chore] Print device info in trtllm-bench report #8584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[None][chore] Print device info in trtllm-bench report #8584
Conversation
📝 WalkthroughWalkthroughAdd optional GPU querying to reporting: new Changes
Sequence Diagram(s)sequenceDiagram
participant Reporter as report_statistics
participant Stats as get_statistics_dict
participant Query as _query_gpu_info
participant Env as CUDA_VISIBLE_DEVICES / os
participant Torch as torch (optional)
participant NVML as pynvml (optional)
Reporter->>Stats: request statistics
Stats->>Query: query machine/gpu info
Query->>Env: read CUDA_VISIBLE_DEVICES (select GPU index)
alt torch available
Query->>Torch: query device count / properties
else
Query-->>Query: skip torch checks
end
alt pynvml available
Query->>NVML: init & read memory clocks
NVML-->>Query: memory clock value
else
Query-->>Query: no NVML, skip clocks
end
Query-->>Stats: return machine_info (or None)
Stats-->>Reporter: include machine_info in statistics dict
Reporter-->>Reporter: format and append machine block to log output
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tensorrt_llm/bench/dataclasses/reporting.py (1)
1-22: Add the required NVIDIA Apache-2.0 copyright header with current year (2025) to the top of the file.Per coding guidelines, all Python source files must include the NVIDIA Apache-2.0 copyright header with the current year at the top. The file currently starts directly with imports, missing this required header.
Add the following header as the first lines of the file:
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tensorrt_llm/bench/dataclasses/reporting.py(9 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Use only spaces, no tabs; indent with 4 spaces.
Files:
tensorrt_llm/bench/dataclasses/reporting.py
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.
Files:
tensorrt_llm/bench/dataclasses/reporting.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).
Files:
tensorrt_llm/bench/dataclasses/reporting.py
🧬 Code graph analysis (1)
tensorrt_llm/bench/dataclasses/reporting.py (2)
tensorrt_llm/profiler.py (1)
PyNVMLContext(113-120)tensorrt_llm/bench/dataclasses/configuration.py (1)
world_size(194-195)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (3)
tensorrt_llm/bench/dataclasses/reporting.py (3)
206-263: LGTM! Robust error handling for GPU querying.The implementation correctly:
- Handles missing pynvml gracefully
- Uses PyNVMLContext for proper NVML lifecycle management
- Implements per-GPU error handling to avoid failing the entire query if one GPU is inaccessible
- Handles byte string decoding for GPU names
- Returns structured data with clear error notes
753-761: LGTM! Machine info properly integrated into the report.The machine information is correctly added to the logging output in the appropriate position.
65-65: LGTM! Documentation improvements.The updated comments and docstrings improve clarity.
Also applies to: 123-124
|
Hi @galagam , Using the torch.cuda API will simplify the code |
d901997 to
c721f03
Compare
d0743c7 to
bb3e851
Compare
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
1 similar comment
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
tensorrt_llm/bench/dataclasses/reporting.py (3)
209-236: Consider adding debug logging for GPU query failures.The method silently returns
Noneon exceptions, which may complicate debugging when GPU information is unavailable. While the broad exception handling is acceptable for non-critical functionality, adding a debug-level log message would help operators understand why machine details are missing.Apply this diff to add logging:
+from tensorrt_llm.logger import logger + @staticmethod def _query_gpu_info() -> Dict[str, Any]: """Query first GPU info (all GPUs must be identical for TRT-LLM).""" if not torch.cuda.is_available(): + logger.debug("CUDA not available, skipping GPU info query") return None try: cuda_visible = os.environ.get("CUDA_VISIBLE_DEVICES", "").strip() physical_idx = int( cuda_visible.split(",")[0].strip()) if cuda_visible else 0 props = torch.cuda.get_device_properties(physical_idx) gpu_info = { "name": getattr(props, "name", "Unknown"), "memory.total": float(getattr(props, "total_memory", 0.0)) / (1024.0**3), "clocks.mem": None, } if pynvml: # Memory clock information is not reported by torch, using NVML instead handle = pynvml.nvmlDeviceGetHandleByIndex(physical_idx) gpu_info["clocks.mem"] = pynvml.nvmlDeviceGetMaxClockInfo( handle, pynvml.NVML_CLOCK_MEM) / 1000.0 return gpu_info except (RuntimeError, AssertionError): + logger.debug("Failed to query GPU info", exc_info=True) return None
229-236: Optional: Restructure exception handling per static analysis hint.The static analysis tool suggests moving the success-path
returnstatement into anelseblock to make the control flow more explicit. This is a stylistic preference that can improve code clarity.Apply this diff:
if pynvml: # Memory clock information is not reported by torch, using NVML instead handle = pynvml.nvmlDeviceGetHandleByIndex(physical_idx) gpu_info["clocks.mem"] = pynvml.nvmlDeviceGetMaxClockInfo( handle, pynvml.NVML_CLOCK_MEM) / 1000.0 - return gpu_info - except (RuntimeError, AssertionError): - return None + except (RuntimeError, AssertionError): + return None + else: + return gpu_info
211-211: Consider adding validation or enhancing documentation for the GPU homogeneity assumption.The docstring documents that "all GPUs must be identical for TRT-LLM," but the code only queries the first GPU without validating that other GPUs match. For heterogeneous setups, consider either:
- Enhancing the docstring to explicitly warn this is a hard requirement (with implications if violated), or
- Adding a lightweight check that validates all visible GPUs match and logs a warning if they don't
No GPU homogeneity validation was found elsewhere in the codebase, so this reporting method is the only place documenting the assumption to users.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tensorrt_llm/bench/dataclasses/reporting.py(6 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Use only spaces, no tabs; indent with 4 spaces.
Files:
tensorrt_llm/bench/dataclasses/reporting.py
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.
Files:
tensorrt_llm/bench/dataclasses/reporting.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).
Files:
tensorrt_llm/bench/dataclasses/reporting.py
🧠 Learnings (6)
📚 Learning: 2025-09-23T14:58:05.372Z
Learnt from: nv-lschneider
Repo: NVIDIA/TensorRT-LLM PR: 7910
File: cpp/tensorrt_llm/kernels/nccl_device/config.cu:42-49
Timestamp: 2025-09-23T14:58:05.372Z
Learning: In TensorRT-LLM NCCL device kernels (cpp/tensorrt_llm/kernels/nccl_device/), the token partitioning intentionally uses ceil-like distribution (same token_per_rank for all ranks) to ensure all ranks launch the same number of blocks. This is required for optimal NCCL device API barrier performance, even though it may launch extra blocks for non-existent tokens on later ranks. Runtime bounds checking in the kernel (blockID validation) handles the overshoot cases.
Applied to files:
tensorrt_llm/bench/dataclasses/reporting.py
📚 Learning: 2025-10-13T19:45:03.518Z
Learnt from: nv-lschneider
Repo: NVIDIA/TensorRT-LLM PR: 7910
File: tests/unittest/_torch/multi_gpu/test_nccl_device.py:138-149
Timestamp: 2025-10-13T19:45:03.518Z
Learning: In test_nccl_device.py, the NCCL device AllReduce implementation compares the entire residual tensor on each rank, unlike the UB implementation which compares per-rank chunks. The residual chunking calculations in the test are intentionally overridden to reflect this design difference.
Applied to files:
tensorrt_llm/bench/dataclasses/reporting.py
📚 Learning: 2025-08-28T10:22:02.288Z
Learnt from: ixlmar
Repo: NVIDIA/TensorRT-LLM PR: 7294
File: tensorrt_llm/_torch/pyexecutor/sampler.py:1191-1197
Timestamp: 2025-08-28T10:22:02.288Z
Learning: In tensorrt_llm/_torch/pyexecutor/sampler.py, the object identity comparison `softmax_req_indices is not group_req_indices_cuda` on line ~1191 is intentional and used as an optimization to determine whether to reuse an existing indexer or create a new one, based on which code path was taken during tensor assignment.
Applied to files:
tensorrt_llm/bench/dataclasses/reporting.py
📚 Learning: 2025-07-17T09:01:27.402Z
Learnt from: amitz-nv
Repo: NVIDIA/TensorRT-LLM PR: 5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
Applied to files:
tensorrt_llm/bench/dataclasses/reporting.py
📚 Learning: 2025-08-26T09:49:04.956Z
Learnt from: pengbowang-nv
Repo: NVIDIA/TensorRT-LLM PR: 7192
File: tests/integration/test_lists/test-db/l0_dgx_b200.yml:56-72
Timestamp: 2025-08-26T09:49:04.956Z
Learning: In TensorRT-LLM test configuration files, the test scheduling system handles wildcard matching with special rules that prevent duplicate test execution even when the same tests appear in multiple yaml files with overlapping GPU wildcards (e.g., "*b200*" and "*gb200*").
Applied to files:
tensorrt_llm/bench/dataclasses/reporting.py
📚 Learning: 2025-08-26T09:37:10.463Z
Learnt from: jiaganc
Repo: NVIDIA/TensorRT-LLM PR: 7031
File: tensorrt_llm/bench/dataclasses/configuration.py:90-104
Timestamp: 2025-08-26T09:37:10.463Z
Learning: In TensorRT-LLM's bench configuration, the `get_pytorch_perf_config()` method returns `self.pytorch_config` which is a Dict[str, Any] that can contain default values including `cuda_graph_config`, making the fallback `llm_args["cuda_graph_config"]` safe to use.
Applied to files:
tensorrt_llm/bench/dataclasses/reporting.py
🪛 Ruff (0.14.2)
tensorrt_llm/bench/dataclasses/reporting.py
234-234: Consider moving this statement to an else block
(TRY300)
🔇 Additional comments (3)
tensorrt_llm/bench/dataclasses/reporting.py (3)
4-4: LGTM: Import additions are well-structured.The unconditional
torchimport is appropriate since the method checkstorch.cuda.is_available()at runtime. The optionalpynvmlimport with graceful fallback is a good pattern for non-critical functionality.Also applies to: 8-8, 10-13
313-314: LGTM: Clean integration of machine details.The machine details are properly integrated into the statistics dictionary with a clear comment explaining the single-GPU query approach.
570-582: LGTM: Proper handling of None values in formatting.The conditional formatting correctly handles
Nonevalues for memory and clock fields, avoiding theTypeErrorthat would occur if.2fwere applied to a string. This addresses the previous review feedback.
|
@FrankD412 can you please review? |
bb3e851 to
ac45a24
Compare
Signed-off-by: Gal Hubara Agam <[email protected]>
Signed-off-by: Gal Hubara Agam <[email protected]>
Signed-off-by: Gal Hubara Agam <[email protected]>
ac45a24 to
dcfaf04
Compare
|
/bot run |
|
PR_Github #24864 [ run ] triggered by Bot. Commit: |
|
PR_Github #24864 [ run ] completed with state |
|
/bot run |
|
PR_Github #24907 [ run ] triggered by Bot. Commit: |
|
PR_Github #24907 [ run ] completed with state |
Summary by CodeRabbit
Description
Add printing of devices' information in trtllm-bench report.
Prints GPU name, memory size (GB), memory clock (GHz).
Assumes homogeneous only setup, queries the first device only.
e.g.