[None][doc] Add table with one-line deployment commands to quickstart #8173

anish-shanbhag · 2025-10-07T09:18:12Z

Description

This is a first step in making it easier for users to leverage known best LLM API configs for popular models. The PR makes a few main changes:

Currently, the model deployment guides instruct users to write LLM API options to a file and then pass it to trtllm-serve. This change moves all of these configs into a dedicated examples/configs directory which is available automatically in the TRTLLM container.
The deployment guides used a mix of trtllm-serve CLI options and LLM API options for configuration; this change aims to standardize around keeping all options within the config files.
The main change is the addition of a Quick Start for Popular Models table within the Quick Start Guide that contains one-line trtllm-serve commands to deploy popular models including DSR1, gpt-oss, etc.

Subsequent changes will aim to streamline this even further, including:

Improving the configs to account for a broader range of ISL/OSL scenarios; and
Adding new logic to trtllm-serve to automatically leverage these configs when possible

The table looks like this in the rendered docs:

Summary by CodeRabbit

New Features
- Added ready-to-use YAML configs for DeepSeek-R1 (latency/throughput/DeepGEMM), GPT-OSS 120B (latency/throughput), Llama 3.3 70B, Llama4 Scout, and Qwen3 (incl. disaggregated prefill).
- Introduced examples/configs README and a “Quick Start for Popular Models” table in the quick-start guide.
Documentation
- Migrated “Quick Start Recipes” to “Deployment Guides” with YAML-based configuration, simplified launch commands, updated paths/tags, and clarified option names.
- Minor copy edits and terminology fixes.
Chores
- Automated version placeholder updates across all Markdown docs.

Test Coverage

N/A

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

anish-shanbhag · 2025-10-08T22:03:12Z

/bot run

coderabbitai · 2025-10-08T22:19:36Z

📝 Walkthrough

Walkthrough

Documentation reorganized to replace quick-starts with deployment guides, standardize YAML-based configuration references, and simplify launch commands. Added multiple example config YAMLs and updated examples to use them. Implemented a helper to auto-update version placeholders across all Markdown files. Minor textual fixes in a benchmark doc.

Changes

Cohort / File(s)	Summary of changes
Deployment guides restructuring `docs/source/deployment-guide/deployment-guide-for-deepseek-r1-on-trtllm.md`, `.../deployment-guide-for-gpt-oss-on-trtllm.md`, `.../deployment-guide-for-llama3.3-70b-on-trtllm.md`, `.../deployment-guide-for-llama4-scout-on-trtllm.md`	Converted “Quick Start” docs into “Deployment Guide” format, switched to YAML-based LLM API options, updated image tags to placeholder `x.y.z`, simplified launch commands, renamed options (e.g., tp/ep -> tensor_parallel_size/moe_expert_parallel_size), added “Recommended Performance Settings.”
Deployment index updates `docs/source/deployment-guide/index.rst`	Replaced quick-start toctree entries with new deployment-guide references.
Quick start overhaul `docs/source/quick-start-guide.md`	Renamed sections, clarified Docker usage, added “Quick Start for Popular Models” with model table and commands, updated examples to reference prebuilt YAML configs.
Serve benchmark doc tweaks `docs/source/commands/trtllm-serve/run-benchmark-with-trtllm-serve.md`	Minor wording, hyphenation, and capitalization fixes; no functional changes.
Qwen examples: config externalization `examples/models/core/qwen/README.md`	Replaced inline here-doc configs with references to YAML files via `--extra_llm_api_options`; added “Recommended Performance Settings” and environment variable usage.
New model config YAMLs (DeepSeek R1) `examples/configs/deepseek-r1-deepgemm.yaml`, `.../deepseek-r1-latency.yaml`, `.../deepseek-r1-throughput.yaml`	Added backend, batching, KV cache, parallelism, CUDA graph, speculative decoding, and MOE settings for DeepSeek R1 profiles.
New model config YAMLs (GPT-OSS 120B) `examples/configs/gpt-oss-120b-latency.yaml`, `.../gpt-oss-120b-throughput.yaml`	Added latency/throughput profiles with TP/EP, KV cache fraction, CUDA graph, MOE backend, attention DP tuning, streaming and workers settings.
New model config YAMLs (Llama) `examples/configs/llama-3.3-70b.yaml`, `.../llama-4-scout.yaml`	Added configs with backend, batch/token limits, KV cache fp8, TP/EP sizing, CUDA graph padding.
New model config YAMLs (Qwen3) `examples/configs/qwen3.yaml`, `.../qwen3-disagg-prefill.yaml`	Added PyTorch configs with batch/token limits, KV cache fraction, attention DP, logging, CUDA graph batch sizes (qwen3).
Configs README `examples/configs/README.md`	New README describing recommended LLM API configs and how to use them with `trtllm-serve`.
Docs helper enhancement `docs/source/helper.py`	`update_version` now discovers all Markdown files, reads `tensorrt_llm/__version__`, and replaces placeholders across docs via glob iteration; added docstring and Path-based paths.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Dev as Developer
  participant Helper as docs/source/helper.py
  participant FS as Docs Markdown (*.md)
  participant Ver as tensorrt_llm/version.py

  Dev->>Helper: run update_version()
  Helper->>Ver: load __version__
  Ver-->>Helper: return version (e.g., x.y.z)
  Helper->>FS: glob all Markdown files
  loop For each Markdown file
    Helper->>FS: read content
    Helper->>Helper: replace version placeholders/URLs
    Helper->>FS: write updated content
  end
  Helper-->>Dev: completed updates

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title clearly follows the required ticket and type format and succinctly summarizes the primary change of adding a one-line deployment commands table to the Quick Start guide.
Description Check	✅ Passed	The pull request description includes distinct Description, Test Coverage, and PR Checklist sections consistent with the repository template, and it provides clear explanations of the changes, test considerations, and checklist items.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (7)

docs/source/commands/trtllm-serve/run-benchmark-with-trtllm-serve.md (1)
8-8: Align “OpenAI-compatible” capitalization.

Other occurrences in this doc (and surrounding docs) use “OpenAI-compatible” with a lowercase “c”. Keeping this bullet consistent avoids jitter in the rendered text.

Apply this diff:
- * Launch the OpenAI-Compatible Server with NGC container
+ * Launch the OpenAI-compatible server with NGC container
examples/configs/qwen3-disagg-prefill.yaml (1)

1-9: LGTM! Configuration appropriate for disaggregated prefill scenario.

The configuration is valid. Note that this file shares ~90% of settings with qwen3.yaml (same max_batch_size, max_num_tokens, kv_cache settings, etc.), differing mainly in trust_remote_code: true and the absence of explicit CUDA graph batch sizes.

Consider whether these configs could leverage YAML anchors/aliases or a shared base config to reduce duplication while maintaining clarity. However, keeping them separate may be preferable for documentation clarity and ease of use.

examples/configs/llama-4-scout.yaml (1)

1-13: LGTM! Configuration is correct.

This configuration is identical to llama-3.3-70b.yaml. While this duplication may be intentional for model-specific discoverability and ease of use, consider whether a shared base configuration or YAML anchors could reduce maintenance overhead.

If Llama-4 Scout and Llama-3.3-70B share identical deployment characteristics, you could:

Use a shared base config with model-specific overrides, or

Add a comment explaining why the configs are identical despite different model names

However, keeping them separate may be preferable for user experience and documentation clarity.

docs/source/deployment-guide/deployment-guide-for-llama4-scout-on-trtllm.md (1)

89-96: Clarify the YAML key name.

The heading shows backend pytorch without a colon, but the YAML option is backend: pytorch. Please add the colon (or separate the value) so readers copy the correct key/value form.

docs/source/deployment-guide/deployment-guide-for-llama3.3-70b-on-trtllm.md (1)

91-97: Add the colon to the backend key.

Like the other guides, this heading should read backend: pytorch so the YAML syntax is accurate. Please update the inline code snippet accordingly.

docs/source/deployment-guide/deployment-guide-for-deepseek-r1-on-trtllm.md (1)

105-112: Fix the backend YAML notation.

Please change the inline code to backend: pytorch so the example matches valid YAML syntax.

docs/source/deployment-guide/deployment-guide-for-gpt-oss-on-trtllm.md (1)

102-110: Use proper YAML syntax for backend.

The inline code should read backend: pytorch; without the colon it’s misleading. Please update the heading to show the correct key/value form.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9298f1b and db56451.

📒 Files selected for processing (19)

docs/source/commands/trtllm-serve/run-benchmark-with-trtllm-serve.md (1 hunks)
docs/source/deployment-guide/deployment-guide-for-deepseek-r1-on-trtllm.md (3 hunks)
docs/source/deployment-guide/deployment-guide-for-gpt-oss-on-trtllm.md (3 hunks)
docs/source/deployment-guide/deployment-guide-for-llama3.3-70b-on-trtllm.md (3 hunks)
docs/source/deployment-guide/deployment-guide-for-llama4-scout-on-trtllm.md (3 hunks)
docs/source/deployment-guide/index.rst (1 hunks)
docs/source/helper.py (1 hunks)
docs/source/quick-start-guide.md (4 hunks)
examples/configs/README.md (1 hunks)
examples/configs/deepseek-r1-deepgemm.yaml (1 hunks)
examples/configs/deepseek-r1-latency.yaml (1 hunks)
examples/configs/deepseek-r1-throughput.yaml (1 hunks)
examples/configs/gpt-oss-120b-latency.yaml (1 hunks)
examples/configs/gpt-oss-120b-throughput.yaml (1 hunks)
examples/configs/llama-3.3-70b.yaml (1 hunks)
examples/configs/llama-4-scout.yaml (1 hunks)
examples/configs/qwen3-disagg-prefill.yaml (1 hunks)
examples/configs/qwen3.yaml (1 hunks)
examples/models/core/qwen/README.md (2 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

docs/source/helper.py

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

docs/source/helper.py

**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

docs/source/helper.py

🔇 Additional comments (9)

examples/configs/deepseek-r1-latency.yaml (1)

1-15: Latency profile config looks solid.

Coverage of TP, EP, KV cache, and relaxed MTP params matches the intended low-latency tuning. Thanks for keeping it compact.

examples/configs/deepseek-r1-throughput.yaml (1)

1-17: Throughput configuration LGTM.

High-batch capture, fp8 KV cache, and MTP params are coherently set for throughput workloads.

examples/configs/deepseek-r1-deepgemm.yaml (1)

1-20: DeepGEMM profile matches expectations.

Extends the throughput template with the DEEPGEMM MOE backend while keeping capture/KV/MTP settings aligned—looks good end to end.

examples/configs/llama-3.3-70b.yaml (1)

1-13: LGTM! Configuration looks appropriate for Llama-3.3-70b.

The settings are well-balanced for a 70B model deployment with single-GPU configuration (TP=1) and fp8 KV cache optimization.

examples/configs/README.md (1)

1-5: LGTM! Clear and helpful documentation.

The README effectively explains the purpose of the config files and how to use them with trtllm-serve.

docs/source/deployment-guide/index.rst (1)

9-12: LGTM! Straightforward rename aligning with the new documentation structure.

The change from "quick-start-recipe" to "deployment-guide" improves clarity and consistency.

examples/configs/qwen3.yaml (1)

1-21: LGTM! Well-configured for Qwen3 with granular CUDA graph batch sizes.

The explicit batch_sizes list in cuda_graph_config enables optimized graph caching for common batch sizes.

examples/configs/gpt-oss-120b-latency.yaml (1)

1-15: LGTM! Well-tuned latency configuration for GPT-OSS 120B.

The high parallelism settings (TP=8, EP=8) and latency-focused parameters (stream_interval: 20, num_postprocess_workers: 4) are appropriate for a large MoE model deployment.

docs/source/helper.py (1)

348-370: Confirm targeted version replacement. 6 of 120 Markdown files contain the placeholder and will be updated; scanning all .md files has negligible performance impact. Verify that no additional files need this update and that no exclusions are necessary.

coderabbitai · 2025-10-08T22:19:39Z

docs/source/quick-start-guide.md

+| [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) | Any | Max Throughput | [gpt-oss-120b-throughput.yaml](https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/configs/gpt-oss-120b-throughput.yaml) | `trtllm-serve openai/gpt-oss-120b --extra_llm_api_options /app/tensorrt_llm/examples/configs/gpt-oss-120b-throughput.yaml` |
+| [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) | Any | Min Latency | [gpt-oss-120b-latency.yaml](https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/configs/gpt-oss-120b-latency.yaml) | `trtllm-serve openai/gpt-oss-120b --extra_llm_api_options /app/tensorrt_llm/examples/configs/gpt-oss-120b-latency.yaml` |


⚠️ Potential issue | 🟠 Major

Correct GPU requirements for GPT-OSS 120B.

Labeling the GPU requirement as “Any” is incorrect. The referenced config (gpt-oss-120b-throughput.yaml) sets tensor_parallel_size: 8 and moe_expert_parallel_size: 8, which assumes clustered Blackwell-class GPUs (e.g., B200/GB200) with sufficient memory and interconnect. Please update the table to list the actual supported GPU SKUs and parallelism expectations so users don’t attempt this on unsupported hardware.

🤖 Prompt for AI Agents

docs/source/quick-start-guide.md around lines 111-112: the table incorrectly lists the GPU requirement for gpt-oss-120b as "Any"; update the GPU column and optionally an adjacent notes column to reflect the actual supported SKUs and required parallelism (e.g., "Blackwell-class GPUs (B200/GB200) with NVLink/High-speed interconnect; tensor_parallel_size: 8, moe_expert_parallel_size: 8") and ensure the throughput/latency rows reference these requirements so users know the memory and networking expectations before attempting to run the provided YAML configs.

venkywonka · 2025-10-09T17:37:51Z

docs/source/deployment-guide/deployment-guide-for-deepseek-r1-on-trtllm.md

-  backend: DEEPGEMM
-  max_num_tokens: 3200
-EOF
+EXTRA_LLM_API_FILE=/app/tensorrt_llm/examples/configs/deepseek-r1-deepgemm.yaml


maybe better to parametrize as TRTLLM_ROOT or CODE_DIR instead of /app/tensorrt_llm ? as the exact root is user/container-specific right?
i know that pulling the official docker scripts will mount tensorrt_llm in /code/tensorrt_llm - but i personally put it in $HOME/tensorrt_llm in my dev workflow etc.

Just to confirm, TRTLLM_ROOT / CODE_DIR would still have to be defined manually right?

venkywonka · 2025-10-09T17:41:17Z

@anish-shanbhag Thank you for elevating the other parts of the documentation by fixing typos and issues even if you didn't have to 😄

…kstart Signed-off-by: Anish Shanbhag <[email protected]>

Signed-off-by: Anish Shanbhag <[email protected]>

juney-nvidia · 2025-10-10T00:28:17Z

examples/models/core/qwen/README.md

+We maintain YAML configuration files with recommended performance settings in the [`examples/configs`](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/configs) directory. These config files are present in the TensorRT LLM container at the path `/app/tensorrt_llm/examples/configs`. You can use these out-of-the-box, or adjust them to your specific use case.
+
+```shell
+EXTRA_LLM_API_FILE=/app/tensorrt_llm/examples/configs/qwen3.yaml


@nv-guomingz in TRT-LLM NGC container, will the newly added .yaml file part of it?

yes, those newly added .yaml files will be part of it, please refer to this.

juney-nvidia · 2025-10-10T00:34:24Z

@anish-shanbhag

Hi Anish, thanks for submitting the PR.

@litaotju @nv-guomingz Hi Tao/Guoming, this PR touches files which you are familiar with, can you also help review it?

Thanks
June

nv-guomingz · 2025-10-10T02:50:29Z

docs/source/quick-start-guide.md



-## Launch Docker on a node with NVIDIA GPUs deployed
+## Launch Docker Container


I believe the original version is much more simplicity.

The principle for quick start guide is to make it as simple as possible.

nv-guomingz · 2025-10-10T02:51:30Z

docs/source/quick-start-guide.md

 You can also directly load pre-quantized models [quantized checkpoints on Hugging Face](https://huggingface.co/collections/nvidia/model-optimizer-66aa84f7966b3150262481a4) in the LLM constructor.
 To learn more about the LLM API, check out the [](llm-api/index) and [](examples/llm_api_examples).

+## Quick Start for Popular Models


I prefer to move this part into deployment-guide section.

@laikhtewari could u please comment this change on quick start guide.md?

nv-guomingz · 2025-10-10T02:55:02Z

docs/source/helper.py

-        os.path.join(os.path.dirname(__file__),
-                     "../../tensorrt_llm/version.py"))
+    """Replace the placeholder container version in all docs source files."""
+    version_path = (Path(__file__).parent.parent.parent / "tensorrt_llm" /


@litaotju , this line change will update the deployment guide folder from fixed version 1.0.0rc6 to latest release version ,e.g, 1.20.0rc1, is it expected behavior?

nv-guomingz · 2025-10-10T03:12:03Z

docs/source/deployment-guide/deployment-guide-for-llama3.3-70b-on-trtllm.md

-    --ep_size 1 \
-    --trust_remote_code \
-    --extra_llm_api_options ${EXTRA_LLM_API_FILE}
+trtllm-serve nvidia/Llama-3.3-70B-Instruct-FP8 --host 0.0.0.0 --port 8000 --extra_llm_api_options ${EXTRA_LLM_API_FILE}


+ @LinPoly @Superjomn

anish-shanbhag marked this pull request as ready for review October 8, 2025 22:13

anish-shanbhag requested review from a team as code owners October 8, 2025 22:13

anish-shanbhag requested review from Wanli-Jiang, QiJune and Shixiaowei02 October 8, 2025 22:13

coderabbitai bot reviewed Oct 8, 2025

View reviewed changes

venkywonka reviewed Oct 9, 2025

View reviewed changes

anish-shanbhag added 9 commits October 9, 2025 15:04

[None][doc] Add table containing one-line deployment commands to quic…

d069cbf

…kstart Signed-off-by: Anish Shanbhag <[email protected]>

Fix formatting

8741a43

Signed-off-by: Anish Shanbhag <[email protected]>

Fix typo

5a3b2e2

Signed-off-by: Anish Shanbhag <[email protected]>

Move CLI options to YAML

03e4d74

Signed-off-by: Anish Shanbhag <[email protected]>

Fix name of TP/EP params

ba9414a

Signed-off-by: Anish Shanbhag <[email protected]>

Add Qwen3 configs

5f95a2d

Signed-off-by: Anish Shanbhag <[email protected]>

Add ISL/OSL disclaimer

e9233ce

Signed-off-by: Anish Shanbhag <[email protected]>

Minor grammar fix

642ba3e

Signed-off-by: Anish Shanbhag <[email protected]>

Remove backend: pytorch from configs

db38972

Signed-off-by: Anish Shanbhag <[email protected]>

anish-shanbhag force-pushed the one-line-deployment branch from 93947ce to db38972 Compare October 9, 2025 22:08

juney-nvidia reviewed Oct 10, 2025

View reviewed changes

juney-nvidia requested review from litaotju and nv-guomingz and removed request for Wanli-Jiang and Shixiaowei02 October 10, 2025 00:33

nv-guomingz reviewed Oct 10, 2025

View reviewed changes

		\| [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) \| Any \| Max Throughput \| [gpt-oss-120b-throughput.yaml](https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/configs/gpt-oss-120b-throughput.yaml) \| `trtllm-serve openai/gpt-oss-120b --extra_llm_api_options /app/tensorrt_llm/examples/configs/gpt-oss-120b-throughput.yaml` \|
		\| [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) \| Any \| Min Latency \| [gpt-oss-120b-latency.yaml](https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/configs/gpt-oss-120b-latency.yaml) \| `trtllm-serve openai/gpt-oss-120b --extra_llm_api_options /app/tensorrt_llm/examples/configs/gpt-oss-120b-latency.yaml` \|



		## Launch Docker on a node with NVIDIA GPUs deployed
		## Launch Docker Container

[None][doc] Add table with one-line deployment commands to quickstart #8173

Are you sure you want to change the base?

[None][doc] Add table with one-line deployment commands to quickstart #8173

Uh oh!

Conversation

anish-shanbhag commented Oct 7, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary by CodeRabbit

Test Coverage

PR Checklist

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

anish-shanbhag commented Oct 8, 2025

Uh oh!

coderabbitai bot commented Oct 8, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

venkywonka Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

venkywonka commented Oct 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juney-nvidia commented Oct 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nv-guomingz Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

anish-shanbhag commented Oct 7, 2025 •

edited by coderabbitai bot

Loading

venkywonka Oct 9, 2025 •

edited

Loading

nv-guomingz Oct 10, 2025 •

edited

Loading