Skip to content

Conversation

nv-guomingz
Copy link
Collaborator

@nv-guomingz nv-guomingz commented Oct 11, 2025

Summary by CodeRabbit

  • Documentation

    • Added a quick-start deployment guide for running Qwen3-Next on TensorRT-LLM, covering setup, configuration, serving, health checks, benchmarking, and troubleshooting.
    • Updated Supported Models to include Qwen3NextForCausalLM, with entries in feature support matrices.
  • Tests

    • Added accuracy references for Qwen3-Next-80B-A3B-Thinking on GSM8K and MMLU.
    • Introduced new integration tests for this model and updated environment-specific test suites, including disaggregated serving and model-specific groups.

It's a cherry-pick PR of #8195

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@nv-guomingz nv-guomingz requested a review from a team as a code owner October 11, 2025 14:23
@nv-guomingz nv-guomingz force-pushed the user/guomingz/cherry-pick-8195 branch from a2757b0 to c23c27b Compare October 11, 2025 14:23
@nv-guomingz
Copy link
Collaborator Author

/bot run

Copy link
Contributor

coderabbitai bot commented Oct 11, 2025

📝 Walkthrough

Walkthrough

Adds documentation for deploying Qwen3-Next on TensorRT-LLM and updates the deployment guide index. Updates supported models docs. Adds accuracy reference entries for Qwen3-Next-80B-A3B-Thinking. Introduces a new PyTorch accuracy test class and updates DGX H100/B200 test lists, including new disaggregated serving tests and some e2e test removals.

Changes

Cohort / File(s) Summary
Deployment Guide: Qwen3-Next on TensorRT-LLM
docs/source/deployment-guide/index.rst, docs/source/deployment-guide/quick-start-recipe-for-qwen3-next-on-trtllm.md
Added a new quick-start recipe page and linked it in the deployment guide toctree. Content covers prerequisites, Docker usage, YAML/CLI configs, launch steps, health checks, benchmarking, and troubleshooting.
Supported Models Documentation
docs/source/models/supported-models.md
Added Qwen3NextForCausalLM to supported models and feature matrices with initial capability flags.
Accuracy References
tests/integration/defs/accuracy/references/gsm8k.yaml, tests/integration/defs/accuracy/references/mmlu.yaml
Added Qwen3/Qwen3-Next-80B-A3B-Thinking entries with reported accuracies (GSM8K: 81.577, MMLU: 86).
Accuracy Test: PyTorch
tests/integration/defs/accuracy/test_llm_api_pytorch.py
Added TestQwen3NextThinking class to evaluate MMLU and GSM8K with specific TP/EP/PP, KV cache, CUDA graph, and token settings, with device-guarding.
Test Lists: DGX B200/H100
tests/integration/test_lists/test-db/l0_dgx_b200.yml, tests/integration/test_lists/test-db/l0_dgx_h100.yml
B200: Added new accuracy test invocation for Qwen3NextThinking. H100: Added disaggregated serving and model-specific test groups; added multiple tests; removed two e2e entries in final block.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description contains only a brief cherry-pick note and unfilled placeholders under the required “Description” and “Test Coverage” sections, failing to explain what changes were made or how they are tested. Please populate the “Description” section with a concise overview of the changes introduced by this cherry-pick and list the relevant tests under “Test Coverage” to satisfy the repository’s template requirements.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The provided title uses the required ticket placeholder and type, concisely summarizes adding Qwen3-Next documentation to the deployment guide and a test case to the L0 pipeline, making it clear what the main changes are.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

coderabbitai bot commented Oct 11, 2025

📝 Walkthrough

Walkthrough

Adds a new TensorRT-LLM quick-start guide for Qwen3-Next, updates supported models docs, introduces accuracy reference entries for GSM8K and MMLU, adds an integration test class for Qwen3-Next-80B-A3B-Thinking with auto-dtype and CUDA Graph/KV cache configs, and adjusts test list YAMLs to include the new test and reorganize entries.

Changes

Cohort / File(s) Summary
Docs: Qwen3-Next TRT-LLM quick start
docs/source/deployment-guide/quick-start-recipe-for-qwen3-next-on-trtllm.md
New end-to-end deployment guide: prerequisites, Docker build/run, server YAML config (incl. cuda_graph_config, moe_config), launch commands, options, health checks, sample requests, troubleshooting, and benchmarking script/workflow.
Docs: Supported models update
docs/source/models/supported-models.md
Adds Qwen3NextForCausalLM entry to supported models and feature matrix for PyTorch backend.
Accuracy references
tests/integration/defs/accuracy/references/gsm8k.yaml, tests/integration/defs/accuracy/references/mmlu.yaml
Adds Qwen3/Qwen3-Next-80B-A3B-Thinking accuracy records (GSM8K: 81.577; MMLU: 86).
Integration test: Qwen3-Next Thinking
tests/integration/defs/accuracy/test_llm_api_pytorch.py
Adds TestQwen3NextThinking class with Auto Dtype test using KV cache and CUDA Graph configs; runs MMLU and GSM8K with TP=4; skipped on Hopper constraints.
Test lists updates
tests/integration/test_lists/test-db/l0_dgx_b200.yml, tests/integration/test_lists/test-db/l0_dgx_h100.yml
Adds new Qwen3-Next-80B test to B200 pre-merge list; reorganizes H100 list with headers, adds entries in earlier block, and removes duplicates from bottom block.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The description remains as the unfilled template with only a single summary line about cherry-picking and lacks any actual Description or Test Coverage details; it does not explain what was changed, why, or which tests cover the new functionality. Complete the template by adding a concise “Description” section summarizing the PR’s intent and changes, list the specific tests added or updated under “Test Coverage,” and ensure all required template sections are properly filled out.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title Check ❓ Inconclusive The title does reference adding documentation and a test case, but it contains a typo (“guid” instead of “guide”), omits mention of model support updates, and could more precisely highlight the primary addition of the Quick Start recipe for Qwen3-Next on TRT LLM. As written it only partially captures the scope of changes. Please correct the typo and refine the title to clearly and succinctly capture the main change, for example “[None][doc] Add Qwen3-Next Quick Start guide to TRT-LLM and include L0 integration tests.”
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
docs/source/deployment-guide/quick-start-recipe-for-qwen3-next-on-trtllm.md (1)

86-89: Clarify the relationship between command-line and YAML parameters.

The documentation for --kv_cache_free_gpu_memory_fraction doesn't clarify how it relates to kv_cache_config.free_gpu_memory_fraction in the YAML file (line 50). If both are set, which takes precedence?

Consider adding a note about parameter precedence or consolidating all KV cache settings in the YAML file to avoid confusion.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 56a539c and c23c27b.

📒 Files selected for processing (7)
  • docs/source/deployment-guide/quick-start-recipe-for-qwen3-next-on-trtllm.md (1 hunks)
  • docs/source/models/supported-models.md (2 hunks)
  • tests/integration/defs/accuracy/references/gsm8k.yaml (1 hunks)
  • tests/integration/defs/accuracy/references/mmlu.yaml (1 hunks)
  • tests/integration/defs/accuracy/test_llm_api_pytorch.py (1 hunks)
  • tests/integration/test_lists/test-db/l0_dgx_b200.yml (1 hunks)
  • tests/integration/test_lists/test-db/l0_dgx_h100.yml (3 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

  • tests/integration/defs/accuracy/test_llm_api_pytorch.py
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

  • tests/integration/defs/accuracy/test_llm_api_pytorch.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

  • tests/integration/defs/accuracy/test_llm_api_pytorch.py
🧠 Learnings (2)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/integration/test_lists/test-db/l0_dgx_h100.yml
📚 Learning: 2025-09-09T09:40:45.658Z
Learnt from: fredricz-20070104
PR: NVIDIA/TensorRT-LLM#7645
File: tests/integration/test_lists/qa/llm_function_core.txt:648-648
Timestamp: 2025-09-09T09:40:45.658Z
Learning: In TensorRT-LLM test lists, it's common and intentional for the same test to appear in multiple test list files when they serve different purposes (e.g., llm_function_core.txt for comprehensive core functionality testing and llm_function_core_sanity.txt for quick sanity checks). This duplication allows tests to be run in different testing contexts.

Applied to files:

  • tests/integration/test_lists/test-db/l0_dgx_h100.yml
🧬 Code graph analysis (1)
tests/integration/defs/accuracy/test_llm_api_pytorch.py (3)
tests/integration/defs/accuracy/accuracy_core.py (5)
  • LlmapiAccuracyTestHarness (844-855)
  • MMLU (315-329)
  • evaluate (184-245)
  • evaluate (763-773)
  • GSM8K (332-347)
tests/integration/defs/conftest.py (2)
  • llm_models_root (79-93)
  • get_device_count (1979-1981)
tensorrt_llm/llmapi/llm_args.py (2)
  • KvCacheConfig (1106-1240)
  • CudaGraphConfig (109-166)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (10)
docs/source/models/supported-models.md (2)

26-26: LGTM! Model entry follows the existing pattern.

The new Qwen3NextForCausalLM entry is properly formatted and includes the correct HuggingFace model reference that matches the test and documentation additions elsewhere in the PR.


37-37: Verify feature support limitations are intentional.

All features in the support matrix are marked as "No" for Qwen3NextForCausalLM, which indicates very limited functionality compared to other models like Qwen3MoeForCausalLM (line 36) which has extensive feature support. Please confirm this is intentional for the initial release.

Based on the deployment guide (docs/source/deployment-guide/quick-start-recipe-for-qwen3-next-on-trtllm.md lines 41-49), the model does use CUDA Graph and KV cache configurations, which suggests some features may be supported but not documented here yet.

tests/integration/defs/accuracy/references/gsm8k.yaml (1)

126-127: LGTM! GSM8K accuracy reference properly added.

The new accuracy reference for Qwen3-Next-80B-A3B-Thinking is correctly formatted and will be used by the test in tests/integration/defs/accuracy/test_llm_api_pytorch.py (line 3555-3556).

tests/integration/defs/accuracy/references/mmlu.yaml (1)

232-233: LGTM! MMLU accuracy reference properly added.

The new accuracy reference for Qwen3-Next-80B-A3B-Thinking is correctly formatted and will be used by the test in tests/integration/defs/accuracy/test_llm_api_pytorch.py (line 3553-3554).

tests/integration/test_lists/test-db/l0_dgx_b200.yml (1)

42-42: LGTM! Test entry correctly added to B200 test list.

The new test entry for Qwen3NextThinking is properly placed in the 4-GPU pre-merge block and follows the established naming convention. It correctly references the test method added in tests/integration/defs/accuracy/test_llm_api_pytorch.py (lines 3537-3556).

tests/integration/test_lists/test-db/l0_dgx_h100.yml (2)

22-22: LGTM! Test organization improved with section headers.

Adding "Disaggregated serving tests" headers improves the readability of the test list by clearly delineating different test categories.

Also applies to: 76-76


74-75: Verify if these test additions are related to the Qwen3-Next PR.

These test_e2e.py additions appear unrelated to the Qwen3-Next model support being added in this PR. If they are incidental changes that were already in the branch, consider whether they should be in a separate PR for cleaner history.

tests/integration/defs/accuracy/test_llm_api_pytorch.py (3)

3528-3532: LGTM! Test class follows established patterns.

The new test class is properly structured following the LlmapiAccuracyTestHarness pattern used by other model tests in this file. The MODEL_NAME and MODEL_PATH setup is consistent with the model reference added in other files.


3533-3556: Verify test configuration aligns with deployment guide.

The test configuration looks correct, but please verify the following alignments with the deployment guide:

  1. max_batch_size mismatch: Test uses 720 (line 3544), deployment guide uses 16 (docs/source/deployment-guide/quick-start-recipe-for-qwen3-next-on-trtllm.md line 63). Is this intentional for testing purposes?

  2. Missing moe_config: The deployment guide specifies moe_config: backend: TRTLLM (line 45), but the test doesn't explicitly set this. Will it default appropriately?

  3. stream_interval: The deployment guide sets stream_interval: 20 (line 46) but the test doesn't configure this parameter.

The core configuration (KV cache with enable_block_reuse=False, CUDA Graph with padding enabled, 4-way TP and EP) correctly matches the deployment guide's recommendations.


3538-3539: Device count validation is good practice.

The explicit device count check prevents the test from running with incorrect parallelism configurations, which would lead to confusing failures.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #21079 [ run ] triggered by Bot

@nv-guomingz nv-guomingz added the Cherry-pick It's a label that applies to Cherry-pick PR. label Oct 11, 2025
@tensorrt-cicd
Copy link
Collaborator

PR_Github #21079 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #15931 completed with status: 'FAILURE'

@nv-guomingz
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #21082 [ run ] triggered by Bot

Copy link
Collaborator

@faradawn faradawn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saw that Blackwell support is added to the guide, and that unit test is refactored. Looks good. Anything you need me to modify?

@tensorrt-cicd
Copy link
Collaborator

PR_Github #21082 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #15933 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@nv-guomingz nv-guomingz force-pushed the user/guomingz/cherry-pick-8195 branch from c23c27b to b0339df Compare October 12, 2025 14:59
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
docs/source/deployment-guide/quick-start-recipe-for-qwen3-next-on-trtllm.md (2)

34-34: Align prose with enable_block_reuse

The text still tells readers to flip kv_cache_reuse, but the actual server knob is enable_block_reuse. Please name the correct option so the instructions match the YAML and implementation.

-Note that we should set kv_cache_reuse to false.
+Note that we should set `enable_block_reuse` to false to disable KV cache reuse for this recipe.

221-222: Remove attention-DP tuning advice

Attention data parallelism remains unsupported for Qwen3NextForCausalLM (see the supported-models matrix), so the throughput tip shouldn’t instruct users to sweep “with attention DP on.” Please reword or drop that clause.

-To achieve max through-put, with attention DP on, one needs to sweep up to `concurrency = max_batch_size * num_gpus`.
+To achieve max throughput, sweep concurrency up to `concurrency = max_batch_size * num_gpus`; attention data parallelism is unsupported for this model today.
🧹 Nitpick comments (1)
docs/source/deployment-guide/quick-start-recipe-for-qwen3-next-on-trtllm.md (1)

24-30: Annotate fenced blocks with languages

Adding the language hints keeps markdownlint happy and improves highlighting—shell for the Docker commands and json for the sample response.

-```
+```shell
 cd TensorRT-LLM
 make -C docker release_build IMAGE_TAG=qwen3-next-local
 make -C docker release_run IMAGE_NAME=tensorrt_llm IMAGE_TAG=qwen3-next-local LOCAL_USER=1
-```
+```json
 {"id":"chatcmpl-64ac201c77bf46a7a3a4eca7759b1fd8","object":"chat.completion","created":1759022940,"model":"Qwen/Qwen3-Next-80B-A3B-Thinking",...}

Also applies to: 172-174

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a7ea544 and b0339df.

📒 Files selected for processing (8)
  • docs/source/deployment-guide/index.rst (1 hunks)
  • docs/source/deployment-guide/quick-start-recipe-for-qwen3-next-on-trtllm.md (1 hunks)
  • docs/source/models/supported-models.md (2 hunks)
  • tests/integration/defs/accuracy/references/gsm8k.yaml (1 hunks)
  • tests/integration/defs/accuracy/references/mmlu.yaml (1 hunks)
  • tests/integration/defs/accuracy/test_llm_api_pytorch.py (1 hunks)
  • tests/integration/test_lists/test-db/l0_dgx_b200.yml (1 hunks)
  • tests/integration/test_lists/test-db/l0_dgx_h100.yml (3 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

  • tests/integration/defs/accuracy/test_llm_api_pytorch.py
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

  • tests/integration/defs/accuracy/test_llm_api_pytorch.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

  • tests/integration/defs/accuracy/test_llm_api_pytorch.py
🧠 Learnings (2)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/integration/test_lists/test-db/l0_dgx_h100.yml
📚 Learning: 2025-09-09T09:40:45.658Z
Learnt from: fredricz-20070104
PR: NVIDIA/TensorRT-LLM#7645
File: tests/integration/test_lists/qa/llm_function_core.txt:648-648
Timestamp: 2025-09-09T09:40:45.658Z
Learning: In TensorRT-LLM test lists, it's common and intentional for the same test to appear in multiple test list files when they serve different purposes (e.g., llm_function_core.txt for comprehensive core functionality testing and llm_function_core_sanity.txt for quick sanity checks). This duplication allows tests to be run in different testing contexts.

Applied to files:

  • tests/integration/test_lists/test-db/l0_dgx_h100.yml
🧬 Code graph analysis (1)
tests/integration/defs/accuracy/test_llm_api_pytorch.py (4)
tests/integration/defs/accuracy/accuracy_core.py (5)
  • LlmapiAccuracyTestHarness (844-855)
  • MMLU (315-329)
  • evaluate (184-245)
  • evaluate (763-773)
  • GSM8K (332-347)
tests/integration/defs/conftest.py (2)
  • llm_models_root (79-93)
  • get_device_count (1979-1981)
tensorrt_llm/llmapi/llm_args.py (2)
  • KvCacheConfig (1106-1240)
  • CudaGraphConfig (109-166)
tensorrt_llm/llmapi/llm.py (1)
  • LLM (1084-1100)
🪛 markdownlint-cli2 (0.18.1)
docs/source/deployment-guide/quick-start-recipe-for-qwen3-next-on-trtllm.md

24-24: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


172-172: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check

@nv-guomingz nv-guomingz force-pushed the user/guomingz/cherry-pick-8195 branch from b0339df to f5e0dc5 Compare October 12, 2025 15:54
Signed-off-by: Faradawn Yang <[email protected]>

Signed-off-by: Robin Kobus <[email protected]>

Signed-off-by: nv-guomingz <[email protected]>
@nv-guomingz nv-guomingz force-pushed the user/guomingz/cherry-pick-8195 branch from f5e0dc5 to 54e73bd Compare October 12, 2025 15:55
@nv-guomingz
Copy link
Collaborator Author

/bot reuse-pipeline

@nv-guomingz nv-guomingz enabled auto-merge (squash) October 12, 2025 15:56
@tensorrt-cicd
Copy link
Collaborator

PR_Github #21100 [ reuse-pipeline ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #21100 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #21082 for commit 54e73bd

@nv-guomingz nv-guomingz merged commit 989c25f into NVIDIA:main Oct 13, 2025
5 checks passed
@faradawn
Copy link
Collaborator

Related:

CI tests PR: #8111

Doc PR: #8007

@nv-guomingz nv-guomingz changed the title [None][doc] Add qwen3-next doc into deployment guid and test case into L0. [None][doc] Add qwen3-next doc into deployment guide and test case into L0. Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Cherry-pick It's a label that applies to Cherry-pick PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants