Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1168 commits
Select commit Hold shift + click to select a range
655d0f4
[https://nvbugs/5455140][fix] unwaive DSR1-fp4 throughput_tp8 (#7022)
lfr-0531 Aug 19, 2025
07506bc
[None][chore] Remove duplicate test waives (#7044)
yiqingy0 Aug 19, 2025
8f95f35
[None][infra] Waive failed tests on main (#7037)
EmmaQiaoCh Aug 19, 2025
7e135d2
[None][feat] Use Separate QKV Input Layout for Context MLA (#6538)
zhhuang-nv Aug 19, 2025
e07fcc3
[https://nvbugs/5444937][chore] Fixing KV events tests (#7004)
pcastonguay Aug 19, 2025
d26a5a9
[https://nvbugs/5451296][bug] Cherry-pick #7017 from release/1.0 bran…
chzblych Aug 19, 2025
7334f93
[None][fix] Accommodate Phi3/4 to work with ModelOpt's FP8 ckpts in T…
moraxu Aug 19, 2025
0e30fe4
[None][fix] Fix assertion errors of quantization when using online EP…
jinyangyuan-nvidia Aug 19, 2025
c02592d
[None][autodeploy] Add group attention pattern for solar-pro-preview …
Fridah-nv Aug 19, 2025
30da5d3
[None][chore] unwaive test_disaggregated_genbs1 (#6944)
bo-nv Aug 20, 2025
fc85e3d
[None][fix] fix llmapi import error (#7030)
crazydemo Aug 20, 2025
ce53832
[TRTLLM-7326][feat] Add standalone multimodal encoder (#6743)
chang-l Aug 20, 2025
3f6a926
[None][infra] update feature_combination_matrix of disaggregated and …
leslie-fang25 Aug 20, 2025
9e71b4f
[TRTLLM-7205][feat] add llama4 tp4 tests (#6989)
xinhe-nv Aug 20, 2025
e270884
[None][infra] "[TRTLLM-6960][fix] enable scaled_mm tests (#6936)" (#7…
Tabrizian Aug 20, 2025
020fed9
[TRTLLM-6341][chore] Preliminary refactors on the kv cache manager be…
eopXD Aug 20, 2025
20f54cb
[None][fix] fix scaffolding dynasor test (#7070)
dc3671 Aug 20, 2025
983fb8e
[None][chore] Update namelist in blossom-ci (#7015)
karljang Aug 20, 2025
b95cab2
[None][ci] move unittests to sub-directories (#6635)
Funatiq Aug 20, 2025
f84dd64
[None][infra] Waive failed tests on main branch 8/20 (#7092)
EmmaQiaoCh Aug 20, 2025
8ac7dec
[None][fix] Fix W4A8 MoE kernel issue (#7072)
yuhyao Aug 20, 2025
92daec1
[TRTLLM-7348] [feat] Enable Cross-Attention to use XQA kernels for Wh…
DomBrown Aug 20, 2025
e5e4170
[None][chore] Only check the bindings lib for current build (#7026)
liji-nv Aug 20, 2025
a918de7
[None][ci] move some tests of b200 to post merge (#7093)
QiJune Aug 20, 2025
73d2daa
[https://nvbugs/5457489][fix] unwaive some tests (#6991)
byshiue Aug 21, 2025
0893afa
[TRTLLM-6771][feat] Support MMMU for multimodal models (#6828)
yechank-nvidia Aug 21, 2025
75b8a90
[None][fix] Fix llama4 multimodal by skipping request validation (#6957)
chang-l Aug 21, 2025
9f51f8d
[None][infra] Upgrade UCX to v1.19.x and NIXL to 0.5.0 (#7024)
BatshevaBlack Aug 21, 2025
f03053b
[None][fix] update accelerate dependency to 1.7+ for AutoDeploy (#7077)
Fridah-nv Aug 21, 2025
41ff490
[None][fix] Fix const modifier inconsistency in log function declarat…
Fan-Yunfan Aug 21, 2025
21f4434
[None][chore] waive failed cases on H100 (#7084)
xinhe-nv Aug 21, 2025
cbcea33
[fix]: use safeInitRowMax instead of fp32_lowest to avoid NaN (#7087)
lowsfer Aug 21, 2025
647a526
[https://nvbugs/5443039][fix] Fix AutoDeploy pattern matcher for torc…
Fridah-nv Aug 21, 2025
ba0a86e
[https://nvbugs/5437405][fix] qwen3 235b eagle3 ci (#7000)
byshiue Aug 21, 2025
2d40e87
[None][doc] Update gpt-oss deployment guide to latest release image (…
farshadghodsian Aug 21, 2025
c7269ea
[https://nvbugs/5392414] [fix] Add customized default routing method …
ChristinaZ Aug 21, 2025
90bfc8c
[https://nvbugs/5453827][fix] Fix RPATH of th_common shared library t…
tongyuantongyu Aug 21, 2025
9a2b44d
[None][chore] No-op changes to support context parallelism in disaggr…
brb-nv Aug 21, 2025
f49dafe
[https://nvbugs/5394409][feat] Support Mistral Small 3.1 multimodal i…
dbari Aug 21, 2025
344bc45
[None][infra] Waive failed case for main branch (#7129)
EmmaQiaoCh Aug 21, 2025
e18dacc
[#4403][refactor] Move fusion, kvcache, and compile to modular infere…
Fridah-nv Aug 21, 2025
f7c597e
[None][perf] Make finalize fusion part of the tactic selection logic …
djns99 Aug 21, 2025
6f245ec
[None][chore] Mass integration of release/1.0 (#6864)
dominicshanshan Aug 22, 2025
c5036cb
[None][docs] update stale link for AutoDeploy (#7135)
suyoggupta Aug 22, 2025
07c711e
[TRTLLM-6825][fix] Update lora for phi4-mm (#6817)
Wanli-Jiang Aug 22, 2025
4017f7c
[None][chore] Add failed cases into waives.txt (#7109)
xinhe-nv Aug 22, 2025
983dd7e
[None][fix] Fix mm_placholder_counts extraction issue. (#7118)
hyukn Aug 22, 2025
099f081
[TRTLLM-7155][feat] Unify sampler handle logits implementation. (#6867)
dcampora Aug 22, 2025
a49cf68
[TRTLLM-5801][infra] Add more RTX Pro 6000 test stages (#5126)
EmmaQiaoCh Aug 22, 2025
898f37f
[None][feat] Enable nanobind as the default binding library (#6608)
Linda-Stadter Aug 22, 2025
d94cc3f
[TRTLLM-7321][doc] Add GPT-OSS Deployment Guide into official doc sit…
dongfengy Aug 22, 2025
b8b2bd4
[TRTLLM-7245][feat] add test_multi_nodes_eval tests (#7108)
xinhe-nv Aug 22, 2025
1388e84
[None][ci] move all B200 TensorRT test cases to post merge (#7165)
QiJune Aug 22, 2025
907bc22
[None][chore] Bump version to 1.1.0rc2 (#7167)
yiqingy0 Aug 22, 2025
e3de575
[#7136][feat] trtllm-serve + autodeploy integration (#7141)
suyoggupta Aug 22, 2025
c232ba8
[TRTLLM-4921][feat] Enable chunked prefill for Nemotron-H (#6334)
tomeras91 Aug 22, 2025
37543a9
[None][refactor] Simplify decoder state initialization for speculativ…
Funatiq Aug 22, 2025
b36460d
[None][feat] Deepseek: Start Eagle work (#6210)
IzzyPutterman Aug 22, 2025
81fd468
[None][fix] Correct KV cache percentage report out. (#7102)
FrankD412 Aug 22, 2025
3d54a1a
[None] [feat] nsys profile output kernel classifier (#7020)
gracehonv Aug 23, 2025
96ff82e
[None][fix] Waive test (#7185)
Tabrizian Aug 24, 2025
35e0ae4
[https://nvbugs/5467232][fix] Fix load_torch_hf_lora to override lora…
amitz-nv Aug 24, 2025
19a0ea3
[TRTLLM-6743][feat] Optimize and refactor alltoall in WideEP (#6973)
dongxuy04 Aug 24, 2025
48155f5
[TRTLLM-7321][doc] Refine GPT-OSS doc (#7180)
dongfengy Aug 24, 2025
ec35481
[None][infra] Prepare for single GPU GB200 test pipeline (#7073)
chzblych Aug 24, 2025
0680566
[None][chore] Enable auto deploy accuracy test in CI (#7179)
ajrasane Aug 24, 2025
31979ae
[None] [ci] Reorganize CMake and Python integration test infrastructu…
Funatiq Aug 24, 2025
486bc76
[None][infra] Split DGX_B200 stage into multiple parts and pre-/post-…
yiqingy0 Aug 25, 2025
6e13160
[TRTLLM-7096][infra] Testing cache transmission functionality in Pyth…
bo-nv Aug 25, 2025
3ba9afc
[None][feat] add gpt-osss tests to sanity list (#7158)
xinhe-nv Aug 25, 2025
c038fb3
[None][chore] cherry-pick 6940 (#7097)
bo-nv Aug 25, 2025
9c5b464
[None][feat] Apply AutoTuner to fp8_block_scale_deep_gemm to trigger …
hyukn Aug 25, 2025
630e67b
[None][ci] waive test_mamba2_chunk_scan_combined_prefill_chunking[seq…
QiJune Aug 25, 2025
f61b74f
[None][test] add l20 specific qa test list (#7067)
crazydemo Aug 25, 2025
be6d92f
[None][fix] Fix MoE load balancer config loading (#7150)
syuoni Aug 25, 2025
a1e03af
[TRTLLM-7346][fix] Improve performance of PyTorchModelEngine._get_lor…
amitz-nv Aug 25, 2025
b32e00e
[None][chore] remove CLI support for mamba cache dtype setting (#7119)
shaharmor98 Aug 25, 2025
bea5e07
[None][refactor] refactor the CUDA graph runner to manage all CUDA gr…
QiJune Aug 25, 2025
200db3b
[None][infra] Waive failed tests on main branch (#7201)
EmmaQiaoCh Aug 25, 2025
6a44e5b
[https://nvbugs/5440241][fix] Fix 70B GSM8K Accuracy drop (#6967)
chenfeiz0326 Aug 25, 2025
788fc62
[None][fix] Update to pull LLM from a central location. (#6458)
FrankD412 Aug 25, 2025
e8e7e52
[None][chore] Refactored the handle logits pp communication (#7154)
dcampora Aug 25, 2025
bf1b958
[TRTLLM-7319][perf] Fuse slicing into MoE. (#6728)
bobboli Aug 25, 2025
97d550b
[None] [AutoDeploy] canonicalize_graph before shape prop for consiste…
lucaslie Aug 25, 2025
2101d46
[TRTLLM-6342][feat] TP Sharding read from the model config (#6972)
greg-kwasniewski1 Aug 25, 2025
9df15b2
[None][doc] update feature_combination_matrix doc (#6691)
leslie-fang25 Aug 26, 2025
b845eb7
[None][test] add kv cache size in bench metric and fix failed cases (…
ruodil Aug 26, 2025
20922b7
[None][chore] Create PyExecutor from TorchLlmArgs Part 1 (#7105)
leslie-fang25 Aug 26, 2025
4f84a45
[https://nvbugs/5452463][doc] update disagg doc about UCX_MAX_RNDV_RA…
zhengd-nv Aug 26, 2025
9257648
[None][feat] Skip prefetching consolidated safetensors when appropria…
2ez4bz Aug 26, 2025
b165f8b
fix/improve kvcache allocation in PyTorch runtime (#5933)
qixiang-99 Aug 26, 2025
bbc1478
[None][chore] Update CI allowlist 2025-08-25 (#7229)
yuanjingx87 Aug 26, 2025
d8bd884
[None][test] Update qwen3 timeout to 60 minutes (#7200)
nvamyt Aug 26, 2025
1a929a1
[https://nvbugs/5457504][fix] fix kv cache event test in disaggregate…
zhengd-nv Aug 26, 2025
cf50ba2
[TRTLLM-6549][feat] add perf metrics endpoint to openai server and op…
zhengd-nv Aug 26, 2025
bf377d0
[None][doc] Display tech blog for nvidia.github.io domain. (#7241)
nv-guomingz Aug 26, 2025
23ed0c8
[https://nvbugs/5477332][fix] Relax atol in test_mamba2_chunk_scan_co…
amitz-nv Aug 26, 2025
f01101f
[None][feat] Hopper Fp8 context mla (#7116)
zhou-yuxin Aug 26, 2025
a142c0c
[None][infra] Add retry 3 times if ssh cluster failed (#6859)
EmmaQiaoCh Aug 26, 2025
80043af
[None][chore] Add failed cases into waives.txt (#7251)
xinhe-nv Aug 26, 2025
2d0c9b3
[None][fix] Updated blog9_Deploying_GPT_OSS_on_TRTLLM (#7260)
Maurits-de-Groot Aug 26, 2025
baef70e
[None][ci] move qwen3 tests from b200 to gb200 (#7257)
QiJune Aug 26, 2025
040f4c7
[None][perf] Accelerate global scale calculations for deepEP fp4 comb…
yilin-void Aug 26, 2025
78ecfbb
[None][fix] Fix data type of KV Cache percentage in bench. (#7230)
FrankD412 Aug 26, 2025
0f947c6
[None][doc] Update autodeploy README.md, deprecate lm_eval in example…
Fridah-nv Aug 26, 2025
87d1d3a
[None][update] Update disagg code owners (#7266)
Tabrizian Aug 26, 2025
0282354
[TRTLLM-6633][feat] Padding for piecewise cudagraph (#6750)
liji-nv Aug 26, 2025
ccb6aad
[https://nvbugs/5412456][fix] Remove from waives.txt (#7248)
zhou-yuxin Aug 27, 2025
e12868b
[None][fix] Remove and fuse some element-wise ops in the ds-r1-fp8 mo…
lfr-0531 Aug 27, 2025
ff40474
[None][opt] Balance the request based on number of tokens in Attentio…
Shunkangz Aug 27, 2025
d0d8903
[TRTLLM-6960][fix] replace flasky scaled_mm test with more stable con…
dc3671 Aug 27, 2025
bc84758
[None][feat] Add logging for OAI disagg server (#7232)
Tabrizian Aug 27, 2025
6c7813e
[TRTLLM-7457][ci] Update & cleanup unittest parallel config (#7254)
tongyuantongyu Aug 27, 2025
82bd187
[None][chore] update disagg readme and scripts for pipeline paralleli…
raayandhar Aug 27, 2025
bed5bc9
[None][chore] Wrap the swiglu into custom op to avoid redundant devic…
hyukn Aug 27, 2025
abdb273
[None][fix] Fix possible hang issue in WideEP and move some tests to …
dongxuy04 Aug 27, 2025
e08c7cf
[None][ci] remove test_llm_api_autodeploy from B200 test db (#7282)
QiJune Aug 27, 2025
f167b1f
[https://nvbugs/5453727][fix] Fix bug of how GPT-OSS setup the parame…
byshiue Aug 27, 2025
dbd4f21
[None][fix] Update maxnt of llama_v3.2_1b bench (#7279)
nvamyt Aug 27, 2025
8b21613
[None][refactor] Move draft token padding out of Drafter (#7134)
mikeiovine Aug 27, 2025
f082e48
[TRTLLM-7250][fix] waive failed cases (#7292)
xinhe-nv Aug 27, 2025
8dc62ff
[None][infra] Waive failed tests on main (#7300)
EmmaQiaoCh Aug 27, 2025
d09add5
[None][ci] parallelize unit tests of auto deploy in B200 (#7291)
QiJune Aug 27, 2025
462169b
[https://nvbugs/5458798][fix] AD perf test outliers handling, tighten…
MrGeva Aug 27, 2025
9d345b3
[https://nvbugs/5453727][fix] unwaive qwen3 CI tests (#7293)
byshiue Aug 27, 2025
7cfa475
[None][fix] Remove the wheel from intermediate docker storage (#7175)
MartinMarciniszyn Aug 27, 2025
8a619be
[None] [chore] Make disagg example compatible with recommended usage …
kaiyux Aug 27, 2025
f30768e
[TRTLLM-6822][infra] Add PR-Checklist github action and modify PR tem…
venkywonka Aug 28, 2025
c1e7fb9
[TRTLLM-7207][feat] Chat completions API for gpt-oss (#7261)
LinPoly Aug 28, 2025
39c9ffd
[None][ci] fix test list name (#7321)
QiJune Aug 28, 2025
7f4adca
[None][fix] Disable mandatory PR checklist enforcement (#7325)
venkywonka Aug 28, 2025
4541655
[https://nvbugs/5430124][ci] Unwaive Mistral 3.1 Small tests (#7274)
2ez4bz Aug 28, 2025
ae89163
[None][ci] skip TestGPTOSS (#7333)
QiJune Aug 28, 2025
53163bf
[TRTLLM-6876][feat] Add low precision all2all for mnnvl (#7155)
zongfeijing Aug 28, 2025
23f72c8
[None] [feat] Use numa to bind CPU (#7304)
kaiyux Aug 28, 2025
08f9356
[https://nvbugs/5474453][fix] fix path to tested model (#7272)
nzmora-nvidia Aug 28, 2025
c4f8233
[None][doc] add adp balance blog (#7213)
yunruis Aug 28, 2025
1e644fa
[None][infra] Waive failed tests on main branch 08/26 (#7346)
EmmaQiaoCh Aug 28, 2025
a419b77
[None][fix] mxfp4 padding bug for TRT-LLM and CUTLASS MoE backends (#…
nekorobov Aug 28, 2025
460a34c
[None][chore] Some improvements for CI stability (#7199)
chzblych Aug 28, 2025
367ff88
[None][feat] Refactor llama4 for multimodal encoder IFB (#6844)
dongfengy Aug 28, 2025
b093d94
[https://nvbugs/5445466][fix] Bypass MLP TP split for MNNVL in DeepSe…
timlee0212 Aug 28, 2025
ccb800f
[TRTLLM-7457][ci] Update unittest parallel config (#7297)
tongyuantongyu Aug 29, 2025
e0253ee
[None][perf] Disable Swap AB when num tokens exceeds N dimension (#7104)
djns99 Aug 29, 2025
085dc19
[TRTLLM-6646][test] NIM migration to TRT-LLM LLMAPI : Add QWQ-32b tor…
aalanwyr Aug 29, 2025
ce580ce
[None][feat] KV Cache Connector API (#7228)
richardhuo-nv Aug 29, 2025
2e43753
[None] [chore] Update .coderabbit.yaml review configuration (#7351)
venkywonka Aug 29, 2025
31b0f0f
[https://nvbugs/5445466][fix] Eliminate race when loading HF dynamic …
chang-l Aug 29, 2025
091b67a
[TRTLLM-7280][test] Add beam search CudaGraph + Overlap Scheduler tes…
fredricz-20070104 Aug 29, 2025
f617b03
[None][fix] fix doc formula (#7367)
yunruis Aug 29, 2025
37a1bd8
[https://nvbugs/5481385][fix] Fix max_seq_len in cuda graph warmup an…
lfr-0531 Aug 29, 2025
62459d5
[None][chore] Update pre-merge test to add DeepSeek/LLaMA and gpt-oss…
pengbowang-nv Aug 29, 2025
15ec2b8
[None][infra] Waive failed tests on main branch 08/29 (#7370)
EmmaQiaoCh Aug 29, 2025
642ff13
[None][doc] Exposing the ADP balance strategy tech blog (#7380)
juney-nvidia Aug 29, 2025
43cb50f
[None][feat] Update TargetInfo to accommodate CP in disagg (#7224)
brb-nv Aug 29, 2025
9bb0c95
[None][docs] Update Dynasor paper info (#7137)
AndyDai-nv Aug 30, 2025
e09c025
[None] [fix] store blog 10 media via lfs (#7375)
Funatiq Aug 30, 2025
5f939b9
[None][chore] Add failed cases into waives.txt (#7342)
xinhe-nv Aug 30, 2025
ec595a8
[None][chore] Bump version to 1.1.0rc2 (#7394)
yiqingy0 Aug 31, 2025
a7ed26d
[TRTLLM-6747][feat] Merge add sparse exp and shared exp into local re…
zongfeijing Sep 1, 2025
e257cb3
[None][feat] Support NVFP4 KV Cache (#6244)
Tom-Zheng Sep 1, 2025
c5148f5
[None][ci] Some improvements for Slurm CI setup (#7407)
chzblych Sep 1, 2025
c7147d2
[TRTLLM-6975][test] Add multi-turn test cases for VLM models (#6749)
crazydemo Aug 13, 2025
7841ea6
[None][chore] waive GB300 known issues (#6812)
xinhe-nv Aug 13, 2025
deba288
[None][fix] fix Llama3 eagle3 test case OOM (#6832)
crazydemo Aug 13, 2025
3e99744
[https://nvbugs/5375594][fix] fix oom issue on structural_tag test ca…
nv-guomingz Aug 13, 2025
2480aed
[TRTLLM-5252][feat] Add fp8 support for Mistral Small 3.1 (#6731)
2ez4bz Aug 14, 2025
3aeee19
[None][infra] Setup the code review rule on the release branch (#6725)
yiqingy0 Aug 14, 2025
cf0c47c
[None][fix] Fix batching bug in Mistral3 model (#6841)
2ez4bz Aug 14, 2025
b821883
[None][fix] Revert phi4-mm aggregate mode (#6907)
amukkara Aug 14, 2025
e106045
[None][fix] Complete the last missing allreduce op in Llama3/4. (#6850)
hyukn Aug 15, 2025
0253036
[None][chore] Add docs for Gemma3 VLMs (#6880)
brb-nv Aug 15, 2025
612c26b
[None][doc] add legacy section for tensorrt engine (#6724)
Superjomn Aug 15, 2025
b4d41d6
[TRTLLM-7048][feat] add benchmark TRT flow test for MIG (#6884)
xinhe-nv Aug 15, 2025
665a1a7
[https://nvbugs/5451434][fix] Fix triton docker build (#6898)
Tabrizian Aug 15, 2025
ac07418
[None][ci] unwaive test_ptp_star_attention_example (#6943)
Superjomn Aug 15, 2025
de55763
[https://nvbugs/5455836][fix] Fix llama 4 FP4 (#6911)
mikeiovine Aug 15, 2025
093a037
[None][infra] update CODEOWNERS for release (#6905)
venkywonka Aug 15, 2025
261ffac
[https://nvbugs/5412562][feat] Allocate MoE workspace only when neces…
nv-yilinf Aug 18, 2025
704fca4
[TRTLLM-6835][fix] Fix potential hang caused by python multiprocessin…
lancelly Aug 18, 2025
d15dcdc
[https://nvbugs/5448525][fix] Mistral Small 3.1 accuracy tests (#6909)
2ez4bz Aug 18, 2025
d5bc5cd
[https://nvbugs/5375646][fix] update waives.txt for nvbug 5375646 (#6…
nv-guomingz Aug 18, 2025
29cdcdb
[None][fix] update skip config (#6891)
crazydemo Aug 18, 2025
f4dc1ed
[https://nvbugs/5449218][fix] Fix KvCacheConfig error in test_perf (#…
peaceh-nv Aug 18, 2025
09bca7c
[None][infra] Waive failed tests for release branch 0818 (#6993)
EmmaQiaoCh Aug 18, 2025
21291f3
[None][chore] Remove duplicate test waives (#6999)
yiqingy0 Aug 18, 2025
93e623b
[https://nvbugs/5449155][fix] Fix DeepSeek R1 weight loading for TP16…
achartier Aug 19, 2025
ed4087a
[https://nvbugs/5374016][fix] improve error message (#6893)
QiJune Aug 19, 2025
44cc308
[https://nvbugs/5474037][fix] Fix building tritonbuild/tritonrelease …
dbari Aug 22, 2025
b0558c7
[None][fix] Fix build of tritonbuild/tritonrelease image (#7003)
dbari Aug 20, 2025
efaefca
[None][test] Update case that not support passing quantization fp8 fo…
nvamyt Sep 1, 2025
2b286ae
[None][infra] Disable GB200-PyTorch-1 due to OOM issue (#7386)
yuanjingx87 Sep 1, 2025
16e9d11
[https://nvbugs/5481087][fix] fix bug of ci when we use mocker (#7332)
byshiue Sep 1, 2025
01dfd3a
[None][infra] Waive failed case on main 0901 (#7447)
EmmaQiaoCh Sep 1, 2025
b3c57a7
[TRTLLM-7353][feat] Implement capturable drafting loops for speculati…
mikeiovine Sep 1, 2025
9f2dc30
[None] [doc] Update DeepSeek example doc (#7358)
jiahanc Sep 1, 2025
1b9c4cc
[None][fix] Fix nanobind failure (#7425)
Tom-Zheng Sep 1, 2025
e81c50d
[None][chore] Use llm args in create_py_executor (#7239)
leslie-fang25 Sep 1, 2025
60df6b2
[https://nvbugs/5485430][fix] Copy the nanobind file when using preco…
jiaganc Sep 2, 2025
ff2439f
[None][infra] Using local variables in rerun function (#7198)
yiqingy0 Sep 2, 2025
a07bb16
[None][ci] Correct docker args for GPU devices and remove some stale …
chzblych Sep 2, 2025
f90375f
[https://nvbugs/5476580][fix] unwaive test_nvfp4_4gpus (#7454)
Superjomn Sep 2, 2025
3799e5d
[None][test] auto reuse torch empty cache on qa test (#7421)
crazydemo Sep 2, 2025
9c8d216
[None][doc] fix example in docstring (#7410)
tomeras91 Sep 2, 2025
c3c9573
[TRTLLM-6643][feat] Add DeepSeek-v3-0324 e2e torch test (#7413)
aalanwyr Sep 2, 2025
7279297
[None][infra] waive test case failed on post-merge (#7471)
HuiGao-NV Sep 2, 2025
eefe5f2
[TRTLLM-7208][feat] Implement basic functionalities for Responses API…
JunyiXu-nv Sep 2, 2025
90479c5
[https://nvbugs/5453992][unwaive] Unwaive llama quickstart test (#7242)
peaceh-nv Sep 2, 2025
aae5d22
[None][infra] Waive failed tests on main branch 0902 (#7482)
EmmaQiaoCh Sep 2, 2025
f58a183
[None][chore] Fix formatting error in Gemma3 readme (#7352)
karljang Sep 2, 2025
bcc55bc
[https://nvbugs/5470782][fix] Add specific test names for test_deepse…
SimengLiu-nv Sep 2, 2025
75c1bb6
[https://nvbugs/5458798][fix] Disabled test_trtllm_bench_backend_comp…
MrGeva Sep 2, 2025
b4340ec
[None][chore] Add note about trtllm-serve to the devel container (#7483)
MartinMarciniszyn Sep 2, 2025
42697ea
[None][chore] rm executor config in kv cache connector (#7372)
leslie-fang25 Sep 3, 2025
109f272
[None][perf] Add MOE support for dynamic cluster shapes and custom ep…
djns99 Sep 3, 2025
572551b
[None][perf] Autotune TRT-LLM Gen MoE when using CUDA graphs (#7285)
jinyangyuan-nvidia Sep 3, 2025
4223a9a
[TRTLLM-7261][feat] Support phi-4 model in pytorch backend (#7371)
Wanli-Jiang Sep 3, 2025
9a4f606
[https://nvbugs/5480289][fix] release slot manager in mtp MTPHiddenSt…
yweng0828 Sep 3, 2025
79d93f9
[https://nvbugs/5488141][fix] Unwaive llama3 test_eagle3 (#7486)
mikeiovine Sep 3, 2025
ae51368
[https://nvbugs/5472947][fix] wait on isend handles before reusing bu…
amukkara Sep 3, 2025
cebbf48
[TRTLLM-7363][test] Add 8-GPU test cases for RTX6000 (#7083)
StanleySun639 Sep 3, 2025
7c73c2f
[https://nvbugs/5485593][fix] improve accuracy/test_disaggregated_ser…
reasonsolo Sep 3, 2025
f156221
[None][doc] add GPT OSS Eagle3 blog (#7140)
IzzyPutterman Sep 3, 2025
64e3bfa
[None][fix] Fix KV cache recompute in draft_target spec decode (#7348)
mikeiovine Sep 3, 2025
5ff3a65
[TRTLLM-7028][feat] Enable guided decoding with speculative decoding …
syuoni Sep 3, 2025
bd9ba97
[None][chore] Remove two unused parameters in create_py_executor (#7458)
leslie-fang25 Sep 3, 2025
51a2b87
[#7222][autodeploy] Separate run_shape_prop as another graph utility …
Fridah-nv Sep 3, 2025
c1aa7f3
[None][fix] Fix a numerical stability issue for XQA with spec dec (#7…
lowsfer Sep 4, 2025
d97c1e6
[https://nvbugs/5470769][fix] fix disagg-serving accuracy test case (…
reasonsolo Sep 4, 2025
db8eb0a
[TRTLLM-7876][test] Test trtllm-serve with --extra_llm_api_options (#…
StanleySun639 Sep 4, 2025
2a2dfe2
[https://nvbugs/5485102][fix] Correctly set stride for piecewise outp…
liji-nv Sep 4, 2025
a117e7a
[TRTLLM-7442][model] Remove unnecessary D2H copies (#7273)
2ez4bz Sep 4, 2025
931816f
[TRTLLM-6199][infra] Update for using open driver from BSL (#7430)
EmmaQiaoCh Sep 4, 2025
c622f61
[None][fix] Fix a typo in the Slurm CI codes (#7485)
chzblych Sep 4, 2025
3755f8a
[TRTLLM-6342][fix] Fixed triggering BMM sharding (#7389)
greg-kwasniewski1 Sep 4, 2025
7090b28
[None][fix] fix hunyuan_moe init bug (#7502)
sorenwu Sep 4, 2025
ced5512
[None][chore] Bump version to 1.1.0rc4 (#7525)
yiqingy0 Sep 4, 2025
cce9556
[https://nvbugs/5485886][fix] Fix resource free of Eagle3ResourceMana…
kris1025 Sep 4, 2025
0de3f83
[TRTLLM-6893][infra] Disable the x86 / SBSA build stage when run Buil…
ZhanruiSunCh Sep 4, 2025
5bcda75
[https://nvbugs/5477730][fix] Fix the alltoall case when tp_size larg…
WeiHaocheng Sep 4, 2025
4e3dded
[TRTLLM-6308][feat] Support Aggregate mode for phi4-mm (#7521)
Wanli-Jiang Sep 4, 2025
d38b8e3
[None][ci] set TORCHINDUCTOR_COMPILE_THREADS for thop/parallel tests …
QiJune Sep 4, 2025
b46e0ae
[None][test] update nim and full test list (#7468)
crazydemo Sep 4, 2025
262b004
Pass mode & directory
tshmilnvidia Jul 15, 2025
e47bdc9
Nixl Loopback Agent
tshmilnvidia Jul 15, 2025
8498b36
copyBlock with NixlLoopbackAgent
tshmilnvidia Sep 2, 2025
2cc0072
GDS_MT backend support for LoopbackAgent
tshmilnvidia Jul 15, 2025
283263a
Add GDS memory mode tests to kvCacheManagerTest
glevnv Aug 7, 2025
e0d7a83
Add LoopbackAgent tests to transferAgentTest
glevnv Aug 11, 2025
ed5fd42
Fix nixl installation on aarch64 CI
tshmilnvidia Aug 27, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
3 changes: 3 additions & 0 deletions .clang-tidy
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
Checks: '*,
-altera-id-dependent-backward-branch,
-altera-struct-pack-align,
-altera-unroll-loops,
-boost-use-ranges,
-cppcoreguidelines-avoid-do-while,
Expand All @@ -9,8 +10,10 @@ Checks: '*,
-fuchsia-default-arguments-calls,
-fuchsia-default-arguments-declarations,
-fuchsia-overloaded-operator,
-fuchsia-virtual-inheritance,
-hicpp-vararg,
-llvm-else-after-return,
-llvmlibc-*,
-misc-include-cleaner,
-misc-non-private-member-variables-in-classes,
-modernize-use-trailing-return-type'
2 changes: 1 addition & 1 deletion .clangd
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ CompileFlags:
# Tweak the clangd parse settings for all files
CompileFlags:
Compiler: clang++
CompilationDatabase: .
CompilationDatabase: cpp/build
Add:
# report all errors
- "-ferror-limit=0"
Expand Down
40 changes: 40 additions & 0 deletions .coderabbit.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
# https://docs.coderabbit.ai/getting-started/configure-coderabbit/
# In PR, comment "@coderabbitai configuration" to get the full config including defaults
language: "en-US"
reviews:
profile: chill
auto_title_placeholder: '@coderabbitai title'
auto_title_instructions: 'Format: "[<category>] <title>". Category must be one of: fix, feat, doc, infra, style, refactor, perf, test, chore, revert. Enclose the category in square brackets. Title should be concise (<= 60 chars). Example: "[feat] Add logit_bias support".'
commit_status: false
collapse_walkthrough: true
assess_linked_issues: true
related_issues: true
related_prs: true
suggested_labels: true
suggested_reviewers: true
poem: false
review_status: false
auto_review:
auto_incremental_review: false
drafts: false
base_branches: ["main", "release/.+"]
knowledge_base:
code_guidelines:
enabled: true
filePatterns: ["**/CODING_GUIDELINES.md"]
10 changes: 10 additions & 0 deletions .devcontainer/devcontainer.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Environment variables used to configure the Dev Container setup.
#
# The syntax needs to be compatible with
# https://docs.docker.com/compose/how-tos/environment-variables/variable-interpolation/#env-file-syntax
#
# Edit this file as necessary. For local changes not to be committed back
# to the repository, create/edit devcontainer.env.user instead.
HF_HOME_DEFAULT="${HOME}/.cache/huggingface"
HF_HOME_XDG_DEFAULT="${XDG_CACHE_HOME:-${HF_HOME_DEFAULT}}"
LOCAL_HF_HOME="${HF_HOME:-${HF_HOME_XDG_DEFAULT}}"
16 changes: 6 additions & 10 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,24 +3,18 @@
{
"name": "TRT-LLM Devcontainer",
"dockerComposeFile": [
"docker-compose.yml"
"docker-compose.yml",
"docker-compose.override.yml"
],
"service": "tensorrt_llm-dev",
"remoteUser": "ubuntu",
"containerEnv": {
// "CCACHE_DIR" : "/home/coder/${localWorkspaceFolderBasename}/cpp/.ccache",
// "CCACHE_BASEDIR" : "/home/coder/${localWorkspaceFolderBasename}",
"HF_TOKEN": "${localEnv:HF_TOKEN}",
"HF_HOME": "/huggingface",
"HISTFILE": "${containerWorkspaceFolder}/.cache/._bash_history"
},
"workspaceFolder": "/workspaces/tensorrt_llm",
// "workspaceFolder": "/home/coder/${localWorkspaceFolderBasename}",
// "workspaceMount": "source=${localWorkspaceFolder},target=/home/coder/${localWorkspaceFolderBasename},type=bind,consistency=consistent",
"mounts": [
"source=${localEnv:HOME}/.cache/huggingface,target=/huggingface,type=bind", // HF cache
"source=/home/scratch.trt_llm_data/,target=/home/scratch.trt_llm_data/,type=bind,consistency=consistent"
],
"initializeCommand": "cd ${localWorkspaceFolder} && ./.devcontainer/make_env.py",
// Note: sourcing .profile is required since we use a local user and the python interpreter is
// global (/usr/bin/python). In this case, pip will default to a local user path which is not
// by default in the PATH. In interactive devcontainer shells, .profile is sourced by default.
Expand All @@ -43,7 +37,9 @@
// "ms-vscode.cmake-tools",
// Git & Github
// "GitHub.vscode-pull-request-github"
"eamodio.gitlens"
"eamodio.gitlens",
// Docs
"ms-vscode.live-server"
],
"settings": {
"C_Cpp.intelliSenseEngine": "disabled",
Expand Down
8 changes: 8 additions & 0 deletions .devcontainer/docker-compose.override-example.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Example .devcontainer/docker-compose.override.yml
version: "3.9"
services:
tensorrt_llm-dev:
volumes:
# Uncomment the following lines to enable
# # Mount TRTLLM data volume:
# - /home/scratch.trt_llm_data/:/home/scratch.trt_llm_data/:ro
5 changes: 3 additions & 2 deletions .devcontainer/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
version: "3.9"
services:
tensorrt_llm-dev:
image: urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:pytorch-25.05-py3-x86_64-ubuntu24.04-trt10.11.0.33-skip-tritondevel-202506051650-4885
image: ${DEV_CONTAINER_IMAGE}
network_mode: host
ipc: host

Expand All @@ -22,7 +22,8 @@ services:
capabilities: [gpu]

volumes:
- ..:/workspaces/tensorrt_llm:cached
- ${SOURCE_DIR}:/workspaces/tensorrt_llm
- ${LOCAL_HF_HOME}:/huggingface # HF cache

environment:
- CCACHE_DIR=/workspaces/tensorrt_llm/cpp/.ccache
Expand Down
221 changes: 221 additions & 0 deletions .devcontainer/make_env.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,221 @@
#!/usr/bin/env python3

import json
import logging
import os
import re
import shlex
import subprocess
import sys
from pathlib import Path
from tempfile import TemporaryDirectory
from typing import Dict, List, Optional

JENKINS_PROPS_PATH = Path("jenkins/current_image_tags.properties")
DEV_CONTAINER_ENV_PATH = Path(".devcontainer/devcontainer.env")
DEV_CONTAINER_USER_ENV_PATH = Path(".devcontainer/devcontainer.env.user")
DOT_ENV_PATH = Path(".devcontainer/.env")
COMPOSE_OVERRIDE_PATH = Path(".devcontainer/docker-compose.override.yml")
COMPOSE_OVERRIDE_EXAMPLE_PATH = Path(
".devcontainer/docker-compose.override-example.yml")

HOME_DIR_VAR = "HOME_DIR"
SOURCE_DIR_VAR = "SOURCE_DIR"
DEV_CONTAINER_IMAGE_VAR = "DEV_CONTAINER_IMAGE"
BUILD_LOCAL_VAR = "BUILD_LOCAL"
JENKINS_IMAGE_VAR = "LLM_DOCKER_IMAGE"
LOCAL_HF_HOME_VAR = "LOCAL_HF_HOME"

LOGGER = logging.getLogger("make_env")


def _load_env(env_files: List[Path]) -> Dict[str, str]:
"""Evaluate files using 'sh' and return resulting environment."""
with TemporaryDirectory("trtllm_make_env") as temp_dir:
json_path = Path(temp_dir) / 'env.json'
subprocess.run(
("(echo set -a && cat " +
" ".join(shlex.quote(str(env_file)) for env_file in env_files) +
" && echo && echo exec /usr/bin/env python3 -c \"'import json; import os; print(json.dumps(dict(os.environ)))'\""
+ f") | sh > {json_path}"),
shell=True,
check=True,
)
with open(json_path, "r") as f:
env = json.load(f)
return env


def _detect_rootless() -> bool:
proc = subprocess.run("./docker/detect_rootless.sh",
capture_output=True,
check=True,
shell=True)
return bool(int(proc.stdout.decode("utf-8").strip()))


def _handle_rootless(env_inout: Dict[str, str]):
is_rootless = _detect_rootless()
if is_rootless:
LOGGER.info("Docker Rootless Mode detected.")
if HOME_DIR_VAR not in env_inout:
raise ValueError(
"Docker Rootless Mode requires setting HOME_DIR in devcontainer.env.user"
)
if SOURCE_DIR_VAR not in env_inout:
raise ValueError(
"Docker Rootless Mode requires setting SOURCE_DIR in devcontainer.env.user"
)

# Handle HF_HOME
if "HF_HOME" in os.environ and "HF_HOME" in env_inout:
raise ValueError(
"Docker Rootless Mode requires either not setting HF_HOME at all or overriding it in devcontainer.env.user"
)
if env_inout[LOCAL_HF_HOME_VAR].startswith(env_inout["HOME"]):
env_inout[LOCAL_HF_HOME_VAR] = env_inout[LOCAL_HF_HOME_VAR].replace(
env_inout["HOME"], env_inout[HOME_DIR_VAR], 1)
else:
env_inout[HOME_DIR_VAR] = env_inout["HOME"]
env_inout[SOURCE_DIR_VAR] = os.getcwd()


def _select_prebuilt_image(env: Dict[str, str]) -> Optional[str]:
# Jenkins image
candidate_images: List[str] = [env[JENKINS_IMAGE_VAR]]

# NGC images
proc = subprocess.run(
r"git tag --sort=creatordate --merged=HEAD | grep -E '^v[0-9]+\.[0-9]+\.[0-9]+' | sed -E 's/^v(.*)$/\1/' | tac",
shell=True,
capture_output=True,
check=True,
)
for git_tag in proc.stdout.splitlines():
git_tag = git_tag.strip()
candidate_images.append(f"nvcr.io/nvidia/tensorrt-llm/devel:{git_tag}")

# Check image availability
for candidate_image in candidate_images:
LOGGER.info(f"Trying image {candidate_image}")

try:
subprocess.run(
f"docker run --rm -it --pull=missing --entrypoint=/bin/true {shlex.quote(candidate_image)}",
check=True,
shell=True)
except subprocess.CalledProcessError:
continue

LOGGER.info(f"Using image {candidate_image}")
return candidate_image

LOGGER.info("No pre-built image found!")
return None


def _build_local_image() -> str:
LOGGER.info("Building container image locally")

with TemporaryDirectory("trtllm_make_env") as temp_dir:
log_path = Path(temp_dir) / "build.log"
subprocess.run(
f"make -C docker devel_build | tee {shlex.quote(str(log_path))}",
check=True,
shell=True,
)
with open(log_path) as f:
build_log = f.read()

# Handle escaped and actual line breaks
build_log_lines = re.sub(r"\\\n", " ", build_log).splitlines()
for build_log_line in build_log_lines:
tokens = shlex.split(build_log_line)
if tokens[:3] != ["docker", "buildx", "build"]:
continue
token = None
while tokens and not (token := tokens.pop(0)).startswith("--tag"):
pass
if token is None:
continue
if token.startswith("--arg="):
token = token.removeprefix("--arg=")
else:
if not tokens:
continue
token = tokens.pop(0)
return token # this is the image URI
raise RuntimeError(
f"Could not parse --tag argument from build log: {build_log}")


def _ensure_compose_override():
if not COMPOSE_OVERRIDE_PATH.exists():
LOGGER.info(
f"Creating initial {COMPOSE_OVERRIDE_PATH} from {COMPOSE_OVERRIDE_EXAMPLE_PATH}"
)
COMPOSE_OVERRIDE_PATH.write_bytes(
COMPOSE_OVERRIDE_EXAMPLE_PATH.read_bytes())


def _update_dot_env(env: Dict[str, str]):
LOGGER.info(f"Updating {DOT_ENV_PATH}")

output_lines = [
"# NOTE: This file is generated by make_env.py, modify devcontainer.env.user instead of this file.\n",
"\n",
]

for env_key, env_value in env.items():
if os.environ.get(env_key) == env_value:
# Only storing differences w.r.t. base env
continue
output_lines.append(f"{env_key}=\"{shlex.quote(env_value)}\"\n")

with open(DOT_ENV_PATH, "w") as f:
f.writelines(output_lines)


def main():
env_files = [
JENKINS_PROPS_PATH,
DEV_CONTAINER_ENV_PATH,
]

if DEV_CONTAINER_USER_ENV_PATH.exists():
env_files.append(DEV_CONTAINER_USER_ENV_PATH)

env = _load_env(env_files)
_handle_rootless(env_inout=env)

# Determine container image to use
image_uri = env.get(DEV_CONTAINER_IMAGE_VAR)
if image_uri:
LOGGER.info(f"Using user-provided container image: {image_uri}")
else:
build_local = bool(int(
env[BUILD_LOCAL_VAR].strip())) if BUILD_LOCAL_VAR in env else None
image_uri = None
if not build_local:
image_uri = _select_prebuilt_image(env)
if image_uri is None:
if build_local is False:
raise RuntimeError(
"No suitable container image found and local build disabled."
)
image_uri = _build_local_image()
LOGGER.info(f"Using locally built container image: {image_uri}")
env[DEV_CONTAINER_IMAGE_VAR] = image_uri

_ensure_compose_override()

_update_dot_env(env)


if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
try:
main()
except Exception as e:
LOGGER.error(f"{e.__class__.__name__}: {e}")
sys.exit(-1)
1 change: 1 addition & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,6 @@ examples/**/.git
examples/**/*.bin
examples/**/*.engine
examples/**/*.onnx
examples/**/*.safetensors
examples/**/c-model
examples/models/core/gpt/gpt*
24 changes: 24 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Auto basic formatting when saving file with EditorConfig https://editorconfig.org/

# top-most EditorConfig file
root = true

[*]
end_of_line = lf
trim_trailing_whitespace = true
insert_final_newline = true

# make
[Makefile*]
indent_style = tab
indent_size = 4

# c++
[*.{cpp,cu,h}]
indent_style = space
indent_size = 4

# python
[*.py]
indent_style = space
indent_size = 4
Loading
Loading