Prototyping partial graph capture #138

lucaslie · 2025-09-24T01:03:18Z

Work in Progress for partial graph capture as a prerequisite to do cudagraph capture for VLMs

Summary

There is two major changes required:

Nested graph inside larger nn.Module --> make all transforms aware and differentiate between transforms on the full model and on the individual subgraphs
Switching infra to kwargs-only from positional args for inputs. Using positional args is not a scalable way forward as the subgraphs are most certainly going to be called with positional arguments.

Other thoughts:

Thinking about auto-capturing subgraph based on inputs to the sub-forward function --> hard-coded right now
Better args/kwargs handling and adding extra flexibility there...

`config.yaml` to play around with

# model: meta-llama/Meta-Llama-3.1-8B-Instruct
# model: mistralai/Magistral-Small-2507
# model: Qwen/Qwen2.5-VL-7B-Instruct
# model: meta-llama/Llama-4-Scout-17B-16E-Instruct
model: mistralai/Mistral-Small-3.1-24B-Instruct-2503
args:
  # mode: graph
  world_size: 0
  runtime: demollm
  compile_backend: torch-opt
  attn_page_size: 64
  max_input_len: 4096
  max_seq_len: 8192
  attn_backend: flashinfer
  # model_factory: AutoModelForImageTextToText
  # model_factory: AutoModelForCausalLM
  model_factory: Mistral3VLM
  skip_loading_weights: true
  model_kwargs:
    text_config:
        num_hidden_layers: 2
    # tp_plan: auto
    # tp_plan: null
    # device_map: cuda
    # num_hidden_layers: 3
    # _attn_implementation: eager
benchmark:
  enabled: false
dry_run: false
# prompt:
#   batch_size: 4
#   queries:
#     - "How big is the universe? "
#     - {"prompt": "In simple words and a single sentence, explain the concept of gravity: "}
#     # see for chat template format: https://huggingface.co/docs/transformers/en/chat_templating_multimodal
#     - - role: user
#         content:
#           - type: text
#             text: How to fix slicing in golf?
#     - - role: user
#         content:
#           - type: text
#             text: Please describe the natural scenery you see in the following images
#           - type: image
#             url: https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/seashore.png
#           - type: image
#             url: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png

…VIDIA#7744) Signed-off-by: Kanghwan Jang <[email protected]>

…#7393) Signed-off-by: greg-kwasniewski1 <[email protected]> Signed-off-by: Grzegorz Kwasniewski <[email protected]>

…ing external web data pulls (NVIDIA#7879) Signed-off-by: Chang Liu (Enterprise Products) <[email protected]>

Signed-off-by: Enwei Zhu <[email protected]>

…de (NVIDIA#7624) Signed-off-by: Balaram Buddharaju <[email protected]>

…e for better code reusing (NVIDIA#7840) Signed-off-by: Yan Chunwei <[email protected]>

…spec dec (NVIDIA#7728) Signed-off-by: ziyixiong-nv <[email protected]>

… shape for sm10x group gemm (NVIDIA#7757) Signed-off-by: Xiwen Yu <[email protected]> Signed-off-by: djns99 <[email protected]> Co-authored-by: djns99 <[email protected]>

…ete if already exist (NVIDIA#7727) Signed-off-by: Dongxu Yang <[email protected]>

…#7871) Signed-off-by: Enwei Zhu <[email protected]>

… issue on large object (NVIDIA#7854) Signed-off-by: Dongxu Yang <[email protected]>

) Signed-off-by: peaceh <[email protected]>

… Workflow (NVIDIA#7808) Signed-off-by: Stefan Niebler <[email protected]> Co-authored-by: Daniel Cámpora <[email protected]>

Signed-off-by: Barry Kang <[email protected]>

…#7298) Signed-off-by: Bo Li <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

…tests (NVIDIA#7354) Signed-off-by: Lizhi Zhou <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

…y estimation (NVIDIA#7391) Signed-off-by: Hui Gao <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

…A#6824) Signed-off-by: Yuxian Qiu <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

Signed-off-by: Yan Chunwei <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

Signed-off-by: Yukun He <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

…er and graph attn metadata (NVIDIA#7606) Signed-off-by: Hui Gao <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

…O 6000 (NVIDIA#7603) Signed-off-by: peaceh <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

NVIDIA#7573) Signed-off-by: Simeng Liu <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

…7560) Signed-off-by: Yan Chunwei <[email protected]> Co-authored-by: Ryan McCormick <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

Signed-off-by: nv-guomingz <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

…el with multiple layer types (NVIDIA#7636) Signed-off-by: Balaram Buddharaju <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

Signed-off-by: Yanchao Lu <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

…7696) Signed-off-by: nv-guomingz <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

…ble_block_reuse (NVIDIA#8108) Signed-off-by: Lucas Liebenwein <[email protected]>

…ackend (NVIDIA#8075) Signed-off-by: Aurelien Chartier <[email protected]>

…DIA#8120) Signed-off-by: Suyog Gupta <[email protected]>

…VIDIA#7998) Signed-off-by: ziyixiong-nv <[email protected]>

…#8126) Signed-off-by: Lucas Liebenwein <[email protected]>

Signed-off-by: Mike Iovine <[email protected]> Signed-off-by: Mike Iovine <[email protected]>

NVIDIA#6806) Signed-off-by: Michal Guzek <[email protected]>

Signed-off-by: Lucas Liebenwein <[email protected]>

…rgs (NVIDIA#8137) Signed-off-by: Lucas Liebenwein <[email protected]>

Signed-off-by: Erin Ho <[email protected]> Co-authored-by: Yuan Tong <[email protected]> Co-authored-by: Erin Ho <[email protected]>

Signed-off-by: Frida Hou <[email protected]> Signed-off-by: Fridah-nv <[email protected]>

…pattern matcher utility; remove fuse_collective (NVIDIA#7545) Signed-off-by: Frida Hou <[email protected]> Signed-off-by: Fridah-nv <[email protected]>

…DIA#5543) Signed-off-by: Yan Chunwei <[email protected]> Signed-off-by: chunweiy <[email protected]> Signed-off-by: Superjomn <[email protected]> Signed-off-by: chunweiy <[email protected]>

…_multi_lora, fix its API use with pytorch flow LoRA (NVIDIA#8146) Signed-off-by: Amit Zuker <[email protected]>

Signed-off-by: Patrice Castonguay <[email protected]>

Signed-off-by: Yan Chunwei <[email protected]>

…VIDIA#8121) Signed-off-by: ixlmar <[email protected]>

Signed-off-by: Lucas Liebenwein <[email protected]>

lucaslie · 2025-10-09T16:32:06Z

see NVIDIA#8203

karljang and others added 30 commits September 19, 2025 08:42

[NVIDIA#7704][chore] Enable MathJax to fix formulas in documentation (N…

8fcd115

…VIDIA#7744) Signed-off-by: Kanghwan Jang <[email protected]>

[TRTLLM-6342][feat] Support for partial sharding from factory (NVIDIA…

8adaf0b

…#7393) Signed-off-by: greg-kwasniewski1 <[email protected]> Signed-off-by: Grzegorz Kwasniewski <[email protected]>

[https://nvbugs/5520490][fix] Fix intermittent test failures by avoid…

2e317a7

…ing external web data pulls (NVIDIA#7879) Signed-off-by: Chang Liu (Enterprise Products) <[email protected]>

[None][doc] Update tech blog12 (NVIDIA#7884)

e943a39

Signed-off-by: Enwei Zhu <[email protected]>

[TRTLLM-7731][feat] KV cache transmission in disagg with CP on gen si…

e10a027

…de (NVIDIA#7624) Signed-off-by: Balaram Buddharaju <[email protected]>

[TRTLLM-8188][chore] refactor GenerationExecutorWorker with WorkerBas…

4509d97

…e for better code reusing (NVIDIA#7840) Signed-off-by: Yan Chunwei <[email protected]>

[https://nvbugs/5517404][fix] Use the correct cuda graph for dynamic …

897c4dd

…spec dec (NVIDIA#7728) Signed-off-by: ziyixiong-nv <[email protected]>

[TRTLLM-6286] [perf] Add NoSmem epilogue schedule and dynamic cluster…

822cb01

… shape for sm10x group gemm (NVIDIA#7757) Signed-off-by: Xiwen Yu <[email protected]> Signed-off-by: djns99 <[email protected]> Co-authored-by: djns99 <[email protected]>

[TRTLLM-7008][fix] cherrypick to main Add automatic shared memory del…

9eb8084

…ete if already exist (NVIDIA#7727) Signed-off-by: Dongxu Yang <[email protected]>

[None][fix] Disable torch.compile for CapturableGuidedDecoder (NVIDIA…

639d410

…#7871) Signed-off-by: Enwei Zhu <[email protected]>

[None][fix] cherrypick to main: Fix possible mpi broadcast and gather…

b057fc9

… issue on large object (NVIDIA#7854) Signed-off-by: Dongxu Yang <[email protected]>

[https://nvbugs/5512556][unwaive] Unwaive DeepSeek PP tests (NVIDIA#7828

9dc7316

) Signed-off-by: peaceh <[email protected]>

[https://nvbugs/5513423][fix] Correctly respect min_tokens in PyTorch…

8aead22

… Workflow (NVIDIA#7808) Signed-off-by: Stefan Niebler <[email protected]> Co-authored-by: Daniel Cámpora <[email protected]>

[None][fix] Fix DeepGEMM commit (NVIDIA#7875)

8484aa9

Signed-off-by: Barry Kang <[email protected]>

[https://nvbugs/5467548][fix] DeepSeek illegal memory access. (NVIDIA…

a15f08d

…#7298) Signed-off-by: Bo Li <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

[https://nvbugs/5448767][fix] disable kv cache reuse for disagg pp>1 …

293d9fb

…tests (NVIDIA#7354) Signed-off-by: Lizhi Zhou <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

[https://nvbugs/5474169][fix]Adjust max seq len for kvcache for memor…

123f5cb

…y estimation (NVIDIA#7391) Signed-off-by: Hui Gao <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

[https://nvbugs/5448754][fix] Download HF model for all nodes. (NVIDI…

2d46dda

…A#6824) Signed-off-by: Yuxian Qiu <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

[https://nvbugs/5351244][fix] test_mpi_session (NVIDIA#7501)

afca2fc

Signed-off-by: Yan Chunwei <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

[https://nvbugs/5496960][fix] Fix Gemma model forward. (NVIDIA#7509)

3cc16c2

Signed-off-by: Yukun He <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

[https://nvbugs/5474169][fix] seq_len mismatch between kv cache manag…

af34c97

…er and graph attn metadata (NVIDIA#7606) Signed-off-by: Hui Gao <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

[https://nvbugs/5503423][waive] Waive Llama3.1-70B-FP8 test on RTX PR…

541b7fd

…O 6000 (NVIDIA#7603) Signed-off-by: peaceh <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

[https://nvbugs/5470782][chore] Remove the skip statement in 1.0 rele… (

9999584

NVIDIA#7573) Signed-off-by: Simeng Liu <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

[https://nvbugs/5416501][doc] add known issues to llmapi doc (NVIDIA#…

2ffc339

…7560) Signed-off-by: Yan Chunwei <[email protected]> Co-authored-by: Ryan McCormick <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

[None][doc] add blackwell information into support matrix (NVIDIA#6740)

8fed8ee

Signed-off-by: nv-guomingz <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

[None][doc] Fix a invalid link and a typo. (NVIDIA#7634)

5c54173

Signed-off-by: nv-guomingz <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

[None][doc] Use hash id for external link (NVIDIA#7641)

ab915fb

Signed-off-by: nv-guomingz <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

[https://nvbugs/5501557][fix] Fix out-of-bounds vector access for mod…

8879ec4

…el with multiple layer types (NVIDIA#7636) Signed-off-by: Balaram Buddharaju <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

[None][ci] Test waives for the release/1.0 branch 09/15 (NVIDIA#7700)

5c8b022

Signed-off-by: Yanchao Lu <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

[None][doc] Add labels description note into llm api section (NVIDIA#…

bc7b503

…7696) Signed-off-by: nv-guomingz <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

lucaslie and others added 7 commits October 3, 2025 04:57

[None][feat] AutoDeploy: dive deeper into token generation bugs + ena…

5faa5e9

…ble_block_reuse (NVIDIA#8108) Signed-off-by: Lucas Liebenwein <[email protected]>

[None][fix] Fix Qwen3 FP8 per-tensor when requesting TRTLLM-GEN MoE b…

9db4366

…ackend (NVIDIA#8075) Signed-off-by: Aurelien Chartier <[email protected]>

[None][feat] AutoDeploy add autotuning when capturing cudagraphs (NVI…

d821524

…DIA#8120) Signed-off-by: Suyog Gupta <[email protected]>

[https://nvbugs/5537878][fix] Reserve an extra slot for padded batch (N…

7bc2d9e

…VIDIA#7998) Signed-off-by: ziyixiong-nv <[email protected]>

[None][feat] AutoDeploy: compiler backends based on nn.Module (NVIDIA…

aaf2c3c

…#8126) Signed-off-by: Lucas Liebenwein <[email protected]>

[None][fix] Fix MTP 2-model (NVIDIA#8115)

ca82911

Signed-off-by: Mike Iovine <[email protected]> Signed-off-by: Mike Iovine <[email protected]>

[TRTLLM-6496][feat] Add LoRa Torch tests for the latest NIM model list (

38da871

NVIDIA#6806) Signed-off-by: Michal Guzek <[email protected]>

lucaslie force-pushed the ll/subgraphs branch from 0bfafc8 to b42bd81 Compare October 3, 2025 22:05

lucaslie and others added 10 commits October 3, 2025 15:39

[None][feat] AutoDeploy: Nemotron-H accuracy test (NVIDIA#8133)

2c454e8

Signed-off-by: Lucas Liebenwein <[email protected]>

[None][feat] AutoDeploy: graph/module inputs with kwargs instead of a…

9d098e3

…rgs (NVIDIA#8137) Signed-off-by: Lucas Liebenwein <[email protected]>

[TRTLLM-7349][feat] Adding new orchestrator type -- ray (NVIDIA#7520)

88ea2c4

Signed-off-by: Erin Ho <[email protected]> Co-authored-by: Yuan Tong <[email protected]> Co-authored-by: Erin Ho <[email protected]>

[None][autodeploy] small refactors on attention matching (NVIDIA#8079)

744246d

Signed-off-by: Frida Hou <[email protected]> Signed-off-by: Fridah-nv <[email protected]>

[NVIDIA#5255][autodeploy] Update FuseAllreduceResidualRMSNorm to use …

f6654f2

…pattern matcher utility; remove fuse_collective (NVIDIA#7545) Signed-off-by: Frida Hou <[email protected]> Signed-off-by: Fridah-nv <[email protected]>

[TRTLLM-8189][chore] enhance GenerationExecutor with RPC (part1) (NVI…

fb51de6

…DIA#5543) Signed-off-by: Yan Chunwei <[email protected]> Signed-off-by: chunweiy <[email protected]> Signed-off-by: Superjomn <[email protected]> Signed-off-by: chunweiy <[email protected]>

[https://nvbugs/5521949][fix] Re-enable test_bielik_11b_v2_2_instruct…

8060aad

…_multi_lora, fix its API use with pytorch flow LoRA (NVIDIA#8146) Signed-off-by: Amit Zuker <[email protected]>

[None][fix] Adding docker folder to Dockerfile (NVIDIA#8138)

fba351a

Signed-off-by: Patrice Castonguay <[email protected]>

[None][chore] fix llmargs conflict (NVIDIA#8152)

54ab976

Signed-off-by: Yan Chunwei <[email protected]>

[TRTLLM-8413][chore] resolve sampling defaults in OpenAI API backend (N…

98b3af4

…VIDIA#8121) Signed-off-by: ixlmar <[email protected]>

lucaslie force-pushed the ll/subgraphs branch 6 times, most recently from 5bf9cde to 024fd2f Compare October 8, 2025 20:49

subgraph pipeline

43965a2

Signed-off-by: Lucas Liebenwein <[email protected]>

lucaslie force-pushed the ll/subgraphs branch from 024fd2f to 14733a0 Compare October 8, 2025 20:54

factory<>export interface for subgraphs

c3465d0

Signed-off-by: Lucas Liebenwein <[email protected]>

lucaslie force-pushed the ll/subgraphs branch from 14733a0 to c3465d0 Compare October 8, 2025 20:56

lucaslie closed this Oct 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prototyping partial graph capture #138

Prototyping partial graph capture #138

Uh oh!

lucaslie commented Sep 24, 2025 •

edited

Loading

Uh oh!

lucaslie commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

99 participants

Prototyping partial graph capture #138

Prototyping partial graph capture #138

Uh oh!

Conversation

lucaslie commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

config.yaml to play around with

Uh oh!

lucaslie commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

99 participants

lucaslie commented Sep 24, 2025 •

edited

Loading

`config.yaml` to play around with