forked from NVIDIA/TensorRT-LLM
-
Notifications
You must be signed in to change notification settings - Fork 1
Prototyping partial graph capture #138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…VIDIA#7744) Signed-off-by: Kanghwan Jang <[email protected]>
…#7393) Signed-off-by: greg-kwasniewski1 <[email protected]> Signed-off-by: Grzegorz Kwasniewski <[email protected]>
…ing external web data pulls (NVIDIA#7879) Signed-off-by: Chang Liu (Enterprise Products) <[email protected]>
Signed-off-by: Enwei Zhu <[email protected]>
…de (NVIDIA#7624) Signed-off-by: Balaram Buddharaju <[email protected]>
…e for better code reusing (NVIDIA#7840) Signed-off-by: Yan Chunwei <[email protected]>
…spec dec (NVIDIA#7728) Signed-off-by: ziyixiong-nv <[email protected]>
… shape for sm10x group gemm (NVIDIA#7757) Signed-off-by: Xiwen Yu <[email protected]> Signed-off-by: djns99 <[email protected]> Co-authored-by: djns99 <[email protected]>
…ete if already exist (NVIDIA#7727) Signed-off-by: Dongxu Yang <[email protected]>
…#7871) Signed-off-by: Enwei Zhu <[email protected]>
… issue on large object (NVIDIA#7854) Signed-off-by: Dongxu Yang <[email protected]>
) Signed-off-by: peaceh <[email protected]>
… Workflow (NVIDIA#7808) Signed-off-by: Stefan Niebler <[email protected]> Co-authored-by: Daniel Cámpora <[email protected]>
Signed-off-by: Barry Kang <[email protected]>
…#7298) Signed-off-by: Bo Li <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
…tests (NVIDIA#7354) Signed-off-by: Lizhi Zhou <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
…y estimation (NVIDIA#7391) Signed-off-by: Hui Gao <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
…A#6824) Signed-off-by: Yuxian Qiu <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
Signed-off-by: Yan Chunwei <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
Signed-off-by: Yukun He <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
…er and graph attn metadata (NVIDIA#7606) Signed-off-by: Hui Gao <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
…O 6000 (NVIDIA#7603) Signed-off-by: peaceh <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
NVIDIA#7573) Signed-off-by: Simeng Liu <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
…7560) Signed-off-by: Yan Chunwei <[email protected]> Co-authored-by: Ryan McCormick <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
Signed-off-by: nv-guomingz <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
Signed-off-by: nv-guomingz <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
Signed-off-by: nv-guomingz <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
…el with multiple layer types (NVIDIA#7636) Signed-off-by: Balaram Buddharaju <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
Signed-off-by: Yanchao Lu <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
…7696) Signed-off-by: nv-guomingz <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
…ble_block_reuse (NVIDIA#8108) Signed-off-by: Lucas Liebenwein <[email protected]>
…ackend (NVIDIA#8075) Signed-off-by: Aurelien Chartier <[email protected]>
…DIA#8120) Signed-off-by: Suyog Gupta <[email protected]>
…VIDIA#7998) Signed-off-by: ziyixiong-nv <[email protected]>
…#8126) Signed-off-by: Lucas Liebenwein <[email protected]>
Signed-off-by: Mike Iovine <[email protected]> Signed-off-by: Mike Iovine <[email protected]>
NVIDIA#6806) Signed-off-by: Michal Guzek <[email protected]>
Signed-off-by: Lucas Liebenwein <[email protected]>
…rgs (NVIDIA#8137) Signed-off-by: Lucas Liebenwein <[email protected]>
Signed-off-by: Erin Ho <[email protected]> Co-authored-by: Yuan Tong <[email protected]> Co-authored-by: Erin Ho <[email protected]>
Signed-off-by: Frida Hou <[email protected]> Signed-off-by: Fridah-nv <[email protected]>
…pattern matcher utility; remove fuse_collective (NVIDIA#7545) Signed-off-by: Frida Hou <[email protected]> Signed-off-by: Fridah-nv <[email protected]>
…DIA#5543) Signed-off-by: Yan Chunwei <[email protected]> Signed-off-by: chunweiy <[email protected]> Signed-off-by: Superjomn <[email protected]> Signed-off-by: chunweiy <[email protected]>
…_multi_lora, fix its API use with pytorch flow LoRA (NVIDIA#8146) Signed-off-by: Amit Zuker <[email protected]>
Signed-off-by: Patrice Castonguay <[email protected]>
Signed-off-by: Yan Chunwei <[email protected]>
…VIDIA#8121) Signed-off-by: ixlmar <[email protected]>
5bf9cde to
024fd2f
Compare
Signed-off-by: Lucas Liebenwein <[email protected]>
Signed-off-by: Lucas Liebenwein <[email protected]>
Collaborator
Author
|
see NVIDIA#8203 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Work in Progress for partial graph capture as a prerequisite to do cudagraph capture for VLMs
Summary
There is two major changes required:
nn.Module--> make all transforms aware and differentiate between transforms on the full model and on the individual subgraphsOther thoughts:
config.yamlto play around with