[fix] [nvbug/5252057] Fix kv cache reuse on PyTorch multimodal #4025

yechank-nvidia · 2025-05-02T08:01:02Z

Description

This PR fixes the bug from mismatching arguments change from !3781.

This causes the kv_cache_reuse=ON on Qwen-VL series models which lead to bug.

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

yechank-nvidia · 2025-05-02T08:07:29Z

/bot run

tensorrt-cicd · 2025-05-02T08:13:18Z

PR_Github #3964 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-02T09:31:42Z

PR_Github #3964 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #2809 completed with status: 'FAILURE'

amukkara

LGTM

Signed-off-by: yechank <[email protected]>

yechank-nvidia · 2025-05-02T14:39:17Z

/bot run

tensorrt-cicd · 2025-05-02T14:44:57Z

PR_Github #3975 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-02T16:59:54Z

PR_Github #3975 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #2818 completed with status: 'SUCCESS'

symphonylyh · 2025-05-02T17:54:56Z

@yechank-nvidia Thanks for fixing it!
btw, this is concerning, do you know why the ci wasn't able to protect this when the preivous PR misses the change?

yechank-nvidia · 2025-05-07T00:23:16Z

@symphonylyh Not sure why it doesn't trigger failure.

But for Qwen2.5-VL models, even though the kv_cache_reuse was ON, I think it does not always use kv_cache_reuse. I think depending on how LLM ties requests together, sometimes it uses kv_cahce_reuse and sometimes not. That results in non-deterministic behavior on the test.

yechank-nvidia mentioned this pull request May 2, 2025

Truncate image embeddings for Qwen2.5-7B-instruct model #4008

Closed

yechank-nvidia requested review from amukkara, rakib-hasan and symphonylyh May 2, 2025 08:10

amukkara approved these changes May 2, 2025

View reviewed changes

fix: [nvbug/5252057] Fix kv cache reuse on PyTorch multimodal

2fc903f

Signed-off-by: yechank <[email protected]>

yechank-nvidia force-pushed the kv_cache_multimodal branch from 86af897 to 2fc903f Compare May 2, 2025 14:39

symphonylyh approved these changes May 2, 2025

View reviewed changes

symphonylyh merged commit 061a620 into NVIDIA:main May 2, 2025
3 checks passed

yechank-nvidia self-assigned this Aug 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[fix] [nvbug/5252057] Fix kv cache reuse on PyTorch multimodal #4025

[fix] [nvbug/5252057] Fix kv cache reuse on PyTorch multimodal #4025

Uh oh!

yechank-nvidia commented May 2, 2025

Uh oh!

yechank-nvidia commented May 2, 2025

Uh oh!

tensorrt-cicd commented May 2, 2025

Uh oh!

tensorrt-cicd commented May 2, 2025

Uh oh!

amukkara left a comment

Uh oh!

yechank-nvidia commented May 2, 2025

Uh oh!

tensorrt-cicd commented May 2, 2025

Uh oh!

tensorrt-cicd commented May 2, 2025

Uh oh!

Uh oh!

symphonylyh commented May 2, 2025

Uh oh!

yechank-nvidia commented May 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[fix] [nvbug/5252057] Fix kv cache reuse on PyTorch multimodal #4025

[fix] [nvbug/5252057] Fix kv cache reuse on PyTorch multimodal #4025

Uh oh!

Conversation

yechank-nvidia commented May 2, 2025

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

yechank-nvidia commented May 2, 2025

Uh oh!

tensorrt-cicd commented May 2, 2025

Uh oh!

tensorrt-cicd commented May 2, 2025

Uh oh!

amukkara left a comment

Choose a reason for hiding this comment

Uh oh!

yechank-nvidia commented May 2, 2025

Uh oh!

tensorrt-cicd commented May 2, 2025

Uh oh!

tensorrt-cicd commented May 2, 2025

Uh oh!

Uh oh!

symphonylyh commented May 2, 2025

Uh oh!

yechank-nvidia commented May 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants