Skip to content

Conversation

@Sunny-bot1
Copy link
Collaborator

@Sunny-bot1 Sunny-bot1 commented Nov 12, 2025

Motivation

Memcpy (DtoH) is time-consuming. The CPU data in get_block_shape_and_split_kv_block's outputs is only used for branching in attention, which is not executed in the CUDA graph. Therefore, we skip capturing memcpy ops to avoid executing them in the CUDA graph.

Modifications

execute copy_ only if phi::backends::gpu::IsCUDAGraphCapturing() is false in get_block_shape_and_split_kv_block decoder branch.

Usage or Command

pass

Accuracy Tests

pass

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link

paddle-bot bot commented Nov 12, 2025

Thanks for your contribution!

@Sunny-bot1 Sunny-bot1 changed the title [Optimization] Skip DtoH capture [Optimization] Skip memcpy(DtoH) capture in get_block_shape_and_split_kv_block Nov 12, 2025
@zhoutianzi666 zhoutianzi666 merged commit 5b24013 into PaddlePaddle:develop Nov 13, 2025
14 of 15 checks passed
Comment on lines +291 to +296
// Note (sunxin): Skip capturing the DtoH copy (it's time-consuming); CPU data
// is only for branching in attention.
if (!phi::backends::gpu::IsCUDAGraphCapturing()) {
max_len_tensor_cpu.copy_(
max_len_tensor_gpu, max_len_tensor_cpu.place(), false);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个工作会生效吗

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个工作会生效吗

before:

image

after:

image

StareAtYou added a commit to StareAtYou/FastDeploy that referenced this pull request Nov 13, 2025
Sunny-bot1 added a commit to Sunny-bot1/FastDeploy that referenced this pull request Nov 14, 2025
yuanlehome pushed a commit that referenced this pull request Nov 14, 2025
* Revert "[BugFix][Metax] Fix metax compile issue in get_block_shape_and_split_kv_block (#5000)"

This reverts commit 05da8e3.

* Revert "skip DtoH capture (#4988)"

This reverts commit 5b24013.
StareAtYou added a commit to StareAtYou/FastDeploy that referenced this pull request Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants