Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pre copy pinned data to gpu #8386

Merged
merged 2 commits into from
May 10, 2024

Conversation

wanghuancoder
Copy link
Contributor

PR types

Performance optimization

PR changes

Models

Description

Bert训练过程中存在大量H2D拷贝,打断了CPU对Kernel的预加载,主要问题是,Bert的Dataloader提供的7个Tensor都是Pinned的。这7个Tensor散落在训练过程中使用,每次使用均需要一次Blocking的H2D拷贝,造成多次CPU打断。修改方法是:训练前统一调用Tensor.cuda(blocking=False)。
修改前:
1715138264739
修改后:
1715138294974

估计其它模型也存在类似问题,均应考虑本优化。

Copy link

paddle-bot bot commented May 8, 2024

Thanks for your contribution!

Copy link
Contributor

@Aurelius84 Aurelius84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@wawltor wawltor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wawltor wawltor merged commit 16ef8f4 into PaddlePaddle:develop May 10, 2024
6 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants