pre copy pinned data to gpu #8386

wanghuancoder · 2024-05-08T03:18:47Z

PR types

Performance optimization

PR changes

Models

Description

Bert训练过程中存在大量H2D拷贝，打断了CPU对Kernel的预加载，主要问题是，Bert的Dataloader提供的7个Tensor都是Pinned的。这7个Tensor散落在训练过程中使用，每次使用均需要一次Blocking的H2D拷贝，造成多次CPU打断。修改方法是：训练前统一调用Tensor.cuda(blocking=False)。
修改前：

修改后：

估计其它模型也存在类似问题，均应考虑本优化。

paddle-bot · 2024-05-08T03:18:53Z

Thanks for your contribution!

Aurelius84

LGTM

…nto refine_bert_dataloader

wawltor

LGTM

pre copy pinned data to gpu

2cec34b

Aurelius84 reviewed May 10, 2024

View reviewed changes

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

4bbcd42

…nto refine_bert_dataloader

wawltor approved these changes May 10, 2024

View reviewed changes

wawltor merged commit 16ef8f4 into PaddlePaddle:develop May 10, 2024
6 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pre copy pinned data to gpu #8386

pre copy pinned data to gpu #8386

wanghuancoder commented May 8, 2024

paddle-bot bot commented May 8, 2024

Aurelius84 left a comment

wawltor left a comment

pre copy pinned data to gpu #8386

pre copy pinned data to gpu #8386

Conversation

wanghuancoder commented May 8, 2024

PR types

PR changes

Description

paddle-bot bot commented May 8, 2024

Aurelius84 left a comment

Choose a reason for hiding this comment

wawltor left a comment

Choose a reason for hiding this comment