Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine clip_by_global_norm #38209

Merged
merged 6 commits into from
Dec 27, 2021

Conversation

zhangbo9674
Copy link
Contributor

PR types

Performance optimization

PR changes

APIs

Describe

优化ClipByGlobalNorm性能:
以10*10的paddle.nn.Linear为例,重复进行100轮优化,clip_by_global_nrom的调用耗时分析如下:
(1)优化前:
图片
(2)优化后:
图片
优化前后耗时比为:0.77/0.54=1.43,优化29.9%。

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

zhiqiu
zhiqiu previously approved these changes Dec 27, 2021
Copy link
Contributor

@zhiqiu zhiqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if g.dtype == core.VarDesc.VarType.FP16 else clip_var)
new_grad = layers.elementwise_mul(x=g, y=clip_input)
params_and_grads.append((p, new_grad))
if global_norm_var > max_global_norm:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议在上面把 global_norm_var > max_global_norm 处理为一个bool flag,这样不用在循环里run多次compare OP

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tks, Done!

@zhangbo9674 zhangbo9674 merged commit 65f7fa0 into PaddlePaddle:develop Dec 27, 2021
@zhangbo9674 zhangbo9674 deleted the dev/clip_by_global_norm branch March 2, 2023 02:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants