Fix set value grad #59034

zoooo0820 · 2023-11-15T10:40:29Z

PR types

Bug fixes

PR changes

OPs

Description

Pcard-66985

在此前，为了适配分布式动半模式，set_value算子及其反向算子迁移到phi下。在此之前，set_value算子可同时处理value 为scalar或tensor两种场景，由input ValueTensor是否存在来决定。相应的，其反向set_value_grad也是如此。

由于phi的要求，需要显式区分set_value （对应value 为scalar) 和 set_value_with_tensor （对应value 为tensor)两个算子。因此，反向也需要对应区分。在phi之前的算子历史定义fluid/operator中，将前者的反向行为错误地设置为assign，这使得在迁移phi时 #58893 的行为参考有误。导致目前value 为scalar时，赋值的反向结果与预期不符，需要修复。
本PR 中新增kernel set_value_with_scalar_grad，用于该场景的计算，替代此前错误的assign行为，其底层仍然复用SetValueGradImpl。

wanghuancoder

这个修改还需要麻烦永康Review一下。这么改了以后paddle/fluid/ir_adaptor/translator/op_translator.cc里的SetValueGradOpTranscriber需要做调整吗？

wanghuancoder · 2023-11-16T03:06:18Z

paddle/phi/kernels/impl/set_value_grad_kernel_impl.h

+  switch (rank) {
+    case 1:
+      SetValueGradImpl<T, Context, 1>(dev_ctx,


这里是不是直接调SetValueGradKernel就可以了，value_grad传入nullptr。

wanghuancoder · 2023-11-16T03:07:59Z

paddle/fluid/operators/set_value_op.cc

+    op->SetType("set_value_grad");
+    op->SetInput(framework::GradVarName("Out"), this->OutputGrad("Out"));


这里应该讨论，如果有ValueTensor调用set_value_grad，没有ValueTensor调用set_value_with_scalar_grad

kangguangli · 2023-11-20T03:30:50Z

paddle/fluid/operators/set_value_op.cc

-      op->SetType("assign");
-      op->SetInput("X", this->OutputGrad("Out"));
-      op->SetOutput("Out", this->InputGrad("Input"));
+      op->SetType("set_value_with_scalar_grad");


这行看起来依然没有被覆盖到，是不是需要注册一个新Op set_value_with_scalar_grad？

paddle-ci-bot · 2023-11-26T03:16:27Z

Sorry to inform you that 6cc1f71's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

kangguangli

LGTM

jeff41404

LGTM

* first fix the UT * fix set value grad * polish code * add static mode backward test * always has input valuetensor * add dygraph test

* Fix set value grad (#59034) * first fix the UT * fix set value grad * polish code * add static mode backward test * always has input valuetensor * add dygraph test * Fix shape error in combined-indexing setitem (#60447) * add ut * fix shape error in combine-indexing * fix ut * Set value with scalar (#60452) * set_value with scalar * fix ut * remove test_pir * remove one test since 2.6 not support uint8-add

* fix windows bug for common lib (#60308) * fix windows bug * fix windows bug * fix windows bug * fix windows bug * fix windows bug * fix windows bug * Update inference_lib.cmake * [Dy2St] Disable `test_bert` on CPU (#60173) (#60324) Co-authored-by: gouzil <[email protected]> * [Cherry-pick] fix weight quant kernel bug when n div 64 != 0 (#60184) * fix weight-only quant kernel error for n div 64 !=0 * code style fix * tile (#60261) * add chunk allocator posix_memalign return value check (#60208) (#60495) * fix chunk allocator posix_memalign return value check;test=develop * fix chunk allocator posix_memalign return value check;test=develop * fix chunk allocator posix_memalign return value check;test=develop * update 2023 security advisory, test=document_fix (#60532) * fix fleetutil get_online_pass_interval bug2; test=develop (#60545) * fix fused_rope diff (#60217) (#60593) * [cherry-pick]fix fleetutil get_online_pass_interval bug3 (#60620) * fix fleetutil get_online_pass_interval bug3; test=develop * fix fleetutil get_online_pass_interval bug3; test=develop * fix fleetutil get_online_pass_interval bug3; test=develop * [cherry-pick]update pdsa-2023-019 (#60649) * update 2023 security advisory, test=document_fix * update pdsa-2023-019, test=document_fix * [Dy2St][2.6] Disable `test_grad` on release/2.6 (#60662) * fix bug of ci (#59926) (#60785) * [Dy2St][2.6] Disable `test_transformer` on `release/2.6` and update README (#60786) * [Dy2St][2.6] Disable `test_transformer` on release/2.6 and update README * [Docs] Update latest release version in README (#60691) * restore order * [Dy2St][2.6] Increase `test_transformer` and `test_mobile_net` ut time (#60829) (#60875) * [Cherry-pick] fix set_value with scalar grad (#60930) * Fix set value grad (#59034) * first fix the UT * fix set value grad * polish code * add static mode backward test * always has input valuetensor * add dygraph test * Fix shape error in combined-indexing setitem (#60447) * add ut * fix shape error in combine-indexing * fix ut * Set value with scalar (#60452) * set_value with scalar * fix ut * remove test_pir * remove one test since 2.6 not support uint8-add * [cherry-pick] This PR enable offset of generator for custom device. (#60616) (#60772) * fix core dump when fallback gather_nd_grad and MemoryAllocateHost (#61067) * fix qat tests (#61211) (#61284) * [Security] fix draw security problem (#61161) (#61338) * fix draw security problem * fix _decompress security problem (#61294) (#61337) * Fix CVE-2024-0521 (#61032) (#61287) This uses shlex for safe command parsing to fix arbitrary code injection Co-authored-by: ndren <[email protected]> * [Security] fix security problem for prune_by_memory_estimation (#61382) * OS Command Injection prune_by_memory_estimation fix * Fix StyleCode * [Security] fix security problem for run_cmd (#61285) (#61398) * fix security problem for run_cmd * [Security] fix download security problem (#61162) (#61388) * fix download security problem * check eval for security (#61389) * [cherry-pick] adapt c_embedding to phi namespace for custom devices (#60774) (#61045) Co-authored-by: Tian <[email protected]> * [CherryPick] Fix issue 60092 (#61427) * fix issue 60092 * update * update * update * Fix unique (#60840) (#61044) * fix unique kernel, row to num_out * cinn(py-dsl): skip eval string in python-dsl (#61380) (#61586) * remove _wget (#61356) (#61569) * remove _wget * remove _wget * remove wget test * fix layer_norm decompose dtyte bugs, polish codes (#61631) * fix doc style (#61688) * merge (#61866) * [security] refine _get_program_cache_key (#61827) (#61896) * security, refine _get_program_cache_key * repeat_interleave support bf16 dtype (#61854) (#61899) * repeat_interleave support bf16 dtype * support bf16 on cpu * Support Fake GroupWise Quant (#61900) * fix launch when elastic run (#61847) (#61878) * [Paddle-TRT] fix solve (#61806) * [Cherry-Pick] Fix CacheKV Quant Bug (#61966) * fix cachekv quant problem * add unittest * Sychronized the paddle2.4 adaptation changes * clear third_part dependencies * change submodules to right commits * build pass with cpu only * build success with maca * build success with cutlass and fused kernels * build with flash_attn and mccl * build with test, fix some bugs * fix some bugs * fixed some compilation bugs * fix bug in previous commit * fix bug with split when col_size biger than 256 * add row_limit to show full kernel name * add env.sh Change-Id: I6fded2761a44af952a4599691e19a1976bd9b9d1 * add shape record Change-Id: I273f5a5e97e2a31c1c8987ee1c3ce44a6acd6738 * modify paddle version Change-Id: I97384323c38066e22562a6fe8f44b245cbd68f98 * wuzhao optimized the performance of elementwise kernel. Change-Id: I607bc990415ab5ff7fb3337f628b3ac765d3186c * fix split when dtype is fp16 Change-Id: Ia55d31d11e6fa214d555326a553eaee3e928e597 * fix bug in previous commit Change-Id: I0fa66120160374da5a774ef2c04f133a54517069 * adapt flash_attn new capi Change-Id: Ic669be18daee9cecbc8542a14e02cdc4b8d429ba * change eigen path Change-Id: I514c0028e16d19a3084656cc9aa0838a115fc75c * modify mcname -> replaced_name Change-Id: Idc520d2db200ed5aa32da9573b19483d81a0fe9e * fix some build bugs Change-Id: I50067dfa3fcaa019b5736f4426df6d4e5f64107d * add PADDLE_ENABLE_SAME_RAND_A100 Change-Id: I2d4ab6ed0b5fac3568562860b0ba1c4f8e346c61 done * remove redundant warning, add patch from 2.6.1 Change-Id: I958d5bebdc68eb42fe433c76a3737330e00a72aa * improve VectorizedBroadcastKernel (cherry picked from commit 19069b26c0bf05a80cc834162db072f6b8aa2536) Change-Id: Iaf5719d72ab52adbedc40d4788c52eb1ce4d517c Signed-off-by: m00891 <[email protected]> * fix bugs (cherry picked from commit b007853a75dbd5de63028f4af82c15a5d3d81f7c) Change-Id: Iaec0418c384ad2c81c354ef09d81f3e9dfcf82f1 Signed-off-by: m00891 <[email protected]> * split ElementwiseDivGrad (cherry picked from commit eb6470406b7d440c135a3f7ff68fbed9494e9c1f) Change-Id: I60e8912be8f8d40ca83a54af1493adfa2962b2d6 Signed-off-by: m00891 <[email protected]> * in VectorizedElementwiseKernel, it can now use vecSize = 8 (cherry picked from commit a873000a6c3bc9e2540e178d460e74e15a3d4de5) Change-Id: Ia703b1e9e959558988fcd09182387da839d33922 Signed-off-by: m00891 <[email protected]> * improve ModulatedDeformableCol2imCoordGpuKernel:1.block size 512->64;2.FastDivMod;3.fix VL1;4.remove DmcnGetCoordinateWeight divergent branches. (cherry picked from commit 82c914bdd29f0eef87a52b229ff84bc456a1beeb) Change-Id: I60b1fa9a9c89ade25e6b057c38e08616a24fa5e3 Signed-off-by: m00891 <[email protected]> * Optimize depthwise_conv2d_grad compute (InputGrad): 1.use shared memory to optimize data load from global memory; 2.different blocksize for different input shape 3.FastDivMod for input shape div, >> and & for stride div. (cherry picked from commit b34a5634d848f3799f5a8bcf884731dba72d3b20) Change-Id: I0d8f22f2a2b9d99dc9fbfc1fb69b7bed66010229 Signed-off-by: m00891 <[email protected]> * improve VectorizedBroadcastKernel with LoadType = 2(kMixed) (cherry picked from commit 728b9547f65e096b45f39f096783d2bb49e8556f) Change-Id: I282dd8284a7cde54061780a22b397133303f51e5 Signed-off-by: m00891 <[email protected]> * fix ElementwiseDivGrad (cherry picked from commit 5f99c31904e94fd073bdd1696c3431cccaa376cb) Change-Id: I3ae0d6c01eec124d12fa226a002b10d0c40f820c Signed-off-by: m00891 <[email protected]> * Revert "Optimize depthwise_conv2d_grad compute (InputGrad):" This reverts commit b34a5634d848f3799f5a8bcf884731dba72d3b20. (cherry picked from commit 398f5cde81e2131ff7014edfe1d7beaaf806adbb) Change-Id: I637685b91860a7dea6df6cbba0ff2cf31363e766 Signed-off-by: m00891 <[email protected]> * improve ElementwiseDivGrad and ElementwiseMulGrad (cherry picked from commit fe32db418d8f075e083f31dca7010398636a6e67) Change-Id: I4f7e0f2b5afd4e704ffcd7258def63afc43eea9c Signed-off-by: m00891 <[email protected]> * improve FilterBBoxes (cherry picked from commit fe4655e86b92f5053fa886af49bf199307960a05) Change-Id: I35003420292359f8a41b19b7ca2cbaae17dc5b45 Signed-off-by: m00891 <[email protected]> * improve deformable_conv_grad op:1.adaptive block size;2.FastDivMod;3.move ldg up. (cherry picked from commit a7cb0ed275a3488f79445ef31456ab6560e9de43) Change-Id: Ia89df4e5a26de64baae4152837d2ce3076c56df1 Signed-off-by: m00891 <[email protected]> * improve ModulatedDeformableIm2colGpuKernel:1.adaptive block size;2.FastDivMod;3.move ldg up. (cherry picked from commit 4fb857655d09f55783d9445b91a2d953ed14d0b8) Change-Id: I7df7f3af7b4615e5e96d33b439e5276be6ddb732 Signed-off-by: m00891 <[email protected]> * improve KeBNBackwardData:replace 1.0/sqrt with rsqrt (cherry picked from commit 333cba7aca1edf7a0e87623a0e55e230cd1e9451) Change-Id: Ic808d42003677ed543621eb22a797f0ab7751baa Signed-off-by: m00891 <[email protected]> * Improve KeBNBackwardData, FilterGradAddupGpuKernel kernels. Improve nonzero and masked_select (forward only) OP. (cherry picked from commit c907b40eb3f9ded6ee751e522c2a97a353ac93bd) Change-Id: I7f4845405e64e7599134a8c497f464ac04dead88 Signed-off-by: m00891 <[email protected]> * Optimize depthwise_conv2d: 1. 256 Blocksize launch for small shape inputgrad; 2. FastDivMod in inputgrad and filtergrad; 3. shared memory to put output_grad_data in small shape. (cherry picked from commit f9f29bf7b8d929fb95eb1153a79d8a6b96d5b6d2) Change-Id: I1a3818201784031dbedc320286ea5f4802dbb6b1 Signed-off-by: m00891 <[email protected]> * Improve CheckFiniteAndUnscaleKernel by splitting the kernel into multiple tensors. (cherry picked from commit 3bd200f262271a333b3947326442b86af7fb6da1) Change-Id: I57c94cc5e709be8926e1b21da14b653cb18eabc3 Signed-off-by: m00891 <[email protected]> * Revert "Improve CheckFiniteAndUnscaleKernel by splitting the kernel into multiple tensors." This reverts commit 3bd200f262271a333b3947326442b86af7fb6da1. (cherry picked from commit 86ed8adaa8c20d3c824eecb0ee1e10d365bcea37) Change-Id: I5b8b7819fdf99255c65fe832d5d77f8e439bdecb Signed-off-by: m00891 <[email protected]> * improve ScatterInitCUDAKernel and ScatterCUDAKernel (cherry picked from commit cddb01a83411c45f68363248291c0c4685e60b24) Change-Id: Ie106ff8d65c21a8545c40636f021b73f3ad84587 Signed-off-by: m00891 <[email protected]> * fix bugs and make the code easier to read (cherry picked from commit 07ea3acf347fda434959c8c9cc3533c0686d1836) Change-Id: Id7a727fd18fac4a662f8af1bf6c6b5ebc6233c9f Signed-off-by: m00891 <[email protected]> * Optimize FilterGard and InputGradSpL Use tmp to store ldg data in the loop so calculate and ldg time can fold each other. (cherry picked from commit 7ddab49d868cdb6deb7c3e17c5ef9bbdbab86c3e) Change-Id: I46399594d1d7f76b78b9860e483716fdae8fc7d6 Signed-off-by: m00891 <[email protected]> * Improve CheckFiniteAndUnscaleKernel by putting address access to shared memory and making single thread do more tasks. (cherry picked from commit 631ffdda2847cda9562e591dc87b3f529a51a978) Change-Id: Ie9ffdd872ab06ff34d4daf3134d6744f5221e41e Signed-off-by: m00891 <[email protected]> * Optimize SwinTransformer 1.LayerNormBackward: remove if statement, now will always loop VPT times for ldg128 in compiler, bool flag to control if write action will be taken or not; 2.ContiguousCaseOneFunc: tmp saving division result for less division (cherry picked from commit 422d676507308d26f6107bed924424166aa350d3) Change-Id: I37aab7e2f97ae6b61c0f50ae4134f5eb1743d429 Signed-off-by: m00891 <[email protected]> * Optimize LayerNormBackwardComputeGradInputWithSmallFeatureSize Set BlockDim.z to make blockSize always be 512, each block can handle several batches. Then all threads will loop 4 times for better performance. (cherry picked from commit 7550c90ca29758952fde13eeea74857ece41908b) Change-Id: If24de87a0af19ee07e29ac2e7e237800f0181148 Signed-off-by: m00891 <[email protected]> * improve KeMatrixTopK:1.fix private memory;2.modify max grid size;3.change it to 64 warp reduce. (cherry picked from commit a346af182b139dfc7737e5f6473dc394b21635d7) Change-Id: I6c8d8105fd77947c662e6d22a0d15d7bad076bde Signed-off-by: m00891 <[email protected]> * Modify LayerNorm Optimization Might have lossdiff with old optimization without atomicAdd. (cherry picked from commit 80b0bcaa9a307c94dbeda658236fd75e104ccccc) Change-Id: I4a7c4ec2a0e885c2d581dcebc74464830dae7637 Signed-off-by: m00891 <[email protected]> * improve roi_align op:1.adaptive block size;2.FastDivMod. (cherry picked from commit cc421d7861c359740de0d2870abcfde4354d8c71) Change-Id: I55c049e951f93782af1c374331f44b521ed75dfe Signed-off-by: m00891 <[email protected]> * add workaround for parameters dislocation when calling BatchedGEMM<float16>. Change-Id: I5788c73a9c45f65e60ed5a88d16a473bbb888927 * fix McFlashAttn string Change-Id: I8b34f02958ddccb3467f639daaac8044022f3d34 * [C500-27046] fix wb issue Change-Id: I77730da567903f43ef7a9992925b90ed4ba179c7 * Support compiling external ops Change-Id: I1b7eb58e7959daff8660ce7889ba390cdfae0c1a * support flash attn varlen api and support arm build Change-Id: I94d422c969bdb83ad74262e03efe38ca85ffa673 * Add a copyright notice Change-Id: I8ece364d926596a40f42d973190525d9b8224d99 * Modify some third-party dependency addresses to public network addresses --------- Signed-off-by: m00891 <[email protected]> Co-authored-by: risemeup1 <[email protected]> Co-authored-by: Nyakku Shigure <[email protected]> Co-authored-by: gouzil <[email protected]> Co-authored-by: Wang Bojun <[email protected]> Co-authored-by: lizexu123 <[email protected]> Co-authored-by: danleifeng <[email protected]> Co-authored-by: Vigi Zhang <[email protected]> Co-authored-by: tianhaodongbd <[email protected]> Co-authored-by: zyfncg <[email protected]> Co-authored-by: JYChen <[email protected]> Co-authored-by: zhaohaixu <[email protected]> Co-authored-by: Spelling <[email protected]> Co-authored-by: zhouzj <[email protected]> Co-authored-by: wanghuancoder <[email protected]> Co-authored-by: ndren <[email protected]> Co-authored-by: Nguyen Cong Vinh <[email protected]> Co-authored-by: Ruibin Cheung <[email protected]> Co-authored-by: Tian <[email protected]> Co-authored-by: Yuanle Liu <[email protected]> Co-authored-by: zhuyipin <[email protected]> Co-authored-by: 6clc <[email protected]> Co-authored-by: Wenyu <[email protected]> Co-authored-by: Xianduo Li <[email protected]> Co-authored-by: Wang Xin <[email protected]> Co-authored-by: Chang Xu <[email protected]> Co-authored-by: wentao yu <[email protected]> Co-authored-by: zhink <[email protected]> Co-authored-by: handiz <[email protected]> Co-authored-by: zhimin Pan <[email protected]> Co-authored-by: m00891 <[email protected]> Co-authored-by: shuliu <[email protected]> Co-authored-by: Yanxin Zhou <[email protected]> Co-authored-by: Zhao Wu <[email protected]> Co-authored-by: m00932 <[email protected]> Co-authored-by: Fangzhou Feng <[email protected]> Co-authored-by: junwang <[email protected]> Co-authored-by: m01097 <[email protected]>

zoooo0820 added 2 commits November 15, 2023 08:43

first fix the UT

29eaaa1

fix set value grad

8a5f39f

zoooo0820 force-pushed the fix_set_value_grad branch from 0bc76a7 to 8a5f39f Compare November 15, 2023 10:42

wanghuancoder reviewed Nov 16, 2023

View reviewed changes

zoooo0820 added 2 commits November 16, 2023 12:52

polish code

c833770

add static mode backward test

6cc1f71

kangguangli reviewed Nov 20, 2023

View reviewed changes

zoooo0820 added 3 commits December 20, 2023 06:46

Merge branch 'develop' into fix_set_value_grad

6dae6f7

always has input valuetensor

5005daf

add dygraph test

f24fd45

kangguangli approved these changes Dec 21, 2023

View reviewed changes

jeff41404 approved these changes Dec 25, 2023

View reviewed changes

jeff41404 merged commit 85e3693 into PaddlePaddle:develop Dec 25, 2023
29 checks passed

zoooo0820 deleted the fix_set_value_grad branch December 25, 2023 07:11

Wanglongzhi2001 pushed a commit to Wanglongzhi2001/Paddle that referenced this pull request Jan 7, 2024

Fix set value grad (PaddlePaddle#59034)

0e425c9

* first fix the UT * fix set value grad * polish code * add static mode backward test * always has input valuetensor * add dygraph test

zoooo0820 added a commit to zoooo0820/Paddle that referenced this pull request Jan 18, 2024

Fix set value grad (PaddlePaddle#59034)

501e520

* first fix the UT * fix set value grad * polish code * add static mode backward test * always has input valuetensor * add dygraph test

zoooo0820 mentioned this pull request Jan 18, 2024

[Cherry-pick] fix set_value with scalar grad #60930

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix set value grad #59034

Fix set value grad #59034

zoooo0820 commented Nov 15, 2023 •

edited

Loading

wanghuancoder left a comment

wanghuancoder Nov 16, 2023

wanghuancoder Nov 16, 2023

kangguangli Nov 20, 2023

paddle-ci-bot bot commented Nov 26, 2023

kangguangli left a comment

jeff41404 left a comment

		op->SetType("set_value_grad");
		op->SetInput(framework::GradVarName("Out"), this->OutputGrad("Out"));

Fix set value grad #59034

Fix set value grad #59034

Conversation

zoooo0820 commented Nov 15, 2023 • edited Loading

PR types

PR changes

Description

wanghuancoder left a comment

Choose a reason for hiding this comment

wanghuancoder Nov 16, 2023

Choose a reason for hiding this comment

wanghuancoder Nov 16, 2023

Choose a reason for hiding this comment

kangguangli Nov 20, 2023

Choose a reason for hiding this comment

paddle-ci-bot bot commented Nov 26, 2023

kangguangli left a comment

Choose a reason for hiding this comment

jeff41404 left a comment

Choose a reason for hiding this comment

zoooo0820 commented Nov 15, 2023 •

edited

Loading