-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Balance Allocation for Multiple Cards On Device KVStore #252
Comments
This could due to the memory allocation policy used for the distributed KVStore under mode kvstore_type = 'device'. I guess this should not be the case for kvstore_type=local When we do device type kvstore, what is needed is we allocate the reduction temporal memory on each of the device. We do it in a random assignment way.here https://github.com/dmlc/mxnet/blob/master/src/kvstore/kvstore_device.h#L36 to balance the temp weight memory on each device. If the weight is not uniformly distributed (e.g there is a weight that is particularly big chunk of weight), then this could cause the imbalance. |
是的,我是使用kvstore=device,可不可以添加一个比较确定的分配策略?例如从weight从大到小贪心,总是放到当前weight最少的设备,我觉得不少人会喜欢把显存用得满满的,如果随机的话,就有偶尔爆显存的顾虑,即使是随机在显存分配上最好也有个默认的seed,这样不用担心显存每次跑起来不一样:) |
This seems to be a good idea. The allocation strategy code is here https://github.com/dmlc/mxnet/blob/master/src/kvstore/kvstore_device.h#L36 Maybe you can consider hack it a bit and contribute back :) ? |
yes,I will |
another possible way is, do random assignment if the size < bigarray_bound_, otherwise, evenly split the array into num_dev parts, and assign one part to each device |
This dep checking of slice might block other slices, as we do not
|
* cub 89de7ab2(89de7ab2)...05eb57fa(05eb57fa) (798 commits) > Merge pull request sociomantic-tsunami#1 from ptrendx/update > Create README.md > update readme.md > 1.7.0 < 1.6.4 doc update (part 2) (...) * dlpack ()...a6e09b5(a6e09b5) (1 commits) > Change order of device_type/id in Context (sociomantic-tsunami#11) * dmlc-core a6c5701(a6c5701)...87b7ffa(87b7ffa) (54 commits) > add SetEnv (apache#322) > Fix a bug in seek/tell on Windows (apache#318) > Fixes apache#303: added recurse_directories to InputSplit::Create (apache#310) > Type name error (apache#316) > Small param bug (apache#315) (...) * mshadow c037b06(c037b06)...2d7780c(2d7780c) (42 commits) > [CMAKE][ARM] Change USE_SSE to SUPPORT_MSSE2 to it uses the autodetected presence of sse compiler flag from the parent project (see PR apache#8395) (apache#303) > Makes repeated setting of gpu rng seed produce repeatable sequences. (apache#304) > Add USE_SSE which propagates into MSHADOW_USE_SSE in cmake (apache#302) > fix range (apache#301) > fix for random seed generation (apache#300) (...) * nnvm b279286(b279286)...e4a138a(e4a138a) (139 commits) > [TVM] upgrade to latest version (apache#263) > Added support for CoreML Permute layers (apache#262) > [CMPL] Add Support for Other Data Types (apache#252) > fix onnx conv2d_transpose loading (apache#245) > [FIX] Fix from_mxnet for multiple outputs symbol (apache#247) (...) * ps-lite v1+118(acdb698)...v1+123(2ce8b9a) (2 commits) > Merge pull request apache#117 from madjam/listen-interface > Merge pull request apache#109 from b0noI/master
Fixes sociomantic-tsunami#11 * cub 89de7ab2(89de7ab2)...05eb57fa(05eb57fa) (798 commits) > Merge pull request sociomantic-tsunami#1 from ptrendx/update > Create README.md > update readme.md > 1.7.0 < 1.6.4 doc update (part 2) (...) * dlpack ()...a6e09b5(a6e09b5) (1 commits) > Change order of device_type/id in Context (sociomantic-tsunami#11) * cub 89de7ab2(89de7ab2)...05eb57fa(05eb57fa) (798 commits) > Merge pull request sociomantic-tsunami#1 from ptrendx/update > Create README.md > update readme.md > 1.7.0 < 1.6.4 doc update (part 2) (...) * dlpack ()...a6e09b5(a6e09b5) (1 commits) > Change order of device_type/id in Context (sociomantic-tsunami#11) * dmlc-core a6c5701(a6c5701)...87b7ffa(87b7ffa) (54 commits) > add SetEnv (apache#322) > Fix a bug in seek/tell on Windows (apache#318) > Fixes apache#303: added recurse_directories to InputSplit::Create (apache#310) > Type name error (apache#316) > Small param bug (apache#315) (...) * mshadow c037b06(c037b06)...2d7780c(2d7780c) (42 commits) > [CMAKE][ARM] Change USE_SSE to SUPPORT_MSSE2 to it uses the autodetected presence of sse compiler flag from the parent project (see PR apache#8395) (apache#303) > Makes repeated setting of gpu rng seed produce repeatable sequences. (apache#304) > Add USE_SSE which propagates into MSHADOW_USE_SSE in cmake (apache#302) > fix range (apache#301) > fix for random seed generation (apache#300) (...) * nnvm b279286(b279286)...e4a138a(e4a138a) (139 commits) > [TVM] upgrade to latest version (apache#263) > Added support for CoreML Permute layers (apache#262) > [CMPL] Add Support for Other Data Types (apache#252) > fix onnx conv2d_transpose loading (apache#245) > [FIX] Fix from_mxnet for multiple outputs symbol (apache#247) (...) * ps-lite v1+118(acdb698)...v1+123(2ce8b9a) (2 commits) > Merge pull request apache#117 from madjam/listen-interface > Merge pull request apache#109 from b0noI/master
* [CMPL] Add Support for Other Data Types * [CMPL] Add test * [CMPL] Fix
我用vggnet 11layer 训练,在总的batchsize是36,3个780gpu,conv workspace 设置为 256, cuda-7.0, 但不用cudnn(觉得可以规避一些内存分配的不确定性), 发现占用gpu:2795m,2661m,2383m,他们100多m的递减差别是怎么产生的?希望能差别小一点,这样显存的利用率高些,毕竟显卡相同,显存利用取决于占用最大的gpu了,batchsize太小我的数据有时不收敛:(
The text was updated successfully, but these errors were encountered: