Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple GPU Support #26

Closed
6 tasks
GraV1337y opened this issue Dec 8, 2021 · 2 comments
Closed
6 tasks

Multiple GPU Support #26

GraV1337y opened this issue Dec 8, 2021 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@GraV1337y
Copy link
Contributor

GraV1337y commented Dec 8, 2021

Description
We follow this idea and spawn one learner on each available GPU. Then we use Torch DDP to average the gradients like we already do when using multiple nodes.

For starters we are not using policy workers on multiple GPUs, as suggested here:

To take full advantage of this, we also need to support policy workers on multiple GPUs. This requires exchanging the parameter vectors between learner and policy worker through CPU memory, rather than shared GPU memory. This can be a step 1 of the implementation.

The only advantage of this would be to save some memory (and maybe some time transferring the gradients through DDP), by not storing the model multiple times per node.

Tasks

  • Add host list as parameter for multi-sample-factory
  • Initialize num_policies learner worker on each GPU
  • Initialize policy_workers_per_policy * num_policies on each GPU
  • Initialize one SharedBuffer per GPU
  • Add each GPU separately to the DDP host list
  • Assign actor workers a fixed SharedBuffer to write to
@GraV1337y GraV1337y added the enhancement New feature or request label Dec 8, 2021
@KonstantinRamthun
Copy link
Member

With commit 9012399 we get a deadlock, because the learners are initialized synchronously, resulting in the learner 0 waiting for the DDP initialization of the following learners.

Log output

^[[36m[2021-12-14 10:52:34,672][17487] Default env families supported: ['doom_*', 'atari_*', 'dmlab_*', 'mujoco_*', 'MiniGrid*', 'unity_*']^[[0m
^[[36m[2021-12-14 10:52:34,671][32665] Default env families supported: ['doom_*', 'atari_*', 'dmlab_*', 'mujoco_*', 'MiniGrid*', 'unity_*']^[[0m
^[[36m[2021-12-14 10:52:35,366][17487] Env registry entry created: unity_^[[0m
^[[36m[2021-12-14 10:52:35,366][32665] Env registry entry created: unity_^[[0m
^[[33m[2021-12-14 10:52:35,470][17487] Saved parameter configuration for experiment saving_training_iss26 not found!^[[0m
^[[33m[2021-12-14 10:52:35,470][17487] Starting experiment from scratch!^[[0m
^[[33m[2021-12-14 10:52:35,470][32665] Saved parameter configuration for experiment saving_training_iss26 not found!^[[0m
^[[33m[2021-12-14 10:52:35,470][32665] Starting experiment from scratch!^[[0m
^[[36m[2021-12-14 10:52:37,505][32665] Queried available GPUs: 0,1
^[[0m
^[[36m[2021-12-14 10:52:37,505][17487] Queried available GPUs: 0,1
^[[0m
[INFO] Connected to Unity environment with package version 2.0.0-pre.3 and communication version 1.5.0
[INFO] Connected to Unity environment with package version 2.0.0-pre.3 and communication version 1.5.0
[INFO] Connected new brain: GoalKeeping?team=0
[INFO] Connected new brain: GoalKeeping?team=0
[WARNING] The environment contains multiple observations. You must define allow_multiple_obs=True to receive them all. Otherwise, only the first visual observation (or vector observation ifthere are no visual observations) will be provi$
[WARNING] The environment contains multiple observations. You must define allow_multiple_obs=True to receive them all. Otherwise, only the first visual observation (or vector observation ifthere are no visual observations) will be provi$
/work/grudelpg/envs/multi-sample-factory-env/lib/python3.9/site-packages/gym/logger.py:34: UserWarning: ^[[33mWARN: Box bound precision lowered by casting to float32^[[0m
  warnings.warn(colorize("%s: %s" % ("WARN", msg % args), "yellow"))
/work/grudelpg/envs/multi-sample-factory-env/lib/python3.9/site-packages/gym/logger.py:34: UserWarning: ^[[33mWARN: Box bound precision lowered by casting to float32^[[0m
  warnings.warn(colorize("%s: %s" % ("WARN", msg % args), "yellow"))
^[[37m^[[01m[2021-12-14 10:52:43,084][17487] Using a total of 240 trajectory buffers^[[0m
^[[36m[2021-12-14 10:52:43,085][17487] Allocating shared memory for trajectories^[[0m
^[[37m^[[01m[2021-12-14 10:52:43,085][32665] Using a total of 240 trajectory buffers^[[0m
^[[36m[2021-12-14 10:52:43,085][32665] Allocating shared memory for trajectories^[[0m
^[[37m^[[01m[2021-12-14 10:52:44,738][32665] Initializing learners...^[[0m
^[[37m^[[01m[2021-12-14 10:52:44,739][32665] Initializing the learner 0 for policy 0^[[0m
^[[37m^[[01m[2021-12-14 10:52:44,752][00322] Set environment var CUDA_VISIBLE_DEVICES to '0' for learner process 0^[[0m
^[[37m^[[01m[2021-12-14 10:52:44,770][17487] Initializing learners...^[[0m
^[[37m^[[01m[2021-12-14 10:52:44,771][17487] Initializing the learner 0 for policy 0^[[0m
^[[37m^[[01m[2021-12-14 10:52:44,783][17610] Set environment var CUDA_VISIBLE_DEVICES to '0' for learner process 0^[[0m
^[[36m[2021-12-14 10:52:44,790][00322] Visible devices: 1^[[0m
^[[37m^[[01m[2021-12-14 10:52:44,793][00322] Starting seed is not provided^[[0m
^[[36m[2021-12-14 10:52:44,820][17610] Visible devices: 1^[[0m
^[[37m^[[01m[2021-12-14 10:52:44,822][17610] Starting seed is not provided^[[0m
^[[37m^[[01m[2021-12-14 10:52:45,871][00322] Waiting for the learner to initialize...^[[0m

@KonstantinRamthun
Copy link
Member

Currently our models are all initialized randomly. As stated here, torch DDP

performs an all-reduce step on gradients and assumes that they will be modified by the optimizer in all processes in the same way.

We need a method, which initializes the models in a deterministic way (because we don't want full models to be send from one note to another). Methods, which initialize models deterministically are e.g. torch.nn.init.constant_, torch.nn.init.ones_, torch.nn.init.zeros_, torch.nn.init.eye_. To fix this, we could do one of the following things:

  1. Add an additional option for the parameter --policy_initialization , which initializes models in a deterministic way. This method has the disadvantage, that users of MSF must set this option by hand, when using multiple nodes. This is prone to be forgotten.
  2. Check that --num_policies is greater than 1 and --with_pbt is false in this method. If this is the case, then apply deterministc intitialization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants