-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about multiple CPU usage. #137
Comments
We use Ray Tune for running all the trials/experiments, meaning that all the resources configurable through the command line correspond to Tune's resources. You can basically configure two things: 1) what resources are available for the Tune runner, and 2) what resources are required for each trial to run. These resources are not hard constraints but only used for scheduling purposes. I.e., if your machine has 40 cpus available, and you specify For example, if you have a machine with 24 cpus and 8 gpus, then:
By default, I'd say you only ever want to configure the trial resources (1) above) and not the machine resources itself, since Ray automatically determines the machine resources. Machine resources 1) from above are passed into softlearning/examples/instrument.py Line 233 in 0596f68
Trial resources 2) from above are passed into tune.run here (in softlearning/examples/instrument.py Lines 238 to 244 in 0596f68
I think there should be more than one trial created if you set the trial resources correctly. Make sure you're running with Also note that even though we currently use only one environment for sampling, all the numerical frameworks (i.e. Did that answer your questions? Let me know if any of that is still unclear. |
Thanks! So just so I'm understanding this right, even if I have 4 trials, only one environment is created, so the different trials just speed up numpy and tensorflow calculations. In this case, say I have 12 CPUs and 4 GPUs, is there a difference between using 4 trials each with 3 CPUs and 1 GPU, versus using 1 trials with all 12 CPUs and 4 CPUs? Also you said tune runner will try using all the available resources to run experiments without the --cpu --gpu flags. What if I have a machine and I don't want to consume all the resources since I'm sharing with others/or want to run multiple experiments? Then which settings should I use? |
Not exactly. We still create 1 environment for each trial. That is, each trial is completely independent of other trials (unless you use some fancier hyperparameter tuning).
Yeah, there's a difference here. Imagine you sweep over four different Here are some very rough estimates of how I allocate my resources. For runs that use low-level state (i.e. not vision observations), I typically run 3-6 trials per GPU as long as each trial has at least 1 or to CPUs. If you have no GPUs available, you need more than 1 CPU per trial, the optimal probably being somewhere around 4. For vision-based experiments, it really depends on your GPU and image and convnet sizes. In some cases you can just run as many trials as you can fit in the GPU memory, but typically I run something like 2-3 trials per GTX 1080 with image sizes of 64x64 and a couple layer convnet. These are to minimize the cost per trial. Obviously, if you want to maximize the speed for one trial without caring the costs, you'd just allocate all your resources to one trial at a time though :) |
What's the proper way of limiting CPU (or GPU) usage? I tried setting --cpus 6 or --trial-cpus 6 or both, but all of them seem to use all 12 of the CPUs. Also, from my understanding, in softlearning, only 1 trial is ever created, and only one environment is created, so what are the reasons why more than 1 CPU is ever needed?
The text was updated successfully, but these errors were encountered: