You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary of the question(problem) I had and the answer I got:
I was trying to run hydra ax-sweeper for Hyperparameter optimization on Minerl environment dataset. This dataset is quite big, around 30Gb+ when fully loaded to the RAM, and I utilized multiprocessing to speed up this loading process.
After about 2-3 runs the code freezes, at the data loading phase. When I check the memory usage in the system it's at the max (90Gb RAM + swap memory), and therefore can't load anything, and my question was around how memory is allocated/deallocated between ax-sweeper runs. My suspicion is that it is not allocated/deallocated as I initially thought (that is, I assume main(cfg: DictConfig) with @hydra.main(...) is run until termination between each sweeps, sequentially)
The answer I got was:
The hydra-ax-sweeper may run trials in parallel, depending on the result of calling the get_max_parallelism function defined in ax.service.ax_client. I suspect that your machine is running out of memory because of this parallelism.
Hydra's Ax plugin does not currently have a config group for configuring this max_parallelism setting, so it is automatically set by ax.
with a quick workaround to move the data loading step outside the main(). This could ofc be doable but I'm loading the data based on the config file parameters so this would mean I have to move all that outside the entire hydra pipeline.
Motivation
Being able to remove the parallelism between sweeper might make it easier for some user, as in I'm fairly certain parallel execution as default is not what most people imagine is happening.
Additional context
I'm pretty newbie when it comes to how AX work, so I might be misunderstanding stuff here and if that's the case, I'm sorry for bothering.
The text was updated successfully, but these errors were encountered:
Hi @jieru-hu Sorry for the late reply to this process. The project I was working on ended (abruptly).
But I got a similar(?) issue, probably related to parallelism in AX-sweeper and I tried the above changes to no avail.
Maybe I should start a new issue for that problem, but a short explanation is that inside the @hydra.main() run I was also using multiprocessing.Pool, and somehow all of the subprocesses froze at some arbitrary data_as_tensor = torch.as_tensor(numpy_data) (although one of the torch.as_tensor() worked, the second call to it just froze).
The same problem went away if I: used the base sweeper (no parallelism), or changing multiprocess.Pool to multiprocess.pool.ThreadPool (no parallelism) while using ax-sweeper. I tried the original code (using multiprocess library) with the solution you mentioned but the freezing still happened.
It became late, but thank you for the prompt answer!
(As for now I'll just be moving my code to ThreadPool so I guess this issue can be closed)
🚀 Feature Request
This feature request stem from a question I made on Stackoverflow regarding a Memoryleak(ish) pattern I saw in my code.
Summary of the question(problem) I had and the answer I got:
I was trying to run hydra ax-sweeper for Hyperparameter optimization on Minerl environment dataset. This dataset is quite big, around 30Gb+ when fully loaded to the RAM, and I utilized multiprocessing to speed up this loading process.
After about 2-3 runs the code freezes, at the data loading phase. When I check the memory usage in the system it's at the max (90Gb RAM + swap memory), and therefore can't load anything, and my question was around how memory is allocated/deallocated between
ax-sweeper
runs. My suspicion is that it is not allocated/deallocated as I initially thought (that is, I assumemain(cfg: DictConfig)
with@hydra.main(...)
is run until termination between each sweeps, sequentially)The answer I got was:
with a quick workaround to move the data loading step outside the
main()
. This could ofc be doable but I'm loading the data based on the config file parameters so this would mean I have to move all that outside the entire hydra pipeline.Motivation
Being able to remove the parallelism between sweeper might make it easier for some user, as in I'm fairly certain parallel execution as default is not what most people imagine is happening.
Additional context
I'm pretty newbie when it comes to how AX work, so I might be misunderstanding stuff here and if that's the case, I'm sorry for bothering.
The text was updated successfully, but these errors were encountered: