You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been trying to make the StableBaselinesAgent (PR #148) be compatible with additive fits but I ran into some issues:
In the _fit_worker auxiliary function, we reseed external libraries. I believe that this is done to guarantee reproducibility when doing distributed training. However, when doing two fits .fit(X), the result will not be the same as doing a single .fit(2X) because the seed will be reset halfway throughout training. Here is the code.
In the load method of AgentHandlers, we reseed the environment after loading the agent, which causes similar issues. I also noticed that the handler's seed is used to reseed the environment, which is different from the seed that was originally used. Here is the code.
I would love to know your opinions on the matter!
The text was updated successfully, but these errors were encountered:
Maybe we could modify rlberry Seeder in order to accept a pytorch generator as a seed_seq.
I looked into torch's rng and really they don't seem compatible with anything but themselves (they can't import a numpy rng for instance) so I don't think it is easy to reseed torch generator in the manager, it would be better to import torch generator as an rlberry Seeder.
.fit(X), the result will not be the same as doing a single .fit(2X) because the seed will be reset halfway throughout training.
I think it's ok to have this behavior in AgentManager only, as long as the whole pipeline (parameters -> manager -> outputs) is reproducible. I believe it's important to enforce the additive property of fit()only at the Agent level, to make sure that the optimization done by AgentManager.optimize_hyperparams makes sense when fit_fraction < 1 (that is, when fit() is called several times to evaluate hyperparameters).
Hello,
I've been trying to make the StableBaselinesAgent (PR #148) be compatible with additive fits but I ran into some issues:
_fit_worker
auxiliary function, we reseed external libraries. I believe that this is done to guarantee reproducibility when doing distributed training. However, when doing two fits.fit(X)
, the result will not be the same as doing a single.fit(2X)
because the seed will be reset halfway throughout training. Here is the code.load
method of AgentHandlers, we reseed the environment after loading the agent, which causes similar issues. I also noticed that the handler's seed is used to reseed the environment, which is different from the seed that was originally used. Here is the code.I would love to know your opinions on the matter!
The text was updated successfully, but these errors were encountered: