You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've read the discussion per #15 and #50, but I still don't understand why the current sharedRmsprop impl avoids thread racing? Actually, the code still occasionally outputs NaN on my machine unless I set the thread be one. By tracking the error I can tell it is due to these two lines:
as state.tmp can become zero while being divided in the rest code and I guess the zeros are due to state.tmp:copy(state.g) from other thread where state.g happens to include 0s...
The async mode is not "thread safe" in the classical sense at all, just happens to work.
I didn't see NaNs while I was running long experiments with the latest code. Make sure that torch is really using OpenBLAS (eg. look at loaded libs), this solved #50.
Are you getting NaNs running Atari environment?
Your modification looks sensible and if you're really getting the NaNs with OpenBLAS and your modification fixes it then put in a PR.
Hi,
I've read the discussion per #15 and #50, but I still don't understand why the current sharedRmsprop impl avoids thread racing? Actually, the code still occasionally outputs NaN on my machine unless I set the thread be one. By tracking the error I can tell it is due to these two lines:
as
state.tmp
can become zero while being divided in the rest code and I guess the zeros are due tostate.tmp:copy(state.g)
from other thread wherestate.g
happens to include 0s...Meanwhile, by changing them to
the error seems to disappear.
It my modification reasonable? Or I just have to update OpenBLAS or something?
The text was updated successfully, but these errors were encountered: