Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is the current sharedRmsprop thread safe? #59

Open
pengsun opened this issue Nov 16, 2016 · 2 comments
Open

Why is the current sharedRmsprop thread safe? #59

pengsun opened this issue Nov 16, 2016 · 2 comments

Comments

@pengsun
Copy link

pengsun commented Nov 16, 2016

Hi,

I've read the discussion per #15 and #50, but I still don't understand why the current sharedRmsprop impl avoids thread racing? Actually, the code still occasionally outputs NaN on my machine unless I set the thread be one. By tracking the error I can tell it is due to these two lines:

 state.g:mul(momentum):addcmul(1 - momentum, dfdx, dfdx)
 state.tmp:copy(state.g):add(epsilon):sqrt()

as state.tmp can become zero while being divided in the rest code and I guess the zeros are due to state.tmp:copy(state.g) from other thread where state.g happens to include 0s...

Meanwhile, by changing them to

  state.g:mul(momentum):addcmul(1 - momentum, dfdx, dfdx)
  state.tmp:sqrt(torch.add(state.g, epsilon))

the error seems to disappear.

It my modification reasonable? Or I just have to update OpenBLAS or something?

@Kaixhin
Copy link
Owner

Kaixhin commented Nov 16, 2016

Looks fine to me but I'll leave it to @lake4790k's discretion.

@lake4790k
Copy link
Collaborator

The async mode is not "thread safe" in the classical sense at all, just happens to work.

I didn't see NaNs while I was running long experiments with the latest code. Make sure that torch is really using OpenBLAS (eg. look at loaded libs), this solved #50.

Are you getting NaNs running Atari environment?

Your modification looks sensible and if you're really getting the NaNs with OpenBLAS and your modification fixes it then put in a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants