Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accuracy of forwardgrad isn't as good as regular backprop #8

Open
ilonadem opened this issue Jun 28, 2022 · 1 comment
Open

accuracy of forwardgrad isn't as good as regular backprop #8

ilonadem opened this issue Jun 28, 2022 · 1 comment

Comments

@ilonadem
Copy link

ilonadem commented Jun 28, 2022

Hi, really cool implementation!

I noticed that when I run your examples although both models achieve convergence, the accuracy of the forward grad method is always worse than that of regular backprop. In the paper they mention that the accuracy of the forward gradient should be pretty comparable/identical to that of backpropagation, is this behavior expected?

I was able to improve things marginally by having the model perform several random perturbations and taking the average of those for the parameter update in each forward pass (since this means that it is likelier to actually find the direction of the true gradient), but wasn't ever able to replicate backprop performance.

@belerico
Copy link
Contributor

belerico commented Jun 30, 2022

Hi @ilonadem, thank you for your words. Coming to your issue:

I noticed that when I run your examples although both models achieve convergence, the accuracy of the forward grad method is always worse than that of regular backprop. In the paper they mention that the accuracy of the forward gradient should be pretty comparable/identical to that of backpropagation, is this behavior expected?

Yes, they should be pretty comparable, although it's something that we have never measured. To this end we can add some test functions to test the trained model and add some results. If you already have something you can also open a PR :)

I was able to improve things marginally by having the model perform several random perturbations and taking the average of those for the parameter update in each forward pass (since this means that it is likelier to actually find the direction of the true gradient), but wasn't ever able to replicate backprop performance.

Yes, in our example we are estimating the expected value with only one sample, and more samples you use more precise become your estimation. This is also something that could be useful to have in our examples: we can add the number of samples to use in the estimation by setting some hydra parameters. Again, feel free to open a PR in case :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants