You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that when I run your examples although both models achieve convergence, the accuracy of the forward grad method is always worse than that of regular backprop. In the paper they mention that the accuracy of the forward gradient should be pretty comparable/identical to that of backpropagation, is this behavior expected?
I was able to improve things marginally by having the model perform several random perturbations and taking the average of those for the parameter update in each forward pass (since this means that it is likelier to actually find the direction of the true gradient), but wasn't ever able to replicate backprop performance.
The text was updated successfully, but these errors were encountered:
Hi @ilonadem, thank you for your words. Coming to your issue:
I noticed that when I run your examples although both models achieve convergence, the accuracy of the forward grad method is always worse than that of regular backprop. In the paper they mention that the accuracy of the forward gradient should be pretty comparable/identical to that of backpropagation, is this behavior expected?
Yes, they should be pretty comparable, although it's something that we have never measured. To this end we can add some test functions to test the trained model and add some results. If you already have something you can also open a PR :)
I was able to improve things marginally by having the model perform several random perturbations and taking the average of those for the parameter update in each forward pass (since this means that it is likelier to actually find the direction of the true gradient), but wasn't ever able to replicate backprop performance.
Yes, in our example we are estimating the expected value with only one sample, and more samples you use more precise become your estimation. This is also something that could be useful to have in our examples: we can add the number of samples to use in the estimation by setting some hydra parameters. Again, feel free to open a PR in case :)
Hi, really cool implementation!
I noticed that when I run your examples although both models achieve convergence, the accuracy of the forward grad method is always worse than that of regular backprop. In the paper they mention that the accuracy of the forward gradient should be pretty comparable/identical to that of backpropagation, is this behavior expected?
I was able to improve things marginally by having the model perform several random perturbations and taking the average of those for the parameter update in each forward pass (since this means that it is likelier to actually find the direction of the true gradient), but wasn't ever able to replicate backprop performance.
The text was updated successfully, but these errors were encountered: