You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello
How are you?
Thanks for contributing to this project.
I made my own AutoClipper class based on your code.
Please check if there is any problem.
Here I doubt the buffer length for the gradient history.
You mentioned the effect of ONLY percentile value in the training performance.
What about the effect of the buffer length for history?
If we set the buffer length to the number of steps in one epoch?
The text was updated successfully, but these errors were encountered:
Hi there! I think those are all fine ideas, just didn't have time to explore all the variations when I was working on this. FWIW I use AutoClip still every day in my own work. I think it's totally reasonable to only keep track for the past so-and-so iterations, but just be careful you're not clipping the gradient too much. Maybe 80 would be a reasonable default for p if you were keeping track of things in shorter histories.
Hello
How are you?
Thanks for contributing to this project.
I made my own AutoClipper class based on your code.
Please check if there is any problem.
Here I doubt the buffer length for the gradient history.
You mentioned the effect of ONLY percentile value in the training performance.
What about the effect of the buffer length for history?
If we set the buffer length to the number of steps in one epoch?
The text was updated successfully, but these errors were encountered: