Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve runtime efficiency of TensorFlow gradient_tape computation of gradients and hessians. #1

Closed
StatMixedML opened this issue Jun 8, 2022 · 0 comments
Labels
help wanted Extra attention is needed

Comments

@StatMixedML
Copy link
Owner

StatMixedML commented Jun 8, 2022

Summary

As we state in our paper, DGBM is not yet competitive in terms of runtime, constantly exceeding those of other approaches by several orders of magnitude. From experiments we have conducted so far it appears that the automatic derivation of the hessian poses a significant computational bottleneck. The reason is that when computing the hessian (i.e., gradient of a gradient) of a loss function with vector input (i.e., multi-parameter optimization), instead of returning the element-wise hessian, the second gradient appears to return the row-wise sum of the hessian. For a more detailed discussion, we refer to tensorflow/tensorflow#29064.

Ask to the community

As we show in this notebook, we can circumvent the problem of incorrect hessians. However, these implementations are not as efficient as the original nested gradient_tape version.

I am reaching out to the TensorFlow community asking for help and guidance on how to further improve the computational efficiency of gradients and hessians.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant