You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As we state in our paper, DGBM is not yet competitive in terms of runtime, constantly exceeding those of other approaches by several orders of magnitude. From experiments we have conducted so far it appears that the automatic derivation of the hessian poses a significant computational bottleneck. The reason is that when computing the hessian (i.e., gradient of a gradient) of a loss function with vector input (i.e., multi-parameter optimization), instead of returning the element-wise hessian, the second gradient appears to return the row-wise sum of the hessian. For a more detailed discussion, we refer to tensorflow/tensorflow#29064.
Ask to the community
As we show in this notebook, we can circumvent the problem of incorrect hessians. However, these implementations are not as efficient as the original nested gradient_tape version.
I am reaching out to the TensorFlow community asking for help and guidance on how to further improve the computational efficiency of gradients and hessians.
The text was updated successfully, but these errors were encountered:
Summary
As we state in our paper, DGBM is not yet competitive in terms of runtime, constantly exceeding those of other approaches by several orders of magnitude. From experiments we have conducted so far it appears that the automatic derivation of the hessian poses a significant computational bottleneck. The reason is that when computing the hessian (i.e., gradient of a gradient) of a loss function with vector input (i.e., multi-parameter optimization), instead of returning the element-wise hessian, the second gradient appears to return the row-wise sum of the hessian. For a more detailed discussion, we refer to tensorflow/tensorflow#29064.
Ask to the community
As we show in this notebook, we can circumvent the problem of incorrect hessians. However, these implementations are not as efficient as the original nested gradient_tape version.
I am reaching out to the TensorFlow community asking for help and guidance on how to further improve the computational efficiency of gradients and hessians.
The text was updated successfully, but these errors were encountered: