Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

AdaMax

AdaMax Example

In Adam, the update rule for individual weights is scaling their gradients inversely proportional to the norm of the past and current gradients.

The L2 norm can be generalized to the norm.

Such variants generally become numerically unstable for large , which is why and norms are most common in practice. However, in the special case where we let , a surprisingly simple and stable algorithm emerges.

To avoid confusion with Adam, we use  to denote the infinity norm-constrained :

We can now plug into the Adam update equation replacing to obtain the AdaMax update rule:

Code

Resources