You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm implementing muP for the OLMo model, and am facing an issue with the coordinate check.
The increasing l1 is for the network output. Following the docs, I also set readout init and query init to zero. I also ensure that the initialization is applied after set_base_shapes is called.
What other things can I check to debug the issue?
The text was updated successfully, but these errors were encountered:
@AkshitaB (very delayed reply but still might be helpful)
From my experience, I also tried query/readout zero-init and it didn't help. However, what I saw is that while growing at early iterations, the readout norms do stabilise across widths after a sufficient number of iterations (like 30). You might actually already see such hints on your plot for t=4, so maybe running coordinate check for longer steps will flatten your readout norms.
But even if not, it's never been a problem for me in practice to have muTransfer, most importantly is that the other layer norms looks flat, which is the case for you :)
I'm implementing muP for the OLMo model, and am facing an issue with the coordinate check.
The increasing l1 is for the network output. Following the docs, I also set readout init and query init to zero. I also ensure that the initialization is applied after
set_base_shapes
is called.What other things can I check to debug the issue?
The text was updated successfully, but these errors were encountered: