Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increasing coord check for the network output #71

Open
AkshitaB opened this issue Apr 11, 2024 · 2 comments
Open

Increasing coord check for the network output #71

AkshitaB opened this issue Apr 11, 2024 · 2 comments

Comments

@AkshitaB
Copy link

I'm implementing muP for the OLMo model, and am facing an issue with the coordinate check.

sp_trsfmr_adamw_coord
μp_trsfmr_adamw_coord

The increasing l1 is for the network output. Following the docs, I also set readout init and query init to zero. I also ensure that the initialization is applied after set_base_shapes is called.

What other things can I check to debug the issue?

@SeunghyunSEO
Copy link

hi @AkshitaB , im reproducing MuP too these days.
can you share the arch ?? or have you solved the problem?

@ofivite
Copy link

ofivite commented Jun 21, 2024

@AkshitaB (very delayed reply but still might be helpful)

From my experience, I also tried query/readout zero-init and it didn't help. However, what I saw is that while growing at early iterations, the readout norms do stabilise across widths after a sufficient number of iterations (like 30). You might actually already see such hints on your plot for t=4, so maybe running coordinate check for longer steps will flatten your readout norms.

But even if not, it's never been a problem for me in practice to have muTransfer, most importantly is that the other layer norms looks flat, which is the case for you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants