You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey guys, first of all thanks for the awesome work!
I've implemented muP in the llm.c project (see here), the coord checks seem to be flat / correct (I went up to 15 steps and still flat!) but I am not getting any performance improvement using mup?
Could it be that this is due to smaller scale? We're testing it on 1.5B LLMs. Should we expect a different behavior at ~7B?
I wrote up a mini document on what i've done to support mup in llm.c here under mup.md.
Am I missing something here?
The text was updated successfully, but these errors were encountered:
Hey guys, first of all thanks for the awesome work!
I've implemented muP in the llm.c project (see here), the coord checks seem to be flat / correct (I went up to 15 steps and still flat!) but I am not getting any performance improvement using mup?
Could it be that this is due to smaller scale? We're testing it on 1.5B LLMs. Should we expect a different behavior at ~7B?
I wrote up a mini document on what i've done to support mup in llm.c here under
mup.md
.Am I missing something here?
The text was updated successfully, but these errors were encountered: