Not getting perf improvements from muP at ~1.5B scale #76

gordicaleksa · 2024-07-19T15:27:51Z

Hey guys, first of all thanks for the awesome work!

I've implemented muP in the llm.c project (see here), the coord checks seem to be flat / correct (I went up to 15 steps and still flat!) but I am not getting any performance improvement using mup?

Could it be that this is due to smaller scale? We're testing it on 1.5B LLMs. Should we expect a different behavior at ~7B?

I wrote up a mini document on what i've done to support mup in llm.c here under mup.md.

Am I missing something here?

The text was updated successfully, but these errors were encountered:

gordicaleksa mentioned this issue Jul 19, 2024

muP (maximum update parametrization) karpathy/llm.c#650

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not getting perf improvements from muP at ~1.5B scale #76

Not getting perf improvements from muP at ~1.5B scale #76

gordicaleksa commented Jul 19, 2024 •

edited

Loading

Not getting perf improvements from muP at ~1.5B scale #76

Not getting perf improvements from muP at ~1.5B scale #76

Comments

gordicaleksa commented Jul 19, 2024 • edited Loading

gordicaleksa commented Jul 19, 2024 •

edited

Loading