Switch TinyLlama pretraining back to 16-mixed #882

awaelchli · 2024-01-15T12:23:19Z

It looks like precision=16-mixed will perform better. This will result in a slightly lower MFU (55% -> 52%) and a slightly higher memory usage, but still fits well into an 8xA100.

update

e7f65f1

awaelchli marked this pull request as ready for review January 15, 2024 12:25

awaelchli requested review from carmocca and lantiga as code owners January 15, 2024 12:25

carmocca approved these changes Jan 15, 2024

View reviewed changes

awaelchli merged commit 1e5afd6 into main Jan 15, 2024

awaelchli deleted the tiny-llama-16-mixed branch January 15, 2024 15:36

awaelchli mentioned this pull request Apr 25, 2024

Add precision arg for pretraining #1353

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Switch TinyLlama pretraining back to 16-mixed #882

Switch TinyLlama pretraining back to 16-mixed #882

Uh oh!

awaelchli commented Jan 15, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Switch TinyLlama pretraining back to 16-mixed #882

Switch TinyLlama pretraining back to 16-mixed #882

Uh oh!

Conversation

awaelchli commented Jan 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

awaelchli commented Jan 15, 2024 •

edited

Loading