Skip to content

Conversation

@awaelchli
Copy link
Contributor

@awaelchli awaelchli commented Jan 15, 2024

It looks like precision=16-mixed will perform better. This will result in a slightly lower MFU (55% -> 52%) and a slightly higher memory usage, but still fits well into an 8xA100.

@awaelchli awaelchli marked this pull request as ready for review January 15, 2024 12:25
@awaelchli awaelchli merged commit 1e5afd6 into main Jan 15, 2024
@awaelchli awaelchli deleted the tiny-llama-16-mixed branch January 15, 2024 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants