Update Mixtral-8x7B fp8 hqt example#756
Merged
Merged
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
puneeshkhanna
pushed a commit
to puneeshkhanna/optimum-habana-fork
that referenced
this pull request
Mar 11, 2024
HolyFalafel
pushed a commit
to HabanaAI/optimum-habana-fork
that referenced
this pull request
Mar 11, 2024
gplutop7
pushed a commit
to HabanaAI/optimum-habana-fork
that referenced
this pull request
Oct 15, 2025
Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Update fp8 hqt example of mixtral-8x7b (1x) to README
Test with bs16_output2048 on 1 card:
Input/outputs:
input 1: ('DeepSpeed is a machine learning framework',)
output 1: ('DeepSpeed is a machine learning framework that enables training of large models on a single machine with a single GPU. It is designed to be easy to use and efficient, and it can be used to train models on a variety of tasks.\n\n## ...
input 2: ('He is working on',)
output 1: ("He is working on a new album, which is expected to be released in 2019.\n\n## ...
input 3: ('He has a',)
output 1: ('He has a new book out, and he’s on a book tour.\n\n ...
input 4: ('He got all',)
output 1: ('He got all the way to the top of the mountain, but he didn’t know what to do when he got there.\n\n ...
input 5: ('Everyone is happy and I can',)
output 1: ('Everyone is happy and I can’t stop smiling.\n\n ...
input 6: ('The new movie that got Oscar this year',)
output 1: ('The new movie that got Oscar this year, “The Shape of Water” is a fantasy drama film directed by Guillermo del Toro and written by del Toro and Vanessa Taylor. It stars Sally Hawkins, Michael Shannon, Richard Jenkins, Doug Jones, Michael Stuhlbarg, and Octavia Spencer. Set in Baltimore, Maryland, in 1962, the story follows a mute custodian at a high-security government laboratory who befriends a captured humanoid amphibian creature.\n\n ...
...
input 15: ('In the far far distance from our galaxy,',)
output 1: ('In the far far distance from our galaxy, there is a planet called “Earth”. It is a planet that is full of life and is the home of many different species. One of these species is called “Humans”. Humans are a very intelligent species and are the most advanced species on the planet. They have created many different technologies that have helped them to survive and thrive on the planet.\n\n ...
input 16: ('Peace is the only way',)
output 1: ('Peace is the only way to end the war in Syria.\n\n ...
Throughput (including tokenization) = 645.77 tokens/second