You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey there!
My goal is to deploy a Mixtral-7x8B with couple of QLoRA fine-tuned adapters using TRT.
The corollary is porting a dev environment made with bitsandbytes and hugging face to Triton Server.
I wonder if this apply to a Mixtral-7x8 as well as llama.
According to this line, it should not apply. In this case, support to Mixtral-7x8 is in roadmap?
Still, according to this section, it seems you can quantize a Mistral with default settings from quantize.py, that is gptnext. Is this correct? Does it apply to Mixtral-7x8 as well?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hey there!
My goal is to deploy a Mixtral-7x8B with couple of QLoRA fine-tuned adapters using TRT.
The corollary is porting a dev environment made with bitsandbytes and hugging face to Triton Server.
This is definetely my situation.
I wonder if this apply to a Mixtral-7x8 as well as llama.
According to this line, it should not apply. In this case, support to Mixtral-7x8 is in roadmap?
Still, according to this section, it seems you can quantize a Mistral with default settings from quantize.py, that is
gptnext
. Is this correct? Does it apply to Mixtral-7x8 as well?Any further int on the topic is welcome!
Cheers
Beta Was this translation helpful? Give feedback.
All reactions