edit mixtral quantization config file (#114)#1739
Conversation
|
@regisss let's hold this until I get accuracy from team |
|
@libinta , @regisss this PR corrects the quant_config context, as the old version has invalid regular-expression strings that are not supported, as well the use of both allow-list and block-list which is invalid. the new version quantize the exact same layers as the old file did, just using correct syntax and logic, so practically these 2 files should result with the same model quantization from user perspective. Regarding the QA testing of the new config file, it was done with the new fp8 dynamic moe code which is not yet upstreamed but should be in this release (1.20). |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
No description provided.