rebase by pglorio · Pull Request #3 · Zyphra/transformers_zamba

pglorio · 2024-11-05T23:10:27Z

No description provided.

…e#34395) * Skip DeepSpeed ZeRO Stage 3 model initialization when it is intended to be quantized. * Propagate the quantization state using a context manager * make fixup

FIX Broken repr of TorchAoConfig The __repr__ method references a non-existent self.kwargs. This is now fixed. There does not appear to be a uniform way of defining __repr__ for quantization configs. I copied the method as implemented for HQQ: https://github.com/huggingface/transformers/blob/e2ac16b28a0b8b900e136750309ca40c49d975c5/src/transformers/utils/quantization_config.py#L285-L287

* save/load sub-configs * nit forgot these * fix copies * move test to common * use dict for sub-configs * add load-save-laod test * clean up modeling check * oops this are correct keys * fix some tests, missed some composite configs * this model was missed

* DistillBERT is ExecuTorch compatible * [run_slow] distilbert * [run_slow] distilbert --------- Co-authored-by: Guang Yang <guangyang@fb.com>

Revert "Fix Whisper CI (huggingface#34541)" This reverts commit eb81144.

) * Fix assistant tokens when truncated * fix test * fix test * step

…4558) * update * update * update * update * update * update * update * update * update * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

) * Changing __repr__ in torchao * small update * make style * small update * add LinearActivationQuantizedTensor * remove some cases * update imports & handle return None * update

fix-torch-interpolation-ci

…xtraction (huggingface#34450) * fix stablelm qkv_bias * fix stablelm qkv_bias and use_parallel_residual * remove original_model.config for stablelm gguf test

* docs: ko: convbert.md * Update _toctree.yml * feat: nmt draft

* gptqmodel Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update readme Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * gptqmodel need use checkpoint_format (#1) * gptqmodel need use checkpoint_format * fix quantize * Update quantization_config.py * Update quantization_config.py * Update quantization_config.py --------- Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * Revert quantizer_gptq.py (#2) * revert quantizer_gptq.py change * pass **kwargs * limit gptqmodel and optimum version Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix warning Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix version check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * revert unrelated changes Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * enable gptqmodel tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix requires gptq Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Fix Transformer compat (#3) * revert quantizer_gptq.py change * pass **kwargs * add meta info * cleanup * cleanup * Update quantization_config.py * hf_select_quant_linear pass checkpoint_format and meta * fix GPTQTestCUDA * Update test_gptq.py * gptqmodel.hf_select_quant_linear() now does not select ExllamaV2 * cleanup * add backend * cleanup * cleanup * no need check exllama version * Update quantization_config.py * lower checkpoint_format and backend * check none * cleanup * Update quantization_config.py * fix self.use_exllama == False * spell * fix unittest * fix unittest --------- Co-authored-by: LRL <lrl@lbx.dev> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format again Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update gptqmodel version (#6) * update gptqmodel version * update gptqmodel version * fix unit test (#5) * update gptqmodel version * update gptqmodel version * "not self.use_exllama" is not equivalent to "self.use_exllama==False" * fix unittest * update gptqmodel version * backend is loading_attibutes (huggingface#7) * fix format and tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix memory check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix device mismatch Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix result check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * update tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * review: update docs (huggingface#10) * review: update docs (huggingface#12) * review: update docs * fix typo * update tests for gptqmodel Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update document (huggingface#9) * update overview.md * cleanup * Update overview.md * Update overview.md * Update overview.md * update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md --------- Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * typo * doc note for asymmetric quant * typo with apple silicon(e) * typo for marlin * column name revert: review * doc rocm support * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com> Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com> Co-authored-by: LRL <lrl@lbx.dev> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

eljandoubi and others added 12 commits November 5, 2024 10:06

Skip DeepSpeed ZeRO Stage 3 model initialization when bnb (huggingfac…

d0b1d8d

…e#34395) * Skip DeepSpeed ZeRO Stage 3 model initialization when it is intended to be quantized. * Propagate the quantization state using a context manager * make fixup

DistilBERT is ExecuTorch compatible (huggingface#34475)

663c851

* DistillBERT is ExecuTorch compatible * [run_slow] distilbert * [run_slow] distilbert --------- Co-authored-by: Guang Yang <guangyang@fb.com>

Remove unused test_dataset (huggingface#34516)

45b0c76

Revert "Fix Whisper CI" (huggingface#34605)

74d3824

Revert "Fix Whisper CI (huggingface#34541)" This reverts commit eb81144.

Fix huggingface#34494 assistant tokens when truncated (huggingface#34531

082e57e

) * Fix assistant tokens when truncated * fix test * fix test * step

Remove @slow for test_eager_matches_sdpa_inference (huggingface#3…

f2d5dfb

…4558) * update * update * update * update * update * update * update * update * update * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Changing __repr__ in torchao to show quantized Linear (huggingface#34202

d2bae7e

) * Changing __repr__ in torchao * small update * make style * small update * add LinearActivationQuantizedTensor * remove some cases * update imports & handle return None * update

Fix torchvision interpolation CI (huggingface#34539)

9f28d0c

fix-torch-interpolation-ci

Fix use_parallel_residual and qkv_bias for StableLM GGUF config e…

e83aaaa

…xtraction (huggingface#34450) * fix stablelm qkv_bias * fix stablelm qkv_bias and use_parallel_residual * remove original_model.config for stablelm gguf test

🌐 [i18n-KO] Translated convbert.md to Korean (huggingface#34599)

7bbc624

* docs: ko: convbert.md * Update _toctree.yml * feat: nmt draft

pglorio merged commit eb6063e into zamba2 Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rebase#3

rebase#3
pglorio merged 12 commits intozamba2from
main

pglorio commented Nov 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Conversation

pglorio commented Nov 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants