Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert/quantize script doubles the size of the q8 decoder from model ViT-GPT2 #1051

Open
1 of 5 tasks
cristianglezm opened this issue Nov 22, 2024 · 0 comments
Open
1 of 5 tasks
Labels
bug Something isn't working

Comments

@cristianglezm
Copy link

System Info

transformers.js v3.0.2
vue v3.5.13
vite v5.4.11

system info:

CPU: Intel(R) Core(TM) i5-8250U
GPU: Intel(R) UHD Graphics 620
RAM: 16GB

Environment/Platform

  • Website/web-app
  • Browser extension
  • Server-side (e.g., Node.js, Deno, Bun)
  • Desktop app (e.g., Electron)
  • Other (e.g., VSCode extension)

Description

I am trying to start using v3 but there are a few issues:

  • model is giving garbled output when on GPU (Chrome, Edge), the new converted model too.

Garbled description on GPU (q4f16 encoder, q8 decoder)
garbledOutput_q4fp16_q8_gpu
good enough description on CPU (q4f16 encoder, q8 decoder)
goodOutput_q4fp16_q8_cpu

  • q4f16 is giving exception (only decoder, the encoder works)
  • decoder_model_merged_quantized size has gone up from 158MB to 297MB if converted from pytorch version it gives the same size if quantized from old model converted from 2.17

quantized from old converted onnx model
converted_model_from_onnx_version

converted from pytorch
converted_model_from_pytorch_version

python convert.py --quantize y --model_id "cristianglezm/ViT-GPT2-FlowerCaptioner" --task 'image-to-text-with-past' --opset 19

Shouldn't q4 be smaller than q8?

I tested the xenova/vit-gpt2-image-captioning on GPU and CPU and it gives the same garbled description on the GPU and wrong description on CPU (that is why I fine-tuned it)

thanks

Reproduction

if you want to test it you can test cloning my project

git clone https://github.com/cristianglezm/FlowerEvolver-frontend
cd FlowerEvolver-frontend
git checkout hf-transformers-v3
npm i
npm run dev
  • go to Settings::ModelOptions change from CPU to GPU
  • go to local and wait for the demo flowers to be imported
  • click on flower 42 menu arrow and click describe
@cristianglezm cristianglezm added the bug Something isn't working label Nov 22, 2024
@cristianglezm cristianglezm changed the title convert/quantize script doubles the size of the q8 model ViT-GPT2 convert/quantize script doubles the size of the q8 decoder from model ViT-GPT2 Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant