Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about the parameter “–group-size” in qserve_benchmark.py #48

Open
oasis-Linmi opened this issue Dec 16, 2024 · 3 comments

Comments

@oasis-Linmi
Copy link

Hi, thank you for your excellent work!
I encountered an issue while running qserve_benchmark.py:
I downloaded several models with the W4A8 per-channel quantization type provided in the QServe Model Zoo. When I tried to set --group-size to -1, I consistently ran into a RuntimeError: probability tensor contains either 'inf', 'nan' or element < 0. Interestingly, changing the parameter to 128 allowed the script to run successfully.
My understanding is that for the W4A8 per-channel quantization type, setting this parameter to -1 is the correct choice, while for the W4A8-g128 type, the correct setting should be 128.
Could you please help explain what might be causing this issue?

@GCQi
Copy link

GCQi commented Jan 17, 2025

hello, i met the same error. Could you tell me how fix the error?

@ys-2020
Copy link
Contributor

ys-2020 commented Jan 17, 2025

Yes. Setting group size to -1 should be correct for per-channel quantized model. Could you please elaborate which checkpoint you are currently using?

@oasis-Linmi
Copy link
Author

I haven't fixed this error yet. I downloaded the checkpoint from https://huggingface.co/mit-han-lab/Llama-3-8B-QServe, following the Usage and Examples section of README.md of QServe. I also noticed that similar errors occur in each of the provided per-channel models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants