Support dequantizing GGUF FP16 format #31783

PenutChen · 2024-07-04T00:54:48Z

What does this PR do?

Fixes #31762: Supports GGUF integration to dequantize FP16 format.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@SunMarc

SunMarc

Thanks for adding this ! Could you add a test to make sure that it works ? You can put it in the tests/quantization/ggml/test_ggml.py file

PenutChen · 2024-07-04T11:57:16Z

Thanks for adding this ! Could you add a test to make sure that it works ? You can put it in the tests/quantization/ggml/test_ggml.py file

@SunMarc currently, only TinyLlama has reasonable outputs, but I can't find an FP16 GGUF model on HF Hub. Should I upload one myself?

SunMarc · 2024-07-04T14:11:34Z

Yes, that would be great !

PenutChen · 2024-07-05T00:46:11Z

@SunMarc I've added an f16 test, and the models are now uploaded to the HF Hub. Also, I've uploaded the bf16 and f32 tinyllama for future tests. Let me know if you need anything else!

support gguf fp16

5430803

SunMarc reviewed Jul 4, 2024

View reviewed changes

add gguf f16 test

c1682e5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support dequantizing GGUF FP16 format #31783

Support dequantizing GGUF FP16 format #31783

PenutChen commented Jul 4, 2024 •

edited

Loading

SunMarc left a comment

PenutChen commented Jul 4, 2024

SunMarc commented Jul 4, 2024

PenutChen commented Jul 5, 2024

Support dequantizing GGUF FP16 format #31783

Are you sure you want to change the base?

Support dequantizing GGUF FP16 format #31783

Conversation

PenutChen commented Jul 4, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

SunMarc left a comment

Choose a reason for hiding this comment

PenutChen commented Jul 4, 2024

SunMarc commented Jul 4, 2024

PenutChen commented Jul 5, 2024

PenutChen commented Jul 4, 2024 •

edited

Loading