Bf16 support for tinychat 2.0 on GEMV and GEMM #255

jason-huang03 · 2025-01-18T08:01:39Z

This PR adds bf16 support for GEMV and GEMM operation under awq/kernels/csrc/quantization_new directory. BF16 is crucial for serving some model like Qwen2.5 72B, which would encounter NaN problem using fp16 even with original model.

The main point of this PR is:

Fast int4x8 to bf16x8 dequantization.
Usage of fp32 accumulator for bf16 mma, rather than fp16 accumulator for fp16 mma.
Dispatching logic.

ys-2020 · 2025-01-31T02:08:18Z

Hi @jason-huang03 . Thank you for the great work! I will approve all the changes till now.

Do we have any modification for the front-ends in accordance with the bf16 support. I guess it would be better to merge them to the codebase within the same PR. 😊

jason-huang03 added 5 commits January 18, 2025 15:12

add dispatch macro

6f9d2cc

add int4 to bf16 fast dequantize

ff0d003

supporto bf16 in gemv

896b92f

remove gemm_cuda_old.cu

bb9d998

support gemm and layernorm in bf16

efa9ac9

jason-huang03 added 3 commits February 4, 2025 14:04

tinychat front end support bfloat16 for qwen

0bc5717

support bf16 sudo awq evaluate

d1ef6dd

algorithm side front end support bf16

ab7ce62

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bf16 support for tinychat 2.0 on GEMV and GEMM #255

Bf16 support for tinychat 2.0 on GEMV and GEMM #255

jason-huang03 commented Jan 18, 2025

ys-2020 commented Jan 31, 2025

Bf16 support for tinychat 2.0 on GEMV and GEMM #255

Are you sure you want to change the base?

Bf16 support for tinychat 2.0 on GEMV and GEMM #255

Conversation

jason-huang03 commented Jan 18, 2025

ys-2020 commented Jan 31, 2025