Adding bf16 support to CUDA #40

ikawrakow · 2024-09-05T08:41:38Z

Haha, llama.cpp seems to not support bf16 on CUDA?

This PR adds it. It works fine on my RTX-4080, but I have no idea if it will work on older GPUs (if I understood correctly it should, with reduced performance), ROCm, etc.

Performance is the same as f16 for TG (TG-128 = 41.2 t/s for LLaMA-3.1-8B for both).

PP is lower but quite decent for prompt processing (PP-512(bf16) = 5250 t/s vs PP-512(f16) = 7250 t/s for LLaMA-3.1-8B). In any case, much better than running on the CPU for bf16 models.

Iwan Kawrakow added 3 commits September 14, 2024 19:49

Adding bf16 support to CUDA - matrix multipications

38ae720

Adding bf16 support to CUDA - cleanup

b2b16d8

Adapt to latest master

6bfd451

ikawrakow force-pushed the ik/cuda_bf16 branch from ba91320 to 6bfd451 Compare September 14, 2024 16:59

ikawrakow merged commit 6f11c95 into main Sep 14, 2024

CISC mentioned this pull request Apr 2, 2025

CUDA: don't convert BF16 weights to FP32 ggml-org/ggml#1174

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding bf16 support to CUDA #40

Adding bf16 support to CUDA #40

Uh oh!

ikawrakow commented Sep 5, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Adding bf16 support to CUDA #40

Adding bf16 support to CUDA #40

Uh oh!

Conversation

ikawrakow commented Sep 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ikawrakow commented Sep 5, 2024 •

edited

Loading