Skip to content

ggml: fixed Arm SVE usage bug in vec.h, vec.cpp#22841

Merged
ggerganov merged 1 commit into
ggml-org:masterfrom
martin-klacer-arm:feature/fix_arm_sve_code
May 28, 2026
Merged

ggml: fixed Arm SVE usage bug in vec.h, vec.cpp#22841
ggerganov merged 1 commit into
ggml-org:masterfrom
martin-klacer-arm:feature/fix_arm_sve_code

Conversation

@martin-klacer-arm
Copy link
Copy Markdown
Contributor

Overview

This pull request fixes Arm SVE code in GGML vec.h and vec.cpp files. Previously, the F16 multiply accumulate functions used F16 as the accumulation data type as well, even though the output type is F32. This lead to overflows in some larger models, causing random ASCII output. This PR changes the accumulation to be F32 data type which solves the overflow.

Additional information

This PR fixes the bug: #21548

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES - used AI guidance to help in understanding SVE intrinsics details.

 * Updated vec.h/vec.cpp code to accumulate to F32 rather than F16

Signed-off-by: Martin Klacer <martin.klacer@arm.com>
Co-authored-by: Milos Puzovic <Milos.Puzovic@arm.com>
Change-Id: I0cb789347f2bf60ffaf9047319f727e788c825f8
@github-actions github-actions Bot added the ggml changes relating to the ggml tensor library for machine learning label May 8, 2026
@martin-klacer-arm
Copy link
Copy Markdown
Contributor Author

Hello @ggerganov, I wanted to follow up on this PR and check if you've had a chance to take a look? If you have any questions about the PR or code changes I'm happy to provide more detail. Thank you!

@chaxu01
Copy link
Copy Markdown
Collaborator

chaxu01 commented May 28, 2026

Hi @ggerganov — just wanted to highlight this PR again when you have a chance.

This fixes an Arm SVE accumulation issue where FP16 accumulation was being used in F16 MAC paths even though the output type is FP32. On some larger models, this could lead to overflow and random ASCII output generation.

We’ve reviewed and validated the fix internally on our side as well. Thanks!

@ggerganov ggerganov merged commit e31cdaa into ggml-org:master May 28, 2026
43 of 46 checks passed
@ggerganov
Copy link
Copy Markdown
Member

Thanks for the reminder!

adrianhoehne pushed a commit to adrianhoehne/llama.cpp that referenced this pull request May 28, 2026
* Updated vec.h/vec.cpp code to accumulate to F32 rather than F16



Change-Id: I0cb789347f2bf60ffaf9047319f727e788c825f8

Signed-off-by: Martin Klacer <martin.klacer@arm.com>
Co-authored-by: Milos Puzovic <Milos.Puzovic@arm.com>
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
* Updated vec.h/vec.cpp code to accumulate to F32 rather than F16



Change-Id: I0cb789347f2bf60ffaf9047319f727e788c825f8

Signed-off-by: Martin Klacer <martin.klacer@arm.com>
Co-authored-by: Milos Puzovic <Milos.Puzovic@arm.com>
turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 2, 2026
* Updated vec.h/vec.cpp code to accumulate to F32 rather than F16



Change-Id: I0cb789347f2bf60ffaf9047319f727e788c825f8

Signed-off-by: Martin Klacer <martin.klacer@arm.com>
Co-authored-by: Milos Puzovic <Milos.Puzovic@arm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants