Skip to content

Commit 535afa5

Browse files
sirus20x6Aaron
authored andcommitted
ggml: Correct SVE implementation in ggml_vec_dot_f16_unroll (ggml-org#16518)
The previous SVE implementation for `ggml_vec_dot_f16_unroll` contained a bug due to a copy-paste error. The wrong variable was used in an FMA instruction, leading to incorrect results. This commit corrects the variable usage and improves the clarity of the code by renaming variables to avoid confusion. Co-authored-by: Aaron <[email protected]>
1 parent fe6f07c commit 535afa5

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

ggml/src/ggml-cpu/vec.h

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -144,14 +144,14 @@ inline static void ggml_vec_dot_f16_unroll(const int n, const int xs, float * GG
144144
for (int i = 0; i < np; i += ggml_f16_step) {
145145
ay1 = GGML_F16x_VEC_LOAD(y + i + 0 * ggml_f16_epr, 0); // 8 elements
146146

147-
ax1 = GGML_F16x_VEC_LOAD(x[0] + i + 0*ggml_f16_epr, 0); // 8 elemnst
147+
ax1 = GGML_F16x_VEC_LOAD(x[0] + i + 0*ggml_f16_epr, 0); // 8 elements
148148
sum_00 = GGML_F16x_VEC_FMA(sum_00, ax1, ay1); // sum_00 = sum_00+ax1*ay1
149149
ax1 = GGML_F16x_VEC_LOAD(x[1] + i + 0*ggml_f16_epr, 0); // 8 elements
150150
sum_10 = GGML_F16x_VEC_FMA(sum_10, ax1, ay1);
151151

152152
ay2 = GGML_F16x_VEC_LOAD(y + i + 1 * ggml_f16_epr, 1); // next 8 elements
153153

154-
ax2 = GGML_F16x_VEC_LOAD(x[0] + i + 1*ggml_f16_epr, 1); // next 8 ekements
154+
ax2 = GGML_F16x_VEC_LOAD(x[0] + i + 1*ggml_f16_epr, 1); // next 8 elements
155155
sum_01 = GGML_F16x_VEC_FMA(sum_01, ax2, ay2);
156156
ax2 = GGML_F16x_VEC_LOAD(x[1] + i + 1*ggml_f16_epr, 1);
157157
sum_11 = GGML_F16x_VEC_FMA(sum_11, ax2, ay2);
@@ -160,7 +160,7 @@ inline static void ggml_vec_dot_f16_unroll(const int n, const int xs, float * GG
160160

161161
ax3 = GGML_F16x_VEC_LOAD(x[0] + i + 2*ggml_f16_epr, 2);
162162
sum_02 = GGML_F16x_VEC_FMA(sum_02, ax3, ay3);
163-
ax1 = GGML_F16x_VEC_LOAD(x[1] + i + 2*ggml_f16_epr, 2);
163+
ax3 = GGML_F16x_VEC_LOAD(x[1] + i + 2*ggml_f16_epr, 2);
164164
sum_12 = GGML_F16x_VEC_FMA(sum_12, ax3, ay3);
165165

166166
ay4 = GGML_F16x_VEC_LOAD(y + i + 3 * ggml_f16_epr, 3);

0 commit comments

Comments
 (0)