Skip to content

Conversation

@ikawrakow
Copy link
Owner

There have been reports that Smart Expert Reduction (SER) can produce garbage.

This PR (hopefully and finally) fixes the CPU implementation.

The issue was that when fewer experts are used than specified by the number of active experts, there are some rows in the experts matrix multiplication results that have not been set to any value. Normally this should not be an issue as these rows get multiplied by zero before being summed up to obtain the final experts result. But if there are Inf or NaN values in the rows that were not computed, then we get NaNs, and this leads to garbage output. If there are Inf and NaN values is a matter of luck that depends on what happened before in the computation, as the same memory is used by other operations to store results. This is why the issue does not always manifests itself (but yes, if one has a long enough conversation, the DDDDDDDD or GGGGGGGGG output will eventually show up).

A similar fix is required for the CUDA implementation. This is left for a follow up PR.

Iwan Kawrakow added 2 commits May 13, 2025 14:55
@ikawrakow ikawrakow mentioned this pull request May 13, 2025
@ikawrakow
Copy link
Owner Author

I couldn't break it, so I'll just merge it without waiting for feedback.

@ikawrakow ikawrakow merged commit 1374062 into main May 13, 2025
@ubergarm
Copy link
Contributor

Hah, our sleep schedules are just off, I just tested this compiling CPU only and it indeed fixes the issue when using -ser 6,1.

Without the fix I saw:

/home/w/projects/ik_llama.cpp/ggml/src/iqk/iqk_mul_mat.cpp:16600: GGML_ASSERT(fms.S[j] > 0) failed/home/w/projects/ik_llama.cpp/ggml/src/iqk/iqk_mul_m
at.cpp:16600: GGML_ASSERT(fms.S[j] > 0) failed
/home/w/projects/ik_llama.cpp/ggml/src/iqk/iqk_mul_mat.cpp:16600: GGML_ASSERT(fms.S[j] > 0) failed

With this PR it works fine.

Thanks!

I'll peep the CUDA one next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants