Skip to content

tests : add support for qwen3 SSM archs#24031

Merged
ggerganov merged 3 commits into
masterfrom
gg/test-archs-add-qwen3-ssm
Jun 3, 2026
Merged

tests : add support for qwen3 SSM archs#24031
ggerganov merged 3 commits into
masterfrom
gg/test-archs-add-qwen3-ssm

Conversation

@ggerganov
Copy link
Copy Markdown
Member

@ggerganov ggerganov commented Jun 2, 2026

Overview

Enable test-llama-archs for Qwen3 architectures using SSM.

Additional information

|       qwen3next|Apple M2 Ultra|   MoE|  OK (8.53e-08)|       OK|
|       qwen3next|    Accelerate|   MoE|  OK (1.00e-17)|       OK|
|       qwen3next|Apple M2 Ultra|   MoE|  OK (9.61e-14)|       OK|
|       qwen3next|          Meta|   MoE|  OK (8.53e-08)|     SKIP|
|          qwen35|Apple M2 Ultra| Dense|  OK (8.53e-08)|       OK|
|          qwen35|    Accelerate| Dense|  OK (0.00e+00)|       OK|
|          qwen35|Apple M2 Ultra| Dense|  OK (9.18e-14)|       OK|
|          qwen35|          Meta| Dense|  OK (8.53e-08)|     SKIP|
|       qwen35moe|Apple M2 Ultra|   MoE|  OK (8.53e-08)|       OK|
|       qwen35moe|    Accelerate|   MoE|  OK (0.00e+00)|       OK|
|       qwen35moe|Apple M2 Ultra|   MoE|  OK (9.49e-14)|       OK|
|       qwen35moe|          Meta|   MoE|  OK (8.53e-08)|     SKIP|

Requirements

@ggerganov ggerganov requested a review from CISC as a code owner June 2, 2026 16:00
@github-actions github-actions Bot added the model Model specific label Jun 2, 2026
Comment thread src/llama-model-loader.cpp Outdated
Comment thread src/models/qwen35.cpp Outdated
Comment on lines +22 to +36
bool try_scalar = false;
try {
ml.get_key_or_arr(LLM_KV_FULL_ATTENTION_INTERVAL, hparams.recurrent_layer_arr, hparams.n_layer, false);
} catch (...) {
try_scalar = true;
}

if (try_scalar) {
const uint32_t n_main = hparams.n_layer - hparams.nextn_predict_layers;

uint32_t full_attn_interval = 4;
ml.get_key(LLM_KV_FULL_ATTENTION_INTERVAL, full_attn_interval, false);
for (uint32_t i = 0; i < hparams.n_layer; ++i) {
hparams.recurrent_layer_arr[i] = (i < n_main) && ((i + 1) % full_attn_interval != 0);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool try_scalar = false;
try {
ml.get_key_or_arr(LLM_KV_FULL_ATTENTION_INTERVAL, hparams.recurrent_layer_arr, hparams.n_layer, false);
} catch (...) {
try_scalar = true;
}
if (try_scalar) {
const uint32_t n_main = hparams.n_layer - hparams.nextn_predict_layers;
uint32_t full_attn_interval = 4;
ml.get_key(LLM_KV_FULL_ATTENTION_INTERVAL, full_attn_interval, false);
for (uint32_t i = 0; i < hparams.n_layer; ++i) {
hparams.recurrent_layer_arr[i] = (i < n_main) && ((i + 1) % full_attn_interval != 0);
}
try {
ml.get_key_or_arr(LLM_KV_FULL_ATTENTION_INTERVAL, hparams.recurrent_layer_arr, hparams.n_layer, false);
} catch (...) {
const uint32_t n_main = hparams.n_layer - hparams.nextn_predict_layers;
uint32_t full_attn_interval = 4;
ml.get_key(LLM_KV_FULL_ATTENTION_INTERVAL, full_attn_interval, false);
for (uint32_t i = 0; i < hparams.n_layer; ++i) {
hparams.recurrent_layer_arr[i] = (i < n_main) && ((i + 1) % full_attn_interval != 0);
}

I think that this would be slightly simpler but either way is fine I think.

@JohannesGaessler
Copy link
Copy Markdown
Contributor

What is the intended scope of this PR, will there be more changes?

@ggerganov
Copy link
Copy Markdown
Member Author

It's ready now.

Comment thread src/llama-model-saver.cpp
Comment thread src/llama-hparams.h
// by default, all layers are dense
// note: using uint32_t type for compatibility reason
std::array<uint32_t, LLAMA_MAX_LAYERS> swa_layers;
std::array<uint32_t, LLAMA_MAX_LAYERS> is_swa_impl;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure "is_swa_impl" is a good choice for the variable name. I'm reading it as "is SWA implementation" but then you have the code of the individual models manipulating it which to me would intuitively seem like the models messing with the internals of llama_hparams. Maybe "swa_pattern" to be consistent with set_swa_pattern?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the is_swa part, to match the function, but agreed, it's confusing, maybe is_swa_layer?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll follow-up with more refactoring of the hparams after this to avoid this PR growing. The main goal here is to get recurrent models enrolled in test-llama-archs to be able to generate small dummy models for testing purposes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be aware though that currently there is no implementation for creating a dummy vocab for those models - I have a poor understanding of the related code and did not want to delay the unit tests for TP. But this means that you cannot just use the dummy models for e.g. llama-perplexity or llama-completion.

Copy link
Copy Markdown
Member Author

@ggerganov ggerganov Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I noticed that. I'll be using it with test-save-load-state and I can rework it to not require a vocab.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For dummy models, wouldn't it be fine to just map ASCII characters to int? I would intuitively assume that that would not be too difficult to implement, the problem for me was just that I would have to read up on the vocab code first.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes probably. It would be definitely useful to generate some dummy vocabs too. Will take a look.

@ggerganov ggerganov force-pushed the gg/test-archs-add-qwen3-ssm branch from 3a20879 to 433b106 Compare June 3, 2026 05:36
@ggerganov ggerganov merged commit 06938ac into master Jun 3, 2026
25 of 28 checks passed
@ggerganov ggerganov deleted the gg/test-archs-add-qwen3-ssm branch June 3, 2026 07:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model Model specific

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants