tests : add support for qwen3 SSM archs#24031
Conversation
| bool try_scalar = false; | ||
| try { | ||
| ml.get_key_or_arr(LLM_KV_FULL_ATTENTION_INTERVAL, hparams.recurrent_layer_arr, hparams.n_layer, false); | ||
| } catch (...) { | ||
| try_scalar = true; | ||
| } | ||
|
|
||
| if (try_scalar) { | ||
| const uint32_t n_main = hparams.n_layer - hparams.nextn_predict_layers; | ||
|
|
||
| uint32_t full_attn_interval = 4; | ||
| ml.get_key(LLM_KV_FULL_ATTENTION_INTERVAL, full_attn_interval, false); | ||
| for (uint32_t i = 0; i < hparams.n_layer; ++i) { | ||
| hparams.recurrent_layer_arr[i] = (i < n_main) && ((i + 1) % full_attn_interval != 0); | ||
| } |
There was a problem hiding this comment.
| bool try_scalar = false; | |
| try { | |
| ml.get_key_or_arr(LLM_KV_FULL_ATTENTION_INTERVAL, hparams.recurrent_layer_arr, hparams.n_layer, false); | |
| } catch (...) { | |
| try_scalar = true; | |
| } | |
| if (try_scalar) { | |
| const uint32_t n_main = hparams.n_layer - hparams.nextn_predict_layers; | |
| uint32_t full_attn_interval = 4; | |
| ml.get_key(LLM_KV_FULL_ATTENTION_INTERVAL, full_attn_interval, false); | |
| for (uint32_t i = 0; i < hparams.n_layer; ++i) { | |
| hparams.recurrent_layer_arr[i] = (i < n_main) && ((i + 1) % full_attn_interval != 0); | |
| } | |
| try { | |
| ml.get_key_or_arr(LLM_KV_FULL_ATTENTION_INTERVAL, hparams.recurrent_layer_arr, hparams.n_layer, false); | |
| } catch (...) { | |
| const uint32_t n_main = hparams.n_layer - hparams.nextn_predict_layers; | |
| uint32_t full_attn_interval = 4; | |
| ml.get_key(LLM_KV_FULL_ATTENTION_INTERVAL, full_attn_interval, false); | |
| for (uint32_t i = 0; i < hparams.n_layer; ++i) { | |
| hparams.recurrent_layer_arr[i] = (i < n_main) && ((i + 1) % full_attn_interval != 0); | |
| } |
I think that this would be slightly simpler but either way is fine I think.
|
What is the intended scope of this PR, will there be more changes? |
|
It's ready now. |
| // by default, all layers are dense | ||
| // note: using uint32_t type for compatibility reason | ||
| std::array<uint32_t, LLAMA_MAX_LAYERS> swa_layers; | ||
| std::array<uint32_t, LLAMA_MAX_LAYERS> is_swa_impl; |
There was a problem hiding this comment.
I'm not sure "is_swa_impl" is a good choice for the variable name. I'm reading it as "is SWA implementation" but then you have the code of the individual models manipulating it which to me would intuitively seem like the models messing with the internals of llama_hparams. Maybe "swa_pattern" to be consistent with set_swa_pattern?
There was a problem hiding this comment.
I get the is_swa part, to match the function, but agreed, it's confusing, maybe is_swa_layer?
There was a problem hiding this comment.
I'll follow-up with more refactoring of the hparams after this to avoid this PR growing. The main goal here is to get recurrent models enrolled in test-llama-archs to be able to generate small dummy models for testing purposes.
There was a problem hiding this comment.
Be aware though that currently there is no implementation for creating a dummy vocab for those models - I have a poor understanding of the related code and did not want to delay the unit tests for TP. But this means that you cannot just use the dummy models for e.g. llama-perplexity or llama-completion.
There was a problem hiding this comment.
Yes, I noticed that. I'll be using it with test-save-load-state and I can rework it to not require a vocab.
There was a problem hiding this comment.
For dummy models, wouldn't it be fine to just map ASCII characters to int? I would intuitively assume that that would not be too difficult to implement, the problem for me was just that I would have to read up on the vocab code first.
There was a problem hiding this comment.
Yes probably. It would be definitely useful to generate some dummy vocabs too. Will take a look.
3a20879 to
433b106
Compare
Overview
Enable
test-llama-archsfor Qwen3 architectures using SSM.Additional information
Requirements