Conversation
|
The model is now online. |
4d51055 to
cdd792c
Compare
|
Unless Qwen releases smaller models, I won't be able to test this. So if anyone gives it a try would appreciate feedback. |
|
I tested this with these two models: The model works on afa6bfe (current master). It crashes at this assert: |
cdd792c to
3f6ca46
Compare
|
@matbrez Thank you for the feedback. Could you try again the latest version of this branch and confirm that it works? |
|
@ggerganov The model can load success with recent commit. But generate repeat 0 tokens.
./llama-server git show --stat commit 3f6ca46 (HEAD -> pr-19660, origin/gg/qwen35-dedup) src/models/models.h | 58 +--------------------- |
|
@bunnynode And to make sure - the |
|
Please try with 1f5a7e4 |
|
@ggerganov It works now with 1f5a7e4, on m3 ultra, the tg improved from 19.8t/s(master) to 21.7/s(this pr) But far slow than mlx-lm (34t/s) , both q4 |
* models : dedup qwen35 graphs * cont : add missing sigmoid
* models : dedup qwen35 graphs * cont : add missing sigmoid
* models : dedup qwen35 graphs * cont : add missing sigmoid

cont #19597
Use the new
struct llm_build_delta_net_baseto deduplicate the delta net graphs from Qwen35 models.TODO: