Skip to content

Argghh#1397

Merged
ikawrakow merged 1 commit intomainfrom
ik/cuda_ctx_mess
Mar 10, 2026
Merged

Argghh#1397
ikawrakow merged 1 commit intomainfrom
ik/cuda_ctx_mess

Conversation

@ikawrakow
Copy link
Owner

@ikawrakow ikawrakow commented Mar 10, 2026

It is a bit of a mess, but hopefully this works.

Basically for split mode graph we need some persistent state in the CUDA context. But the whole ggml/llama thing does not really foresee such use case, so it all becomes very hacky.

I fixed split mode graph not working when an mmproj file is loaded in #1392. The reason it wasn't working was that when the mmproj file was being loaded, it created a new CUDA context, which overwrote the CUDA context of the main model. I removed the overwrite. But when one loads the main model using --no-mmap (something I never use myself), that creates a CUDA context to load the main model, but that is not the context that gets created later that we actually need. So, that makes split mode graph fail.

Hopefully this PR covers all use cases.

@ubergarm
Copy link
Contributor

Thanks a bunch, just tested and it is working again with --no-mmap! Great work pulling everything together!

With this in place, -sm graph working well, and --mmproj working through opencode client now too the Qwen3.5s are actually useful for full local use without requiring a giant server or expensive API subscription haha... 🥳 🎉

@abc-nix
Copy link
Contributor

abc-nix commented Mar 10, 2026

This fixed the crashes I was having with SM graph and --no-mmap using Qwen-3.5. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants