-
Notifications
You must be signed in to change notification settings - Fork 470
Add support for gemma 3n #1248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add support for gemma 3n #1248
Conversation
Introduces LLamaFlashAttentionType enum and integrates flash attention configuration into LLamaContextParams. Adds support for diffusion-based models in SafeLlamaModelHandle. Updates NativeApi and SafeLLamaContextHandle with new adapter metadata and sequence state methods. Syncs llama.cpp submodule.
Thanks for putting this together. I've started a build here to produce new binaries, once those are ready we can update the csproj and run the tests. |
Hi @martindevans , i guess i need to hardcode the binary file url for now once the changes are tested and ready to merge we can restore the original url right ? |
Have changed it assuming i had to hardcode if anything else is required do let me know. |
Sorry for the delay, I'll create a new release on https://github.com/SciSharp/LLamaSharpBinaries shortly, then you can just put the ID of that release into the csproj file. |
Ok I've created https://github.com/SciSharp/LLamaSharpBinaries/releases/tag/86587da, you can put that release ID into the csproj here and it should auto download on build (probably best to do a clean and rebuild to be sure). |
Will do it in sometime. |
Hi @martindevans , few tests are failing but i checked out to the last commit before my contribution even there the tests are failing... am i missing something ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, here's the llama.h diff I'm working from for this review: ggml-org/llama.cpp@11dd5a4...86587da#diff-dd7c3bc82b7d2728f1fe661e7820aebbfd53f558a5fdd85bbd6dd618213a118d
There are a few bits missing from this PR that are in the diff, the most important is the flash_attn
parameter which has been removed on the C++ side but not the C# side, that will cause all the other parameters to be misaligned. Hopefully fixing this might resolve some of the test failures 🤞
There are a few functions that are missing:
- llama_adapter_get_alora_n_invocation_tokens
- llama_adapter_get_alora_invocation_tokens
- llama_flash_attn_type_name
- llama_state_seq_flags (3 related function)
If you're not interested in implementing those right now please just dump their signatures into a comment in NativeApi.cs, that way we don't get out of sync.
I tried fixing these but still the tests failed. I will try to dig deeper this weekend. |
I'm on holiday this week, so I won't be around much to help out. I'll try to keep tabs on this thread though. A lot of the test failures seem to be similar, something to do with invalid batches in the Reranker. e.g.
That seems to be coming from here in llama.cpp, that code looks like checks that none of the tokens in the batch are being assigned to a The LLamaReranker is relatively new, so it's very possible there's a bug of some kind in it. |
Sure,will go through this |
I just opened a PR on your fork into this branch that fixes 2 of the failing tests. The other seems to be a genuine failure, something wrong with the way batching works. I tried out the BatchedExecutor forking demo as another check and that fails. |
Will check that , sorry for the delay have been busy with other tasks lately. |
No worries, I was meaning to check this out myself ages ago too! I'm hoping to get some time to properly look through this and other PRs this weekend 🤞 |
…ich is why these were failing now. (#1)
No description provided.