-
Notifications
You must be signed in to change notification settings - Fork 2.2k
feat: Improve llama.cpp argument handling and add device parsing tests #6041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit refactors how arguments are passed to llama.cpp, specifically by only adding arguments when their values differ from their defaults. This reduces the verbosity of the command and prevents potential conflicts or errors when llama.cpp's default behavior aligns with the desired setting. Additionally, new tests have been added for parsing device output from llama.cpp, ensuring the accurate extraction of GPU information (ID, name, total memory, and free memory). This improves the robustness of device detection. The following changes were made: * **Remove redundant `--ctx-size` argument:** The `--ctx-size` argument is now only explicitly added if `cfg.ctx_size` is greater than 0. * **Conditional argument adding for default values:** * `--split-mode` is only added if `cfg.split_mode` is not empty and not 'layer'. * `--main-gpu` is only added if `cfg.main_gpu` is not undefined and not 0. * `--cache-type-k` is only added if `cfg.cache_type_k` is not 'f16'. * `--cache-type-v` is only added if `cfg.cache_type_v` is not 'f16' (when `flash_attn` is enabled) or not 'f32' (otherwise). This also corrects the `flash_attn` condition. * `--defrag-thold` is only added if `cfg.defrag_thold` is not 0.1. * `--rope-scaling` is only added if `cfg.rope_scaling` is not 'none'. * `--rope-scale` is only added if `cfg.rope_scale` is not 1. * `--rope-freq-base` is only added if `cfg.rope_freq_base` is not 0. * `--rope-freq-scale` is only added if `cfg.rope_freq_scale` is not 1. * **Add `parse_device_output` tests:** Comprehensive unit tests were added to `src-tauri/src/core/utils/extensions/inference_llamacpp_extension/server.rs` to validate the parsing of llama.cpp device output under various scenarios, including multiple devices, single devices, different backends (CUDA, Vulkan, SYCL), complex GPU names, and error conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Changes requested ❌
Reviewed everything up to 30932c8 in 2 minutes and 16 seconds. Click for details.
- Reviewed
262
lines of code in2
files - Skipped
0
files when reviewing. - Skipped posting
1
draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. src-tauri/src/core/utils/extensions/inference_llamacpp_extension/server.rs:149
- Draft comment:
Raw string literal used to normalize the library path ends with a backslash, which is invalid in Rust raw strings. Consider using an alternate delimiter (e.g. r#"\?"#) or another approach. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_oxlOlD2Dgd18TdgZ
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
Barecheck - Code coverage reportTotal: 33.11%Your code coverage diff: 0.02% ▴ ✅ All code changes are covered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 3fb3ac9 in 1 minute and 38 seconds. Click for details.
- Reviewed
15
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
1
draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. extensions/llamacpp-extension/src/index.ts:1281
- Draft comment:
The new condition for pushing--cache-type-v
only checks whenflash_attn
is true. According to the PR description, whenflash_attn
is enabled the default is'f16'
, and when it’s disabled the default should be'f32'
. Consider refactoring the condition to handle both cases. For example: if (cfg.cache_type_v !== (cfg.flash_attn ? 'f16' : 'f32')) { args.push('--cache-type-v', cfg.cache_type_v) } This makes the intent explicit and ensures the flag is added only when the value differs from the appropriate default. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment makes a reasonable suggestion about making the default values more explicit. However, I don't have enough context to verify that the comment's assumptions about the defaults are correct. The comment claims that 'f16' is the default for flash_attn=true and 'f32' is the default for flash_attn=false, but I don't see evidence for this in the code or documentation. Without being able to verify these claims, I can't be confident the suggested change is correct. The comment makes assumptions about default values that I can't verify. The current code may be handling edge cases that aren't obvious. While the suggestion would make the code more explicit, we can't be confident it preserves the same behavior without more context about the intended defaults. The comment should be deleted since we can't verify its assumptions about the default values are correct. Making changes based on unverified assumptions could introduce bugs.
Workflow ID: wflow_LfsV51mSX8tjqg4Q
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
Describe Your Changes
This change refactors how arguments are passed to llama.cpp, specifically by only adding arguments when their values differ from their defaults. This reduces the verbosity of the command and prevents potential conflicts or errors when llama.cpp's default behavior aligns with the desired setting.
Additionally, new tests have been added for parsing device output from llama.cpp, ensuring the accurate extraction of GPU information (ID, name, total memory, and free memory). This improves the robustness of device detection.
The following changes were made:
--ctx-size
argument: The--ctx-size
argument is now only explicitly added ifcfg.ctx_size
is greater than 0.--split-mode
is only added ifcfg.split_mode
is not empty and not 'layer'.--main-gpu
is only added ifcfg.main_gpu
is not undefined and not 0.--cache-type-k
is only added ifcfg.cache_type_k
is not 'f16'.--cache-type-v
is only added ifcfg.cache_type_v
is not 'f16' (whenflash_attn
is enabled) or not 'f32' (otherwise). This also corrects theflash_attn
condition.--defrag-thold
is only added ifcfg.defrag_thold
is not 0.1.--rope-scaling
is only added ifcfg.rope_scaling
is not 'none'.--rope-scale
is only added ifcfg.rope_scale
is not 1.--rope-freq-base
is only added ifcfg.rope_freq_base
is not 0.--rope-freq-scale
is only added ifcfg.rope_freq_scale
is not 1.parse_device_output
tests: Comprehensive unit tests were added tosrc-tauri/src/core/utils/extensions/inference_llamacpp_extension/server.rs
to validate the parsing of llama.cpp device output under various scenarios, including multiple devices, single devices, different backends (CUDA, Vulkan, SYCL), complex GPU names, and error conditions.Fixes Issues
Self Checklist
Important
Refactor
llamacpp_extension
argument handling for efficiency and add comprehensive device parsing tests inserver.rs
.llamacpp_extension
to add arguments only if they differ from defaults, reducing verbosity and potential conflicts.--ctx-size
argument unlesscfg.ctx_size
> 0.--split-mode
,--main-gpu
,--cache-type-k
,--cache-type-v
,--defrag-thold
,--rope-scaling
,--rope-scale
,--rope-freq-base
, and--rope-freq-scale
based on specific conditions.server.rs
forparse_device_output
to validate GPU information extraction under various scenarios.This description was created by
for 3fb3ac9. You can customize this summary. It will automatically update as commits are pushed.