feat: Improve llama.cpp argument handling and add device parsing tests #6041

qnixsynapse · 2025-08-04T07:08:37Z

Describe Your Changes

This change refactors how arguments are passed to llama.cpp, specifically by only adding arguments when their values differ from their defaults. This reduces the verbosity of the command and prevents potential conflicts or errors when llama.cpp's default behavior aligns with the desired setting.

Additionally, new tests have been added for parsing device output from llama.cpp, ensuring the accurate extraction of GPU information (ID, name, total memory, and free memory). This improves the robustness of device detection.

The following changes were made:

Remove redundant --ctx-size argument: The --ctx-size argument is now only explicitly added if cfg.ctx_size is greater than 0.
Conditional argument adding for default values:
- --split-mode is only added if cfg.split_mode is not empty and not 'layer'.
- --main-gpu is only added if cfg.main_gpu is not undefined and not 0.
- --cache-type-k is only added if cfg.cache_type_k is not 'f16'.
- --cache-type-v is only added if cfg.cache_type_v is not 'f16' (when flash_attn is enabled) or not 'f32' (otherwise). This also corrects the flash_attn condition.
- --defrag-thold is only added if cfg.defrag_thold is not 0.1.
- --rope-scaling is only added if cfg.rope_scaling is not 'none'.
- --rope-scale is only added if cfg.rope_scale is not 1.
- --rope-freq-base is only added if cfg.rope_freq_base is not 0.
- --rope-freq-scale is only added if cfg.rope_freq_scale is not 1.
Add parse_device_output tests: Comprehensive unit tests were added to src-tauri/src/core/utils/extensions/inference_llamacpp_extension/server.rs to validate the parsing of llama.cpp device output under various scenarios, including multiple devices, single devices, different backends (CUDA, Vulkan, SYCL), complex GPU names, and error conditions.

Fixes Issues

Closes #

Self Checklist

Added relevant comments, esp in complex areas
Updated docs (for bug fixes / features)
Created issues for follow-up changes or refactoring needed

Important

Refactor llamacpp_extension argument handling for efficiency and add comprehensive device parsing tests in server.rs.

Argument Handling:
- Refactor llamacpp_extension to add arguments only if they differ from defaults, reducing verbosity and potential conflicts.
- Remove redundant --ctx-size argument unless cfg.ctx_size > 0.
- Conditional addition for --split-mode, --main-gpu, --cache-type-k, --cache-type-v, --defrag-thold, --rope-scaling, --rope-scale, --rope-freq-base, and --rope-freq-scale based on specific conditions.
Device Parsing Tests:
- Add tests in server.rs for parse_device_output to validate GPU information extraction under various scenarios.
- Scenarios include multiple devices, single devices, different backends, complex GPU names, and error conditions.

^{This description was created by}^{for 3fb3ac9. You can customize this summary. It will automatically update as commits are pushed.}

This commit refactors how arguments are passed to llama.cpp, specifically by only adding arguments when their values differ from their defaults. This reduces the verbosity of the command and prevents potential conflicts or errors when llama.cpp's default behavior aligns with the desired setting. Additionally, new tests have been added for parsing device output from llama.cpp, ensuring the accurate extraction of GPU information (ID, name, total memory, and free memory). This improves the robustness of device detection. The following changes were made: * **Remove redundant `--ctx-size` argument:** The `--ctx-size` argument is now only explicitly added if `cfg.ctx_size` is greater than 0. * **Conditional argument adding for default values:** * `--split-mode` is only added if `cfg.split_mode` is not empty and not 'layer'. * `--main-gpu` is only added if `cfg.main_gpu` is not undefined and not 0. * `--cache-type-k` is only added if `cfg.cache_type_k` is not 'f16'. * `--cache-type-v` is only added if `cfg.cache_type_v` is not 'f16' (when `flash_attn` is enabled) or not 'f32' (otherwise). This also corrects the `flash_attn` condition. * `--defrag-thold` is only added if `cfg.defrag_thold` is not 0.1. * `--rope-scaling` is only added if `cfg.rope_scaling` is not 'none'. * `--rope-scale` is only added if `cfg.rope_scale` is not 1. * `--rope-freq-base` is only added if `cfg.rope_freq_base` is not 0. * `--rope-freq-scale` is only added if `cfg.rope_freq_scale` is not 1. * **Add `parse_device_output` tests:** Comprehensive unit tests were added to `src-tauri/src/core/utils/extensions/inference_llamacpp_extension/server.rs` to validate the parsing of llama.cpp device output under various scenarios, including multiple devices, single devices, different backends (CUDA, Vulkan, SYCL), complex GPU names, and error conditions.

ellipsis-dev

Caution

Changes requested ❌

Reviewed everything up to 30932c8 in 2 minutes and 16 seconds. Click for details.

Reviewed 262 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 1 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. src-tauri/src/core/utils/extensions/inference_llamacpp_extension/server.rs:149

Draft comment:
Raw string literal used to normalize the library path ends with a backslash, which is invalid in Rust raw strings. Consider using an alternate delimiter (e.g. r#"\?"#) or another approach.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_oxlOlD2Dgd18TdgZ

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

extensions/llamacpp-extension/src/index.ts

github-actions · 2025-08-04T07:13:32Z

Barecheck - Code coverage report

Total: 33.11%

Your code coverage diff: 0.02% ▴

✅ All code changes are covered

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 3fb3ac9 in 1 minute and 38 seconds. Click for details.

Reviewed 15 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. extensions/llamacpp-extension/src/index.ts:1281

Draft comment:
The new condition for pushing --cache-type-v only checks when flash_attn is true. According to the PR description, when flash_attn is enabled the default is 'f16', and when it’s disabled the default should be 'f32'. Consider refactoring the condition to handle both cases. For example: if (cfg.cache_type_v !== (cfg.flash_attn ? 'f16' : 'f32')) { args.push('--cache-type-v', cfg.cache_type_v) } This makes the intent explicit and ensures the flag is added only when the value differs from the appropriate default.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment makes a reasonable suggestion about making the default values more explicit. However, I don't have enough context to verify that the comment's assumptions about the defaults are correct. The comment claims that 'f16' is the default for flash_attn=true and 'f32' is the default for flash_attn=false, but I don't see evidence for this in the code or documentation. Without being able to verify these claims, I can't be confident the suggested change is correct. The comment makes assumptions about default values that I can't verify. The current code may be handling edge cases that aren't obvious. While the suggestion would make the code more explicit, we can't be confident it preserves the same behavior without more context about the intended defaults. The comment should be deleted since we can't verify its assumptions about the default values are correct. Making changes based on unverified assumptions could introduce bugs.

Workflow ID: wflow_LfsV51mSX8tjqg4Q

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

github-project-automation bot added this to Jan Aug 4, 2025

github-actions bot assigned qnixsynapse Aug 4, 2025

qnixsynapse requested review from louis-menlo, urmauur and Minh141120 August 4, 2025 07:08

ellipsis-dev bot reviewed Aug 4, 2025

View reviewed changes

extensions/llamacpp-extension/src/index.ts Outdated Show resolved Hide resolved

fixup cache_type_v comparision

3fb3ac9

ellipsis-dev bot reviewed Aug 4, 2025

View reviewed changes

louis-menlo approved these changes Aug 4, 2025

View reviewed changes

Merge branch 'dev' into chore/llamacpp-cleanup

8ce1bad

qnixsynapse merged commit 5e533bd into dev Aug 4, 2025
7 of 10 checks passed

qnixsynapse deleted the chore/llamacpp-cleanup branch August 4, 2025 14:17

github-project-automation bot moved this to QA in Jan Aug 4, 2025

github-actions bot added this to the v0.6.7 milestone Aug 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Improve llama.cpp argument handling and add device parsing tests #6041

feat: Improve llama.cpp argument handling and add device parsing tests #6041

Uh oh!

qnixsynapse commented Aug 4, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Aug 4, 2025 •

edited

Loading

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

Uh oh!

Uh oh!

feat: Improve llama.cpp argument handling and add device parsing tests #6041

feat: Improve llama.cpp argument handling and add device parsing tests #6041

Uh oh!

Conversation

qnixsynapse commented Aug 4, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe Your Changes

Fixes Issues

Self Checklist

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Barecheck - Code coverage report

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

qnixsynapse commented Aug 4, 2025 •

edited by ellipsis-dev bot

Loading

github-actions bot commented Aug 4, 2025 •

edited

Loading