You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* feat: Improve llama.cpp argument handling and add device parsing tests
This commit refactors how arguments are passed to llama.cpp,
specifically by only adding arguments when their values differ from
their defaults. This reduces the verbosity of the command and prevents
potential conflicts or errors when llama.cpp's default behavior aligns
with the desired setting.
Additionally, new tests have been added for parsing device output from
llama.cpp, ensuring the accurate extraction of GPU information (ID,
name, total memory, and free memory). This improves the robustness of
device detection.
The following changes were made:
* **Remove redundant `--ctx-size` argument:** The `--ctx-size`
argument is now only explicitly added if `cfg.ctx_size` is greater
than 0.
* **Conditional argument adding for default values:**
* `--split-mode` is only added if `cfg.split_mode` is not empty
and not 'layer'.
* `--main-gpu` is only added if `cfg.main_gpu` is not undefined
and not 0.
* `--cache-type-k` is only added if `cfg.cache_type_k` is not 'f16'.
* `--cache-type-v` is only added if `cfg.cache_type_v` is not 'f16'
(when `flash_attn` is enabled) or not 'f32' (otherwise). This
also corrects the `flash_attn` condition.
* `--defrag-thold` is only added if `cfg.defrag_thold` is not 0.1.
* `--rope-scaling` is only added if `cfg.rope_scaling` is not
'none'.
* `--rope-scale` is only added if `cfg.rope_scale` is not 1.
* `--rope-freq-base` is only added if `cfg.rope_freq_base` is not 0.
* `--rope-freq-scale` is only added if `cfg.rope_freq_scale` is
not 1.
* **Add `parse_device_output` tests:** Comprehensive unit tests were
added to `src-tauri/src/core/utils/extensions/inference_llamacpp_extension/server.rs`
to validate the parsing of llama.cpp device output under various
scenarios, including multiple devices, single devices, different
backends (CUDA, Vulkan, SYCL), complex GPU names, and error
conditions.
* fixup cache_type_v comparision
0 commit comments