Releases: CodeLinaro/llama.cpp
Releases · CodeLinaro/llama.cpp
b6775
gguf-py : add support for endian conversion of BF16 data (#16594) BF16 requires special handling in this script while it's a 2-bytes data, but view is 1-byte by default. Switch to correct view before attempting byteswapping. With this change correctly byteswapping models like Meta-Llama-3-8B-Instruct-bf16-GGUF should be possible.
b6745
metal : add opt_step_adamw and op_sum (#16529) * scaffold to support opt step adamw on metal (not written so far) * add opt-step-adamw kernel for metal * pass op->src[4] as a separate buffer to the pipeline * add bounds check to opt-step-adamw kernel * complete scaffold for GGML_OP_SUM * naive GGML_OP_SUM kernel * remove unwanted comment * change OP_SUM capability gate * Add has_simdgroup_reduction to both ops to pass CI
b6725
webui: updated the chat service to only include max_tokens in the req…
b6713
server : fix cancel pending task (#16467) Co-authored-by: DevAI <[email protected]>
b6700
llama : add --no-host to disable host buffers (#16310) * implement --no-host to disable host buffer * fix equal_mparams * move no-host enumeration order together with other model params --------- Co-authored-by: slaren <[email protected]>
b6664
CI: reenable cdna in rocm docker builds (#16376)
b6661
ci: Properly install rocwmma for hip builds (#16305) * CI: Properly install rocwmma for hip builds on windows we now windows install rocwmma from ubuntu pacakges * CI: update linux rocm docker build to use rocm 7.0
b6550
ggml : implement set_rows with i32 index (#16159) * implement set_rows with i32 index * template fix * test quantized path warnings-- * Apply suggestions from code review Co-authored-by: Georgi Gerganov <[email protected]> * forgotten name change * deduplicate cuda/sycl and test-fix * indent++ * vulkan: support set_rows with i32 index type (#16162) * disable i32 index for webgpu for now --------- Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: Jeff Bolz <[email protected]>
b6451
ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type (#15797) * ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type ggml-backend : add device id to device props llama : only use iGPU devices if there are no GPU devices llama : do not use multiple devices from different backends with the same device id
b6423
json : support `enum` values within `allOf` (#15830)