Skip to content

Releases: CodeLinaro/llama.cpp

b6775

16 Oct 00:04
7adc79c

Choose a tag to compare

gguf-py : add support for endian conversion of BF16 data (#16594)

BF16 requires special handling in this script
while it's a 2-bytes data, but view is 1-byte by default.
Switch to correct view before attempting byteswapping.

With this change correctly byteswapping models like
Meta-Llama-3-8B-Instruct-bf16-GGUF
should be possible.

b6745

12 Oct 20:54
a31cf36

Choose a tag to compare

metal : add opt_step_adamw and op_sum (#16529)

* scaffold to support opt step adamw on metal (not written so far)

* add opt-step-adamw kernel for metal

* pass op->src[4] as a separate buffer to the pipeline

* add bounds check to opt-step-adamw kernel

* complete scaffold for GGML_OP_SUM

* naive GGML_OP_SUM kernel

* remove unwanted comment

* change OP_SUM capability gate

* Add has_simdgroup_reduction to both ops to pass CI

b6725

10 Oct 04:31
1faa13a

Choose a tag to compare

webui: updated the chat service to only include max_tokens in the req…

b6713

08 Oct 16:39
d2ee056

Choose a tag to compare

server : fix cancel pending task (#16467)

Co-authored-by: DevAI <[email protected]>

b6700

06 Oct 22:52
3df2244

Choose a tag to compare

llama : add --no-host to disable host buffers (#16310)

* implement --no-host to disable host buffer

* fix equal_mparams

* move no-host enumeration order together with other model params

---------

Co-authored-by: slaren <[email protected]>

b6664

01 Oct 23:06
c8dedc9

Choose a tag to compare

CI: reenable cdna in rocm docker builds (#16376)

b6661

01 Oct 20:28
1fe4e38

Choose a tag to compare

ci: Properly install rocwmma for hip builds (#16305)

* CI: Properly install rocwmma for hip builds

on windows we now windows install rocwmma from ubuntu pacakges

* CI: update linux rocm docker build to use rocm 7.0

b6550

22 Sep 18:13
3ecb2f6

Choose a tag to compare

ggml : implement set_rows with i32 index (#16159)

* implement set_rows with i32 index

* template fix

* test quantized path

warnings--

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <[email protected]>

* forgotten name change

* deduplicate cuda/sycl and test-fix

* indent++

* vulkan: support set_rows with i32 index type (#16162)

* disable i32 index for webgpu for now

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Jeff Bolz <[email protected]>

b6451

11 Sep 23:58
360d653

Choose a tag to compare

ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type (#15797)

* ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type

ggml-backend : add device id to device props

llama : only use iGPU devices if there are no GPU devices

llama : do not use multiple devices from different backends with the same device id

b6423

09 Sep 00:51
7057faf

Choose a tag to compare

json : support `enum` values within `allOf` (#15830)