Releases · CodeLinaro/llama.cpp

16 Oct 00:04

7adc79c

b6775 Latest

Latest

gguf-py : add support for endian conversion of BF16 data (#16594)

BF16 requires special handling in this script
while it's a 2-bytes data, but view is 1-byte by default.
Switch to correct view before attempting byteswapping.

With this change correctly byteswapping models like
Meta-Llama-3-8B-Instruct-bf16-GGUF
should be possible.

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-10-16T00:04:20Z
llama-b6775-bin-macos-arm64.zip

sha256:fc1e1f47f80afb4a7f519b7190cd94b73bb25f6199c3ff6056315b9978e22e03

10.4 MB 2025-10-16T00:04:29Z
llama-b6775-bin-macos-x64.zip

sha256:e0bc65b9fa132659b354a642c908aa560ef6f0af7b6dcd413c7d14da00fe0237

27 MB 2025-10-16T00:04:30Z
llama-b6775-bin-ubuntu-vulkan-x64.zip

sha256:79c1794d638310efdd74a2ba00be44d54f9dbefb68612236d601373f1d25b6e2

25.8 MB 2025-10-16T00:04:31Z
llama-b6775-bin-ubuntu-x64.zip

sha256:e66e8aee64ee15ba4b98baad1c499cc49367ac0a34f9b882302369e55621ca30

12.5 MB 2025-10-16T00:04:32Z
llama-b6775-bin-win-cpu-arm64.zip

sha256:39b32565895443c11a1f7223f8c1a31ee8e8af7569acf3b2cf631b5145a33e24

10.6 MB 2025-10-16T00:04:33Z
llama-b6775-bin-win-cpu-x64.zip

sha256:328ceeeaf26eabb4739c4dbcc90e56f877841b37f9cdcde0668fc7fc0a1c14b9

13.7 MB 2025-10-16T00:04:34Z
llama-b6775-bin-win-cuda-12.4-x64.zip

sha256:f0f08a37836365ba76c3dc32080f6e631e5fe94e36ce0dcec9dc0f50ff3778a4

169 MB 2025-10-16T00:04:35Z
llama-b6775-bin-win-hip-radeon-x64.zip

sha256:55d57c7ef72ea1aea6f84a3b2d06f20229b0f5655c788f45618777951ec497a1

321 MB 2025-10-16T00:04:40Z
llama-b6775-bin-win-opencl-adreno-arm64.zip

sha256:5bdf2dba7f37a237da962e2004b9aa3225b26efa909d73c97eaaad8347312a1a

11 MB 2025-10-16T00:04:52Z
Source code (zip)

2025-10-15T20:43:08Z
Source code (tar.gz)

2025-10-15T20:43:08Z

12 Oct 20:54

github-actions

b6745

a31cf36

b6745

metal : add opt_step_adamw and op_sum (#16529)

* scaffold to support opt step adamw on metal (not written so far)

* add opt-step-adamw kernel for metal

* pass op->src[4] as a separate buffer to the pipeline

* add bounds check to opt-step-adamw kernel

* complete scaffold for GGML_OP_SUM

* naive GGML_OP_SUM kernel

* remove unwanted comment

* change OP_SUM capability gate

* Add has_simdgroup_reduction to both ops to pass CI

Assets 15

10 Oct 04:31

github-actions

b6725

1faa13a

b6725

webui: updated the chat service to only include max_tokens in the req…

Assets 15

08 Oct 16:39

github-actions

b6713

d2ee056

b6713

server : fix cancel pending task (#16467)

Co-authored-by: DevAI <[email protected]>

Assets 15

06 Oct 22:52

github-actions

b6700

3df2244

b6700

llama : add --no-host to disable host buffers (#16310)

* implement --no-host to disable host buffer

* fix equal_mparams

* move no-host enumeration order together with other model params

---------

Co-authored-by: slaren <[email protected]>

Assets 15

01 Oct 23:06

github-actions

b6664

c8dedc9

b6664

CI: reenable cdna in rocm docker builds (#16376)

Assets 15

01 Oct 20:28

github-actions

b6661

1fe4e38

b6661

ci: Properly install rocwmma for hip builds (#16305)

* CI: Properly install rocwmma for hip builds

on windows we now windows install rocwmma from ubuntu pacakges

* CI: update linux rocm docker build to use rocm 7.0

Assets 15

22 Sep 18:13

github-actions

b6550

3ecb2f6

b6550

ggml : implement set_rows with i32 index (#16159)

* implement set_rows with i32 index

* template fix

* test quantized path

warnings--

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <[email protected]>

* forgotten name change

* deduplicate cuda/sycl and test-fix

* indent++

* vulkan: support set_rows with i32 index type (#16162)

* disable i32 index for webgpu for now

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Jeff Bolz <[email protected]>

Assets 15

11 Sep 23:58

github-actions

b6451

360d653

b6451

ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type (#15797)

* ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type

ggml-backend : add device id to device props

llama : only use iGPU devices if there are no GPU devices

llama : do not use multiple devices from different backends with the same device id

Assets 15

09 Sep 00:51

github-actions

b6423

7057faf

b6423

json : support `enum` values within `allOf` (#15830)

Assets 15

Releases: CodeLinaro/llama.cpp

b6775

Uh oh!

b6745

Uh oh!

b6725

Uh oh!

b6713

Uh oh!

b6700

Uh oh!

b6664

Uh oh!

b6661

Uh oh!

b6550

Uh oh!

b6451

Uh oh!

b6423

Uh oh!