opencl: allow large buffer for adreno by lhez · Pull Request #20997 · ggml-org/llama.cpp

lhez · 2026-03-25T16:31:35Z

Overview

OpenCL has a limit on the maximum allocation size for buffers (can be queried using CL_DEVICE_MAX_MEM_ALLOC_SIZE). Some Adreno GPUs allow allocating buffers beyond this limit by using an extension (although it does not guarantee to allocate buffer as large as the entire DRAM). This allows larger compute buffer and larger context.

This PR adds an env var GGML_OPENCL_ADRENO_USE_LARGE_BUFFER to enable this extension. If this env var exists and the GPU is Adreno and this extension is supported, this extension will be used to allocate buffers that go beyond the limit defined by CL_DEVICE_MAX_MEM_ALLOC_SIZE.

Additional information

The extension is cl_qcom_large_buffer. Relevant documentation can be found in Adreno OpenCL SDK documentation (the SDK can be downloaded from https://softwarecenter.qualcomm.com/catalog/item/Adreno_OpenCL_SDK).

Android platform with A7x and A8x GPU should support it. X Elite (Windows) does not support it at the moment. The upcoming X2 Elite (Windows) should also support it.

For example, GGML_OPENCL_ADRENO_USE_LARGE_BUFFER=1 allows Qwen3-0.6B to run on A740 Android device with context length 40960.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: No

Set LM_GGML_OPENCL_ADRENO_USE_LARGE_BUFFER=1 before SoLoader.init so the llama.rn OpenCL backend enables cl_qcom_large_buffer on supported Adreno devices. Non-Adreno devices and drivers without the extension no-op. Closes #657. Upstream: ggml-org/llama.cpp#20997

opencl: allow large buffer for adreno

3528185

github-actions Bot added ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels Mar 25, 2026

max-krasnyansky approved these changes Mar 25, 2026

View reviewed changes

lhez marked this pull request as ready for review March 26, 2026 06:01

lhez requested a review from a team as a code owner March 26, 2026 06:02

lhez requested review from CISC and ggerganov March 26, 2026 06:26

ggerganov approved these changes Mar 26, 2026

View reviewed changes

max-krasnyansky merged commit ded446b into ggml-org:master Mar 26, 2026
49 of 50 checks passed

BlindDeveloper mentioned this pull request Mar 29, 2026

[Feat]: Large buffer support for Adreno gpus a-ghorbani/pocketpal-ai#657

Closed

BlindDeveloper mentioned this pull request Apr 10, 2026

Large buffer support for A7X and A8X gpus mybigday/llama.rn#328

Closed

slartibardfast pushed a commit to slartibardfast/llama.cpp that referenced this pull request Apr 12, 2026

opencl: allow large buffer for adreno (ggml-org#20997)

45ad546

a-ghorbani mentioned this pull request Apr 21, 2026

feat(android): enable Adreno large buffer for A7X/A8X GPUs a-ghorbani/pocketpal-ai#699

Merged

4 tasks

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

opencl: allow large buffer for adreno (ggml-org#20997)

afafa3d

rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026

opencl: allow large buffer for adreno (ggml-org#20997)

a11e5c4

ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026

opencl: allow large buffer for adreno (ggml-org#20997)

f39e5e9

my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026

opencl: allow large buffer for adreno (ggml-org#20997)

d4670dd

my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026

opencl: allow large buffer for adreno (ggml-org#20997)

0ae843e

fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026

opencl: allow large buffer for adreno (ggml-org#20997)

959af63

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opencl: allow large buffer for adreno#20997

opencl: allow large buffer for adreno#20997
max-krasnyansky merged 1 commit into
ggml-org:masterfrom
qualcomm:lh/adreno-large-buffer

lhez commented Mar 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lhez commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Requirements

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lhez commented Mar 25, 2026 •

edited

Loading