From e67b3aa29796727c979903dd6e1341386942bee0 Mon Sep 17 00:00:00 2001
From: Todd Malsbary <todd.malsbary@intel.com>
Date: Mon, 11 May 2026 14:49:49 -0700
Subject: [PATCH 1/5] Tidy up SYCL doc a bit

- Add explicit links to referenced items
- Fix spelling errors

Signed-off-by: Todd Malsbary <todd.malsbary@intel.com>
---
 docs/backend/SYCL.md | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/docs/backend/SYCL.md b/docs/backend/SYCL.md
index f66facc856a7..6e28f156bbb3 100644
--- a/docs/backend/SYCL.md
+++ b/docs/backend/SYCL.md
@@ -43,11 +43,11 @@ The following releases are verified and recommended:
 
 ### Ubuntu 24.04
 
-The release packages for Ubuntu 24.04 x64 (FP32/FP16) only include the binary files of the llama.cpp SYCL backend. They require the target machine to have pre-installed Intel GPU drivers and oneAPI packages that are the same version as the build package. To get the version and installation info, refer to release.yml: ubuntu-24-sycl -> Download & Install oneAPI.
+The release packages for Ubuntu 24.04 x64 (FP32/FP16) only include the binary files of the llama.cpp SYCL backend. They require the target machine to have pre-installed Intel GPU drivers and oneAPI packages that are the same version as the build package. To get the version and installation info, refer to [.github/workflows/release.yml#L713](../../.github/workflows/release.yml#L713): ubuntu-24-sycl -> Download & Install oneAPI.
 
-It is recommended to use them with Intel Docker.
+It is recommended to use them with [Intel Docker](https://hub.docker.com/r/intel/deep-learning-essentials).
 
-The packages for FP32 and FP16 would have different accuracy and performance on LLMs. Please choose it acording to the test result.
+The packages for FP32 and FP16 would have different accuracy and performance on LLMs. Please choose it according to the test result.
 
 ## News
 
@@ -190,7 +190,7 @@ docker run -it --rm -v "/path/to/models:/models" --device /dev/dri/renderD128:/d
 
   - **Intel GPU**
 
-Intel data center GPUs drivers installation guide and download page can be found here: [Get intel dGPU Drivers](https://dgpu-docs.intel.com/driver/installation.html#ubuntu-install-steps).
+Intel data center GPUs drivers installation guide and download page can be found here: [Get Intel dGPU Drivers](https://dgpu-docs.intel.com/driver/installation.html#ubuntu-install-steps).
 
 *Note*: for client GPUs *(iGPU & Arc A-Series)*, please refer to the [client iGPU driver installation](https://dgpu-docs.intel.com/driver/client/overview.html).
 
@@ -240,7 +240,7 @@ Please follow the instructions for downloading and installing the Toolkit for Li
 
 Following guidelines/code snippets assume the default installation values. Otherwise, please make sure the necessary changes are reflected where applicable.
 
-Upon a successful installation, SYCL is enabled for the available intel devices, along with relevant libraries such as oneAPI oneDNN for Intel GPUs.
+Upon a successful installation, SYCL is enabled for the available Intel devices, along with relevant libraries such as oneAPI oneDNN for Intel GPUs.
 
 |Verified release|
 |-|
@@ -319,7 +319,7 @@ Similar to the native `sycl-ls`, available SYCL devices can be queried as follow
 ./build/bin/llama-ls-sycl-device
 ```
 
-This command will only display the selected backend that is supported by SYCL. The default backend is level_zero. For example, in a system with 2 *intel GPU* it would look like the following:
+This command will only display the selected backend that is supported by SYCL. The default backend is level_zero. For example, in a system with 2 *Intel GPU* it would look like the following:
 ```
 found 2 SYCL devices:
 
@@ -465,7 +465,7 @@ In the oneAPI command line, run the following to print the available SYCL device
 sycl-ls.exe
 ```
 
-There should be one or more *level-zero* GPU devices displayed as **[ext_oneapi_level_zero:gpu]**. Below is example of such output detecting an *intel Iris Xe* GPU as a Level-zero SYCL device:
+There should be one or more *level-zero* GPU devices displayed as **[ext_oneapi_level_zero:gpu]**. Below is example of such output detecting an *Intel Iris Xe* GPU as a Level-zero SYCL device:
 
 Output (example):
 ```
@@ -731,7 +731,7 @@ use 1 SYCL GPUs: [0] with Max compute units:512
 |-------------------|------------------|---------------------------------------------------------------------------------------------------------------------------|
 | GGML_SYCL_DEBUG   | 0 (default) or 1 | Enable log function by macro: GGML_SYCL_DEBUG                                                                             |
 | GGML_SYCL_ENABLE_FLASH_ATTN | 1 (default) or 0| Enable Flash-Attention. It can reduce memory usage. The performance impact depends on the LLM.|
-| GGML_SYCL_DISABLE_OPT | 0 (default) or 1 | Disable optimize features for Intel GPUs. (Recommended to 1 for intel devices older than Gen 10) |
+| GGML_SYCL_DISABLE_OPT | 0 (default) or 1 | Disable optimize features for Intel GPUs. (Recommended to 1 for Intel devices older than Gen 10) |
 | GGML_SYCL_DISABLE_GRAPH | 0 or 1 (default) | Disable running computations through SYCL Graphs feature. Disabled by default because SYCL Graph is still on development, no better performance. |
 | GGML_SYCL_DISABLE_DNN | 0 (default) or 1 | Disable running computations through oneDNN and always use oneMKL. |
 | ZES_ENABLE_SYSMAN | 0 (default) or 1 | Support to get free memory of GPU by sycl::aspect::ext_intel_free_memory.<br>Recommended to use when --split-mode = layer |
@@ -773,8 +773,8 @@ Pass these via `CXXFLAGS` or add a one-off `#define` to enable a flag on the spo
 
 - `Split-mode:[row]` is not supported.
 
-- Missed the AOT (Ahead-of-Time) in buiding.
-  - Good: build quickly, smaller size of binary file.
+- Missed the AOT (Ahead-of-Time) in building.
+  - Good: Builds quickly, smaller size of binary file.
   - Bad: The startup is slow (JIT) in first time, but subsequent performance is unaffected.
 
 ## Q&A

From 4bf20d90e74405967748b43df051a92cd7c09a0c Mon Sep 17 00:00:00 2001
From: Todd Malsbary <todd.malsbary@intel.com>
Date: Mon, 11 May 2026 14:51:16 -0700
Subject: [PATCH 2/5] Correct documented default for GGML_SYCL_GRAPH

The default is ON, not OFF:

  $ cmake -LAH -B build | grep GGML_SYCL_GRAPH
  ...
  GGML_SYCL_GRAPH:BOOL=ON

Signed-off-by: Todd Malsbary <todd.malsbary@intel.com>
---
 docs/backend/SYCL.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/backend/SYCL.md b/docs/backend/SYCL.md
index 6e28f156bbb3..1bcb4a65a6fd 100644
--- a/docs/backend/SYCL.md
+++ b/docs/backend/SYCL.md
@@ -717,7 +717,7 @@ use 1 SYCL GPUs: [0] with Max compute units:512
 | GGML_SYCL_TARGET   | INTEL *(default)*                     | Set the SYCL target device type.            |
 | GGML_SYCL_DEVICE_ARCH | Optional                           | Set the SYCL device architecture. Setting the device architecture can improve the performance. See the table [--offload-arch](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/OffloadDesign.md#--offload-arch) for a list of valid architectures. |
 | GGML_SYCL_F16      | OFF *(default)* \|ON *(optional)*     | Enable FP16 build with SYCL code path. (1.) |
-| GGML_SYCL_GRAPH    | OFF *(default)* \|ON *(Optional)*     | Enable build with [SYCL Graph extension](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc). |
+| GGML_SYCL_GRAPH    | ON *(default)* \|OFF *(Optional)*     | Enable build with [SYCL Graph extension](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc). |
 | GGML_SYCL_DNN      | ON *(default)* \|OFF *(Optional)*     | Enable build with oneDNN.                   |
 | GGML_SYCL_HOST_MEM_FALLBACK | ON *(default)* \|OFF *(Optional)* | Allow host memory fallback when device memory is full during quantized weight reorder. Enables inference to continue at reduced speed (reading over PCIe) instead of failing. Requires Linux kernel 6.8+. |
 | CMAKE_C_COMPILER   | `icx` *(Linux)*, `icx/cl` *(Windows)* | Set `icx` compiler for SYCL code path.      |

From e25af0b981bd199bb8e8def246c5f88caacc1683 Mon Sep 17 00:00:00 2001
From: Todd Malsbary <todd.malsbary@intel.com>
Date: Tue, 12 May 2026 10:07:17 -0700
Subject: [PATCH 3/5] Move docker instructions from SYCL.md to docker.md

This makes them directly accesible from the Quick Start section
of the top-level README.md.

Signed-off-by: Todd Malsbary <todd.malsbary@intel.com>
---
 docs/backend/SYCL.md | 30 +-----------------------------
 docs/docker.md       | 44 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+), 29 deletions(-)

diff --git a/docs/backend/SYCL.md b/docs/backend/SYCL.md
index 1bcb4a65a6fd..ec6086a995fe 100644
--- a/docs/backend/SYCL.md
+++ b/docs/backend/SYCL.md
@@ -152,35 +152,7 @@ NA
 
 ## Docker
 
-The docker build option is currently limited to *Intel GPU* targets.
-
-### Build image
-
-```sh
-# Using FP32
-docker build -t llama-cpp-sycl --build-arg="GGML_SYCL_F16=OFF" --target light -f .devops/intel.Dockerfile .
-
-# Using FP16
-docker build -t llama-cpp-sycl --build-arg="GGML_SYCL_F16=ON" --target light -f .devops/intel.Dockerfile .
-```
-
-*Notes*:
-
-You can also use the `.devops/llama-server-intel.Dockerfile`, which builds the *"server"* alternative.
-Check the [documentation for Docker](../docker.md) to see the available images.
-
-### Run container
-
-```sh
-# First, find all the DRI cards
-ls -la /dev/dri
-# Then, pick the card that you want to use (here for e.g. /dev/dri/card1).
-docker run -it --rm -v "/path/to/models:/models" --device /dev/dri/renderD128:/dev/dri/renderD128 --device /dev/dri/card0:/dev/dri/card0 llama-cpp-sycl -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -c 4096 -s 0
-```
-
-*Notes:*
-- Docker has been tested successfully on native Linux. WSL support has not been verified yet.
-- You may need to install Intel GPU driver on the **host** machine *(Please refer to the [Linux configuration](#linux) for details)*.
+Please refer to [Docker with SYCL](../docker.md#docker-with-sycl) for details.
 
 ## Linux
 
diff --git a/docs/docker.md b/docs/docker.md
index 7f99bfaad628..cd6cd9806008 100644
--- a/docs/docker.md
+++ b/docs/docker.md
@@ -140,3 +140,47 @@ docker run -v /path/to/models:/models local/llama.cpp:full-musa --run -m /models
 docker run -v /path/to/models:/models local/llama.cpp:light-musa -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
 docker run -v /path/to/models:/models local/llama.cpp:server-musa -m /models/7B/ggml-model-q4_0.gguf --port 8080 --host 0.0.0.0 -n 512 --n-gpu-layers 1
 ```
+
+## Docker With SYCL
+
+## Building Docker locally
+
+```bash
+docker build -t local/llama.cpp:full-intel --target full -f .devops/intel.Dockerfile .
+docker build -t local/llama.cpp:light-intel --target light -f .devops/intel.Dockerfile .
+docker build -t local/llama.cpp:server-intel --target server -f .devops/intel.Dockerfile .
+```
+
+You may want to pass in some different `ARGS`, depending on the SYCL environment supported by your container host, as well as the GPU architecture.
+
+The defaults are:
+
+- `GGML_SYCL_F16` set to `OFF`
+- `IGC_VERSION` set to `v2.30.1`
+- `IGC_VERSION_FULL` set to `2_2.30.1+20950`
+- `COMPUTE_RUNTIME_VERSION` set to `26.09.37435.1`
+- `COMPUTE_RUNTIME_VERSION_FULL` set to `26.09.37435.1-0`
+- `IGDGMM_VERSION` set to `22.9.0`
+
+The resulting images, are essentially the same as the non-SYCL images:
+
+1. `local/llama.cpp:full-intel`: This image includes both the `llama-cli` and `llama-completion` executables and the tools to convert LLaMA models into ggml and convert into 4-bit quantization.
+2. `local/llama.cpp:light-intel`: This image only includes the `llama-cli` and `llama-completion` executables.
+3. `local/llama.cpp:server-intel`: This image only includes the `llama-server` executable.
+
+## Usage
+
+After building locally, usage is similar to the non-SYCL examples, but you'll need to add the `--device` flag.
+
+```bash
+# First, find all the DRI cards
+ls -la /dev/dri
+# Then, pick the card that you want to use (here for e.g. /dev/dri/card0).
+docker run --device /dev/dri/renderD128:/dev/dri/renderD128 --device /dev/dri/card0:/dev/dri/card0 -v /path/to/models:/models local/llama.cpp:full-intel -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 99
+docker run --device /dev/dri/renderD128:/dev/dri/renderD128 --device /dev/dri/card0:/dev/dri/card0 -v /path/to/models:/models local/llama.cpp:light-intel -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 99
+docker run --device /dev/dri/renderD128:/dev/dri/renderD128 --device /dev/dri/card0:/dev/dri/card0 -v /path/to/models:/models local/llama.cpp:server-intel -m /models/7B/ggml-model-q4_0.gguf --port 8080 --host 0.0.0.0 -n 512 --n-gpu-layers 99
+```
+
+*Notes:*
+- Docker has been tested successfully on native Linux. WSL support has not been verified yet.
+- You may need to install Intel GPU driver on the **host** machine *(Please refer to the [Linux configuration](./backend/SYCL.md#linux) for details)*.

From 0e8aed12df8c838630525e3d3ae6de82af2dda07 Mon Sep 17 00:00:00 2001
From: Todd Malsbary <todd.malsbary@intel.com>
Date: Mon, 1 Jun 2026 17:21:52 +0000
Subject: [PATCH 4/5] Refer to intel.Dockerfile for ARGs and their defaults

The defaults are always changing; this avoids accuracy errors
from duplicating the information.

Signed-off-by: Todd Malsbary <todd.malsbary@intel.com>
---
 docs/docker.md | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/docs/docker.md b/docs/docker.md
index cd6cd9806008..b1c6c1f6f9f8 100644
--- a/docs/docker.md
+++ b/docs/docker.md
@@ -152,15 +152,7 @@ docker build -t local/llama.cpp:server-intel --target server -f .devops/intel.Do
 ```
 
 You may want to pass in some different `ARGS`, depending on the SYCL environment supported by your container host, as well as the GPU architecture.
-
-The defaults are:
-
-- `GGML_SYCL_F16` set to `OFF`
-- `IGC_VERSION` set to `v2.30.1`
-- `IGC_VERSION_FULL` set to `2_2.30.1+20950`
-- `COMPUTE_RUNTIME_VERSION` set to `26.09.37435.1`
-- `COMPUTE_RUNTIME_VERSION_FULL` set to `26.09.37435.1-0`
-- `IGDGMM_VERSION` set to `22.9.0`
+Refer to [.devops/intel.Dockerfile](../.devops/intel.Dockerfile) for the available `ARGS` and their defaults.
 
 The resulting images, are essentially the same as the non-SYCL images:
 

From 8e0b2f3ed96af7844c6c4f84cdfb71ed2114edcb Mon Sep 17 00:00:00 2001
From: Todd Malsbary <todd.malsbary@intel.com>
Date: Mon, 1 Jun 2026 17:23:07 +0000
Subject: [PATCH 5/5] Remove mention of Nvidia in SYCL row of backend table

This support was removed in 2026.02 - refer to the SYCL.md News.

Signed-off-by: Todd Malsbary <todd.malsbary@intel.com>
---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index de93f17eb773..a823c5b18283 100644
--- a/README.md
+++ b/README.md
@@ -279,7 +279,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
 | [Metal](docs/build.md#metal-build) | Apple Silicon |
 | [BLAS](docs/build.md#blas-build) | All |
 | [BLIS](docs/backend/BLIS.md) | All |
-| [SYCL](docs/backend/SYCL.md) | Intel and Nvidia GPU |
+| [SYCL](docs/backend/SYCL.md) | Intel GPU |
 | [OpenVINO [In Progress]](docs/backend/OPENVINO.md) | Intel CPUs, GPUs, and NPUs |
 | [MUSA](docs/build.md#musa) | Moore Threads GPU |
 | [CUDA](docs/build.md#cuda) | Nvidia GPU |