Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
320 commits
Select commit Hold shift + click to select a range
fd32436
Update build doc
wine99 May 20, 2025
8ce5cc5
Add cgraph tensor output name to OV op name
wine99 May 22, 2025
3051d5a
Update openvino build instructions
ravi9 May 29, 2025
7fec223
Add initial NPU support
wine99 May 27, 2025
34531ab
draft NPU support version 2: prefill + kvcache
wine99 May 29, 2025
d9ca8f5
NPU support version 2: prefill + kvcache
wine99 Jun 3, 2025
f7ad779
Change due to ggml cgraph changes, not correct yet
wine99 Jun 4, 2025
592d7f8
Change due to ggml cgraph changes, llama-3.2 CPU work
wine99 Jun 16, 2025
e27738a
Add AMD64 to CMakeLists
wine99 Jun 16, 2025
42d4240
Change due to ggml cgraph changes, all device work
wine99 Jun 16, 2025
593484c
Refactor: clean, fix warning
wine99 Jun 20, 2025
8afee79
Update clang-format
wine99 Jun 23, 2025
4c582ac
Statful transformation for CPU GPU
wine99 Jun 26, 2025
73ee84f
Add SwiGLU
wine99 Jul 3, 2025
ebc4fc9
Fuse to SDPA
wine99 Jul 3, 2025
bf5414c
Replace Concat with Broadcast in MulMat for GQA
wine99 Jul 4, 2025
acf358d
Pull out indices creation for kv cache update
wine99 Jul 6, 2025
0fa7a5e
Refactor: remove past_token_len from extra_inputs
wine99 Jul 9, 2025
3533c14
Fix Phi3 SwiGLU and SoftMax
wine99 Jul 9, 2025
a80da69
Pull out sin cos from rope
wine99 Jul 9, 2025
f3c0519
Reduce memory: free ov weights node after graph conversion
wine99 Jul 11, 2025
d61f83c
Fix CPY due to cgraph change
wine99 Jul 17, 2025
ea75772
Added OpenVINO CI/CD. Updated docs
ravi9 Jul 18, 2025
1ed49bb
Fix llama-cli
wine99 Jul 23, 2025
44f4cf3
Fix Phi3 ROPE; Add test-backend-ops
wine99 Jul 21, 2025
6dc4b90
Fix NPU
wine99 Jul 23, 2025
75eec62
Fix llama-bench; Clang-format
wine99 Jul 24, 2025
4e7f04a
Fix llama-perplexity
wine99 Jul 24, 2025
9cf56d6
temp. changes for mark decomp
cavusmustafa Jul 29, 2025
01cdf4a
matmul in fp32
wine99 Jul 29, 2025
e2fdc1b
mulmat input conversion fix
cavusmustafa Jul 30, 2025
93b2d09
mulmat type conversion update
cavusmustafa Jul 30, 2025
1a19566
add mark decomp pass
cavusmustafa Jul 30, 2025
43489bb
Revert changes in fuse_to_sdpa
wine99 Jul 30, 2025
2f99135
Update build.md
ravi9 Jul 31, 2025
fc86534
Fix test-backend-ops
wine99 Jul 31, 2025
1141350
Skip test-thread-safety; Run ctest only in ci/run.sh
wine99 Jul 31, 2025
37ff226
Use CiD for NPU
wine99 Aug 1, 2025
9a91ca6
Optimize tensor conversion, improve TTFT
wine99 Aug 4, 2025
63d000b
Support op SET_ROWS
wine99 Aug 13, 2025
7bda502
Fix NPU
wine99 Aug 14, 2025
839f8c6
Remove CPY
wine99 Aug 14, 2025
f4123be
Fix test-backend-ops
wine99 Aug 14, 2025
a7b611b
Minor updates for raising PR
wine99 Aug 14, 2025
14c8a85
Perf: RMS fused to OV internal RMS op
wine99 Aug 27, 2025
65e1b1a
Fix after rebasing
wine99 Sep 4, 2025
56d5967
Change openvino device_type to GPU; Enable flash_attn
wine99 Sep 5, 2025
3e897df
Update supports_buft and supports_op for quantized models
wine99 Aug 5, 2025
d4ca760
Add quant weight conversion functions from genai gguf reader
wine99 Aug 5, 2025
663a0b8
Quant models run with accuracy issue
wine99 Aug 6, 2025
6ab76ed
Fix accuracy: disable cpu_repack
wine99 Aug 7, 2025
dd80b04
Fix CI; Disable test-backend-ops
wine99 Aug 7, 2025
a1ce428
Fix Q4_1
wine99 Aug 8, 2025
9900245
Fix test-backend-ops: Treat quantized tensors as weights
wine99 Aug 12, 2025
9ca53c7
Add NPU Q4_0 support
wine99 Aug 19, 2025
82c9833
NPU perf: eliminate zp
wine99 Aug 22, 2025
b593428
Dequantize q4_1 q4_k q6_k for NPU
wine99 Aug 29, 2025
6926655
Add custom quant type: q8_1_c, q4_0_128
wine99 Sep 2, 2025
c5231a2
Set m_is_static=false as default in decoder
wine99 Sep 2, 2025
810eb48
Simpilfy translation of get_rows
wine99 Sep 2, 2025
0f7b253
Fix after rebasing
wine99 Sep 8, 2025
2ad1147
Improve debug util; Eliminate nop ReshapeReshape
wine99 Sep 10, 2025
dc77cbb
STYLE: make get_types_to_requant a function
wine99 Sep 10, 2025
bcc343a
Support BF16 model
wine99 Sep 11, 2025
434059a
Fix NPU compile
wine99 Sep 12, 2025
da2cc99
WA for npu 1st token acc issue
wine99 Sep 12, 2025
be07073
Apply EliminateZP only for npu
wine99 Sep 12, 2025
5975612
Add GeGLU
wine99 Sep 15, 2025
7d81861
Fix Hunyuan
wine99 Sep 15, 2025
9de874c
Support iSWA
wine99 Sep 16, 2025
602f9ca
Fix NPU accuracy
wine99 Sep 17, 2025
1a38339
Fix ROPE accuracy when freq_scale != 1
wine99 Sep 17, 2025
67e178a
Minor: not add attention_size_swa for non-swa model
wine99 Sep 17, 2025
2f1d50f
Minor refactor
wine99 Sep 19, 2025
e4bfe5a
Add Q5_K to support phi-3-q4_k_m
wine99 Sep 23, 2025
f3afa7b
Requantize Q6_K (gs16) to gs32 on GPU
wine99 Sep 26, 2025
fdadca1
Fix after rebasing
wine99 Sep 28, 2025
973a80f
Always apply Eliminate_ZP to fix GPU compile issue on some platforms
wine99 Sep 28, 2025
c112bc4
kvcachefusion support
cavusmustafa Oct 1, 2025
e725292
env variable GGML_OPENVINO_DISABLE_SDPA_OPTIMIZATION added
cavusmustafa Oct 1, 2025
05d7aba
Fix for Phi3
cavusmustafa Oct 2, 2025
a9371ea
Fix llama-cli (need to run with --no-warmup)
wine99 Oct 9, 2025
8b82d11
Fix add_sliced_mask; Revert mulmat, softmax; Remove input attention_s…
wine99 Oct 10, 2025
299f492
fix after rebasing
wine99 Oct 11, 2025
2d2f00a
Fix llama-3-8b and phi3-mini q4_0 NPU
wine99 Oct 14, 2025
841d673
Update to OV-2025.3 and CMakeLists.txt
ravi9 Oct 15, 2025
4c8406e
Add OV CI cache
wine99 Oct 15, 2025
38e8a19
Apply CISC review and update CI to OV2025.3
ravi9 Oct 15, 2025
45af912
Update CI to run OV dep install before build
ravi9 Oct 15, 2025
3a1129e
Update OV dockerfile to use OV2025.3 and update build docs
ravi9 Oct 15, 2025
bd3093f
Style: use switch in supports_ops
wine99 Oct 21, 2025
eba8113
Style: middle ptr and ref align, omit optional struct keyword
wine99 Oct 21, 2025
b8690bc
NPU Unify PD (#14)
wine99 Nov 4, 2025
303923a
Clean placeholders in ggml-openvino.cpp
wine99 Oct 21, 2025
ea2c99b
NPU unify PD (handled internally)
wine99 Nov 5, 2025
072dde0
change graph to 4d, support multi sequences
wine99 Nov 20, 2025
ae404f7
Fix llama-bench
wine99 Nov 20, 2025
531941b
Fix NPU
wine99 Nov 24, 2025
047bfb5
Update ggml-decoder.cpp
I-N-T-E-L Nov 20, 2025
11b4cc5
Update ggml-decoder.cpp
I-N-T-E-L Nov 20, 2025
bed4952
Update ggml-decoder.cpp
I-N-T-E-L Nov 20, 2025
4a57b37
Update ggml-decoder.cpp
I-N-T-E-L Nov 20, 2025
98396b2
Update ggml-decoder.cpp
I-N-T-E-L Nov 20, 2025
4400b5c
Update ggml-decoder.cpp
I-N-T-E-L Nov 20, 2025
ae93651
Remove the second decoder for node. Moving the function into the mode…
zhaixuejun1993 Nov 26, 2025
992dea7
Fix error for naive
zhaixuejun1993 Nov 26, 2025
38254cf
NPU prefill chunking
wine99 Dec 1, 2025
59e7e7c
NPU fix llama-bench
wine99 Dec 3, 2025
65348b5
fallback naive run with accuracy issue
wine99 Nov 27, 2025
808619e
NPU support llma-perplexity -b 512 --no-warmup
wine99 Dec 3, 2025
2a9d4ca
Refactor: split ov_graph_compute for dynamic and static
wine99 Dec 4, 2025
0ea8238
remove unused API GgmlOvDecoder::get_output_stride(const std::string …
zhaixuejun1993 Dec 4, 2025
8f4ee4e
minor update due to ov 2025.4
wine99 Dec 4, 2025
497964a
remove unused API GgmlOvDecoder::get_output_names()
zhaixuejun1993 Dec 4, 2025
f516db1
remove unused API get_output_shape(const std::string & name)
zhaixuejun1993 Dec 4, 2025
6d7a0d6
Modified API GgmlOvDecoder::get_output_type(const std::string & name)
zhaixuejun1993 Dec 4, 2025
ba852f2
Removed API GgmlOvDecoder::get_output_op_params(const std::string & n…
zhaixuejun1993 Dec 4, 2025
111c96c
Removed API get_output_ggml_tensor(const std::string & name)
zhaixuejun1993 Dec 4, 2025
8ff73e5
Removed API m_outputs
zhaixuejun1993 Dec 4, 2025
197ed99
Removed m_output_names
zhaixuejun1993 Dec 4, 2025
95c3071
Removed API GgmlOvDecoder::get_input_names()
zhaixuejun1993 Dec 4, 2025
cd61178
Removed API GgmlOvDecoder::get_input_stride(const std::string& name)
zhaixuejun1993 Dec 4, 2025
891a3be
Removed API get_input_type
zhaixuejun1993 Dec 4, 2025
42ca27f
Removed API get_input_type
zhaixuejun1993 Dec 4, 2025
acb8a01
Removed API GgmlOvDecoder::get_input_shape(const std::string & name)
zhaixuejun1993 Dec 4, 2025
47c91db
Removed API GgmlOvDecoder::get_input_op_params(const std::string & name)
zhaixuejun1993 Dec 4, 2025
91a1b20
Fix error for decoder cache
zhaixuejun1993 Dec 5, 2025
28da9a9
Reuse cached decoder
wine99 Dec 5, 2025
469325c
GPU remove Q6_K requantization
wine99 Dec 8, 2025
ae01322
NPU fix wrong model output shape
wine99 Dec 8, 2025
c9234b4
NPU fix q4 perf regression
wine99 Dec 8, 2025
9e3163e
Remove unused variable nodes
zhaixuejun1993 Dec 10, 2025
0ef2e5e
Fix decoder can_reuse for llama-bench
wine99 Dec 11, 2025
ae53363
Update build.md for Windows
I-N-T-E-L Dec 26, 2025
22d9c17
backend buffer: allocate on host
wine99 Dec 18, 2025
72bba82
Use shared_buffer for GPU NPU; Refactor
wine99 Dec 18, 2025
3fdcb6a
Add ov_backend_host_buffer; Use cached remote context
wine99 Dec 19, 2025
d757849
Put kvcache on GPU
wine99 Dec 22, 2025
8273a7c
Use ggml_aligned_malloc
wine99 Dec 24, 2025
88d1d17
only use remote tensor for kvcache
wine99 Dec 25, 2025
a356b44
only use remote tensor for kvcache for GPU
wine99 Dec 25, 2025
cfc4713
FIX: use remote tensor from singleton
wine99 Dec 26, 2025
52a4401
Update build.md to include OpenCL
wine99 Dec 26, 2025
c1142dd
NPU always requant to q4_0_128
wine99 Dec 26, 2025
67c9720
Optimize symmetric quant weight extraction: use single zp
wine99 Dec 29, 2025
4e45177
Use Q8_0_C in token embd, lm_head, and for 5 and 6 bits quant
wine99 Dec 29, 2025
f5c71e3
Update build.md
wine99 Dec 30, 2025
0d6f253
Support -ctk f32
wine99 Jan 7, 2026
5f30eac
Initial stateful graph support
cavusmustafa Jan 8, 2026
d2fc152
Update ggml/src/ggml-openvino/ggml-decoder.cpp
cavusmustafa Jan 9, 2026
981ec65
code cleanup
cavusmustafa Jan 9, 2026
a40a5df
npu perf fix
cavusmustafa Jan 9, 2026
a81b202
requant to f16 for Q6 embed on NPU
cavusmustafa Jan 12, 2026
a92ecee
Update ggml/src/ggml-openvino/ggml-decoder.cpp
cavusmustafa Jan 13, 2026
599335c
Update ggml/src/ggml-openvino/ggml-openvino-extra.cpp
cavusmustafa Jan 13, 2026
416556a
Create OPENVINO.md in llama.cpp backend docs
ynimmaga Jan 13, 2026
25e6525
Update OPENVINO.md
ynimmaga Jan 13, 2026
9ba3247
Update OPENVINO.md
ynimmaga Jan 13, 2026
61552e4
Update OPENVINO.md
ynimmaga Jan 13, 2026
63eed0d
Update build.md
ynimmaga Jan 13, 2026
f44c60e
Update OPENVINO.md
ynimmaga Jan 13, 2026
e9ed5c4
Update OPENVINO.md
ynimmaga Jan 13, 2026
d3649c1
Update OPENVINO.md
ynimmaga Jan 13, 2026
d7dccf8
kq_mask naming fix
cavusmustafa Jan 15, 2026
aa4bc90
Syntax correction for workflows build file
cavusmustafa Jan 16, 2026
9a15c8b
Change ov backend buffer is_host to false
wine99 Jan 21, 2026
8fb20b2
Fix llama-bench -p -n where p<=256
wine99 Jan 22, 2026
1c0a47a
Fix --direct-io 0
wine99 Jan 22, 2026
c840210
Don't put kvcache on GPU in stateful mode
wine99 Jan 24, 2026
d398214
Remove hardcode names
wine99 Jan 23, 2026
26328fe
Fix stateful shapes
wine99 Jan 23, 2026
3259921
Simplification for stateful and update output shape processing
cavusmustafa Jan 21, 2026
18ab0f5
Remove hardcode names
wine99 Feb 3, 2026
b6c0697
Avoid re-compilation in llama-bench
wine99 Feb 4, 2026
0ee7e05
Extract zp directly instead of bias
wine99 Feb 5, 2026
900dd76
Refactor weight tensor processing
wine99 Feb 6, 2026
7b3b65b
Merge branch 'master' into dev_backend_openvino
wine99 Feb 11, 2026
1d4ec1b
create_weight_node accept non-ov backend buffer
wine99 Feb 11, 2026
e059015
remove changes in llama-graph.cpp
wine99 Feb 11, 2026
0d74aba
stateful masking fix (#38)
cavusmustafa Feb 12, 2026
d5d673c
Fix test-backend-ops crash glu, get_rows, scale, rms_norm, add
wine99 Feb 12, 2026
59e7d73
hardcoded name handling for rope_freqs.weight
cavusmustafa Feb 13, 2026
1a54965
Suppress logging and add error handling to allow test-backend-ops to …
wine99 Feb 13, 2026
2a6a95e
Fix MUL_MAT with broadcast; Add unsupported MUL_MAT FLASH_ATTN cases
wine99 Feb 13, 2026
5525bac
Use bias instead of zp in test-backend-ops
wine99 Feb 13, 2026
76775a5
Merge pull request #43 from cavusmustafa/additional_fixes_after_rebase
cavusmustafa Feb 14, 2026
4c1fdd3
Update OV in CI, Add OV CI Tests in GH Actions
ravi9 Feb 18, 2026
ae8a140
Temp fix for multithreading bug
cavusmustafa Feb 18, 2026
20ecf4b
Update OV CI, fix review suggestions.
ravi9 Feb 19, 2026
cb92f77
Merge pull request #45 from cavusmustafa/tmp_fix_multithread
ravi9 Feb 19, 2026
a8e894d
fix editorconfig-checker, update docs
ravi9 Feb 19, 2026
19e4f31
Fix tabs to spaces for editorconfig-checker
ravi9 Feb 19, 2026
c6ee7c5
fix editorconfig-checker
ravi9 Feb 19, 2026
7d4d311
Update docs
ravi9 Feb 19, 2026
ed91be2
updated model link to be GGUF model links
cavusmustafa Feb 21, 2026
016aa26
Remove GGML_CPU_REPACK=OFF
wine99 Feb 25, 2026
21b796b
Merge branch 'master' into dev_backend_openvino
wine99 Feb 26, 2026
214838e
Skip permuted ADD and MUL
wine99 Feb 26, 2026
40d2bb2
Removed static variables from utils.cpp
cavusmustafa Feb 26, 2026
41179c0
Removed initializing non-existing variable
cavusmustafa Feb 27, 2026
252ef84
Remove unused structs
wine99 Feb 27, 2026
56e89f8
Merge pull request #1 from wine99/remove_static_variables
cavusmustafa Feb 27, 2026
240692b
Removed static variables from utils.cpp
wine99 Feb 27, 2026
046669e
Fix test-backend-ops for OV GPU
wine99 Feb 27, 2026
18f0ad7
unify api calling
zhaixuejun1993 Feb 28, 2026
2e025bf
Update utils.cpp
zhaixuejun1993 Feb 28, 2026
8fae1b9
When the dim is dynamic, throw an error, need to is stastic forst
zhaixuejun1993 Feb 28, 2026
fe6a7ed
Add interface compute_model_outputs(), which get the model output thr…
zhaixuejun1993 Mar 2, 2026
603e6d2
No need to return
zhaixuejun1993 Mar 2, 2026
43ca96a
Merge branch 'master' into dev_backend_openvino
wine99 Mar 3, 2026
d25211e
Fix test-backend-ops for OV GPU LNL
wine99 Mar 2, 2026
5f0a68c
Fix test-thread-safety
wine99 Mar 3, 2026
0b32b7f
use the shape from infer request of output tensor create to avoid issue
Mar 3, 2026
183f36f
fix dynamic output shape issue
Mar 3, 2026
bc87902
Merge pull request #49 from zhaixuejun1993/xuejun/unify-api-get_ov_ou…
wine99 Mar 3, 2026
198e932
fix issue for the unused node in tests
Mar 3, 2026
f274f63
Rewrite the logistic about model outputs computer, add new API comput…
wine99 Mar 4, 2026
e8dc98b
Remove unused lock
wine99 Mar 4, 2026
6e9dc50
Merge branch 'dev_backend_openvino' into fix-thread-safety
wine99 Mar 4, 2026
03d03b8
Fix test-thread-safety
wine99 Mar 4, 2026
cfb395d
Add comment
wine99 Mar 4, 2026
415c9b3
Fix test-backend-ops for OV GPU LNL
wine99 Mar 4, 2026
ba61424
Merge pull request #46 from cavusmustafa/fix-readme-model-links
ravi9 Mar 4, 2026
aef9e62
Update openvino docs
ravi9 Mar 4, 2026
9d3d2c4
update to OV release version 2026.0
ravi9 Mar 4, 2026
36ce914
add ci ov-gpu self hosted runner
ravi9 Mar 4, 2026
091c58f
fix editorconfig
ravi9 Mar 4, 2026
a613e8b
Fix perplexity
wine99 Mar 4, 2026
82051a9
Rewrite the model inputs finding mechanism (#54)
zhaixuejun1993 Mar 6, 2026
db97626
Put the iteration logistic in func
zhaixuejun1993 Mar 5, 2026
42a1cb5
Added ggml-ci-intel-openvino-gpu and doc update
ravi9 Mar 6, 2026
6d1f94d
.hpp files converted to .h
cavusmustafa Mar 6, 2026
c29ccc4
Merge pull request #57 from cavusmustafa/hpp_to_h
cavusmustafa Mar 6, 2026
f71cc59
fix ggml-ci-x64-intel-openvino-gpu
ravi9 Mar 7, 2026
eae534e
Fix for stateful execution bug in llama-bench
cavusmustafa Mar 7, 2026
c2c4211
Minor updates after stateful llama-bench fix
cavusmustafa Mar 7, 2026
7b93c50
Update ggml/src/ggml-openvino/utils.cpp
cavusmustafa Mar 7, 2026
29c217a
Remove multiple get_shape calls
cavusmustafa Mar 7, 2026
8616f12
Bring back mutex into compute
cavusmustafa Mar 7, 2026
0480c2c
Fix VIEW op, which slice the input node
zhaixuejun1993 Mar 10, 2026
f5304c6
Added token_len_per_seq existence check before slicing masks and move…
zhaixuejun1993 Mar 10, 2026
481d938
Merge pull request #60 from zhaixuejun1993/xuejun/hot-fix-llama-embed…
cavusmustafa Mar 10, 2026
e646c85
Temp. fix for test requant errors
cavusmustafa Mar 10, 2026
1cf0716
Merge pull request #62 from zhaixuejun1993/xuejun/fix_issue_key_miss
cavusmustafa Mar 10, 2026
409cc8e
Merge pull request #58 from cavusmustafa/fix_stateful_state_sync
cavusmustafa Mar 10, 2026
bb40ee8
Update to OV ggml-ci to low-perf
ravi9 Mar 12, 2026
0aaf8ab
ci : temporary disable "test-llama-archs"
ggerganov Mar 13, 2026
e73b4d4
ci : cache v4 -> v5, checkout v4 -> v6, fix runner tag
ggerganov Mar 13, 2026
5237965
docs : update url
ggerganov Mar 13, 2026
996b739
Fix OV link in docker and Update docs
ravi9 Mar 13, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
138 changes: 138 additions & 0 deletions .devops/openvino.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
ARG OPENVINO_VERSION_MAJOR=2026.0
ARG OPENVINO_VERSION_FULL=2026.0.0.20965.c6d6a13a886
ARG UBUNTU_VERSION=24.04

# Optional proxy build arguments - empty by default
ARG http_proxy=
ARG https_proxy=

## Build Image
FROM ubuntu:${UBUNTU_VERSION} AS build

# Pass proxy args to build stage
ARG http_proxy
ARG https_proxy

RUN apt-get update && \
apt-get install -y --no-install-recommends \
ca-certificates \
gnupg \
wget \
git \
cmake \
ninja-build \
build-essential \
libtbb12 \
libssl-dev \
ocl-icd-opencl-dev \
opencl-headers \
opencl-clhpp-headers \
intel-opencl-icd && \
rm -rf /var/lib/apt/lists/*

# Install OpenVINO for Ubuntu 24.04
ARG OPENVINO_VERSION_MAJOR
ARG OPENVINO_VERSION_FULL
RUN mkdir -p /opt/intel && \
wget https://storage.openvinotoolkit.org/repositories/openvino/packages/${OPENVINO_VERSION_MAJOR}/linux/openvino_toolkit_ubuntu24_${OPENVINO_VERSION_FULL}_x86_64.tgz && \
tar -xf openvino_toolkit_ubuntu24_${OPENVINO_VERSION_FULL}_x86_64.tgz && \
mv openvino_toolkit_ubuntu24_${OPENVINO_VERSION_FULL}_x86_64 /opt/intel/openvino_${OPENVINO_VERSION_MAJOR} && \
cd /opt/intel/openvino_${OPENVINO_VERSION_MAJOR} && \
echo "Y" | ./install_dependencies/install_openvino_dependencies.sh && \
cd - && \
ln -s /opt/intel/openvino_${OPENVINO_VERSION_MAJOR} /opt/intel/openvino

ENV OpenVINO_DIR=/opt/intel/openvino

WORKDIR /app

COPY . .

# Build Stage
RUN bash -c "source ${OpenVINO_DIR}/setupvars.sh && \
cmake -B build/ReleaseOV -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_OPENVINO=ON && \
cmake --build build/ReleaseOV -j$(nproc)"

# Copy all necessary libraries
RUN mkdir -p /app/lib && \
find build/ReleaseOV -name '*.so*' -exec cp {} /app/lib \; && \
find ${OpenVINO_DIR}/runtime/lib/intel64 -name '*.so*' -exec cp -P {} /app/lib \; 2>/dev/null || \
find ${OpenVINO_DIR}/lib/intel64 -name '*.so*' -exec cp -P {} /app/lib \;

# Create runtime directories and copy binaries
RUN mkdir -p /app/full \
&& cp build/ReleaseOV/bin/* /app/full/ \
&& cp *.py /app/full \
&& cp -r gguf-py /app/full \
&& cp -r requirements /app/full \
&& cp requirements.txt /app/full \
&& cp .devops/tools.sh /app/full/tools.sh

## Base Runtime Image
FROM ubuntu:${UBUNTU_VERSION} AS base

# Pass proxy args to runtime stage
ARG http_proxy
ARG https_proxy

RUN apt-get update \
&& apt-get install -y libgomp1 libtbb12 curl\
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
&& find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
&& find /var/cache -type f -delete

COPY --from=build /app/lib/ /app/

### Full (all binaries)
FROM base AS full

ARG http_proxy
ARG https_proxy

COPY --from=build /app/full /app/

WORKDIR /app

RUN apt-get update && \
apt-get install -y --no-install-recommends \
git \
python3 \
python3-venv \
python3-pip && \
python3 -m venv /ov-venv && \
/ov-venv/bin/pip install --no-cache-dir --upgrade pip setuptools wheel && \
/ov-venv/bin/pip install --no-cache-dir -r requirements.txt && \
apt-get autoremove -y && \
apt-get clean && \
rm -rf /tmp/* /var/tmp/* && \
find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete && \
find /var/cache -type f -delete

ENTRYPOINT ["/bin/bash", "-c", "source /ov-venv/bin/activate && exec /app/tools.sh \"$@\"", "--"]


### Light, CLI only
FROM base AS light

COPY --from=build /app/full/llama-cli /app/

WORKDIR /app

ENTRYPOINT [ "/app/llama-cli" ]

### Server, Server only
FROM base AS server

ENV LLAMA_ARG_HOST=0.0.0.0

COPY --from=build /app/full/llama-server /app/

WORKDIR /app

HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ]

ENTRYPOINT [ "/app/llama-server" ]
25 changes: 25 additions & 0 deletions .github/actions/linux-setup-openvino/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: "Linux - Setup OpenVINO Toolkit"
description: "Setup OpenVINO Toolkit for Linux"
inputs:
path:
description: "Installation path"
required: true
version_major:
description: "OpenVINO major version (e.g., 2025.3)"
required: true
version_full:
description: "OpenVINO full version (e.g., 2025.3.0.19807.44526285f24)"
required: true

runs:
using: "composite"
steps:
- name: Setup OpenVINO Toolkit
id: setup
uses: ./.github/actions/unarchive-tar
with:
url: https://storage.openvinotoolkit.org/repositories/openvino/packages/${{ inputs.version_major }}/linux/openvino_toolkit_ubuntu24_${{ inputs.version_full }}_x86_64.tgz
path: ${{ inputs.path }}
type: z
strip: 1

28 changes: 28 additions & 0 deletions .github/workflows/build-cache.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,34 @@ jobs:
path: ./spacemit_toolchain
version: ${{ env.SPACEMIT_IME_TOOLCHAIN_VERSION }}

ubuntu-24-openvino-cache:
runs-on: ubuntu-24.04

env:
# Sync versions in build.yml, release.yml, build-cache.yml, .devops/openvino.Dockerfile
OPENVINO_VERSION_MAJOR: "2026.0"
OPENVINO_VERSION_FULL: "2026.0.0.20965.c6d6a13a886"

steps:
- name: Clone
id: checkout
uses: actions/checkout@v6

- name: Setup Cache
uses: actions/cache@v5
id: cache-openvino
with:
path: ./openvino_toolkit
key: openvino-toolkit-v${{ env.OPENVINO_VERSION_FULL }}-${{ runner.os }}

- name: Setup OpenVINO Toolkit
if: steps.cache-openvino.outputs.cache-hit != 'true'
uses: ./.github/actions/linux-setup-openvino
with:
path: ./openvino_toolkit
version_major: ${{ env.OPENVINO_VERSION_MAJOR }}
version_full: ${{ env.OPENVINO_VERSION_FULL }}

windows-2022-rocm-cache:
runs-on: windows-2022

Expand Down
117 changes: 117 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -743,6 +743,83 @@ jobs:
-DGGML_SYCL_F16=ON
cmake --build build --config Release -j $(nproc)
ubuntu-24-cmake-openvino:
name: ubuntu-24-cmake-openvino-${{ matrix.openvino_device }}
strategy:
matrix:
include:
- variant: cpu
runner: '"ubuntu-24.04"'
openvino_device: "CPU"
- variant: gpu
runner: '["self-hosted","Linux","X64","Intel"]'
openvino_device: "GPU"

runs-on: ${{ fromJSON(matrix.runner) }}

Comment on lines +746 to +759
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add a second workflow that runs the ggml-ci set of tests.

Here are sample workflows that you can use as an example to create ggml-ci-intel-openvino-gpu:

ggml-ci-x64-cpu-low-perf:
runs-on: ubuntu-22.04
steps:
- name: Clone
id: checkout
uses: actions/checkout@v6
- name: ccache
uses: ggml-org/ccache-action@v1.2.16
with:
key: ggml-ci-x64-cpu-low-perf
evict-old-files: 1d
save: ${{ github.event_name == 'push' && github.ref == 'refs/heads/master' }}
- name: Dependencies
id: depends
run: |
sudo apt-get update
sudo apt-get install build-essential
- name: Test
id: ggml-ci
run: |
LLAMA_ARG_THREADS=$(nproc) GG_BUILD_LOW_PERF=1 bash ./ci/run.sh ./tmp/results ./tmp/mnt
ggml-ci-arm64-cpu-low-perf:
runs-on: ubuntu-22.04-arm
steps:
- name: Clone
id: checkout
uses: actions/checkout@v6
- name: ccache
uses: ggml-org/ccache-action@v1.2.16
with:
key: ggml-ci-arm64-cpu-low-perf
evict-old-files: 1d
save: ${{ github.event_name == 'push' && github.ref == 'refs/heads/master' }}
- name: Dependencies
id: depends
run: |
sudo apt-get update
sudo apt-get install build-essential
- name: Test
id: ggml-ci
run: |
LLAMA_ARG_THREADS=$(nproc) GG_BUILD_LOW_PERF=1 bash ./ci/run.sh ./tmp/results ./tmp/mnt
ggml-ci-x64-cpu-high-perf:
runs-on: ubuntu-22.04
steps:
- name: Clone
id: checkout
uses: actions/checkout@v6
- name: ccache
uses: ggml-org/ccache-action@v1.2.16
with:
key: ggml-ci-x64-cpu-high-perf
evict-old-files: 1d
save: ${{ github.event_name == 'push' && github.ref == 'refs/heads/master' }}
- name: Dependencies
id: depends
run: |
sudo apt-get update
sudo apt-get install build-essential
- name: Test
id: ggml-ci
run: |
LLAMA_ARG_THREADS=$(nproc) GG_BUILD_HIGH_PERF=1 bash ./ci/run.sh ./tmp/results ./tmp/mnt
ggml-ci-arm64-cpu-high-perf:
runs-on: ubuntu-22.04-arm
steps:
- name: Clone
id: checkout
uses: actions/checkout@v6
- name: ccache
uses: ggml-org/ccache-action@v1.2.16
with:
key: ggml-ci-arm64-cpu-high-perf
evict-old-files: 1d
save: ${{ github.event_name == 'push' && github.ref == 'refs/heads/master' }}
- name: Dependencies
id: depends
run: |
sudo apt-get update
sudo apt-get install build-essential
- name: Test
id: ggml-ci
run: |
LLAMA_ARG_THREADS=$(nproc) GG_BUILD_HIGH_PERF=1 GG_BUILD_NO_SVE=1 GG_BUILD_NO_BF16=1 GG_BUILD_EXTRA_TESTS_0=1 bash ./ci/run.sh ./tmp/results ./tmp/mnt
ggml-ci-arm64-cpu-high-perf-sve:
runs-on: ubuntu-22.04-arm
steps:
- name: Clone
id: checkout
uses: actions/checkout@v6
- name: ccache
uses: ggml-org/ccache-action@v1.2.16
with:
key: ggml-ci-arm64-cpu-high-perf-sve
evict-old-files: 1d
save: ${{ github.event_name == 'push' && github.ref == 'refs/heads/master' }}
- name: Dependencies
id: depends
run: |
sudo apt-get update
sudo apt-get install build-essential
- name: Test
id: ggml-ci
run: |
LLAMA_ARG_THREADS=$(nproc) GG_BUILD_NO_BF16=1 GG_BUILD_EXTRA_TESTS_0=1 bash ./ci/run.sh ./tmp/results ./tmp/mnt
ggml-ci-x64-nvidia-cuda:
runs-on: [self-hosted, Linux, X64, NVIDIA]
steps:
- name: Clone
id: checkout
uses: actions/checkout@v6
- name: Test
id: ggml-ci
run: |
nvidia-smi
GG_BUILD_CUDA=1 bash ./ci/run.sh ~/results/llama.cpp /mnt/llama.cpp
ggml-ci-x64-nvidia-vulkan-cm:
runs-on: [self-hosted, Linux, X64, NVIDIA]
steps:
- name: Clone
id: checkout
uses: actions/checkout@v6
- name: Test
id: ggml-ci
run: |
vulkaninfo --summary
GG_BUILD_VULKAN=1 GGML_VK_DISABLE_COOPMAT2=1 bash ./ci/run.sh ~/results/llama.cpp /mnt/llama.cpp
ggml-ci-x64-nvidia-vulkan-cm2:
runs-on: [self-hosted, Linux, X64, NVIDIA, COOPMAT2]
steps:
- name: Clone
id: checkout
uses: actions/checkout@v6
- name: Test
id: ggml-ci
run: |
vulkaninfo --summary
GG_BUILD_VULKAN=1 bash ./ci/run.sh ~/results/llama.cpp /mnt/llama.cpp
ggml-ci-x64-cpu-amx:
runs-on: [self-hosted, Linux, X64, CPU, AMX]
steps:
- name: Clone
id: checkout
uses: actions/checkout@v6
- name: Test
id: ggml-ci
run: |
bash ./ci/run.sh ~/results/llama.cpp /mnt/llama.cpp
# ggml-ci-x64-amd-vulkan:
# runs-on: [self-hosted, Linux, X64, AMD]
# steps:
# - name: Clone
# id: checkout
# uses: actions/checkout@v6
# - name: Test
# id: ggml-ci
# run: |
# vulkaninfo --summary
# GG_BUILD_VULKAN=1 bash ./ci/run.sh ~/results/llama.cpp /mnt/llama.cpp
# ggml-ci-x64-amd-rocm:
# runs-on: [self-hosted, Linux, X64, AMD]
# steps:
# - name: Clone
# id: checkout
# uses: actions/checkout@v6
# - name: Test
# id: ggml-ci
# run: |
# amd-smi static
# GG_BUILD_ROCM=1 GG_BUILD_AMDGPU_TARGETS="gfx1101" bash ./ci/run.sh ~/results/llama.cpp /mnt/llama.cpp
ggml-ci-mac-metal:
runs-on: [self-hosted, macOS, ARM64]
steps:
- name: Clone
id: checkout
uses: actions/checkout@v6
- name: Test
id: ggml-ci
run: |
GG_BUILD_METAL=1 bash ./ci/run.sh ~/results/llama.cpp ~/mnt/llama.cpp
ggml-ci-mac-webgpu:
runs-on: [self-hosted, macOS, ARM64]
steps:
- name: Clone
id: checkout
uses: actions/checkout@v6
- name: Dawn Dependency
id: dawn-depends
run: |
DAWN_VERSION="v2.0.0"
DAWN_OWNER="reeselevine"
DAWN_REPO="dawn"
DAWN_ASSET_NAME="Dawn-5e9a4865b1635796ccc77dd30057f2b4002a1355-macos-latest-Release"
echo "Fetching release asset from https://github.com/${DAWN_OWNER}/${DAWN_REPO}/releases/download/${DAWN_VERSION}/${DAWN_ASSET_NAME}.zip"
curl -L -o artifact.zip \
"https://github.com/${DAWN_OWNER}/${DAWN_REPO}/releases/download/${DAWN_VERSION}/${DAWN_ASSET_NAME}.zip"
mkdir dawn
unzip artifact.zip
tar -xvf ${DAWN_ASSET_NAME}.tar.gz -C dawn --strip-components=1
- name: Test
id: ggml-ci
run: |
GG_BUILD_WEBGPU=1 GG_BUILD_WEBGPU_DAWN_PREFIX="$GITHUB_WORKSPACE/dawn" \
bash ./ci/run.sh ~/results/llama.cpp ~/mnt/llama.cpp
ggml-ci-mac-vulkan:
runs-on: [self-hosted, macOS, ARM64]
steps:
- name: Clone
id: checkout
uses: actions/checkout@v6
- name: Test
id: ggml-ci
run: |
vulkaninfo --summary
GG_BUILD_VULKAN=1 bash ./ci/run.sh ~/results/llama.cpp ~/mnt/llama.cpp
ggml-ci-arm64-cpu-kleidiai:
runs-on: ubuntu-22.04-arm
steps:
- name: Clone
id: checkout
uses: actions/checkout@v6
- name: ccache
uses: ggml-org/ccache-action@v1.2.16
with:
key: ggml-ci-arm64-cpu-kleidiai
evict-old-files: 1d
save: ${{ github.event_name == 'push' && github.ref == 'refs/heads/master' }}
- name: Dependencies
id: depends
run: |
sudo apt-get update
sudo apt-get install -y build-essential
- name: Test
id: ggml-ci
run: |
GG_BUILD_KLEIDIAI=1 GG_BUILD_EXTRA_TESTS_0=1 bash ./ci/run.sh ./tmp/results ./tmp/mnt

Basically, the workflow needs to call GG_BUILD_OPENVINO=1 bash ./ci/run.sh with appropriate arguments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added ggml-ci-x64-intel-openvino-gpu

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are currently working to fix it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ggerganov, We added ggml-ci-x64-intel-openvino-gpu-low-perf. We are currently working on supporting embedding models and other quantization formats, so until then, we can run the ggml-ci with GG_BUILD_LOW_PERF=1.

env:
# Sync versions in build.yml, release.yml, build-cache.yml, .devops/openvino.Dockerfile
OPENVINO_VERSION_MAJOR: "2026.0"
OPENVINO_VERSION_FULL: "2026.0.0.20965.c6d6a13a886"

steps:
- name: Clone
id: checkout
uses: actions/checkout@v6

- name: ccache
uses: ggml-org/ccache-action@v1.2.16
with:
key: ubuntu-24-cmake-openvino-${{ matrix.variant }}-no-preset-v1
evict-old-files: 1d

- name: Dependencies
id: depends
run: |
sudo apt-get update
sudo apt-get install -y build-essential libssl-dev libtbb12 cmake ninja-build python3-pip
sudo apt-get install -y ocl-icd-opencl-dev opencl-headers opencl-clhpp-headers intel-opencl-icd
- name: Use OpenVINO Toolkit Cache
uses: actions/cache@v5
id: cache-openvino
with:
path: ./openvino_toolkit
key: openvino-toolkit-v${{ env.OPENVINO_VERSION_FULL }}-${{ runner.os }}

- name: Setup OpenVINO Toolkit
if: steps.cache-openvino.outputs.cache-hit != 'true'
uses: ./.github/actions/linux-setup-openvino
with:
path: ./openvino_toolkit
version_major: ${{ env.OPENVINO_VERSION_MAJOR }}
version_full: ${{ env.OPENVINO_VERSION_FULL }}

- name: Install OpenVINO dependencies
run: |
cd ./openvino_toolkit
chmod +x ./install_dependencies/install_openvino_dependencies.sh
echo "Y" | sudo -E ./install_dependencies/install_openvino_dependencies.sh
- name: Build
id: cmake_build
run: |
source ./openvino_toolkit/setupvars.sh
cmake -B build/ReleaseOV -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_OPENVINO=ON
cmake --build build/ReleaseOV --config Release -j $(nproc)
- name: Test
id: cmake_test
# TODO: fix and re-enable the `test-llama-archs` test below
run: |
cd ${{ github.workspace }}
if [ "${{ matrix.openvino_device }}" = "GPU" ]; then
export GGML_OPENVINO_DEVICE=GPU
fi
ctest --test-dir build/ReleaseOV -L main -E "test-llama-archs" --verbose --timeout 2000
build-linux-cross:
uses: ./.github/workflows/build-linux-cross.yml

Expand Down Expand Up @@ -1752,6 +1829,46 @@ jobs:
run: |
GG_BUILD_KLEIDIAI=1 GG_BUILD_EXTRA_TESTS_0=1 bash ./ci/run.sh ./tmp/results ./tmp/mnt
ggml-ci-x64-intel-openvino-gpu-low-perf:
runs-on: [self-hosted, Linux, X64, Intel, OpenVINO]

env:
# Sync versions in build.yml, release.yml, build-cache.yml, .devops/openvino.Dockerfile
OPENVINO_VERSION_MAJOR: "2026.0"
OPENVINO_VERSION_FULL: "2026.0.0.20965.c6d6a13a886"

steps:
- name: Clone
id: checkout
uses: actions/checkout@v6

- name: Use OpenVINO Toolkit Cache
uses: actions/cache@v5
id: cache-openvino
with:
path: ./openvino_toolkit
key: openvino-toolkit-v${{ env.OPENVINO_VERSION_FULL }}-${{ runner.os }}

- name: Setup OpenVINO Toolkit
if: steps.cache-openvino.outputs.cache-hit != 'true'
uses: ./.github/actions/linux-setup-openvino
with:
path: ./openvino_toolkit
version_major: ${{ env.OPENVINO_VERSION_MAJOR }}
version_full: ${{ env.OPENVINO_VERSION_FULL }}

- name: Install OpenVINO dependencies
run: |
cd ./openvino_toolkit
chmod +x ./install_dependencies/install_openvino_dependencies.sh
echo "Y" | sudo -E ./install_dependencies/install_openvino_dependencies.sh
- name: Test
id: ggml-ci
run: |
source ./openvino_toolkit/setupvars.sh
GG_BUILD_OPENVINO=1 GGML_OPENVINO_DEVICE=GPU GG_BUILD_LOW_PERF=1 bash ./ci/run.sh ./tmp/results ./tmp/mnt
ubuntu-cpu-cmake-riscv64-native:
runs-on: RISCV64

Expand Down
1 change: 1 addition & 0 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ jobs:
- { tag: "vulkan", dockerfile: ".devops/vulkan.Dockerfile", platforms: "linux/amd64", full: true, light: true, server: true, free_disk_space: false, runs_on: "ubuntu-22.04" }
- { tag: "s390x", dockerfile: ".devops/s390x.Dockerfile", platforms: "linux/s390x", full: true, light: true, server: true, free_disk_space: false, runs_on: "ubuntu-22.04-s390x" }
- { tag: "rocm", dockerfile: ".devops/rocm.Dockerfile", platforms: "linux/amd64", full: true, light: true, server: true, free_disk_space: true, runs_on: "ubuntu-22.04" }
- { tag: "openvino", dockerfile: ".devops/openvino.Dockerfile", platforms: "linux/amd64", full: true, light: true, server: true, free_disk_space: false, runs_on: "ubuntu-22.04" }
steps:
- name: Check out the repo
uses: actions/checkout@v6
Expand Down
Loading
Loading