-
Notifications
You must be signed in to change notification settings - Fork 15.6k
Add OpenVINO backend #15307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+8,344
−3
Merged
Add OpenVINO backend #15307
Changes from all commits
Commits
Show all changes
320 commits
Select commit
Hold shift + click to select a range
fd32436
Update build doc
wine99 8ce5cc5
Add cgraph tensor output name to OV op name
wine99 3051d5a
Update openvino build instructions
ravi9 7fec223
Add initial NPU support
wine99 34531ab
draft NPU support version 2: prefill + kvcache
wine99 d9ca8f5
NPU support version 2: prefill + kvcache
wine99 f7ad779
Change due to ggml cgraph changes, not correct yet
wine99 592d7f8
Change due to ggml cgraph changes, llama-3.2 CPU work
wine99 e27738a
Add AMD64 to CMakeLists
wine99 42d4240
Change due to ggml cgraph changes, all device work
wine99 593484c
Refactor: clean, fix warning
wine99 8afee79
Update clang-format
wine99 4c582ac
Statful transformation for CPU GPU
wine99 73ee84f
Add SwiGLU
wine99 ebc4fc9
Fuse to SDPA
wine99 bf5414c
Replace Concat with Broadcast in MulMat for GQA
wine99 acf358d
Pull out indices creation for kv cache update
wine99 0fa7a5e
Refactor: remove past_token_len from extra_inputs
wine99 3533c14
Fix Phi3 SwiGLU and SoftMax
wine99 a80da69
Pull out sin cos from rope
wine99 f3c0519
Reduce memory: free ov weights node after graph conversion
wine99 d61f83c
Fix CPY due to cgraph change
wine99 ea75772
Added OpenVINO CI/CD. Updated docs
ravi9 1ed49bb
Fix llama-cli
wine99 44f4cf3
Fix Phi3 ROPE; Add test-backend-ops
wine99 6dc4b90
Fix NPU
wine99 75eec62
Fix llama-bench; Clang-format
wine99 4e7f04a
Fix llama-perplexity
wine99 9cf56d6
temp. changes for mark decomp
cavusmustafa 01cdf4a
matmul in fp32
wine99 e2fdc1b
mulmat input conversion fix
cavusmustafa 93b2d09
mulmat type conversion update
cavusmustafa 1a19566
add mark decomp pass
cavusmustafa 43489bb
Revert changes in fuse_to_sdpa
wine99 2f99135
Update build.md
ravi9 fc86534
Fix test-backend-ops
wine99 1141350
Skip test-thread-safety; Run ctest only in ci/run.sh
wine99 37ff226
Use CiD for NPU
wine99 9a91ca6
Optimize tensor conversion, improve TTFT
wine99 63d000b
Support op SET_ROWS
wine99 7bda502
Fix NPU
wine99 839f8c6
Remove CPY
wine99 f4123be
Fix test-backend-ops
wine99 a7b611b
Minor updates for raising PR
wine99 14c8a85
Perf: RMS fused to OV internal RMS op
wine99 65e1b1a
Fix after rebasing
wine99 56d5967
Change openvino device_type to GPU; Enable flash_attn
wine99 3e897df
Update supports_buft and supports_op for quantized models
wine99 d4ca760
Add quant weight conversion functions from genai gguf reader
wine99 663a0b8
Quant models run with accuracy issue
wine99 6ab76ed
Fix accuracy: disable cpu_repack
wine99 dd80b04
Fix CI; Disable test-backend-ops
wine99 a1ce428
Fix Q4_1
wine99 9900245
Fix test-backend-ops: Treat quantized tensors as weights
wine99 9ca53c7
Add NPU Q4_0 support
wine99 82c9833
NPU perf: eliminate zp
wine99 b593428
Dequantize q4_1 q4_k q6_k for NPU
wine99 6926655
Add custom quant type: q8_1_c, q4_0_128
wine99 c5231a2
Set m_is_static=false as default in decoder
wine99 810eb48
Simpilfy translation of get_rows
wine99 0f7b253
Fix after rebasing
wine99 2ad1147
Improve debug util; Eliminate nop ReshapeReshape
wine99 dc77cbb
STYLE: make get_types_to_requant a function
wine99 bcc343a
Support BF16 model
wine99 434059a
Fix NPU compile
wine99 da2cc99
WA for npu 1st token acc issue
wine99 be07073
Apply EliminateZP only for npu
wine99 5975612
Add GeGLU
wine99 7d81861
Fix Hunyuan
wine99 9de874c
Support iSWA
wine99 602f9ca
Fix NPU accuracy
wine99 1a38339
Fix ROPE accuracy when freq_scale != 1
wine99 67e178a
Minor: not add attention_size_swa for non-swa model
wine99 2f1d50f
Minor refactor
wine99 e4bfe5a
Add Q5_K to support phi-3-q4_k_m
wine99 f3afa7b
Requantize Q6_K (gs16) to gs32 on GPU
wine99 fdadca1
Fix after rebasing
wine99 973a80f
Always apply Eliminate_ZP to fix GPU compile issue on some platforms
wine99 c112bc4
kvcachefusion support
cavusmustafa e725292
env variable GGML_OPENVINO_DISABLE_SDPA_OPTIMIZATION added
cavusmustafa 05d7aba
Fix for Phi3
cavusmustafa a9371ea
Fix llama-cli (need to run with --no-warmup)
wine99 8b82d11
Fix add_sliced_mask; Revert mulmat, softmax; Remove input attention_s…
wine99 299f492
fix after rebasing
wine99 2d2f00a
Fix llama-3-8b and phi3-mini q4_0 NPU
wine99 841d673
Update to OV-2025.3 and CMakeLists.txt
ravi9 4c8406e
Add OV CI cache
wine99 38e8a19
Apply CISC review and update CI to OV2025.3
ravi9 45af912
Update CI to run OV dep install before build
ravi9 3a1129e
Update OV dockerfile to use OV2025.3 and update build docs
ravi9 bd3093f
Style: use switch in supports_ops
wine99 eba8113
Style: middle ptr and ref align, omit optional struct keyword
wine99 b8690bc
NPU Unify PD (#14)
wine99 303923a
Clean placeholders in ggml-openvino.cpp
wine99 ea2c99b
NPU unify PD (handled internally)
wine99 072dde0
change graph to 4d, support multi sequences
wine99 ae404f7
Fix llama-bench
wine99 531941b
Fix NPU
wine99 047bfb5
Update ggml-decoder.cpp
I-N-T-E-L 11b4cc5
Update ggml-decoder.cpp
I-N-T-E-L bed4952
Update ggml-decoder.cpp
I-N-T-E-L 4a57b37
Update ggml-decoder.cpp
I-N-T-E-L 98396b2
Update ggml-decoder.cpp
I-N-T-E-L 4400b5c
Update ggml-decoder.cpp
I-N-T-E-L ae93651
Remove the second decoder for node. Moving the function into the mode…
zhaixuejun1993 992dea7
Fix error for naive
zhaixuejun1993 38254cf
NPU prefill chunking
wine99 59e7e7c
NPU fix llama-bench
wine99 65348b5
fallback naive run with accuracy issue
wine99 808619e
NPU support llma-perplexity -b 512 --no-warmup
wine99 2a9d4ca
Refactor: split ov_graph_compute for dynamic and static
wine99 0ea8238
remove unused API GgmlOvDecoder::get_output_stride(const std::string …
zhaixuejun1993 8f4ee4e
minor update due to ov 2025.4
wine99 497964a
remove unused API GgmlOvDecoder::get_output_names()
zhaixuejun1993 f516db1
remove unused API get_output_shape(const std::string & name)
zhaixuejun1993 6d7a0d6
Modified API GgmlOvDecoder::get_output_type(const std::string & name)
zhaixuejun1993 ba852f2
Removed API GgmlOvDecoder::get_output_op_params(const std::string & n…
zhaixuejun1993 111c96c
Removed API get_output_ggml_tensor(const std::string & name)
zhaixuejun1993 8ff73e5
Removed API m_outputs
zhaixuejun1993 197ed99
Removed m_output_names
zhaixuejun1993 95c3071
Removed API GgmlOvDecoder::get_input_names()
zhaixuejun1993 cd61178
Removed API GgmlOvDecoder::get_input_stride(const std::string& name)
zhaixuejun1993 891a3be
Removed API get_input_type
zhaixuejun1993 42ca27f
Removed API get_input_type
zhaixuejun1993 acb8a01
Removed API GgmlOvDecoder::get_input_shape(const std::string & name)
zhaixuejun1993 47c91db
Removed API GgmlOvDecoder::get_input_op_params(const std::string & name)
zhaixuejun1993 91a1b20
Fix error for decoder cache
zhaixuejun1993 28da9a9
Reuse cached decoder
wine99 469325c
GPU remove Q6_K requantization
wine99 ae01322
NPU fix wrong model output shape
wine99 c9234b4
NPU fix q4 perf regression
wine99 9e3163e
Remove unused variable nodes
zhaixuejun1993 0ef2e5e
Fix decoder can_reuse for llama-bench
wine99 ae53363
Update build.md for Windows
I-N-T-E-L 22d9c17
backend buffer: allocate on host
wine99 72bba82
Use shared_buffer for GPU NPU; Refactor
wine99 3fdcb6a
Add ov_backend_host_buffer; Use cached remote context
wine99 d757849
Put kvcache on GPU
wine99 8273a7c
Use ggml_aligned_malloc
wine99 88d1d17
only use remote tensor for kvcache
wine99 a356b44
only use remote tensor for kvcache for GPU
wine99 cfc4713
FIX: use remote tensor from singleton
wine99 52a4401
Update build.md to include OpenCL
wine99 c1142dd
NPU always requant to q4_0_128
wine99 67c9720
Optimize symmetric quant weight extraction: use single zp
wine99 4e45177
Use Q8_0_C in token embd, lm_head, and for 5 and 6 bits quant
wine99 f5c71e3
Update build.md
wine99 0d6f253
Support -ctk f32
wine99 5f30eac
Initial stateful graph support
cavusmustafa d2fc152
Update ggml/src/ggml-openvino/ggml-decoder.cpp
cavusmustafa 981ec65
code cleanup
cavusmustafa a40a5df
npu perf fix
cavusmustafa a81b202
requant to f16 for Q6 embed on NPU
cavusmustafa a92ecee
Update ggml/src/ggml-openvino/ggml-decoder.cpp
cavusmustafa 599335c
Update ggml/src/ggml-openvino/ggml-openvino-extra.cpp
cavusmustafa 416556a
Create OPENVINO.md in llama.cpp backend docs
ynimmaga 25e6525
Update OPENVINO.md
ynimmaga 9ba3247
Update OPENVINO.md
ynimmaga 61552e4
Update OPENVINO.md
ynimmaga 63eed0d
Update build.md
ynimmaga f44c60e
Update OPENVINO.md
ynimmaga e9ed5c4
Update OPENVINO.md
ynimmaga d3649c1
Update OPENVINO.md
ynimmaga d7dccf8
kq_mask naming fix
cavusmustafa aa4bc90
Syntax correction for workflows build file
cavusmustafa 9a15c8b
Change ov backend buffer is_host to false
wine99 8fb20b2
Fix llama-bench -p -n where p<=256
wine99 1c0a47a
Fix --direct-io 0
wine99 c840210
Don't put kvcache on GPU in stateful mode
wine99 d398214
Remove hardcode names
wine99 26328fe
Fix stateful shapes
wine99 3259921
Simplification for stateful and update output shape processing
cavusmustafa 18ab0f5
Remove hardcode names
wine99 b6c0697
Avoid re-compilation in llama-bench
wine99 0ee7e05
Extract zp directly instead of bias
wine99 900dd76
Refactor weight tensor processing
wine99 7b3b65b
Merge branch 'master' into dev_backend_openvino
wine99 1d4ec1b
create_weight_node accept non-ov backend buffer
wine99 e059015
remove changes in llama-graph.cpp
wine99 0d74aba
stateful masking fix (#38)
cavusmustafa d5d673c
Fix test-backend-ops crash glu, get_rows, scale, rms_norm, add
wine99 59e7d73
hardcoded name handling for rope_freqs.weight
cavusmustafa 1a54965
Suppress logging and add error handling to allow test-backend-ops to …
wine99 2a6a95e
Fix MUL_MAT with broadcast; Add unsupported MUL_MAT FLASH_ATTN cases
wine99 5525bac
Use bias instead of zp in test-backend-ops
wine99 76775a5
Merge pull request #43 from cavusmustafa/additional_fixes_after_rebase
cavusmustafa 4c1fdd3
Update OV in CI, Add OV CI Tests in GH Actions
ravi9 ae8a140
Temp fix for multithreading bug
cavusmustafa 20ecf4b
Update OV CI, fix review suggestions.
ravi9 cb92f77
Merge pull request #45 from cavusmustafa/tmp_fix_multithread
ravi9 a8e894d
fix editorconfig-checker, update docs
ravi9 19e4f31
Fix tabs to spaces for editorconfig-checker
ravi9 c6ee7c5
fix editorconfig-checker
ravi9 7d4d311
Update docs
ravi9 ed91be2
updated model link to be GGUF model links
cavusmustafa 016aa26
Remove GGML_CPU_REPACK=OFF
wine99 21b796b
Merge branch 'master' into dev_backend_openvino
wine99 214838e
Skip permuted ADD and MUL
wine99 40d2bb2
Removed static variables from utils.cpp
cavusmustafa 41179c0
Removed initializing non-existing variable
cavusmustafa 252ef84
Remove unused structs
wine99 56e89f8
Merge pull request #1 from wine99/remove_static_variables
cavusmustafa 240692b
Removed static variables from utils.cpp
wine99 046669e
Fix test-backend-ops for OV GPU
wine99 18f0ad7
unify api calling
zhaixuejun1993 2e025bf
Update utils.cpp
zhaixuejun1993 8fae1b9
When the dim is dynamic, throw an error, need to is stastic forst
zhaixuejun1993 fe6a7ed
Add interface compute_model_outputs(), which get the model output thr…
zhaixuejun1993 603e6d2
No need to return
zhaixuejun1993 43ca96a
Merge branch 'master' into dev_backend_openvino
wine99 d25211e
Fix test-backend-ops for OV GPU LNL
wine99 5f0a68c
Fix test-thread-safety
wine99 0b32b7f
use the shape from infer request of output tensor create to avoid issue
183f36f
fix dynamic output shape issue
bc87902
Merge pull request #49 from zhaixuejun1993/xuejun/unify-api-get_ov_ou…
wine99 198e932
fix issue for the unused node in tests
f274f63
Rewrite the logistic about model outputs computer, add new API comput…
wine99 e8dc98b
Remove unused lock
wine99 6e9dc50
Merge branch 'dev_backend_openvino' into fix-thread-safety
wine99 03d03b8
Fix test-thread-safety
wine99 cfb395d
Add comment
wine99 415c9b3
Fix test-backend-ops for OV GPU LNL
wine99 ba61424
Merge pull request #46 from cavusmustafa/fix-readme-model-links
ravi9 aef9e62
Update openvino docs
ravi9 9d3d2c4
update to OV release version 2026.0
ravi9 36ce914
add ci ov-gpu self hosted runner
ravi9 091c58f
fix editorconfig
ravi9 a613e8b
Fix perplexity
wine99 82051a9
Rewrite the model inputs finding mechanism (#54)
zhaixuejun1993 db97626
Put the iteration logistic in func
zhaixuejun1993 42a1cb5
Added ggml-ci-intel-openvino-gpu and doc update
ravi9 6d1f94d
.hpp files converted to .h
cavusmustafa c29ccc4
Merge pull request #57 from cavusmustafa/hpp_to_h
cavusmustafa f71cc59
fix ggml-ci-x64-intel-openvino-gpu
ravi9 eae534e
Fix for stateful execution bug in llama-bench
cavusmustafa c2c4211
Minor updates after stateful llama-bench fix
cavusmustafa 7b93c50
Update ggml/src/ggml-openvino/utils.cpp
cavusmustafa 29c217a
Remove multiple get_shape calls
cavusmustafa 8616f12
Bring back mutex into compute
cavusmustafa 0480c2c
Fix VIEW op, which slice the input node
zhaixuejun1993 f5304c6
Added token_len_per_seq existence check before slicing masks and move…
zhaixuejun1993 481d938
Merge pull request #60 from zhaixuejun1993/xuejun/hot-fix-llama-embed…
cavusmustafa e646c85
Temp. fix for test requant errors
cavusmustafa 1cf0716
Merge pull request #62 from zhaixuejun1993/xuejun/fix_issue_key_miss
cavusmustafa 409cc8e
Merge pull request #58 from cavusmustafa/fix_stateful_state_sync
cavusmustafa bb40ee8
Update to OV ggml-ci to low-perf
ravi9 0aaf8ab
ci : temporary disable "test-llama-archs"
ggerganov e73b4d4
ci : cache v4 -> v5, checkout v4 -> v6, fix runner tag
ggerganov 5237965
docs : update url
ggerganov 996b739
Fix OV link in docker and Update docs
ravi9 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,138 @@ | ||
| ARG OPENVINO_VERSION_MAJOR=2026.0 | ||
| ARG OPENVINO_VERSION_FULL=2026.0.0.20965.c6d6a13a886 | ||
| ARG UBUNTU_VERSION=24.04 | ||
|
|
||
| # Optional proxy build arguments - empty by default | ||
| ARG http_proxy= | ||
| ARG https_proxy= | ||
|
|
||
| ## Build Image | ||
| FROM ubuntu:${UBUNTU_VERSION} AS build | ||
|
|
||
| # Pass proxy args to build stage | ||
| ARG http_proxy | ||
| ARG https_proxy | ||
|
|
||
| RUN apt-get update && \ | ||
| apt-get install -y --no-install-recommends \ | ||
| ca-certificates \ | ||
| gnupg \ | ||
| wget \ | ||
| git \ | ||
| cmake \ | ||
| ninja-build \ | ||
| build-essential \ | ||
| libtbb12 \ | ||
| libssl-dev \ | ||
| ocl-icd-opencl-dev \ | ||
| opencl-headers \ | ||
| opencl-clhpp-headers \ | ||
| intel-opencl-icd && \ | ||
| rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # Install OpenVINO for Ubuntu 24.04 | ||
| ARG OPENVINO_VERSION_MAJOR | ||
| ARG OPENVINO_VERSION_FULL | ||
| RUN mkdir -p /opt/intel && \ | ||
| wget https://storage.openvinotoolkit.org/repositories/openvino/packages/${OPENVINO_VERSION_MAJOR}/linux/openvino_toolkit_ubuntu24_${OPENVINO_VERSION_FULL}_x86_64.tgz && \ | ||
| tar -xf openvino_toolkit_ubuntu24_${OPENVINO_VERSION_FULL}_x86_64.tgz && \ | ||
| mv openvino_toolkit_ubuntu24_${OPENVINO_VERSION_FULL}_x86_64 /opt/intel/openvino_${OPENVINO_VERSION_MAJOR} && \ | ||
| cd /opt/intel/openvino_${OPENVINO_VERSION_MAJOR} && \ | ||
| echo "Y" | ./install_dependencies/install_openvino_dependencies.sh && \ | ||
| cd - && \ | ||
| ln -s /opt/intel/openvino_${OPENVINO_VERSION_MAJOR} /opt/intel/openvino | ||
|
|
||
| ENV OpenVINO_DIR=/opt/intel/openvino | ||
|
|
||
| WORKDIR /app | ||
|
|
||
| COPY . . | ||
|
|
||
| # Build Stage | ||
| RUN bash -c "source ${OpenVINO_DIR}/setupvars.sh && \ | ||
| cmake -B build/ReleaseOV -G Ninja \ | ||
| -DCMAKE_BUILD_TYPE=Release \ | ||
| -DGGML_OPENVINO=ON && \ | ||
| cmake --build build/ReleaseOV -j$(nproc)" | ||
|
|
||
| # Copy all necessary libraries | ||
| RUN mkdir -p /app/lib && \ | ||
| find build/ReleaseOV -name '*.so*' -exec cp {} /app/lib \; && \ | ||
| find ${OpenVINO_DIR}/runtime/lib/intel64 -name '*.so*' -exec cp -P {} /app/lib \; 2>/dev/null || \ | ||
| find ${OpenVINO_DIR}/lib/intel64 -name '*.so*' -exec cp -P {} /app/lib \; | ||
|
|
||
| # Create runtime directories and copy binaries | ||
| RUN mkdir -p /app/full \ | ||
| && cp build/ReleaseOV/bin/* /app/full/ \ | ||
| && cp *.py /app/full \ | ||
| && cp -r gguf-py /app/full \ | ||
| && cp -r requirements /app/full \ | ||
| && cp requirements.txt /app/full \ | ||
| && cp .devops/tools.sh /app/full/tools.sh | ||
|
|
||
| ## Base Runtime Image | ||
| FROM ubuntu:${UBUNTU_VERSION} AS base | ||
|
|
||
| # Pass proxy args to runtime stage | ||
| ARG http_proxy | ||
| ARG https_proxy | ||
|
|
||
| RUN apt-get update \ | ||
| && apt-get install -y libgomp1 libtbb12 curl\ | ||
| && apt autoremove -y \ | ||
| && apt clean -y \ | ||
| && rm -rf /tmp/* /var/tmp/* \ | ||
| && find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \ | ||
| && find /var/cache -type f -delete | ||
|
|
||
| COPY --from=build /app/lib/ /app/ | ||
|
|
||
| ### Full (all binaries) | ||
| FROM base AS full | ||
|
|
||
| ARG http_proxy | ||
| ARG https_proxy | ||
|
|
||
| COPY --from=build /app/full /app/ | ||
|
|
||
| WORKDIR /app | ||
|
|
||
| RUN apt-get update && \ | ||
| apt-get install -y --no-install-recommends \ | ||
| git \ | ||
| python3 \ | ||
| python3-venv \ | ||
| python3-pip && \ | ||
| python3 -m venv /ov-venv && \ | ||
| /ov-venv/bin/pip install --no-cache-dir --upgrade pip setuptools wheel && \ | ||
| /ov-venv/bin/pip install --no-cache-dir -r requirements.txt && \ | ||
| apt-get autoremove -y && \ | ||
| apt-get clean && \ | ||
| rm -rf /tmp/* /var/tmp/* && \ | ||
| find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete && \ | ||
| find /var/cache -type f -delete | ||
|
|
||
| ENTRYPOINT ["/bin/bash", "-c", "source /ov-venv/bin/activate && exec /app/tools.sh \"$@\"", "--"] | ||
|
|
||
|
|
||
| ### Light, CLI only | ||
| FROM base AS light | ||
|
|
||
| COPY --from=build /app/full/llama-cli /app/ | ||
|
|
||
| WORKDIR /app | ||
|
|
||
| ENTRYPOINT [ "/app/llama-cli" ] | ||
|
|
||
| ### Server, Server only | ||
| FROM base AS server | ||
|
|
||
| ENV LLAMA_ARG_HOST=0.0.0.0 | ||
|
|
||
| COPY --from=build /app/full/llama-server /app/ | ||
|
|
||
| WORKDIR /app | ||
|
|
||
| HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ] | ||
|
|
||
| ENTRYPOINT [ "/app/llama-server" ] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| name: "Linux - Setup OpenVINO Toolkit" | ||
| description: "Setup OpenVINO Toolkit for Linux" | ||
| inputs: | ||
| path: | ||
| description: "Installation path" | ||
| required: true | ||
| version_major: | ||
| description: "OpenVINO major version (e.g., 2025.3)" | ||
| required: true | ||
| version_full: | ||
| description: "OpenVINO full version (e.g., 2025.3.0.19807.44526285f24)" | ||
| required: true | ||
|
|
||
| runs: | ||
| using: "composite" | ||
| steps: | ||
| - name: Setup OpenVINO Toolkit | ||
| id: setup | ||
| uses: ./.github/actions/unarchive-tar | ||
| with: | ||
| url: https://storage.openvinotoolkit.org/repositories/openvino/packages/${{ inputs.version_major }}/linux/openvino_toolkit_ubuntu24_${{ inputs.version_full }}_x86_64.tgz | ||
| path: ${{ inputs.path }} | ||
| type: z | ||
| strip: 1 | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add a second workflow that runs the ggml-ci set of tests.
Here are sample workflows that you can use as an example to create
ggml-ci-intel-openvino-gpu:llama.cpp/.github/workflows/build.yml
Lines 1537 to 1830 in db97626
Basically, the workflow needs to call
GG_BUILD_OPENVINO=1 bash ./ci/run.shwith appropriate arguments.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added ggml-ci-x64-intel-openvino-gpu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the
llama-embeddingtest is failing:https://github.com/ggml-org/llama.cpp/actions/runs/22787048288/job/66120793212?pr=15307#step:6:3479
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are currently working to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ggerganov, We added ggml-ci-x64-intel-openvino-gpu-low-perf. We are currently working on supporting embedding models and other quantization formats, so until then, we can run the ggml-ci with GG_BUILD_LOW_PERF=1.