Add Dockerfile for ARM64 architecture support and update README instructions by z4y4ts · Pull Request #827 · huggingface/text-embeddings-inference

z4y4ts · 2026-02-13T16:20:27Z

What does this PR do?

This will make it easier to use the TEI on Apple Silicone machines.

The new Dockerfile-arm64 is based on the Dockerfile. The differences are:

Stripped down MKL support, as it's Intel-only
Use arm64-compatible sccache and protobuf

Fixes #611

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines.
Did you write any new necessary tests? If applicable, did you include or update the insta snapshots?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Copilot

Pull request overview

This PR adds support for building Docker images on ARM64 architecture (Apple Silicon) to enable local development on Apple M-series machines. The PR addresses issue #611 where building the Docker image on Apple M4 chips failed due to Intel-specific dependencies not being available for ARM64.

Changes:

Added new Dockerfile-arm64 specifically designed for ARM64 architecture without Intel MKL dependencies
Updated README instructions to reference the new ARM64-specific Dockerfile

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
Dockerfile-arm64	New Dockerfile for ARM64 builds, removing Intel MKL support and using ARM64-compatible sccache and protobuf binaries
README.md	Updated Docker build command to reference `Dockerfile-arm64` for ARM64 platforms

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-13T16:24:33Z


 ```
-docker build . -f Dockerfile --platform=linux/arm64
+docker build . -f Dockerfile-arm64 --platform=linux/arm64


The CI/CD workflow files (.github/workflows/build.yaml and .github/workflows/test.yaml) currently trigger on changes to "Dockerfile" but not "Dockerfile-arm64". This means that changes to the new ARM64 Dockerfile won't trigger automated builds or tests. Consider updating the workflow paths to include "Dockerfile-arm64" to ensure proper CI/CD coverage for ARM64 builds.

Copilot · 2026-02-13T16:24:33Z


 ```
-docker build . -f Dockerfile --platform=linux/arm64
+docker build . -f Dockerfile-arm64 --platform=linux/arm64


The matrix.json file used for automated builds doesn't include an entry for the ARM64 architecture. While this PR enables manual ARM64 builds for local development, consider adding an ARM64 entry to matrix.json (similar to the existing "cpu" entry but using "Dockerfile-arm64") to enable automated ARM64 Docker image builds and publishing in the CI/CD pipeline.

Copilot · 2026-02-13T16:24:33Z


 ```
-docker build . -f Dockerfile --platform=linux/arm64
+docker build . -f Dockerfile-arm64 --platform=linux/arm64


Consider adding documentation about the ARM64 Dockerfile in the docs/source/en/custom_container.md file, similar to how CPU and CUDA builds are documented. This would help users understand when and how to use Dockerfile-arm64, especially for Apple Silicon development environments.

alvarobartt · 2026-02-15T12:31:01Z

Hey @z4y4ts thanks for opening the PR! Given that Metal won't work over Docker, which is really the point on having a separate Dockerfile for it? What's really the benefit here, other than making it a bit lighter due to the lack of Intel MKL libs? I'm just not sure about having a separate Dockerfile for it, but happy to discuss further in case it's a real need here, thanks!

z4y4ts · 2026-02-16T13:12:54Z

Hey @alvarobartt !

The key use case is to be able to do a local development on an Apple Silicone machines.

We're deploying the service to a linux cluster. Local development is done with using https://tilt.dev/

The key need is to have dev setup as close to the production as possible. So the slower performance is an acceptable trade-off as the load on dev machine is expected to be low. As long as it provides the same features/models.

I've added a separate Dockerfile because I see I'm not the only one with the same need. So it seems to be beneficial for the community.

Also, the solution described in the README does not actually work. So it's not super obvious how to make it run and takes quite a few extra steps.

If we agree this change is actually valuable, I'd like to also add it to the matrix.json to simplify macos setup even further.

alvarobartt

Fair enough @z4y4ts thanks for the detailed information! I'll be happy to merge this PR and update the matrix.json to include it and build + push the arm64- image too (which I'll do in a follow up PR, as otherwise the CI won't run) 🤗

…uctions

z4y4ts · 2026-02-19T15:39:15Z

@alvarobartt thanks for approving!
Can you merge it? Or shall I do some extra steps?

Asking because there's no merge button for me.

- Make Dockerfile-cuda multi-arch by using TARGETARCH for sccache and protoc binary selection (supports amd64 and arm64) - Add sm_121 compute capability for NVIDIA DGX Spark GB10 - nvprune support in Dockerfile-cuda - compute_cap_matching: (120..=121, 120) and (121, 121) compatibility - flash_attn: allow runtime_compute_cap == 121 - Comprehensive test coverage for sm_121 matching Builds on PR huggingface#827 (ARM64 CPU Dockerfile) by extending CUDA support to ARM64 and adding the GB10's sm_121 compute capability.

- Make Dockerfile-cuda multi-arch by using TARGETARCH for sccache and protoc binary selection (supports amd64 and arm64) - Add sm_121 compute capability for NVIDIA DGX Spark GB10 - nvprune support in Dockerfile-cuda - compute_cap_matching: (120..=121, 120) and (121, 121) compatibility - flash_attn: allow runtime_compute_cap == 121 - Comprehensive test coverage for sm_121 matching - Update README: replace Apple-only ARM64 section with comprehensive aarch64 docs covering CPU-only and CUDA paths (DGX Spark, Jetson) - Add sm_121 to CUDA compute cap examples in README Builds on PR huggingface#827 (ARM64 CPU Dockerfile) by extending CUDA support to ARM64 and adding the GB10's sm_121 compute capability.

Dockerfile-cuda: - Multi-arch sccache and protoc via TARGETARCH (amd64 + arm64) - Add sm_121 to nvprune for DGX Spark GB10 compute_cap.rs: - (120..=121, 120) => true: sm_121 compatible with sm_120 binaries - (121, 121) => true: native sm_121 exact match - Comprehensive test coverage for sm_121 flash_attn.rs: - Allow runtime_compute_cap == 121 for flash attention v2 build.yaml: - Use matrix.platforms with fallback to linux/amd64 - Enables per-variant platform selection matrix.json: - Add blackwell-121 entry (linux/arm64, sm_121) - Add cpu-arm64 entry (linux/arm64, Dockerfile-arm64) README.md: - Add Platform column to Docker Images table - Add cpu-arm64 and 121 (DGX Spark) image entries - Replace Apple-only ARM64 section with comprehensive aarch64 docs covering CPU-only and CUDA paths - Add sm_121 to CUDA compute cap examples Builds on PR huggingface#827 (ARM64 CPU Dockerfile) by extending CUDA support to ARM64 and adding the GB10's sm_121 compute capability.

nazq · 2026-03-04T16:46:04Z

@z4y4ts Added this #840 as an extension to support CUDA on Spark DGX

alvarobartt · 2026-03-06T09:21:50Z

Hey @z4y4ts thanks again for your great work here, given that #840 is in place to I can either:

Close this in favour of feat: multi-arch CUDA Dockerfile and sm_121 (DGX Spark GB10) #840 which ports that to CUDA too and keeps your co-authorship
Or, merge this one first, then the PR from @nazq

Really up to you, I'm fine with any given that authorship is respected 🤗

z4y4ts · 2026-03-06T09:33:19Z

Hey @alvarobartt thanks for getting back on this.

Let's merge this PR first and then #840 to separate concerns and keep log easier to understand in the future.

lorr1 · 2026-03-23T18:25:39Z

+1 we'd like to see this merged too!

JCorners68 · 2026-03-29T17:31:23Z

Hi @alvarobartt — PR #840 already includes the Dockerfile-arm64 from this PR
in its single commit (ad55ed2), with Co-Authored-By: z4y4ts preserving attribution.
Since #840 is a strict superset (multi-arch CUDA Dockerfile, sm_121 compute cap fix,
CI matrix entries, and this Dockerfile), this PR can be closed in favor of #840.

I've independently validated #840 on a DGX Spark (GB10, sm_121) — build succeeds,
embedding and reranking models pass smoke tests with flash attention both on and off.
Full results posted on #840.

Copilot AI review requested due to automatic review settings February 13, 2026 16:20

Copilot started reviewing on behalf of z4y4ts February 13, 2026 16:21 View session

Copilot AI reviewed Feb 13, 2026

View reviewed changes

alvarobartt approved these changes Feb 16, 2026

View reviewed changes

z4y4ts added 2 commits February 19, 2026 17:37

Add Dockerfile for ARM64 architecture support and update README instr…

ee7195d

…uctions

Cleanup

e923d8e

z4y4ts force-pushed the main branch from 1e2803d to e923d8e Compare February 19, 2026 15:37

alvarobartt added this to the v1.10.0 milestone Feb 25, 2026

nazq mentioned this pull request Mar 4, 2026

feat: multi-arch CUDA Dockerfile and sm_121 (DGX Spark GB10) #840

Merged

3 tasks

aponcedeleonch mentioned this pull request Mar 6, 2026

Add vMCP optimizer guide for Kubernetes stacklok/docs-website#588

Merged

3 tasks

alvarobartt merged commit a2c07dd into huggingface:main Mar 30, 2026

This was linked to issues Mar 30, 2026

ARM64 Support #769

Closed

Can we get automatic CPU-only ARM64 images? #657

Closed

alvarobartt mentioned this pull request Apr 1, 2026

Can we get automatic CPU-only ARM64 images? #657

Closed

Conversation

z4y4ts commented Feb 13, 2026

What does this PR do?

Before submitting

Who can review?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

alvarobartt commented Feb 15, 2026

Uh oh!

z4y4ts commented Feb 16, 2026

Uh oh!

alvarobartt left a comment

Choose a reason for hiding this comment

Uh oh!

z4y4ts commented Feb 19, 2026

Uh oh!

nazq commented Mar 4, 2026

Uh oh!

alvarobartt commented Mar 6, 2026

Uh oh!

z4y4ts commented Mar 6, 2026

Uh oh!

lorr1 commented Mar 23, 2026

Uh oh!

JCorners68 commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants