Skip to content

Add Dockerfile for ARM64 architecture support and update README instructions#827

Merged
alvarobartt merged 2 commits intohuggingface:mainfrom
z4y4ts:main
Mar 30, 2026
Merged

Add Dockerfile for ARM64 architecture support and update README instructions#827
alvarobartt merged 2 commits intohuggingface:mainfrom
z4y4ts:main

Conversation

@z4y4ts
Copy link
Copy Markdown
Contributor

@z4y4ts z4y4ts commented Feb 13, 2026

What does this PR do?

This will make it easier to use the TEI on Apple Silicone machines.

The new Dockerfile-arm64 is based on the Dockerfile. The differences are:

  • Stripped down MKL support, as it's Intel-only
  • Use arm64-compatible sccache and protobuf

Fixes #611

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline?
  • Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the documentation guidelines.
  • Did you write any new necessary tests? If applicable, did you include or update the insta snapshots?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Copilot AI review requested due to automatic review settings February 13, 2026 16:20
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for building Docker images on ARM64 architecture (Apple Silicon) to enable local development on Apple M-series machines. The PR addresses issue #611 where building the Docker image on Apple M4 chips failed due to Intel-specific dependencies not being available for ARM64.

Changes:

  • Added new Dockerfile-arm64 specifically designed for ARM64 architecture without Intel MKL dependencies
  • Updated README instructions to reference the new ARM64-specific Dockerfile

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
Dockerfile-arm64 New Dockerfile for ARM64 builds, removing Intel MKL support and using ARM64-compatible sccache and protobuf binaries
README.md Updated Docker build command to reference Dockerfile-arm64 for ARM64 platforms

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md

```
docker build . -f Dockerfile --platform=linux/arm64
docker build . -f Dockerfile-arm64 --platform=linux/arm64
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI/CD workflow files (.github/workflows/build.yaml and .github/workflows/test.yaml) currently trigger on changes to "Dockerfile" but not "Dockerfile-arm64". This means that changes to the new ARM64 Dockerfile won't trigger automated builds or tests. Consider updating the workflow paths to include "Dockerfile-arm64" to ensure proper CI/CD coverage for ARM64 builds.

Copilot uses AI. Check for mistakes.
Comment thread README.md

```
docker build . -f Dockerfile --platform=linux/arm64
docker build . -f Dockerfile-arm64 --platform=linux/arm64
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The matrix.json file used for automated builds doesn't include an entry for the ARM64 architecture. While this PR enables manual ARM64 builds for local development, consider adding an ARM64 entry to matrix.json (similar to the existing "cpu" entry but using "Dockerfile-arm64") to enable automated ARM64 Docker image builds and publishing in the CI/CD pipeline.

Copilot uses AI. Check for mistakes.
Comment thread README.md

```
docker build . -f Dockerfile --platform=linux/arm64
docker build . -f Dockerfile-arm64 --platform=linux/arm64
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding documentation about the ARM64 Dockerfile in the docs/source/en/custom_container.md file, similar to how CPU and CUDA builds are documented. This would help users understand when and how to use Dockerfile-arm64, especially for Apple Silicon development environments.

Copilot uses AI. Check for mistakes.
@alvarobartt
Copy link
Copy Markdown
Member

Hey @z4y4ts thanks for opening the PR! Given that Metal won't work over Docker, which is really the point on having a separate Dockerfile for it? What's really the benefit here, other than making it a bit lighter due to the lack of Intel MKL libs? I'm just not sure about having a separate Dockerfile for it, but happy to discuss further in case it's a real need here, thanks!

@z4y4ts
Copy link
Copy Markdown
Contributor Author

z4y4ts commented Feb 16, 2026

Hey @alvarobartt !

The key use case is to be able to do a local development on an Apple Silicone machines.

We're deploying the service to a linux cluster. Local development is done with using https://tilt.dev/

The key need is to have dev setup as close to the production as possible. So the slower performance is an acceptable trade-off as the load on dev machine is expected to be low. As long as it provides the same features/models.

I've added a separate Dockerfile because I see I'm not the only one with the same need. So it seems to be beneficial for the community.

Also, the solution described in the README does not actually work. So it's not super obvious how to make it run and takes quite a few extra steps.

If we agree this change is actually valuable, I'd like to also add it to the matrix.json to simplify macos setup even further.

Copy link
Copy Markdown
Member

@alvarobartt alvarobartt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough @z4y4ts thanks for the detailed information! I'll be happy to merge this PR and update the matrix.json to include it and build + push the arm64- image too (which I'll do in a follow up PR, as otherwise the CI won't run) 🤗

@z4y4ts
Copy link
Copy Markdown
Contributor Author

z4y4ts commented Feb 19, 2026

@alvarobartt thanks for approving!
Can you merge it? Or shall I do some extra steps?

Asking because there's no merge button for me.

@alvarobartt alvarobartt added this to the v1.10.0 milestone Feb 25, 2026
nazq pushed a commit to nazq/text-embeddings-inference that referenced this pull request Mar 4, 2026
- Make Dockerfile-cuda multi-arch by using TARGETARCH for sccache and
  protoc binary selection (supports amd64 and arm64)
- Add sm_121 compute capability for NVIDIA DGX Spark GB10
  - nvprune support in Dockerfile-cuda
  - compute_cap_matching: (120..=121, 120) and (121, 121) compatibility
  - flash_attn: allow runtime_compute_cap == 121
  - Comprehensive test coverage for sm_121 matching

Builds on PR huggingface#827 (ARM64 CPU Dockerfile) by extending CUDA support
to ARM64 and adding the GB10's sm_121 compute capability.
nazq pushed a commit to nazq/text-embeddings-inference that referenced this pull request Mar 4, 2026
- Make Dockerfile-cuda multi-arch by using TARGETARCH for sccache and
  protoc binary selection (supports amd64 and arm64)
- Add sm_121 compute capability for NVIDIA DGX Spark GB10
  - nvprune support in Dockerfile-cuda
  - compute_cap_matching: (120..=121, 120) and (121, 121) compatibility
  - flash_attn: allow runtime_compute_cap == 121
  - Comprehensive test coverage for sm_121 matching
- Update README: replace Apple-only ARM64 section with comprehensive
  aarch64 docs covering CPU-only and CUDA paths (DGX Spark, Jetson)
- Add sm_121 to CUDA compute cap examples in README

Builds on PR huggingface#827 (ARM64 CPU Dockerfile) by extending CUDA support
to ARM64 and adding the GB10's sm_121 compute capability.
nazq pushed a commit to nazq/text-embeddings-inference that referenced this pull request Mar 4, 2026
Dockerfile-cuda:
- Multi-arch sccache and protoc via TARGETARCH (amd64 + arm64)
- Add sm_121 to nvprune for DGX Spark GB10

compute_cap.rs:
- (120..=121, 120) => true: sm_121 compatible with sm_120 binaries
- (121, 121) => true: native sm_121 exact match
- Comprehensive test coverage for sm_121

flash_attn.rs:
- Allow runtime_compute_cap == 121 for flash attention v2

build.yaml:
- Use matrix.platforms with fallback to linux/amd64
- Enables per-variant platform selection

matrix.json:
- Add blackwell-121 entry (linux/arm64, sm_121)
- Add cpu-arm64 entry (linux/arm64, Dockerfile-arm64)

README.md:
- Add Platform column to Docker Images table
- Add cpu-arm64 and 121 (DGX Spark) image entries
- Replace Apple-only ARM64 section with comprehensive aarch64
  docs covering CPU-only and CUDA paths
- Add sm_121 to CUDA compute cap examples

Builds on PR huggingface#827 (ARM64 CPU Dockerfile) by extending CUDA support
to ARM64 and adding the GB10's sm_121 compute capability.
nazq pushed a commit to nazq/text-embeddings-inference that referenced this pull request Mar 4, 2026
Dockerfile-cuda:
- Multi-arch sccache and protoc via TARGETARCH (amd64 + arm64)
- Add sm_121 to nvprune for DGX Spark GB10

compute_cap.rs:
- (120..=121, 120) => true: sm_121 compatible with sm_120 binaries
- (121, 121) => true: native sm_121 exact match
- Comprehensive test coverage for sm_121

flash_attn.rs:
- Allow runtime_compute_cap == 121 for flash attention v2

build.yaml:
- Use matrix.platforms with fallback to linux/amd64
- Enables per-variant platform selection

matrix.json:
- Add blackwell-121 entry (linux/arm64, sm_121)
- Add cpu-arm64 entry (linux/arm64, Dockerfile-arm64)

README.md:
- Add Platform column to Docker Images table
- Add cpu-arm64 and 121 (DGX Spark) image entries
- Replace Apple-only ARM64 section with comprehensive aarch64
  docs covering CPU-only and CUDA paths
- Add sm_121 to CUDA compute cap examples

Builds on PR huggingface#827 (ARM64 CPU Dockerfile) by extending CUDA support
to ARM64 and adding the GB10's sm_121 compute capability.
@nazq
Copy link
Copy Markdown
Contributor

nazq commented Mar 4, 2026

@z4y4ts Added this #840 as an extension to support CUDA on Spark DGX

@alvarobartt
Copy link
Copy Markdown
Member

Hey @z4y4ts thanks again for your great work here, given that #840 is in place to I can either:

Really up to you, I'm fine with any given that authorship is respected 🤗

@z4y4ts
Copy link
Copy Markdown
Contributor Author

z4y4ts commented Mar 6, 2026

Hey @alvarobartt thanks for getting back on this.

Let's merge this PR first and then #840 to separate concerns and keep log easier to understand in the future.

@lorr1
Copy link
Copy Markdown

lorr1 commented Mar 23, 2026

+1 we'd like to see this merged too!

@JCorners68
Copy link
Copy Markdown

Hi @alvarobartt — PR #840 already includes the Dockerfile-arm64 from this PR
in its single commit (ad55ed2), with Co-Authored-By: z4y4ts preserving attribution.
Since #840 is a strict superset (multi-arch CUDA Dockerfile, sm_121 compute cap fix,
CI matrix entries, and this Dockerfile), this PR can be closed in favor of #840.

I've independently validated #840 on a DGX Spark (GB10, sm_121) — build succeeds,
embedding and reranking models pass smoke tests with flash attention both on and off.
Full results posted on #840.

@alvarobartt alvarobartt merged commit a2c07dd into huggingface:main Mar 30, 2026
This was linked to issues Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

6 participants