Skip to content

Qwen/Qwen3-Embedding-0.6B OOM error with Turing Docker image #760

@juanjucm

Description

@juanjucm

System Info

OS version

  • Operating System: Amazon Linux 2023.8.20250915
  • CPE OS Name: cpe:2.3:o:amazon:amazon_linux:2023
  • Kernel: Linux 6.1.150-174.273.amzn2023.x86_64
  • Architecture: x86-64
  • Hardware Vendor: Amazon EC2
  • Hardware Model: g4dn.12xlarge

GPU

1x Tesla T4

Model

Qwen/Qwen3-Embedding-0.6B

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

When running Qwen/Qwen3-Embedding-0.6B with nvidia turing docker image (i.e. ghcr.io/huggingface/text-embeddings-inference:turing-1.8.3) I run into CUDA_ERROR_OUT_OF_MEMORY error. However, when using the generic CUDA Docker image (i.e. ghcr.io/huggingface/text-embeddings-inference:cuda-1.8.3) it works correctly.

  • Error with: docker run --gpus all -e MODEL_ID=Qwen/Qwen3-Embedding-0.6B -e AUTO_TRUNCATE=true ghcr.io/huggingface/text-embeddings-inference:turing-1.8.3
  • Works with: docker run --gpus all -e MODEL_ID=Qwen/Qwen3-Embedding-0.6B -e AUTO_TRUNCATE=true ghcr.io/huggingface/text-embeddings-inference:cuda-1.8.3
Error: Model backend is not healthy

Caused by:
    DriverError(CUDA_ERROR_OUT_OF_MEMORY, "out of memory")

Expected behavior

TEI to launch correctly using docker run --gpus all -e MODEL_ID=Qwen/Qwen3-Embedding-0.6B -e AUTO_TRUNCATE=true ghcr.io/huggingface/text-embeddings-inference:turing-1.8.3 on a Tesla T4 GPU.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions