`Qwen/Qwen3-Embedding-0.6B` OOM error with Turing Docker image

### System Info

#### OS version
- Operating System: Amazon Linux 2023.8.20250915
- CPE OS Name: cpe:2.3:o:amazon:amazon_linux:2023
- Kernel: Linux 6.1.150-174.273.amzn2023.x86_64
- Architecture: x86-64
- Hardware Vendor: Amazon EC2
- Hardware Model: g4dn.12xlarge

#### GPU
1x Tesla T4

#### Model
`Qwen/Qwen3-Embedding-0.6B`

### Information

- [x] Docker
- [ ] The CLI directly

### Tasks

- [x] An officially supported command
- [ ] My own modifications

### Reproduction

When running `Qwen/Qwen3-Embedding-0.6B` with nvidia turing docker image (i.e. `ghcr.io/huggingface/text-embeddings-inference:turing-1.8.3`) I run into `CUDA_ERROR_OUT_OF_MEMORY` error. However, when using the generic CUDA Docker image (i.e. `ghcr.io/huggingface/text-embeddings-inference:cuda-1.8.3`) it works correctly.

- Error with: `docker run --gpus all -e MODEL_ID=Qwen/Qwen3-Embedding-0.6B -e AUTO_TRUNCATE=true ghcr.io/huggingface/text-embeddings-inference:turing-1.8.3`
- Works with: `docker run --gpus all -e MODEL_ID=Qwen/Qwen3-Embedding-0.6B -e AUTO_TRUNCATE=true ghcr.io/huggingface/text-embeddings-inference:cuda-1.8.3`

```
Error: Model backend is not healthy

Caused by:
    DriverError(CUDA_ERROR_OUT_OF_MEMORY, "out of memory")
```

### Expected behavior

TEI to launch correctly using `docker run --gpus all -e MODEL_ID=Qwen/Qwen3-Embedding-0.6B -e AUTO_TRUNCATE=true ghcr.io/huggingface/text-embeddings-inference:turing-1.8.3` on a Tesla T4 GPU.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`Qwen/Qwen3-Embedding-0.6B` OOM error with Turing Docker image #760

System Info

OS version

GPU

Model

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen/Qwen3-Embedding-0.6B OOM error with Turing Docker image #760

Description

System Info

OS version

GPU

Model

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`Qwen/Qwen3-Embedding-0.6B` OOM error with Turing Docker image #760