[Bug]: OOM when serve  Gemma3-AWQ

### Your current environment

Vllm 0.8.3


### 🐛 Describe the bug

I couldn't serve gemma3-AWQ model on 24G GPU.

- command1:
```
VLLM_USE_V1=0  vllm serve gaunernst/gemma-3-27b-it-int4-awq \
 --max-model-len 4096 --gpu-memory-utilization 0.98   --distributed-executor-backend ray  --dtype float16 
```
- error1:
```
INFO 04-07 15:52:21 [model_runner.py:1146] Model loading took 17.6110 GiB and 12.273077 seconds

 ValueError: The model's max seq len (4096) is larger than the maximum number of tokens that can be stored in KV cache (3584). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.
```

- command2:
```
 VLLM_USE_V1=0  vllm serve gaunernst/gemma-3-27b-it-int4-awq \
 --max-model-len 128 --gpu-memory-utilization 0.98   --distributed-executor-backend ray  --dtype float16 
```
- error2:
```
INFO 04-07 15:54:56 [worker.py:267] the current vLLM instance can use total_gpu_memory (21.99GiB) x gpu_memory_utilization (0.98) = 21.55GiB

ERROR 04-07 15:55:35 [engine.py:448]   File "/python3.9/site-packages/torch/cuda/graphs.py", line 84, in capture_end
ERROR 04-07 15:55:35 [engine.py:448]     super().capture_end()
ERROR 04-07 15:55:35 [engine.py:448] RuntimeError: CUDA error: out of memory
ERROR 04-07 15:55:35 [engine.py:448] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
```



**Only work when I set context length to 128 and num seq to 2, but the response for any request is `NULL!`!**
```
 VLLM_USE_V1=0  vllm serve gaunernst/gemma-3-27b-it-int4-awq --max-num-seqs 2  \
 --max-model-len 128 --gpu-memory-utilization 0.9   --distributed-executor-backend ray  --dtype float16 
```

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: OOM when serve Gemma3-AWQ #16199

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: OOM when serve Gemma3-AWQ #16199

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions