Skip to content

Latest commit

 

History

History
55 lines (51 loc) · 2.67 KB

health_check.md

File metadata and controls

55 lines (51 loc) · 2.67 KB

vLLM Health Check (BETA)

Note

The vLLM Health Check support is currently in BETA. Its features and functionality are subject to change as we collect feedback. We are excited to hear any thoughts you have!

The vLLM backend supports checking for vLLM Engine Health upon receiving each inference request. If the health check fails, the model state will becomes NOT Ready at the server, which can be queried by the Repository Index or Model Ready APIs.

The Health Check is disabled by default. To enable it, set the following parameter on the model config to true

parameters: {
  key: "ENABLE_VLLM_HEALTH_CHECK"
  value: { string_value: "true" }
}

and select Model Control Mode EXPLICIT when the server is started.