Note
The vLLM Health Check support is currently in BETA. Its features and functionality are subject to change as we collect feedback. We are excited to hear any thoughts you have!
The vLLM backend supports checking for vLLM Engine Health upon receiving each inference request. If the health check fails, the model state will becomes NOT Ready at the server, which can be queried by the Repository Index or Model Ready APIs.
The Health Check is disabled by default. To enable it, set the following parameter on the model config to true
parameters: {
key: "ENABLE_VLLM_HEALTH_CHECK"
value: { string_value: "true" }
}
and select Model Control Mode EXPLICIT when the server is started.