diff --git a/docs/guides/health_check.md b/docs/guides/health_check.md new file mode 100644 index 0000000000..9ac5c6562b --- /dev/null +++ b/docs/guides/health_check.md @@ -0,0 +1,200 @@ + + +# Dynamo Health Checks + +## Overview + +Dynamo provides health check and liveness HTTP endpoints for each component which +can be used to configure startup, liveness and readiness probes in +orchestration frameworks such as Kubernetes. + +## Frontend Liveness Check + +The frontend liveness endpoint reports a status of `live` as long as +the service is running. + +> **Note**: Frontend liveness doesn't depend on worker health or liveness only on the Frontend service itself. + +#### Example Request + +``` +curl -s localhost:8080/live -q | jq +``` + +#### Example Response + +``` +{ + "message": "Service is live", + "status": "live" +} +``` + +## Frontend Health Check + +The frontend health endpoint reports a status of `healthy` as long as +the service is running. Once workers have been registered, the +`health` endpoint will also list registered endpoints and instances. + +> **Note**: Frontend liveness doesn't depend on worker health or liveness only on the Frontend service itself. + +#### Example Request + +``` +curl -v localhost:8080/health -q | jq +``` + +#### Example Response + +Before workers are registered: + +``` +HTTP/1.1 200 OK +content-type: application/json +content-length: 72 +date: Wed, 03 Sep 2025 13:31:44 GMT + +{ + "instances": [], + "message": "No endpoints available", + "status": "unhealthy" +} +``` + +After workers are registered: + +``` +HTTP/1.1 200 OK +content-type: application/json +content-length: 609 +date: Wed, 03 Sep 2025 13:32:03 GMT + +{ + "endpoints": [ + "dyn://dynamo.backend.generate" + ], + "instances": [ + { + "component": "backend", + "endpoint": "clear_kv_blocks", + "instance_id": 7587888160958628000, + "namespace": "dynamo", + "transport": { + "nats_tcp": "dynamo_backend.clear_kv_blocks-694d98147d54be25" + } + }, + { + "component": "backend", + "endpoint": "generate", + "instance_id": 7587888160958628000, + "namespace": "dynamo", + "transport": { + "nats_tcp": "dynamo_backend.generate-694d98147d54be25" + } + }, + { + "component": "backend", + "endpoint": "load_metrics", + "instance_id": 7587888160958628000, + "namespace": "dynamo", + "transport": { + "nats_tcp": "dynamo_backend.load_metrics-694d98147d54be25" + } + } + ], + "status": "healthy" +} +``` + +## Worker Liveness and Health Check + +Health checks for components other than the frontend are enabled +selectively based on environment variables. If a health check for a +component is enabled the starting status can be set along with the set +of endpoints that are required to be served before the component is +declared `ready`. + +Once all endpoints declared in `DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS` +are served the component transitions to a `ready` state until the +component is shutdown. The endpoints return HTTP status code of `HTTP/1.1 503 Service Unavailable` +when initializing and HTTP status code `HTTP/1.1 200 OK` once ready. + +> **Note**: Both /live and /ready return the same information + +### Environment Variables for Enabling Health Checks + +| **Environment Variable** | **Description** | **Example Settings** | +| -------------------------| ------------------- | ------------------------------------------------ | +| `DYN_SYSTEM_ENABLED` | Enables the system status server. | `true`, `false` | +| `DYN_SYSTEM_PORT` | Specifies the port for the system status server. | `9090` | +| `DYN_SYSTEM_STARTING_HEALTH_STATUS` | Sets the initial health status of the system (ready/not ready). | `ready`, `notready` | +| `DYN_SYSTEM_HEALTH_PATH` | Custom path for the health endpoint. | `/custom/health` | +| `DYN_SYSTEM_LIVE_PATH` | Custom path for the liveness endpoint. | `/custom/live` | +| `DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS` | Specifies endpoints to check for determining overall system health status. | `["generate"]` | + +### Example Environment Setting + +``` +export DYN_SYSTEM_ENABLED="true" +export DYN_SYSTEM_STARTING_HEALTH_STATUS="notready" +export DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS="[\"generate\"]" +export DYN_SYSTEM_PORT=9090 +``` + +#### Example Request + +``` +curl -v localhost:9090/health | jq +``` + +#### Example Response +Before endpoints are being served: + +``` +HTTP/1.1 503 Service Unavailable +content-type: text/plain; charset=utf-8 +content-length: 96 +date: Wed, 03 Sep 2025 13:42:39 GMT + +{ + "endpoints": { + "generate": "notready" + }, + "status": "notready", + "uptime": { + "nanos": 313803539, + "secs": 12 + } +} +``` + +After endpoints are being served: + +``` +HTTP/1.1 200 OK +content-type: text/plain; charset=utf-8 +content-length: 139 +date: Wed, 03 Sep 2025 13:42:45 GMT + +{ + "endpoints": { + "clear_kv_blocks": "ready", + "generate": "ready", + "load_metrics": "ready" + }, + "status": "ready", + "uptime": { + "nanos": 356504530, + "secs": 18 + } +} +``` + +## Related Documentation + +- [Distributed Runtime Architecture](../architecture/distributed_runtime.md) +- [Dynamo Architecture Overview](../architecture/architecture.md) +- [Backend Guide](backend.md) diff --git a/docs/guides/logging.md b/docs/guides/logging.md new file mode 100644 index 0000000000..9fc8d97268 --- /dev/null +++ b/docs/guides/logging.md @@ -0,0 +1,149 @@ + + +# Dynamo Logging + +## Overview + +Dynamo provides structured logging in both text as well as JSONL. When +JSONL is enabled logs additionally contain `span` creation and exit +events as well as support for `trace_id` and `span_id` fields for +distributed tracing. + +## Environment Variables for configuring Logging + +| Environment Variable | Description | Example Settings | +| ----------------------------------- | --------------------------------------------| ---------------------------------------------------- | +| `DYN_LOGGING_JSONL` | Enable JSONL logging format (default: READABLE) | `DYN_LOGGING_JSONL=true` | +| `DYN_LOG_USE_LOCAL_TZ` | Use local timezone for logging timestamps (default: UTC) | `DYN_LOG_USE_LOCAL_TZ=1` | +| `DYN_LOG` | Log levels per target `,=,=` | `DYN_LOG=info,dynamo_runtime::system_status_server:trace` | +| `DYN_LOGGING_CONFIG_PATH` | Path to custom TOML logging configuration file | `DYN_LOGGING_CONFIG_PATH=/path/to/config.toml`| + + +## Available Logging Levels + +| **Logging Levels (Least to Most Verbose)** | **Description** | +|-------------------------------------------|---------------------------------------------------------------------------------| +| **ERROR** | Critical errors (e.g., unrecoverable failures, resource exhaustion) | +| **WARN** | Unexpected or degraded situations (e.g., retries, recoverable errors) | +| **INFO** | Operational information (e.g., startup/shutdown, major events) | +| **DEBUG** | General debugging information (e.g., variable values, flow control) | +| **TRACE** | Very low-level, detailed information (e.g., internal algorithm steps) | + +## Example Readable Format + +Environment Setting: + +``` +export DYN_LOG="info,dynamo_runtime::system_status_server:trace" +export DYN_LOGGING_JSONL="false" +``` + +Resulting Log format: + +``` +2025-09-02T15:50:01.770028Z INFO main.init: VllmWorker for Qwen/Qwen3-0.6B has been initialized +2025-09-02T15:50:01.770195Z INFO main.init: Reading Events from tcp://127.0.0.1:21555 +2025-09-02T15:50:01.770265Z INFO main.init: Getting engine runtime configuration metadata from vLLM engine... +2025-09-02T15:50:01.770316Z INFO main.get_engine_cache_info: Cache config values: {'num_gpu_blocks': 24064} +2025-09-02T15:50:01.770358Z INFO main.get_engine_cache_info: Scheduler config values: {'max_num_seqs': 256, 'max_num_batched_tokens': 2048} +``` + +## Example JSONL Format + +Environment Setting: + +``` +export DYN_LOG="info,dynamo_runtime::system_status_server:trace" +export DYN_LOGGING_JSONL="true" +``` + +Resulting Log format: + +```json +{"time":"2025-09-02T15:53:31.943377Z","level":"INFO","target":"log","message":"VllmWorker for Qwen/Qwen3-0.6B has been initialized","log.file":"/opt/dynamo/venv/lib/python3.12/site-packages/dynamo/vllm/main.py","log.line":191,"log.target":"main.init"} +{"time":"2025-09-02T15:53:31.943550Z","level":"INFO","target":"log","message":"Reading Events from tcp://127.0.0.1:26771","log.file":"/opt/dynamo/venv/lib/python3.12/site-packages/dynamo/vllm/main.py","log.line":212,"log.target":"main.init"} +{"time":"2025-09-02T15:53:31.943636Z","level":"INFO","target":"log","message":"Getting engine runtime configuration metadata from vLLM engine...","log.file":"/opt/dynamo/venv/lib/python3.12/site-packages/dynamo/vllm/main.py","log.line":220,"log.target":"main.init"} +{"time":"2025-09-02T15:53:31.943701Z","level":"INFO","target":"log","message":"Cache config values: {'num_gpu_blocks': 24064}","log.file":"/opt/dynamo/venv/lib/python3.12/site-packages/dynamo/vllm/main.py","log.line":267,"log.target":"main.get_engine_cache_info"} +{"time":"2025-09-02T15:53:31.943747Z","level":"INFO","target":"log","message":"Scheduler config values: {'max_num_seqs': 256, 'max_num_batched_tokens': 2048}","log.file":"/opt/dynamo/venv/lib/python3.12/site-packages/dynamo/vllm/main.py","log.line":268,"log.target":"main.get_engine_cache_info"} +``` + +## Trace and Span information + +When `DYN_LOGGING_JSONL` is enabled with `DYN_LOG` set to greater than or equal to +`info` level trace information is added to all logged spans along with +`SPAN_CREATED` and `SPAN_CLOSED` events. + +### Example Request + +``` +curl -d '{"model": "Qwen/Qwen3-0.6B", "max_completion_tokens": 2049, "messages":[{"role":"user", "content": "What is the capital of South Africa?" }]}' -H 'Content-Type: application/json' http://localhost:8080/v1/chat/completions +``` + +### Example Logs + +``` +# Span Created in HTTP Frontend with trace_id and span_id + +{"time":"2025-09-02T16:38:06.656503Z","level":"INFO","file":"/workspace/lib/runtime/src/logging.rs","line":248,"target":"dynamo_runtime::logging","message":"SPAN_CREATED","method":"POST","span_id":"6959a1b2d1ee41a5","span_name":"http-request","trace_id":"425ef761ca5b44c795b4c912f1d84b39","uri":"/v1/chat/completions","version":"HTTP/1.1"} + +# Span Created in Worker with trace_id and parent_id from frontend and new span_id + +{"time":"2025-09-02T16:38:06.666672Z","level":"INFO","file":"/workspace/lib/runtime/src/pipeline/network/ingress/push_endpoint.rs","line":108,"target":"dynamo_runtime::pipeline::network::ingress::push_endpoint","message":"SPAN_CREATED","component":"backend","endpoint":"generate","instance_id":"7587888160958627596","namespace":"dynamo","parent_id":"6959a1b2d1ee41a5","span_id":"b035f33bdd5c4b50","span_name":"handle_payload","trace_id":"425ef761ca5b44c795b4c912f1d84b39"} +{"time":"2025-09-02T16:38:06.685333Z","level":"WARN","target":"log","message":"cudagraph dispatching keys are not initialized. No cudagraph will be used.","log.file":"/opt/vllm/vllm/v1/cudagraph_dispatcher.py","log.line":101,"log.target":"cudagraph_dispatcher.dispatch"} + +# Span Closed in Worker with duration, busy, and idle information + +{"time":"2025-09-02T16:38:08.787232Z","level":"INFO","file":"/workspace/lib/runtime/src/pipeline/network/ingress/push_endpoint.rs","line":108,"target":"dynamo_runtime::pipeline::network::ingress::push_endpoint","message":"SPAN_CLOSED","component":"backend","endpoint":"generate","instance_id":"7587888160958627596","namespace":"dynamo","parent_id":"6959a1b2d1ee41a5","span_id":"b035f33bdd5c4b50","span_name":"handle_payload","time.busy_us":1090,"time.duration_us":2121090,"time.idle_us":2120000,"trace_id":"425ef761ca5b44c795b4c912f1d84b39"} + +# Span Closed in HTTP Frontend with duration, busy and idle information + +{"time":"2025-09-02T16:38:08.788268Z","level":"INFO","file":"/workspace/lib/runtime/src/logging.rs","line":248,"target":"dynamo_runtime::logging","message":"SPAN_CLOSED","method":"POST","span_id":"6959a1b2d1ee41a5","span_name":"http-request","time.busy_us":13000,"time.duration_us":2133000,"time.idle_us":2120000,"trace_id":"425ef761ca5b44c795b4c912f1d84b39","uri":"/v1/chat/completions","version":"HTTP/1.1"} +``` + +### Example Request with User Supplied `x-request-id` + +``` +curl -d '{"model": "Qwen/Qwen3-0.6B", "max_completion_tokens": 2049, "messages":[{"role":"user", "content": "What is the capital of South Africa?" }]}' -H 'Content-Type: application/json' -H 'x-request-id: 8372eac7-5f43-4d76-beca-0a94cfb311d0' http://localhost:8080/v1/chat/completions +``` + +### Example Logs + +``` +# Span Created in HTTP Frontend with x-request-id, trace_id, and span_id + +{"time":"2025-09-02T17:01:46.306801Z","level":"INFO","file":"/workspace/lib/runtime/src/logging.rs","line":248,"target":"dynamo_runtime::logging","message":"SPAN_CREATED","method":"POST","span_id":"906902a4e74b4264","span_name":"http-request","trace_id":"3924188ea88d40febdfa173afd246a3a","uri":"/v1/chat/completions","version":"HTTP/1.1","x_request_id":"8372eac7-5f43-4d76-beca-0a94cfb311d0"} + +# Span Created in Worker with trace_id, parent_id and x_request_id from frontend and new span_id + +{"time":"2025-09-02T17:01:46.307484Z","level":"INFO","file":"/workspace/lib/runtime/src/pipeline/network/ingress/push_endpoint.rs","line":108,"target":"dynamo_runtime::pipeline::network::ingress::push_endpoint","message":"SPAN_CREATED","component":"backend","endpoint":"generate","instance_id":"7587888160958627596","namespace":"dynamo","parent_id":"906902a4e74b4264","span_id":"5a732a3721814f5e","span_name":"handle_payload","trace_id":"3924188ea88d40febdfa173afd246a3a","x_request_id":"8372eac7-5f43-4d76-beca-0a94cfb311d0"} + +# Span Closed in Worker with duration, busy, and idle information +{"time":"2025-09-02T17:01:47.975228Z","level":"INFO","file":"/workspace/lib/runtime/src/pipeline/network/ingress/push_endpoint.rs","line":108,"target":"dynamo_runtime::pipeline::network::ingress::push_endpoint","message":"SPAN_CLOSED","component":"backend","endpoint":"generate","instance_id":"7587888160958627596","namespace":"dynamo","parent_id":"906902a4e74b4264","span_id":"5a732a3721814f5e","span_name":"handle_payload","time.busy_us":646,"time.duration_us":1670646,"time.idle_us":1670000,"trace_id":"3924188ea88d40febdfa173afd246a3a","x_request_id":"8372eac7-5f43-4d76-beca-0a94cfb311d0"} + +# Span Closed in HTTP Frontend with duration, busy and idle information + +{"time":"2025-09-02T17:01:47.975616Z","level":"INFO","file":"/workspace/lib/runtime/src/logging.rs","line":248,"target":"dynamo_runtime::logging","message":"SPAN_CLOSED","method":"POST","span_id":"906902a4e74b4264","span_name":"http-request","time.busy_us":2980,"time.duration_us":1672980,"time.idle_us":1670000,"trace_id":"3924188ea88d40febdfa173afd246a3a","uri":"/v1/chat/completions","version":"HTTP/1.1","x_request_id":"8372eac7-5f43-4d76-beca-0a94cfb311d0"} + +``` + +## Related Documentation + +- [Distributed Runtime Architecture](../architecture/distributed_runtime.md) +- [Dynamo Architecture Overview](../architecture/architecture.md) +- [Backend Guide](backend.md) +- [Log Aggregation in Kubernetes](dynamo_deploy/logging.md) diff --git a/lib/llm/src/http/service/health.rs b/lib/llm/src/http/service/health.rs index c5825d0f87..75707d5577 100644 --- a/lib/llm/src/http/service/health.rs +++ b/lib/llm/src/http/service/health.rs @@ -65,27 +65,16 @@ async fn health_handler( vec![] }; - if model_entries.is_empty() { - ( - StatusCode::SERVICE_UNAVAILABLE, - Json(json!({ - "status": "unhealthy", - "message": "No endpoints available", - "instances": instances - })), - ) - } else { - let endpoints: Vec = model_entries - .iter() - .map(|entry| entry.endpoint_id.as_url()) - .collect(); - ( - StatusCode::OK, - Json(json!({ - "status": "healthy", - "endpoints": endpoints, - "instances": instances - })), - ) - } + let endpoints: Vec = model_entries + .iter() + .map(|entry| entry.endpoint_id.as_url()) + .collect(); + ( + StatusCode::OK, + Json(json!({ + "status": "healthy", + "endpoints": endpoints, + "instances": instances + })), + ) }