-
Notifications
You must be signed in to change notification settings - Fork 693
docs: health check and structured logs #2805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 16 commits
c69a1ef
ba8b254
060d668
89b3f1a
f6e8f50
118d316
9b34cda
59664d9
d58ac8a
0bfc40a
c8f5f9c
5fc7467
5a84606
6c782fc
09f25a1
6552c99
7b518a7
1301a7e
1481977
26de0a4
c86b6a7
5bbad51
9715dab
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,215 @@ | ||
| <!-- | ||
| SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| Licensed under the Apache License, Version 2.0 (the "License"); | ||
| you may not use this file except in compliance with the License. | ||
| You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software | ||
| distributed under the License is distributed on an "AS IS" BASIS, | ||
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| See the License for the specific language governing permissions and | ||
| limitations under the License. | ||
| --> | ||
|
|
||
| # Dynamo Health Checks | ||
|
|
||
| ## Overview | ||
|
|
||
| Dynamo provides health check and liveness HTTP endpoints for each component which | ||
| can be used configure startup, liveness and readiness probes in | ||
nnshah1 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| orchestration frameworks such as Kubernetes. | ||
|
|
||
| ## Frontend Liveness Check | ||
|
|
||
| The frontend liveness endpoint reports a status of `live` as long as | ||
| the service is running. | ||
|
|
||
| > **Note**: Frontend liveness doesn't depend on worker health or liveness only on the Frontend service itself. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "... doesn't depend on worker health or liveness only ..." => "... doesn't depend on worker health or liveness,(comma) only ..." |
||
|
|
||
| #### Example Request | ||
|
|
||
| ``` | ||
| curl -s localhost:8080/live -q | jq | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think someone (maybe @PeaBrane?) recently switched all the FE to be 8000. May want to check with Rudy. |
||
| ``` | ||
|
|
||
| #### Example Response | ||
|
|
||
| ``` | ||
| { | ||
| "message": "Service is live", | ||
| "status": "live" | ||
| } | ||
| ``` | ||
|
|
||
| ## Frontend Health Check | ||
|
|
||
| The frontend health endpoint reports a status of `healthy` once a | ||
| model has been registered. During initial startup the frontend will | ||
| report `unhealthy` with a HTTP status code of `HTTP/1.1 503 Service Unavailable` | ||
| until workers have been initialized and registered | ||
| with the frontend. Once workers have been registered, the `health` | ||
| endpoint will also list registered endpoints and instances and returl an HTTP status code of `HTTP/1.1 200 OK`. | ||
nnshah1 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
nnshah1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| > **Note**: Frontend health depends only on endpoints (workers) being registered. It doesn't depend on worker health or liveness. | ||
nnshah1 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| #### Example Request | ||
|
|
||
| ``` | ||
| curl -v localhost:8080/health -q | jq | ||
| ``` | ||
|
|
||
| #### Example Response | ||
|
|
||
| Before workers are registered: | ||
|
|
||
| ``` | ||
| HTTP/1.1 503 Service Unavailable | ||
| content-type: application/json | ||
| content-length: 72 | ||
| date: Wed, 03 Sep 2025 13:31:44 GMT | ||
|
|
||
| { | ||
| "instances": [], | ||
| "message": "No endpoints available", | ||
| "status": "unhealthy" | ||
| } | ||
| ``` | ||
|
|
||
| After workers are registered: | ||
|
|
||
| ``` | ||
| HTTP/1.1 200 OK | ||
keivenchang marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| content-type: application/json | ||
| content-length: 609 | ||
| date: Wed, 03 Sep 2025 13:32:03 GMT | ||
|
|
||
| { | ||
| "endpoints": [ | ||
| "dyn://dynamo.backend.generate" | ||
| ], | ||
| "instances": [ | ||
| { | ||
| "component": "backend", | ||
| "endpoint": "clear_kv_blocks", | ||
| "instance_id": 7587888160958628000, | ||
| "namespace": "dynamo", | ||
| "transport": { | ||
| "nats_tcp": "dynamo_backend.clear_kv_blocks-694d98147d54be25" | ||
| } | ||
| }, | ||
| { | ||
| "component": "backend", | ||
| "endpoint": "generate", | ||
| "instance_id": 7587888160958628000, | ||
| "namespace": "dynamo", | ||
| "transport": { | ||
| "nats_tcp": "dynamo_backend.generate-694d98147d54be25" | ||
| } | ||
| }, | ||
| { | ||
| "component": "backend", | ||
| "endpoint": "load_metrics", | ||
| "instance_id": 7587888160958628000, | ||
| "namespace": "dynamo", | ||
| "transport": { | ||
| "nats_tcp": "dynamo_backend.load_metrics-694d98147d54be25" | ||
| } | ||
| } | ||
| ], | ||
| "status": "healthy" | ||
| } | ||
| ``` | ||
nnshah1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ## Worker Liveness and Health Check | ||
|
|
||
| Health checks for components other than the frontend are enabled | ||
| selectively based on environment variables. If a health check for a | ||
| component is enabled the starting status can be set along with the set | ||
| of endpoints that are required to be served before the component is | ||
| declared `ready`. | ||
|
|
||
| Once all endpoints declared in `DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS` | ||
| are served the component transitions to a `ready` state until the | ||
| component is shutdown. The endpoints return HTTP status code of `HTTP/1.1 503 Service Unavailable` | ||
| when initializing and HTTP status code `HTTP/1.1 200 OK` once ready. | ||
|
|
||
| > **Note**: Both /live and /ready return the same information | ||
|
|
||
| ### Environment Variables for Enabling Health Checks | ||
|
|
||
| | **Environment Variable** | **Description** | **Example Settings** | | ||
| | -------------------------| ------------------- | ------------------------------------------------ | | ||
| | `DYN_SYSTEM_ENABLED` | Enables the system status server. | `true`, `false` | | ||
nnshah1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| | `DYN_SYSTEM_PORT` | Specifies the port for the system status server. | `9090` | | ||
| | `DYN_SYSTEM_STARTING_HEALTH_STATUS` | Sets the initial health status of the system (ready/not ready). | `ready`, `notready` | | ||
| | `DYN_SYSTEM_HEALTH_PATH` | Custom path for the health endpoint. | `/custom/health` | | ||
| | `DYN_SYSTEM_LIVE_PATH` | Custom path for the liveness endpoint. | `/custom/live` | | ||
| | `DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS` | Specifies endpoints to check for determining overall system health status. | `["generate"]` | | ||
|
|
||
| ### Example Environment Setting | ||
|
|
||
| ``` | ||
| export DYN_SYSTEM_ENABLED="true" | ||
| export DYN_SYSTEM_STARTING_HEALTH_STATUS="notready" | ||
| export DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS="[\"generate\"]" | ||
| export DYN_SYSTEM_PORT=9090 | ||
| ``` | ||
|
|
||
| #### Example Request | ||
|
|
||
| ``` | ||
| curl -v localhost:9090/health | jq | ||
| ``` | ||
|
|
||
| #### Example Response | ||
| Before endpoints are being served: | ||
|
|
||
| ``` | ||
nnshah1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| HTTP/1.1 503 Service Unavailable | ||
| content-type: text/plain; charset=utf-8 | ||
| content-length: 96 | ||
| date: Wed, 03 Sep 2025 13:42:39 GMT | ||
|
|
||
| { | ||
| "endpoints": { | ||
| "generate": "notready" | ||
| }, | ||
| "status": "notready", | ||
| "uptime": { | ||
| "nanos": 313803539, | ||
| "secs": 12 | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| After endpoints are being served: | ||
|
|
||
| ``` | ||
| HTTP/1.1 200 OK | ||
| content-type: text/plain; charset=utf-8 | ||
| content-length: 139 | ||
| date: Wed, 03 Sep 2025 13:42:45 GMT | ||
|
|
||
| { | ||
| "endpoints": { | ||
| "clear_kv_blocks": "ready", | ||
| "generate": "ready", | ||
| "load_metrics": "ready" | ||
| }, | ||
| "status": "ready", | ||
| "uptime": { | ||
| "nanos": 356504530, | ||
| "secs": 18 | ||
| } | ||
| } | ||
nnshah1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ``` | ||
|
|
||
| ## Related Documentation | ||
|
|
||
| - [Distributed Runtime Architecture](../architecture/distributed_runtime.md) | ||
| - [Dynamo Architecture Overview](../architecture/architecture.md) | ||
| - [Backend Guide](backend.md) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,139 @@ | ||
| <!-- | ||
| SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| Licensed under the Apache License, Version 2.0 (the "License"); | ||
| you may not use this file except in compliance with the License. | ||
| You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software | ||
| distributed under the License is distributed on an "AS IS" BASIS, | ||
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| See the License for the specific language governing permissions and | ||
| limitations under the License. | ||
| --> | ||
|
|
||
| # Dynamo Logging | ||
|
|
||
| ## Overview | ||
|
|
||
| Dynamo provides structured logging in both text as well as JSONL. When | ||
| JSONL is enabled logs additionally contain `span` creation and exit | ||
| events as well as support for `trace_id` and `span_id` fields for | ||
| distributed tracing. | ||
|
|
||
| ## Environment Variables for configuring Logging | ||
|
|
||
| | Environment Variable | Description | Example Settings | | ||
| | ----------------------------------- | --------------------------------------------| ---------------------------------------------------- | | ||
| | `DYN_LOGGING_JSONL` | Enable JSONL logging format (default: READABLE) | `DYN_LOGGING_JSONL=true` | | ||
| | `DYN_LOG_USE_LOCAL_TZ` | Use local timezone for logging timestamps (default: UTC) | `DYN_LOG_USE_LOCAL_TZ=1` | | ||
| | `DYN_LOG` | Log levels per target (comma-separated key-value pairs) | `DYN_LOG=info,dynamo_runtime::system_status_server:trace` | | ||
nnshah1 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
nnshah1 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| | `DYN_LOGGING_CONFIG_PATH` | Path to custom TOML logging configuration file | `DYN_LOGGING_CONFIG_PATH=/path/to/config.toml`| | ||
|
|
||
|
|
||
| ## Example Readable Format | ||
|
|
||
| Environment Setting: | ||
|
|
||
| ``` | ||
| export DYN_LOG="info,dynamo_runtime::system_status_server:trace" | ||
| export DYN_LOGGING_JSONL="false" | ||
| ``` | ||
|
|
||
| Resulting Log format: | ||
|
|
||
| ``` | ||
| 2025-09-02T15:50:01.770028Z INFO main.init: VllmWorker for Qwen/Qwen3-0.6B has been initialized | ||
| 2025-09-02T15:50:01.770195Z INFO main.init: Reading Events from tcp://127.0.0.1:21555 | ||
| 2025-09-02T15:50:01.770265Z INFO main.init: Getting engine runtime configuration metadata from vLLM engine... | ||
| 2025-09-02T15:50:01.770316Z INFO main.get_engine_cache_info: Cache config values: {'num_gpu_blocks': 24064} | ||
| 2025-09-02T15:50:01.770358Z INFO main.get_engine_cache_info: Scheduler config values: {'max_num_seqs': 256, 'max_num_batched_tokens': 2048} | ||
| ``` | ||
|
|
||
| ## Example JSONL Format | ||
|
|
||
| Environment Setting: | ||
|
|
||
| ``` | ||
| export DYN_LOG="info,dynamo_runtime::system_status_server:trace" | ||
| export DYN_LOGGING_JSONL="true" | ||
| ``` | ||
|
|
||
| Resulting Log format: | ||
|
|
||
| ```json | ||
| {"time":"2025-09-02T15:53:31.943377Z","level":"INFO","target":"log","message":"VllmWorker for Qwen/Qwen3-0.6B has been initialized","log.file":"/opt/dynamo/venv/lib/python3.12/site-packages/dynamo/vllm/main.py","log.line":191,"log.target":"main.init"} | ||
| {"time":"2025-09-02T15:53:31.943550Z","level":"INFO","target":"log","message":"Reading Events from tcp://127.0.0.1:26771","log.file":"/opt/dynamo/venv/lib/python3.12/site-packages/dynamo/vllm/main.py","log.line":212,"log.target":"main.init"} | ||
| {"time":"2025-09-02T15:53:31.943636Z","level":"INFO","target":"log","message":"Getting engine runtime configuration metadata from vLLM engine...","log.file":"/opt/dynamo/venv/lib/python3.12/site-packages/dynamo/vllm/main.py","log.line":220,"log.target":"main.init"} | ||
| {"time":"2025-09-02T15:53:31.943701Z","level":"INFO","target":"log","message":"Cache config values: {'num_gpu_blocks': 24064}","log.file":"/opt/dynamo/venv/lib/python3.12/site-packages/dynamo/vllm/main.py","log.line":267,"log.target":"main.get_engine_cache_info"} | ||
| {"time":"2025-09-02T15:53:31.943747Z","level":"INFO","target":"log","message":"Scheduler config values: {'max_num_seqs': 256, 'max_num_batched_tokens': 2048}","log.file":"/opt/dynamo/venv/lib/python3.12/site-packages/dynamo/vllm/main.py","log.line":268,"log.target":"main.get_engine_cache_info"} | ||
| ``` | ||
|
|
||
| ## Trace and Span information | ||
|
|
||
| When `DYN_LOGGING_JSONL` is enabled with `DYN_LOG` set to greater than or equal to | ||
| `info` level trace information is added to all logged spans along with | ||
| `SPAN_CREATED` and `SPAN_CLOSED` events. | ||
|
|
||
| ### Example Request | ||
|
|
||
| ``` | ||
| curl -d '{"model": "Qwen/Qwen3-0.6B", "max_completion_tokens": 2049, "messages":[{"role":"user", "content": "What is the capital of South Africa?" }]}' -H 'Content-Type: application/json' http://localhost:8080/v1/chat/completions | ||
| ``` | ||
|
|
||
| ### Example Logs | ||
|
|
||
| ``` | ||
| # Span Created in HTTP Frontend with trace_id and span_id | ||
|
|
||
| {"time":"2025-09-02T16:38:06.656503Z","level":"INFO","file":"/workspace/lib/runtime/src/logging.rs","line":248,"target":"dynamo_runtime::logging","message":"SPAN_CREATED","method":"POST","span_id":"6959a1b2d1ee41a5","span_name":"http-request","trace_id":"425ef761ca5b44c795b4c912f1d84b39","uri":"/v1/chat/completions","version":"HTTP/1.1"} | ||
|
|
||
| # Span Created in Worker with trace_id and parent_id from frontend and new span_id | ||
|
|
||
| {"time":"2025-09-02T16:38:06.666672Z","level":"INFO","file":"/workspace/lib/runtime/src/pipeline/network/ingress/push_endpoint.rs","line":108,"target":"dynamo_runtime::pipeline::network::ingress::push_endpoint","message":"SPAN_CREATED","component":"backend","endpoint":"generate","instance_id":"7587888160958627596","namespace":"dynamo","parent_id":"6959a1b2d1ee41a5","span_id":"b035f33bdd5c4b50","span_name":"handle_payload","trace_id":"425ef761ca5b44c795b4c912f1d84b39"} | ||
| {"time":"2025-09-02T16:38:06.685333Z","level":"WARN","target":"log","message":"cudagraph dispatching keys are not initialized. No cudagraph will be used.","log.file":"/opt/vllm/vllm/v1/cudagraph_dispatcher.py","log.line":101,"log.target":"cudagraph_dispatcher.dispatch"} | ||
|
|
||
| # Span Closed in Worker with duration, busy, and idle information | ||
|
|
||
| {"time":"2025-09-02T16:38:08.787232Z","level":"INFO","file":"/workspace/lib/runtime/src/pipeline/network/ingress/push_endpoint.rs","line":108,"target":"dynamo_runtime::pipeline::network::ingress::push_endpoint","message":"SPAN_CLOSED","component":"backend","endpoint":"generate","instance_id":"7587888160958627596","namespace":"dynamo","parent_id":"6959a1b2d1ee41a5","span_id":"b035f33bdd5c4b50","span_name":"handle_payload","time.busy_us":1090,"time.duration_us":2121090,"time.idle_us":2120000,"trace_id":"425ef761ca5b44c795b4c912f1d84b39"} | ||
|
|
||
| # Span Closed in HTTP Frontend with duration, busy and idle information | ||
|
|
||
| {"time":"2025-09-02T16:38:08.788268Z","level":"INFO","file":"/workspace/lib/runtime/src/logging.rs","line":248,"target":"dynamo_runtime::logging","message":"SPAN_CLOSED","method":"POST","span_id":"6959a1b2d1ee41a5","span_name":"http-request","time.busy_us":13000,"time.duration_us":2133000,"time.idle_us":2120000,"trace_id":"425ef761ca5b44c795b4c912f1d84b39","uri":"/v1/chat/completions","version":"HTTP/1.1"} | ||
| ``` | ||
|
|
||
| ### Example Request with User Supplied `x-request-id` | ||
|
|
||
| ``` | ||
| curl -d '{"model": "Qwen/Qwen3-0.6B", "max_completion_tokens": 2049, "messages":[{"role":"user", "content": "What is the capital of South Africa?" }]}' -H 'Content-Type: application/json' -H 'x-request-id: 8372eac7-5f43-4d76-beca-0a94cfb311d0' http://localhost:8080/v1/chat/completions | ||
| ``` | ||
|
|
||
| ### Example Logs | ||
|
|
||
| ``` | ||
| # Span Created in HTTP Frontend with x-request-id, trace_id, and span_id | ||
|
|
||
nnshah1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| {"time":"2025-09-02T17:01:46.306801Z","level":"INFO","file":"/workspace/lib/runtime/src/logging.rs","line":248,"target":"dynamo_runtime::logging","message":"SPAN_CREATED","method":"POST","span_id":"906902a4e74b4264","span_name":"http-request","trace_id":"3924188ea88d40febdfa173afd246a3a","uri":"/v1/chat/completions","version":"HTTP/1.1","x_request_id":"8372eac7-5f43-4d76-beca-0a94cfb311d0"} | ||
|
|
||
| # Span Created in Worker with trace_id, parent_id and x_request_id from frontend and new span_id | ||
|
|
||
| {"time":"2025-09-02T17:01:46.307484Z","level":"INFO","file":"/workspace/lib/runtime/src/pipeline/network/ingress/push_endpoint.rs","line":108,"target":"dynamo_runtime::pipeline::network::ingress::push_endpoint","message":"SPAN_CREATED","component":"backend","endpoint":"generate","instance_id":"7587888160958627596","namespace":"dynamo","parent_id":"906902a4e74b4264","span_id":"5a732a3721814f5e","span_name":"handle_payload","trace_id":"3924188ea88d40febdfa173afd246a3a","x_request_id":"8372eac7-5f43-4d76-beca-0a94cfb311d0"} | ||
|
|
||
| # Span Closed in Worker with duration, busy, and idle information | ||
| {"time":"2025-09-02T17:01:47.975228Z","level":"INFO","file":"/workspace/lib/runtime/src/pipeline/network/ingress/push_endpoint.rs","line":108,"target":"dynamo_runtime::pipeline::network::ingress::push_endpoint","message":"SPAN_CLOSED","component":"backend","endpoint":"generate","instance_id":"7587888160958627596","namespace":"dynamo","parent_id":"906902a4e74b4264","span_id":"5a732a3721814f5e","span_name":"handle_payload","time.busy_us":646,"time.duration_us":1670646,"time.idle_us":1670000,"trace_id":"3924188ea88d40febdfa173afd246a3a","x_request_id":"8372eac7-5f43-4d76-beca-0a94cfb311d0"} | ||
|
|
||
| # Span Closed in HTTP Frontend with duration, busy and idle information | ||
|
|
||
| {"time":"2025-09-02T17:01:47.975616Z","level":"INFO","file":"/workspace/lib/runtime/src/logging.rs","line":248,"target":"dynamo_runtime::logging","message":"SPAN_CLOSED","method":"POST","span_id":"906902a4e74b4264","span_name":"http-request","time.busy_us":2980,"time.duration_us":1672980,"time.idle_us":1670000,"trace_id":"3924188ea88d40febdfa173afd246a3a","uri":"/v1/chat/completions","version":"HTTP/1.1","x_request_id":"8372eac7-5f43-4d76-beca-0a94cfb311d0"} | ||
|
|
||
| ``` | ||
|
|
||
| ## Related Documentation | ||
|
|
||
| - [Distributed Runtime Architecture](../architecture/distributed_runtime.md) | ||
| - [Dynamo Architecture Overview](../architecture/architecture.md) | ||
| - [Backend Guide](backend.md) | ||
| - [Log Aggregation in Kubernetes](dynamo_deploy/logging.md) | ||
Uh oh!
There was an error while loading. Please reload this page.