-
Notifications
You must be signed in to change notification settings - Fork 690
docs: health check and structured logs #2805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 6 commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
c69a1ef
updating incremental for review
nnshah1 ba8b254
updated
nnshah1 060d668
updated
nnshah1 89b3f1a
updated
nnshah1 f6e8f50
Apply suggestion from @coderabbitai[bot]
nnshah1 118d316
Apply suggestion from @coderabbitai[bot]
nnshah1 9b34cda
Update health_check.md
nnshah1 59664d9
Update health_check.md
nnshah1 d58ac8a
Update health_check.md
nnshah1 0bfc40a
Update health_check.md
nnshah1 c8f5f9c
Update logging.md
nnshah1 5fc7467
Update logging.md
nnshah1 5a84606
Update logging.md
nnshah1 6c782fc
Update logging.md
nnshah1 09f25a1
updated
nnshah1 6552c99
Update health_check.md
nnshah1 7b518a7
Update health_check.md
nnshah1 1301a7e
Update health_check.md
nnshah1 1481977
updated
nnshah1 26de0a4
Merge remote-tracking branch 'origin/main' into nnshah1/docs
nnshah1 c86b6a7
Merge remote-tracking branch 'origin/main' into nnshah1/docs
nnshah1 5bbad51
updated
nnshah1 9715dab
Update health_check.md
nnshah1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,188 @@ | ||
| <!-- | ||
| SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| Licensed under the Apache License, Version 2.0 (the "License"); | ||
| you may not use this file except in compliance with the License. | ||
| You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software | ||
| distributed under the License is distributed on an "AS IS" BASIS, | ||
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| See the License for the specific language governing permissions and | ||
| limitations under the License. | ||
| --> | ||
|
|
||
| # Dynamo Health Checks | ||
|
|
||
| ## Overview | ||
|
|
||
| Dynamo provides health check and liveness HTTP endpoints for each component which | ||
| can be used configure startup, liveness and readiness probes in | ||
nnshah1 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| orchestration frameworks such as Kubernetes. | ||
|
|
||
| ## Frontend Liveness Check | ||
|
|
||
| The frontend liveness endpoint reports a status of `live` as long as | ||
| the service is running. | ||
|
|
||
| #### Example Request | ||
|
|
||
| ``` | ||
| curl -s localhost:8080/live -q | jq | ||
nnshah1 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ``` | ||
|
|
||
| #### Example Response | ||
|
|
||
| ``` | ||
| { | ||
| "message": "Service is live", | ||
| "status": "live" | ||
| } | ||
| ``` | ||
|
|
||
| ## Frontend Health Check | ||
|
|
||
| The frontend health endpoint reports a status of `healthy` once a | ||
| model has been registered. During initial startup the frontend will | ||
| report `unhealthy` until workers have been initialized and registered | ||
| with the frontend. Once workers have been registered, the `health` | ||
| endpoint will also list registered endpoints and instances. | ||
|
|
||
nnshah1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| #### Example Request | ||
|
|
||
| ``` | ||
| curl -s localhost:8080/health -q | jq | ||
| ``` | ||
|
|
||
| #### Example Response | ||
|
|
||
| Before workers are registered: | ||
|
|
||
| ``` | ||
| { | ||
| "instances": [], | ||
| "message": "No endpoints available", | ||
| "status": "unhealthy" | ||
| } | ||
| ``` | ||
|
|
||
| After workers are registered: | ||
|
|
||
| ``` | ||
| { | ||
| "endpoints": [ | ||
| "dyn://dynamo.backend.generate" | ||
| ], | ||
| "instances": [ | ||
| { | ||
| "component": "backend", | ||
| "endpoint": "clear_kv_blocks", | ||
| "instance_id": 7587888160958628000, | ||
| "namespace": "dynamo", | ||
| "transport": { | ||
| "nats_tcp": "dynamo_backend.clear_kv_blocks-694d98147d54be25" | ||
| } | ||
| }, | ||
| { | ||
| "component": "backend", | ||
| "endpoint": "generate", | ||
| "instance_id": 7587888160958628000, | ||
| "namespace": "dynamo", | ||
| "transport": { | ||
| "nats_tcp": "dynamo_backend.generate-694d98147d54be25" | ||
| } | ||
| }, | ||
| { | ||
| "component": "backend", | ||
| "endpoint": "load_metrics", | ||
| "instance_id": 7587888160958628000, | ||
| "namespace": "dynamo", | ||
| "transport": { | ||
| "nats_tcp": "dynamo_backend.load_metrics-694d98147d54be25" | ||
| } | ||
| } | ||
| ], | ||
| "status": "healthy" | ||
| } | ||
| ``` | ||
nnshah1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ## Worker Liveness and Health Check | ||
|
|
||
| Health checks for components other than the frontend are enabled | ||
| selectively based on environment variables. If a health check for a | ||
| component is enabled the starting status can be set along with the set | ||
| of endpoints that are required to be served before the component is | ||
| declared `ready`. | ||
|
|
||
| Once all endpoints declared in `DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS` | ||
| are served the component transitions to a `ready` state until the | ||
| component is shutdown. | ||
|
|
||
| > **Note**: Both /live and /ready return the same information | ||
|
|
||
| ### Environment Variables for Enabling Health Checks | ||
|
|
||
| | **Environment Variable** | **Description** | **Example Settings** | | ||
| | -------------------------| ------------------- | ------------------------------------------------ | | ||
| | `DYN_SYSTEM_ENABLED` | Enables the system status server. | `true`, `false` | | ||
nnshah1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| | `DYN_SYSTEM_PORT` | Specifies the port for the system status server. | `9090` | | ||
| | `DYN_SYSTEM_STARTING_HEALTH_STATUS` | Sets the initial health status of the system (ready/not ready). | `ready`, `notready` | | ||
| | `DYN_SYSTEM_HEALTH_PATH` | Custom path for the health endpoint. | `/custom/health` | | ||
| | `DYN_SYSTEM_LIVE_PATH` | Custom path for the liveness endpoint. | `/custom/live` | | ||
| | `DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS` | Specifies endpoints to check for determining overall system health status. | `["generate"]` | | ||
|
|
||
| ### Example Environment Setting | ||
|
|
||
| ``` | ||
| export DYN_SYSTEM_ENABLED="true" | ||
| export DYN_SYSTEM_STARTING_HEALTH_STATUS="notready" | ||
| export DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS="[\"generate\"]" | ||
| export DYN_SYSTEM_PORT=9090 | ||
| ``` | ||
|
|
||
| #### Example Request | ||
|
|
||
| ``` | ||
| curl -s localhost:9090/health -q | jq | ||
| #### Example Response | ||
| Before endpoints are being served: | ||
|
|
||
| ``` | ||
nnshah1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| { | ||
| "endpoints": { | ||
| "generate": "notready" | ||
| }, | ||
| "status": "notready", | ||
| "uptime": { | ||
| "nanos": 775381996, | ||
| "secs": 2 | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| After endpoints are being served: | ||
|
|
||
| ``` | ||
| { | ||
| "endpoints": { | ||
| "clear_kv_blocks": "ready", | ||
| "generate": "ready", | ||
| "load_metrics": "ready" | ||
| }, | ||
| "status": "ready", | ||
| "uptime": { | ||
| "nanos": 435707697, | ||
| "secs": 55 | ||
| } | ||
| } | ||
nnshah1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ``` | ||
|
|
||
| ## Related Documentation | ||
|
|
||
| - [Distributed Runtime Architecture](../architecture/distributed_runtime.md) | ||
| - [Dynamo Architecture Overview](../architecture/architecture.md) | ||
| - [Backend Guide](backend.md) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,134 @@ | ||
| <!-- | ||
| SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| Licensed under the Apache License, Version 2.0 (the "License"); | ||
| you may not use this file except in compliance with the License. | ||
| You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software | ||
| distributed under the License is distributed on an "AS IS" BASIS, | ||
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| See the License for the specific language governing permissions and | ||
| limitations under the License. | ||
| --> | ||
|
|
||
| # Dynamo Logging | ||
|
|
||
| ## Overview | ||
|
|
||
| Dynamo provides structured logging in both text as well as JSONL. When | ||
| JSONL is enabled logs additionally contain `span` creation and exit | ||
| events as well as support for `trace_id` and `span_id` fields for | ||
| distributed tracing. | ||
|
|
||
| ## Environment Variables for configuring Logging | ||
|
|
||
| | Environment Variable | Description | Example Settings | | ||
| | ----------------------------------- | --------------------------------------------| ---------------------------------------------------- | | ||
| | `DYN_LOGGING_JSONL` | Enable JSONL logging format (default: READABLE) | `DYN_LOGGING_JSONL=1` | | ||
nnshah1 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| | `DYN_LOG_USE_LOCAL_TZ` | Use local timezone for logging timestamps (default: UTC) | `DYN_LOG_USE_LOCAL_TZ=1` | | ||
| | `DYN_LOG` | Log levels per target (comma-separated key-value pairs) | `DYN_LOG=info,dynamo_runtime::system_status_server:trace` | | ||
nnshah1 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
nnshah1 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| | `DYN_LOGGING_CONFIG_PATH` | Path to custom TOML logging configuration file | `DYN_LOGGING_CONFIG_PATH=/path/to/config.toml`| | ||
|
|
||
|
|
||
| ## Example Readable Format | ||
|
|
||
| Environment Setting: | ||
|
|
||
| ``` | ||
| export DYN_LOG="info,dynamo_runtime::system_status_server:trace" | ||
| export DYN_LOGGING_JSONL="0" | ||
| ``` | ||
|
|
||
| Resulting Log format: | ||
|
|
||
| ``` | ||
| 2025-09-02T15:50:01.770028Z INFO main.init: VllmWorker for Qwen/Qwen3-0.6B has been initialized | ||
| 2025-09-02T15:50:01.770195Z INFO main.init: Reading Events from tcp://127.0.0.1:21555 | ||
| 2025-09-02T15:50:01.770265Z INFO main.init: Getting engine runtime configuration metadata from vLLM engine... | ||
| 2025-09-02T15:50:01.770316Z INFO main.get_engine_cache_info: Cache config values: {'num_gpu_blocks': 24064} | ||
| 2025-09-02T15:50:01.770358Z INFO main.get_engine_cache_info: Scheduler config values: {'max_num_seqs': 256, 'max_num_batched_tokens': 2048} | ||
| ``` | ||
|
|
||
| ## Example JSONL Format | ||
|
|
||
| Environment Setting: | ||
|
|
||
| ``` | ||
| export DYN_LOG="info,dynamo_runtime::system_status_server:trace" | ||
| export DYN_LOGGING_JSONL="1" | ||
| ``` | ||
|
|
||
| Resulting Log format: | ||
|
|
||
| ```json | ||
| {"time":"2025-09-02T15:53:31.943377Z","level":"INFO","target":"log","message":"VllmWorker for Qwen/Qwen3-0.6B has been initialized","log.file":"/opt/dynamo/venv/lib/python3.12/site-packages/dynamo/vllm/main.py","log.line":191,"log.target":"main.init"} | ||
| {"time":"2025-09-02T15:53:31.943550Z","level":"INFO","target":"log","message":"Reading Events from tcp://127.0.0.1:26771","log.file":"/opt/dynamo/venv/lib/python3.12/site-packages/dynamo/vllm/main.py","log.line":212,"log.target":"main.init"} | ||
| {"time":"2025-09-02T15:53:31.943636Z","level":"INFO","target":"log","message":"Getting engine runtime configuration metadata from vLLM engine...","log.file":"/opt/dynamo/venv/lib/python3.12/site-packages/dynamo/vllm/main.py","log.line":220,"log.target":"main.init"} | ||
| {"time":"2025-09-02T15:53:31.943701Z","level":"INFO","target":"log","message":"Cache config values: {'num_gpu_blocks': 24064}","log.file":"/opt/dynamo/venv/lib/python3.12/site-packages/dynamo/vllm/main.py","log.line":267,"log.target":"main.get_engine_cache_info"} | ||
| {"time":"2025-09-02T15:53:31.943747Z","level":"INFO","target":"log","message":"Scheduler config values: {'max_num_seqs': 256, 'max_num_batched_tokens': 2048}","log.file":"/opt/dynamo/venv/lib/python3.12/site-packages/dynamo/vllm/main.py","log.line":268,"log.target":"main.get_engine_cache_info"} | ||
| ``` | ||
|
|
||
| ## Trace and Span information | ||
|
|
||
| When `DYN_LOGGING_JSONL` with `DYN_LOG` set to greate than or equal to | ||
| `info` level trace information is added to all spans along with | ||
| `SPAN_CREATED` and `SPAN_CLOSED` events. | ||
|
|
||
| ### Example Request | ||
|
|
||
| ``` | ||
| curl -d '{"model": "Qwen/Qwen3-0.6B", "max_completion_tokens": 2049, "messages":[{"role":"user", "content": "What is the capital of South Africa?" }]}' -H 'Content-Type: application/json' http://localhost:8080/v1/chat/completions | ||
| ``` | ||
|
|
||
| ### Example Logs | ||
|
|
||
| ``` | ||
| # Span Created in HTTP Frontend | ||
|
|
||
| {"time":"2025-09-02T16:38:06.656503Z","level":"INFO","file":"/workspace/lib/runtime/src/logging.rs","line":248,"target":"dynamo_runtime::logging","message":"SPAN_CREATED","method":"POST","span_id":"6959a1b2d1ee41a5","span_name":"http-request","trace_id":"425ef761ca5b44c795b4c912f1d84b39","uri":"/v1/chat/completions","version":"HTTP/1.1"} | ||
|
|
||
| # Span Created and Closed in Worker with parent_id from frontend | ||
|
|
||
| {"time":"2025-09-02T16:38:06.666672Z","level":"INFO","file":"/workspace/lib/runtime/src/pipeline/network/ingress/push_endpoint.rs","line":108,"target":"dynamo_runtime::pipeline::network::ingress::push_endpoint","message":"SPAN_CREATED","component":"backend","endpoint":"generate","instance_id":"7587888160958627596","namespace":"dynamo","parent_id":"6959a1b2d1ee41a5","span_id":"b035f33bdd5c4b50","span_name":"handle_payload","trace_id":"425ef761ca5b44c795b4c912f1d84b39"} | ||
| {"time":"2025-09-02T16:38:06.685333Z","level":"WARN","target":"log","message":"cudagraph dispatching keys are not initialized. No cudagraph will be used.","log.file":"/opt/vllm/vllm/v1/cudagraph_dispatcher.py","log.line":101,"log.target":"cudagraph_dispatcher.dispatch"} | ||
| {"time":"2025-09-02T16:38:08.787232Z","level":"INFO","file":"/workspace/lib/runtime/src/pipeline/network/ingress/push_endpoint.rs","line":108,"target":"dynamo_runtime::pipeline::network::ingress::push_endpoint","message":"SPAN_CLOSED","component":"backend","endpoint":"generate","instance_id":"7587888160958627596","namespace":"dynamo","parent_id":"6959a1b2d1ee41a5","span_id":"b035f33bdd5c4b50","span_name":"handle_payload","time.busy_us":1090,"time.duration_us":2121090,"time.idle_us":2120000,"trace_id":"425ef761ca5b44c795b4c912f1d84b39"} | ||
|
|
||
| # Span Closed in HTTP Frontend | ||
|
|
||
| {"time":"2025-09-02T16:38:08.788268Z","level":"INFO","file":"/workspace/lib/runtime/src/logging.rs","line":248,"target":"dynamo_runtime::logging","message":"SPAN_CLOSED","method":"POST","span_id":"6959a1b2d1ee41a5","span_name":"http-request","time.busy_us":13000,"time.duration_us":2133000,"time.idle_us":2120000,"trace_id":"425ef761ca5b44c795b4c912f1d84b39","uri":"/v1/chat/completions","version":"HTTP/1.1"} | ||
| ``` | ||
|
|
||
| ### Example Request with User Supplied `x-request-id` | ||
|
|
||
| ``` | ||
| curl -d '{"model": "Qwen/Qwen3-0.6B", "max_completion_tokens": 2049, "messages":[{"role":"user", "content": "What is the capital of South Africa?" }]}' -H 'Content-Type: application/json' -H 'x-request-id: 8372eac7-5f43-4d76-beca-0a94cfb311d0' http://localhost:8080/v1/chat/completions | ||
| ``` | ||
|
|
||
| ### Example Logs | ||
|
|
||
| ``` | ||
| # Span Created in HTTP Frontend | ||
|
|
||
nnshah1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| {"time":"2025-09-02T17:01:46.306801Z","level":"INFO","file":"/workspace/lib/runtime/src/logging.rs","line":248,"target":"dynamo_runtime::logging","message":"SPAN_CREATED","method":"POST","span_id":"906902a4e74b4264","span_name":"http-request","trace_id":"3924188ea88d40febdfa173afd246a3a","uri":"/v1/chat/completions","version":"HTTP/1.1","x_request_id":"8372eac7-5f43-4d76-beca-0a94cfb311d0"} | ||
|
|
||
| # Span Created and Closed in Worker with parent_id and x_request_id from frontend | ||
|
|
||
| {"time":"2025-09-02T17:01:46.307484Z","level":"INFO","file":"/workspace/lib/runtime/src/pipeline/network/ingress/push_endpoint.rs","line":108,"target":"dynamo_runtime::pipeline::network::ingress::push_endpoint","message":"SPAN_CREATED","component":"backend","endpoint":"generate","instance_id":"7587888160958627596","namespace":"dynamo","parent_id":"906902a4e74b4264","span_id":"5a732a3721814f5e","span_name":"handle_payload","trace_id":"3924188ea88d40febdfa173afd246a3a","x_request_id":"8372eac7-5f43-4d76-beca-0a94cfb311d0"} | ||
| {"time":"2025-09-02T17:01:47.975228Z","level":"INFO","file":"/workspace/lib/runtime/src/pipeline/network/ingress/push_endpoint.rs","line":108,"target":"dynamo_runtime::pipeline::network::ingress::push_endpoint","message":"SPAN_CLOSED","component":"backend","endpoint":"generate","instance_id":"7587888160958627596","namespace":"dynamo","parent_id":"906902a4e74b4264","span_id":"5a732a3721814f5e","span_name":"handle_payload","time.busy_us":646,"time.duration_us":1670646,"time.idle_us":1670000,"trace_id":"3924188ea88d40febdfa173afd246a3a","x_request_id":"8372eac7-5f43-4d76-beca-0a94cfb311d0"} | ||
|
|
||
| # Span Closed in HTTP Frontend | ||
|
|
||
| {"time":"2025-09-02T17:01:47.975616Z","level":"INFO","file":"/workspace/lib/runtime/src/logging.rs","line":248,"target":"dynamo_runtime::logging","message":"SPAN_CLOSED","method":"POST","span_id":"906902a4e74b4264","span_name":"http-request","time.busy_us":2980,"time.duration_us":1672980,"time.idle_us":1670000,"trace_id":"3924188ea88d40febdfa173afd246a3a","uri":"/v1/chat/completions","version":"HTTP/1.1","x_request_id":"8372eac7-5f43-4d76-beca-0a94cfb311d0"} | ||
|
|
||
| ``` | ||
|
|
||
| ## Related Documentation | ||
|
|
||
| - [Distributed Runtime Architecture](../architecture/distributed_runtime.md) | ||
| - [Dynamo Architecture Overview](../architecture/architecture.md) | ||
| - [Backend Guide](backend.md) | ||
| - [Log Aggregation in Kubernetes](dynamo_deploy/logging.md) | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.