-
Notifications
You must be signed in to change notification settings - Fork 689
docs: health check and structured logs #2805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds two new documentation guides: health_check and logging. They describe frontend/worker health endpoints, liveness/readiness behavior, JSON responses, and environment configuration; and logging formats (text/JSONL), tracing fields, span events, config via env vars/TOML, with examples. No code or public API changes. Changes
Sequence Diagram(s)Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
Signed-off-by: nnshah1 <[email protected]>
Signed-off-by: nnshah1 <[email protected]>
Signed-off-by: nnshah1 <[email protected]>
Signed-off-by: nnshah1 <[email protected]>
a37f5bd to
89b3f1a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (21)
docs/guides/logging.md (11)
22-26: Tighten wording and fix grammar.Replace awkward phrasing and clarify feature behavior.
-Dynamo provides structured logging in both text as well as JSONL. When -JSONL is enabled logs additionally contain `span` creation and exit -events as well as support for `trace_id` and `span_id` fields for -distributed tracing. +Dynamo provides structured logging in two formats: readable text and JSONL. +When JSONL is enabled, logs can include span creation/exit events and emit +trace context fields (`trace_id`, `span_id`) for distributed tracing.
41-44: Add language to fenced code block (markdownlint MD040).-``` +```bash export DYN_LOG="info,dynamo_runtime::system_status_server:trace" export DYN_LOGGING_JSONL="0"--- `48-54`: **Add language to fenced code block (markdownlint MD040).** ```diff -``` +```text 2025-09-02T15:50:01.770028Z INFO main.init: VllmWorker for Qwen/Qwen3-0.6B has been initialized 2025-09-02T15:50:01.770195Z INFO main.init: Reading Events from tcp://127.0.0.1:21555 2025-09-02T15:50:01.770265Z INFO main.init: Getting engine runtime configuration metadata from vLLM engine... 2025-09-02T15:50:01.770316Z INFO main.get_engine_cache_info: Cache config values: {'num_gpu_blocks': 24064} 2025-09-02T15:50:01.770358Z INFO main.get_engine_cache_info: Scheduler config values: {'max_num_seqs': 256, 'max_num_batched_tokens': 2048}--- `60-63`: **Add language to fenced code block (markdownlint MD040).** ```diff -``` +```bash export DYN_LOG="info,dynamo_runtime::system_status_server:trace" export DYN_LOGGING_JSONL="1"--- `77-80`: **Fix typo and clarify condition.** “greate” → “greater”; make the condition explicit. ```diff -When `DYN_LOGGING_JSONL` with `DYN_LOG` set to greate than or equal to -`info` level trace information is added to all spans along with -`SPAN_CREATED` and `SPAN_CLOSED` events. +When `DYN_LOGGING_JSONL=1` and `DYN_LOG` is set to `info` or higher, the logger +emits span metadata, including `SPAN_CREATED` and `SPAN_CLOSED` events.
83-86: Add language to fenced code block (markdownlint MD040).-``` +```bash curl -d '{"model": "Qwen/Qwen3-0.6B", "max_completion_tokens": 2049, "messages":[{"role":"user", "content": "What is the capital of South Africa?" }]}' -H 'Content-Type: application/json' http://localhost:8080/v1/chat/completions--- `89-103`: **Add language to fenced code block; these are JSON logs.** ```diff -``` +```json # Span Created in HTTP Frontend {"time":"2025-09-02T16:38:06.656503Z","level":"INFO","file":"/workspace/lib/runtime/src/logging.rs","line":248,"target":"dynamo_runtime::logging","message":"SPAN_CREATED","method":"POST","span_id":"6959a1b2d1ee41a5","span_name":"http-request","trace_id":"425ef761ca5b44c795b4c912f1d84b39","uri":"/v1/chat/completions","version":"HTTP/1.1"} ...--- `105-109`: **Add language to fenced code block (markdownlint MD040).** ```diff -``` +```bash curl -d '{"model": "Qwen/Qwen3-0.6B", "max_completion_tokens": 2049, "messages":[{"role":"user", "content": "What is the capital of South Africa?" }]}' -H 'Content-Type: application/json' -H 'x-request-id: 8372eac7-5f43-4d76-beca-0a94cfb311d0' http://localhost:8080/v1/chat/completions--- `111-111`: **Resolve duplicate heading (markdownlint MD024).** Rename to differentiate from the earlier “Example Logs”. ```diff -### Example Logs +### Example Logs (with x-request-id)
113-127: Add language to fenced code block; these are JSON logs.-``` +```json # Span Created in HTTP Frontend ...--- `29-35`: **Consider adding a minimal TOML example since a config path is documented.** Helps users bootstrap a file quickly. ```diff | `DYN_LOGGING_CONFIG_PATH` | Path to custom TOML logging configuration file | `DYN_LOGGING_CONFIG_PATH=/path/to/config.toml`| +### Example TOML configuration + +```toml +[log] +level = "info" +use_local_tz = false +jsonl = true + +[targets] +# per-target overrides +"dynamo_runtime::system_status_server" = "trace" +```docs/guides/health_check.md (10)
22-25: Fix grammar.-Dynamo provides health check and liveness HTTP endpoints for each component which -can be used configure startup, liveness and readiness probes in -orchestration frameworks such as Kubernetes. +Dynamo exposes health and liveness HTTP endpoints for each component, which +can be used to configure startup, liveness, and readiness probes in +orchestration frameworks such as Kubernetes.
33-35: Add language to fenced code block (markdownlint MD040).-``` +```bash curl -s localhost:8080/live -q | jq--- `39-44`: **Add language to fenced code block (markdownlint MD040).** ```diff -``` +```json { "message": "Service is live", "status": "live" }--- `59-61`: **Add language to fenced code block (markdownlint MD040).** ```diff -``` +```bash curl -s localhost:8080/health -q | jq--- `66-72`: **Add language to fenced code block (markdownlint MD040).** ```diff -``` +```json { "instances": [], "message": "No endpoints available", "status": "unhealthy" }--- `77-112`: **Add language to fenced code block (markdownlint MD040).** ```diff -``` +```json { "endpoints": [ "dyn://dynamo.backend.generate" ], "instances": [ { "component": "backend", "endpoint": "clear_kv_blocks", "instance_id": 7587888160958628000, "namespace": "dynamo", "transport": { "nats_tcp": "dynamo_backend.clear_kv_blocks-694d98147d54be25" } }, ... ], "status": "healthy" }--- `116-121`: **Polish wording.** ```diff -Health checks for components other than the frontend are enabled -selectively based on environment variables. If a health check for a -component is enabled the starting status can be set along with the set -of endpoints that are required to be served before the component is -declared `ready`. +Health checks for non-frontend components are enabled selectively via environment variables. +When enabled, you can set the initial status and the set of endpoints that must be served +before the component is declared `ready`.
141-146: Add language to fenced code block (markdownlint MD040).-``` +```bash export DYN_SYSTEM_ENABLED="true" export DYN_SYSTEM_STARTING_HEALTH_STATUS="notready" export DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS="[\"generate\"]" export DYN_SYSTEM_PORT=9090--- `158-168`: **Add language to fenced code block (markdownlint MD040).** ```diff -``` +```json { "endpoints": { "generate": "notready" }, "status": "notready", "uptime": { "nanos": 775381996, "secs": 2 } }--- `173-185`: **Add language to fenced code block (markdownlint MD040).** ```diff -``` +```json { "endpoints": { "clear_kv_blocks": "ready", "generate": "ready", "load_metrics": "ready" }, "status": "ready", "uptime": { "nanos": 435707697, "secs": 55 } }</blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used**: Path: .coderabbit.yaml **Review profile**: CHILL **Plan**: Pro **💡 Knowledge Base configuration:** - MCP integration is disabled by default for public repositories - Jira integration is disabled by default for public repositories - Linear integration is disabled by default for public repositories You can enable these sources in your CodeRabbit configuration. <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 5bbbeae3596eb6b0babe66be352497dd8cd88cf8 and 89b3f1af439d328512c3c98ceacf91f7ed671e71. </details> <details> <summary>📒 Files selected for processing (2)</summary> * `docs/guides/health_check.md` (1 hunks) * `docs/guides/logging.md` (1 hunks) </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🪛 LanguageTool</summary> <details> <summary>docs/guides/health_check.md</summary> [grammar] ~22-~22: There might be a mistake here. Context: ... HTTP endpoints for each component which can be used configure startup, liveness ... (QB_NEW_EN) --- [grammar] ~23-~23: There might be a mistake here. Context: ...tartup, liveness and readiness probes in orchestration frameworks such as Kuberne... (QB_NEW_EN) --- [grammar] ~28-~28: There might be a mistake here. Context: ...nt reports a status of `live` as long as the service is running. #### Example Re... (QB_NEW_EN) --- [grammar] ~49-~49: There might be a mistake here. Context: ...During initial startup the frontend will report `unhealthy` until workers have be... (QB_NEW_EN) --- [grammar] ~50-~50: Ensure spelling is correct Context: ...until workers have been initialized and registred with the frontend. Once workers have bee... (QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1) --- [grammar] ~51-~51: There might be a mistake here. Context: ...orkers have been registered the `health` endpoint additionally will list registre... (QB_NEW_EN) --- [grammar] ~52-~52: Ensure spelling is correct Context: ...health` endpoint additionally will list registred endpoints and instances. #### Exampl... (QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1) --- [grammar] ~117-~117: There might be a mistake here. Context: ...nment variables. If a health check for a component is enabled the starting status... (QB_NEW_EN) --- [grammar] ~118-~118: There might be a mistake here. Context: ...ing status can be set along with the set of endpoints that are required to be ser... (QB_NEW_EN) --- [grammar] ~119-~119: There might be a mistake here. Context: ...red to be served before the component is declared `ready`. Once all endpoints de... (QB_NEW_EN) --- [grammar] ~122-~122: There might be a mistake here. Context: ... `DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS` are served the component transitions to ... (QB_NEW_EN) --- [grammar] ~123-~123: There might be a mistake here. Context: ...transitions to a `ready` state until the component is shutdown. > **Note**: Both... (QB_NEW_EN) --- [grammar] ~132-~132: There might be a mistake here. Context: ... | `true`, `false` | | `DYN_SYSTEM_PORT` | Specifies th... (QB_NEW_EN) --- [grammar] ~133-~133: There might be a mistake here. Context: ...090` | | `DYN_SYSTEM_STARTING_HEALTH_STATUS` ... (QB_NEW_EN) --- [grammar] ~134-~134: There might be a mistake here. Context: ...tem (ready/not ready). | `ready`, `notready` | | `DYN_SYSTEM_HEALTH_PATH` ... (QB_NEW_EN) --- [grammar] ~135-~135: There might be a mistake here. Context: ... | `/custom/health` | | `DYN_SYSTEM_LIVE_PATH` ... (QB_NEW_EN) --- [grammar] ~136-~136: There might be a mistake here. Context: ... | `/custom/live` | | `DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS... (QB_NEW_EN) --- [grammar] ~154-~154: There might be a mistake here. Context: ...alth -q | jq ``` #### Example Response Before endpoints are being served: ``` ... (QB_NEW_EN) </details> <details> <summary>docs/guides/logging.md</summary> [grammar] ~22-~22: There might be a mistake here. Context: ...ging in both text as well as JSONL. When JSONL is enabled logs additionally conta... (QB_NEW_EN) --- [grammar] ~23-~23: There might be a mistake here. Context: ...ionally contain `span` creation and exit events as well as support for `trace_id`... (QB_NEW_EN) --- [grammar] ~24-~24: There might be a mistake here. Context: ... for `trace_id` and `span_id` fields for distributed tracing. ## Environment Var... (QB_NEW_EN) --- [grammar] ~77-~77: Ensure spelling is correct Context: ...YN_LOGGING_JSONL` with `DYN_LOG` set to greate than or equal to `info` level trace inf... (QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1) --- [grammar] ~77-~77: There might be a mistake here. Context: ...`DYN_LOG` set to greate than or equal to `info` level trace information is added ... (QB_NEW_EN) --- [grammar] ~78-~78: There might be a mistake here. Context: ...rmation is added to all spans along with `SPAN_CREATED` and `SPAN_CLOSED` events.... (QB_NEW_EN) </details> </details> <details> <summary>🪛 markdownlint-cli2 (0.17.2)</summary> <details> <summary>docs/guides/health_check.md</summary> 41-41: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 48-48: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 60-60: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 83-83: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 89-89: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 107-107: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 111-111: Multiple headings with the same content (MD024, no-duplicate-heading) --- 113-113: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> <details> <summary>docs/guides/logging.md</summary> 41-41: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 48-48: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 60-60: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 83-83: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 89-89: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 107-107: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 111-111: Multiple headings with the same content (MD024, no-duplicate-heading) --- 113-113: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> </details> <details> <summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)</summary> * GitHub Check: Build and Test - dynamo </details> <details> <summary>🔇 Additional comments (3)</summary><blockquote> <details> <summary>docs/guides/logging.md (1)</summary><blockquote> `131-134`: **All cross-doc links verified** All linked targets in `docs/guides/logging.md` resolve correctly and no broken paths were found. </blockquote></details> <details> <summary>docs/guides/health_check.md (2)</summary><blockquote> `126-126`: **Confirm behavior: /live and /ready returning identical payloads.** If intentional, call it out earlier; if not, clarify differences to avoid misconfiguring probes. Would you like me to add Kubernetes probe examples once confirmed? --- `189-191`: **Cross-doc links resolve.** All referenced files exist at their respective paths. </blockquote></details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Neelay Shah <[email protected]>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Neelay Shah <[email protected]>
Signed-off-by: Neelay Shah <[email protected]>
Signed-off-by: Neelay Shah <[email protected]>
Signed-off-by: Neelay Shah <[email protected]>
Signed-off-by: Neelay Shah <[email protected]>
Signed-off-by: Neelay Shah <[email protected]>
Signed-off-by: Neelay Shah <[email protected]> Signed-off-by: nnshah1 <[email protected]>
Signed-off-by: Neelay Shah <[email protected]> Signed-off-by: nnshah1 <[email protected]>
Signed-off-by: Neelay Shah <[email protected]> Signed-off-by: nnshah1 <[email protected]>
Signed-off-by: nnshah1 <[email protected]>
b84b018 to
09f25a1
Compare
Signed-off-by: Neelay Shah <[email protected]>
Signed-off-by: Neelay Shah <[email protected]>
Signed-off-by: Neelay Shah <[email protected]>
Signed-off-by: nnshah1 <[email protected]>
Signed-off-by: Neelay Shah <[email protected]>
|
For future me:
|
|
/ok to test |
@nnshah1, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/ |
|
/ok to test 9715dab |
|
/ok to test 9715dab |
| #### Example Request | ||
|
|
||
| ``` | ||
| curl -s localhost:8080/live -q | jq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think someone (maybe @PeaBrane?) recently switched all the FE to be 8000. May want to check with Rudy.
| The frontend liveness endpoint reports a status of `live` as long as | ||
| the service is running. | ||
|
|
||
| > **Note**: Frontend liveness doesn't depend on worker health or liveness only on the Frontend service itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"... doesn't depend on worker health or liveness only ..."
=>
"... doesn't depend on worker health or liveness,(comma) only ..."
Signed-off-by: nnshah1 <[email protected]>
Signed-off-by: nnshah1 <[email protected]> Signed-off-by: ayushag <[email protected]>
Signed-off-by: nnshah1 <[email protected]> Signed-off-by: zhongdaor <[email protected]>
Signed-off-by: nnshah1 <[email protected]> Co-authored-by: mohammedabdulwahhab <[email protected]>
Overview:
Adds basic docs for health checks and logging with examples.
Where should the reviewer start?
logging.md and health_check.md
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit