Skip to content

Conversation

nnshah1
Copy link
Contributor

@nnshah1 nnshah1 commented Oct 10, 2025

Overview:

Changes to make error messages for etcd connection issues more consistent (feedback from UX work)

Details:

Consistent messages with additional information

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

https://linear.app/nvidia/issue/DYN-1109/improve-error-messaging-etcd-registration-failures

Summary by CodeRabbit

  • Refactor

    • Standardized and clarified error messages and logs for etcd connection, key operations, leases, and service discovery registration.
    • Added contextual error wrapping for clearer diagnostics.
  • Bug Fixes

    • Cancels registration on etcd failure to prevent stale state.
    • Improves failure propagation during key creation/validation and lease refresh/expiry handling.
  • Chores

    • Minor inline comment typo corrected.

@nnshah1 nnshah1 requested a review from a team as a code owner October 10, 2025 15:48
Copy link
Contributor

coderabbitai bot commented Oct 10, 2025

Walkthrough

Introduces internal error message constants across endpoint and etcd transport modules and replaces generic errors/logs with contextualized messages. Adds standardized error wrapping for etcd connection, lease, and key operations. Cancels the endpoint’s cancellation token on etcd registration failure. No public APIs or control flow are changed.

Changes

Cohort / File(s) Summary
Endpoint discovery
lib/runtime/src/component/endpoint.rs
Added COMPONENT_DISCOVERY_REGISTRATION_ERROR constant; updated error/log messages to include component/endpoint; on etcd kv_create failure, returns the specific error and cancels the related cancellation token.
ETCD transport (core)
lib/runtime/src/transports/etcd.rs
Introduced ETCD error message templates; wrapped connection, lease, key-create, and txn paths with contextual errors (ETCD_CONNECTION_ERROR, ETCD_LEASE_ERROR, ETCD_KEY_CREATE_ERROR, ETCD_KEY_VALIDATE_ERROR, ETCD_KEY_OPERATION_ERROR); minor connect call tweak using cloned URL.
ETCD transport (lease)
lib/runtime/src/transports/etcd/lease.rs
Added lease error constants (ETCD_LEASE_MAINTAIN_ERROR, ETCD_LEASE_REFRESH_ERROR, ETCD_LEASE_EXPIRED_ERROR, ETCD_LEASE_HEARTBEAT_ERROR); replaced generic logs/errors with structured messages; minor comment correction.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I thump my paw at logs made clear,
New error tunes are crisp to hear.
In burrows of etcd we hop,
With leases beating—never stop.
If keys misbehave, we name the fright—
Then nibble carrots by lamplight. 🥕

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description includes Overview, Details, and Related Issues sections but omits the “Where should the reviewer start?” section required by the repository template and the Related Issues section does not use an action keyword with a GitHub issue reference. Please add the missing “Where should the reviewer start?” section to specify key files for review and format the Related Issues entry using an action keyword and a GitHub issue number (for example “Closes #123”).
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title Check ✅ Passed The title clearly labels this PR as a chore focused on updating error messages, which accurately reflects the main change of introducing and standardizing error message constants across the endpoint and etcd modules.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nnshah1 nnshah1 changed the title update for error messages chore: update for error messages Oct 10, 2025
@github-actions github-actions bot added the chore label Oct 10, 2025
@nnshah1 nnshah1 marked this pull request as draft October 10, 2025 17:32
- Implement consistent error message pattern: "Unable to {action}. Check etcd server status"
- Add specific context for ETCD connection failures with server URLs
- Separate concerns: ETCD modules mention etcd, component modules use generic "discovery service"
- Use error message constants for maintainability
- Include component/endpoint names in service registration failures
- Convert internal terms like "lease" to user-friendly "connect to etcd"
- Fix Rust compilation issues with proper cloning and format! macro usage

Error message improvements:
* ETCD connection: "Unable to connect to etcd server at {urls}. Check etcd server status"
* ETCD lease: "Unable to create lease. Check etcd server status at {urls}"
* Component registration: "Unable to register service for discovery. Check discovery service status"
* Lease maintenance: "Unable to maintain lease - expired or revoked. Check etcd server status"
…sistency

- Remove ETCD_KEY_CREATE_ERROR, ETCD_KEY_VALIDATE_ERROR, ETCD_KEY_OPERATION_ERROR constants
- Remove COMPONENT_DISCOVERY_REGISTRATION_ERROR constant
- Replace all constant references with inline strings to maintain consistency
- Ensures all error messages use the same approach throughout the codebase
@nnshah1 nnshah1 marked this pull request as ready for review October 10, 2025 20:28
@nnshah1 nnshah1 force-pushed the neelays/error-messages branch from 9a9361e to a19306f Compare October 10, 2025 20:35
@nnshah1 nnshah1 requested a review from grahamking October 13, 2025 03:26
@nnshah1 nnshah1 enabled auto-merge (squash) October 13, 2025 16:09
@nnshah1 nnshah1 merged commit 6337afe into main Oct 13, 2025
28 of 29 checks passed
@nnshah1 nnshah1 deleted the neelays/error-messages branch October 13, 2025 16:12
shpgy-shpgy pushed a commit to shpgy-shpgy/dynamo that referenced this pull request Oct 15, 2025
ziqifan617 pushed a commit that referenced this pull request Oct 20, 2025
nv-tusharma pushed a commit that referenced this pull request Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants