feat: fix graceful shutdown, add config validation#41
Conversation
- Await HTTP server Shutdown synchronously instead of fire-and-forget goroutine, ensuring connections drain before Stop returns. - Stop both servers when one fails in Run(), preventing resource leaks on partial startup failures. - Add Config.Validate() that checks port ranges, sampling ratios, TLS cert/key pairing, and shutdown timing, called from New() with warning logs. - Add runtime deprecation warning when Jaeger client is initialized.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdded a new Config.Validate() that returns validation warnings; New() logs those warnings. Adjusted Run()/Stop() shutdown sequencing to stop the other server on unexpected errors and perform HTTP Shutdown inline. Added Jaeger deprecation warning and updated go.mod dependencies. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant New as New()
participant ConfigValidate as Config.Validate
participant Logger
Client->>New: New(config)
New->>ConfigValidate: c.Validate()
ConfigValidate-->>New: []warnings
alt warnings present
loop for each warning
New->>Logger: log.Warn(warning)
end
end
New-->>Client: return Core
sequenceDiagram
participant Runner as Run()
participant gRPC as gRPC Server
participant HTTP as HTTP Server
participant Graceful as gracefulWait
gRPC->>Runner: error
alt err == http.ErrServerClosed or grpc.ErrServerStopped
Runner->>Graceful: wait
else unexpected error
Runner->>gRPC: grpcServer.Stop()
Runner->>HTTP: httpServer.Close() (if non-nil) / or Shutdown inline
Runner->>Graceful: wait
Graceful-->>Runner: done
Runner->>Runner: close()
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR improves shutdown semantics for the core server runtime and adds non-fatal config validation to surface common misconfigurations early, while also warning at runtime about deprecated Jaeger usage.
Changes:
- Make HTTP shutdown synchronous (wait for
httpServer.Shutdown()instead of launching it in a goroutine). - Add cleanup in
Run()to stop the peer server when one of gRPC/HTTP exits with an error. - Introduce
Config.Validate()and call it fromNew()to emit startup warnings for common config issues.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
initializers.go |
Adds a runtime warning when the deprecated Jaeger client setup is used. |
core.go |
Adjusts shutdown flow (HTTP shutdown sync) and adds peer-server cleanup + config validation warnings at construction. |
config/config.go |
Adds Config.Validate() returning warning strings for ports, sampling ratios, TLS pairing, and shutdown timing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
config/config.go (1)
142-169: Solid validation method with one edge-case inconsistency.The validation logic is well-structured and covers important misconfigurations. However, there's a subtle inconsistency:
SetupOpenTelemetryininitializers.go(lines 262-265) treatsratio <= 0as invalid and defaults it to0.2- This validation only warns when
< 0, allowing0to pass without warning- A user setting
OTLPSamplingRatio=0(orNewRelicOpentelemetrySample=0) to disable sampling would pass validation but silently get 20% samplingConsider either:
- Aligning the validation to warn when
<= 0(if 0 should use default), or- Updating
SetupOpenTelemetryto honor0as "no sampling" (if that's a valid use case)Option 1: Align validation to warn on zero
- if c.NewRelicOpentelemetrySample < 0 || c.NewRelicOpentelemetrySample > 1.0 { - warnings = append(warnings, "NewRelicOpentelemetrySample should be between 0.0 and 1.0") + if c.NewRelicOpentelemetrySample <= 0 || c.NewRelicOpentelemetrySample > 1.0 { + warnings = append(warnings, "NewRelicOpentelemetrySample should be between 0.0 (exclusive) and 1.0") } - if c.OTLPSamplingRatio < 0 || c.OTLPSamplingRatio > 1.0 { - warnings = append(warnings, "OTLPSamplingRatio should be between 0.0 and 1.0") + if c.OTLPSamplingRatio <= 0 || c.OTLPSamplingRatio > 1.0 { + warnings = append(warnings, "OTLPSamplingRatio should be between 0.0 (exclusive) and 1.0") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@config/config.go` around lines 142 - 169, The validation in Config.Validate is inconsistent with SetupOpenTelemetry: change the checks for OTLPSamplingRatio and NewRelicOpentelemetrySample to treat zero as invalid by replacing the current "< 0" checks with "<= 0" so both OTLPSamplingRatio and NewRelicOpentelemetrySample append warnings when <= 0, ensuring users are warned when they set 0 which SetupOpenTelemetry currently overrides to the 0.2 default; update the comparisons in Validate for OTLPSamplingRatio and NewRelicOpentelemetrySample accordingly (refer to Config.Validate, OTLPSamplingRatio, NewRelicOpentelemetrySample and SetupOpenTelemetry).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@config/config.go`:
- Around line 142-169: The validation in Config.Validate is inconsistent with
SetupOpenTelemetry: change the checks for OTLPSamplingRatio and
NewRelicOpentelemetrySample to treat zero as invalid by replacing the current "<
0" checks with "<= 0" so both OTLPSamplingRatio and NewRelicOpentelemetrySample
append warnings when <= 0, ensuring users are warned when they set 0 which
SetupOpenTelemetry currently overrides to the 0.2 default; update the
comparisons in Validate for OTLPSamplingRatio and NewRelicOpentelemetrySample
accordingly (refer to Config.Validate, OTLPSamplingRatio,
NewRelicOpentelemetrySample and SetupOpenTelemetry).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4d910746-2d37-4303-a6ec-b0d663893bd0
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (4)
config/config.gocore.gogo.modinitializers.go
…ort 0, add tests - Only force-stop peer server on unexpected errors (not http.ErrServerClosed) to avoid racing with in-progress graceful shutdown - Defer validation warning logs until after processConfig so logger is configured with correct level and format - Allow port 0 (ephemeral) in validation — only negative or >65535 warns - Add table-driven tests for Config.Validate covering defaults, port 0, port conflict, sampling ratios, TLS mismatch, and shutdown timing
SetupOpenTelemetry was silently overriding 0 to 0.2. Now only negative values trigger the default — 0 is respected as a valid explicit choice to disable sampling.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
config/config_test.go (1)
8-19:TestValidateDefaultsis currently misleading and may miss default regressions.Line 9 constructs a hand-picked config literal, not the runtime/default-populated config path, so default-tag changes can drift without failing this test. Either build the config through the same initialization/defaulting path or rename this test to reflect that it validates a known-good config, not defaults.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@config/config_test.go` around lines 8 - 19, TestValidateDefaults constructs a hand-written Config literal instead of exercising the real defaulting/initialization path, so change the test to instantiate the config via the actual default factory (e.g., call the package's constructor like NewConfig() or DefaultConfig() or whichever function populates default-tag values) and then call c.Validate() to assert no warnings; alternatively, if you intentionally want to validate the hand-crafted values, rename TestValidateDefaults to TestValidateKnownGoodConfig to accurately reflect its behavior and avoid future drift. Ensure you reference the Config type and the Validate() method in the updated test so it clearly uses the runtime/default population path or is renamed to match intent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@config/config_test.go`:
- Around line 21-30: TestValidatePortZero currently only rejects "range"
warnings but doesn't assert there is no "conflict" warning when GRPCPort and
HTTPPort are both 0; update the test to fail if any warning contains
conflict-related text (e.g. "conflict", "in use", or "already") to ensure
Config.Validate (method Validate on type Config using fields GRPCPort and
HTTPPort) does not emit a port conflict for the ephemeral 0/0 case—locate
TestValidatePortZero and add a check iterating warnings that calls t.Errorf if
any warning indicates a conflict between ports.
- Around line 46-57: Update the TestValidateSamplingRatio test to assert the
identity of warnings rather than only count: call c.Validate() (uses
Config.Validate) and verify the returned warnings include entries that reference
both NewRelicOpentelemetrySample and OTLPSamplingRatio (the fields on Config) —
e.g. check each warning string contains the respective field name or expected
message for NewRelicOpentelemetrySample and OTLPSamplingRatio; fail the test if
either specific warning is missing. This ensures TestValidateSamplingRatio (the
test function) fails if either sampling validation stops producing its expected
warning.
---
Nitpick comments:
In `@config/config_test.go`:
- Around line 8-19: TestValidateDefaults constructs a hand-written Config
literal instead of exercising the real defaulting/initialization path, so change
the test to instantiate the config via the actual default factory (e.g., call
the package's constructor like NewConfig() or DefaultConfig() or whichever
function populates default-tag values) and then call c.Validate() to assert no
warnings; alternatively, if you intentionally want to validate the hand-crafted
values, rename TestValidateDefaults to TestValidateKnownGoodConfig to accurately
reflect its behavior and avoid future drift. Ensure you reference the Config
type and the Validate() method in the updated test so it clearly uses the
runtime/default population path or is renamed to match intent.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3c7d32af-6afc-42f4-92b5-e030ab3c533c
📒 Files selected for processing (3)
config/config.goconfig/config_test.gocore.go
🚧 Files skipped from review as they are similar to previous changes (2)
- core.go
- config/config.go
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 6 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ng, use impl.config - Handle grpc.ErrServerStopped and nil error as expected shutdown signals in Run() to avoid force-stopping peer server during graceful shutdown - Gate Jaeger deprecation warning on actual Jaeger configuration being present, avoiding log noise on every startup - Fix sampling ratio comment to mention >1 is also defaulted - Validate impl.config instead of the input param c for robustness
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
core.go:501
Stop()now blocks onhttpServer.Shutdown(ctx)before starting gRPC graceful shutdown. Because both use the same timeout context, time spent draining HTTP directly reduces the remaining time available forGracefulStop(), which can cause gRPC to be force-stopped earlier than the configured shutdown duration. To preserve the full grace window for both servers, consider running HTTP shutdown and gRPC graceful stop concurrently (still waiting for both to finish before returning), or using separate child contexts with the same deadline.
if c.httpServer != nil {
if err := c.httpServer.Shutdown(ctx); err != nil {
log.Error(context.Background(), "msg", "http server shutdown error", "err", err)
}
}
if c.grpcServer != nil {
timedCall(ctx, c.grpcServer.GracefulStop)
c.grpcServer.Stop()
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 6 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
httpServer.Shutdown()synchronously instead of fire-and-forget goroutine, ensuring in-flight HTTP connections drain beforeStop()returnsRun(), explicitly stop the other to prevent resource leaksConfig.Validate()checking port ranges, sampling ratios, TLS cert/key pairing, and shutdown timing. Called fromNew()with warning logs — non-fatal to avoid breaking existing servicesSetupOpenTelemetryDependencies
go-coldbrew/errorsandgo-coldbrew/interceptorsupdates (forNotifyAsync)Test plan
make test -race)go.modto reference new versions🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Deprecations
Tests
Chores