Skip to content

feat: fix graceful shutdown, add config validation#41

Merged
ankurs merged 6 commits intomainfrom
feat/improvements
Mar 23, 2026
Merged

feat: fix graceful shutdown, add config validation#41
ankurs merged 6 commits intomainfrom
feat/improvements

Conversation

@ankurs
Copy link
Copy Markdown
Member

@ankurs ankurs commented Mar 22, 2026

Summary

  • HTTP shutdown fix: Await httpServer.Shutdown() synchronously instead of fire-and-forget goroutine, ensuring in-flight HTTP connections drain before Stop() returns
  • Partial failure cleanup: When one server (gRPC/HTTP) fails in Run(), explicitly stop the other to prevent resource leaks
  • Config validation: Add Config.Validate() checking port ranges, sampling ratios, TLS cert/key pairing, and shutdown timing. Called from New() with warning logs — non-fatal to avoid breaking existing services
  • Jaeger deprecation: Log runtime warning when deprecated Jaeger client is initialized, directing users to SetupOpenTelemetry

Dependencies

  • Requires go-coldbrew/errors and go-coldbrew/interceptors updates (for NotifyAsync)

Test plan

  • Existing tests pass (make test -race)
  • Lint clean
  • After dependency packages are published, update go.mod to reference new versions

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added runtime configuration validation that emits warnings for port ranges/conflicts, sampling ratios, TLS key/cert pairing, and shutdown/healthcheck timing.
  • Bug Fixes

    • Improved shutdown sequencing and made HTTP server shutdown handling inline for more reliable termination.
  • Deprecations

    • Jaeger client marked EOL; migrate to OpenTelemetry or New Relic OpenTelemetry.
  • Tests

    • Added unit tests covering the new configuration validation.
  • Chores

    • Updated project dependencies.

- Await HTTP server Shutdown synchronously instead of fire-and-forget
  goroutine, ensuring connections drain before Stop returns.
- Stop both servers when one fails in Run(), preventing resource leaks
  on partial startup failures.
- Add Config.Validate() that checks port ranges, sampling ratios,
  TLS cert/key pairing, and shutdown timing, called from New() with
  warning logs.
- Add runtime deprecation warning when Jaeger client is initialized.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 22, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f79a02c5-49d3-474e-aec5-419c2a86aa02

📥 Commits

Reviewing files that changed from the base of the PR and between b7eb75f and 197a1ae.

📒 Files selected for processing (3)
  • config/config_test.go
  • core.go
  • initializers.go
✅ Files skipped from review due to trivial changes (1)
  • config/config_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • core.go
  • initializers.go

📝 Walkthrough

Walkthrough

Added a new Config.Validate() that returns validation warnings; New() logs those warnings. Adjusted Run()/Stop() shutdown sequencing to stop the other server on unexpected errors and perform HTTP Shutdown inline. Added Jaeger deprecation warning and updated go.mod dependencies.

Changes

Cohort / File(s) Summary
Configuration Validation
config/config.go, config/config_test.go
Added Config.Validate() []string emitting warnings for invalid port ranges, port conflicts when both non-zero and equal, NewRelic/OTLP sampling ratios outside 0.0–1.0, TLS cert/key pairing mismatch, and healthcheck vs shutdown timing; tests added covering these scenarios.
Runtime Init & Shutdown
core.go
New() calls config.Validate() and logs warnings; Run() stops the alternate server when the first error is an unexpected server error (not normal shutdown); Stop() now calls httpServer.Shutdown(ctx) inline and logs returned errors immediately.
Initializers / Telemetry
initializers.go
setupJaeger() early-returns when both service names are empty and logs a deprecation/EOL warning recommending OTLP/OpenTelemetry migration; sampling clamp changed to allow exactly 0.0.
Dependencies
go.mod
Bumped several direct dependencies and adjusted indirect modules (added several telemetry/infra-related modules; removed github.com/newrelic/csec-go-agent).

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant New as New()
    participant ConfigValidate as Config.Validate
    participant Logger
    Client->>New: New(config)
    New->>ConfigValidate: c.Validate()
    ConfigValidate-->>New: []warnings
    alt warnings present
        loop for each warning
            New->>Logger: log.Warn(warning)
        end
    end
    New-->>Client: return Core
Loading
sequenceDiagram
    participant Runner as Run()
    participant gRPC as gRPC Server
    participant HTTP as HTTP Server
    participant Graceful as gracefulWait
    gRPC->>Runner: error
    alt err == http.ErrServerClosed or grpc.ErrServerStopped
        Runner->>Graceful: wait
    else unexpected error
        Runner->>gRPC: grpcServer.Stop()
        Runner->>HTTP: httpServer.Close() (if non-nil) / or Shutdown inline
        Runner->>Graceful: wait
        Graceful-->>Runner: done
        Runner->>Runner: close()
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Suggested reviewers

  • vestor
  • fajran
  • kevinjom
  • svetha-cvl

Poem

🐰 I hopped through config fields with a careful glance,
Ports and TLS and samplings got a second chance,
Servers bow politely before they part,
Jaeger tips its hat as new traces start,
A little rabbit cheers the code's smart dance.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the two main changes: graceful shutdown improvements and the addition of config validation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/improvements

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves shutdown semantics for the core server runtime and adds non-fatal config validation to surface common misconfigurations early, while also warning at runtime about deprecated Jaeger usage.

Changes:

  • Make HTTP shutdown synchronous (wait for httpServer.Shutdown() instead of launching it in a goroutine).
  • Add cleanup in Run() to stop the peer server when one of gRPC/HTTP exits with an error.
  • Introduce Config.Validate() and call it from New() to emit startup warnings for common config issues.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
initializers.go Adds a runtime warning when the deprecated Jaeger client setup is used.
core.go Adjusts shutdown flow (HTTP shutdown sync) and adds peer-server cleanup + config validation warnings at construction.
config/config.go Adds Config.Validate() returning warning strings for ports, sampling ratios, TLS pairing, and shutdown timing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread core.go Outdated
Comment thread core.go Outdated
Comment thread config/config.go Outdated
Comment thread config/config.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
config/config.go (1)

142-169: Solid validation method with one edge-case inconsistency.

The validation logic is well-structured and covers important misconfigurations. However, there's a subtle inconsistency:

  • SetupOpenTelemetry in initializers.go (lines 262-265) treats ratio <= 0 as invalid and defaults it to 0.2
  • This validation only warns when < 0, allowing 0 to pass without warning
  • A user setting OTLPSamplingRatio=0 (or NewRelicOpentelemetrySample=0) to disable sampling would pass validation but silently get 20% sampling

Consider either:

  1. Aligning the validation to warn when <= 0 (if 0 should use default), or
  2. Updating SetupOpenTelemetry to honor 0 as "no sampling" (if that's a valid use case)
Option 1: Align validation to warn on zero
-	if c.NewRelicOpentelemetrySample < 0 || c.NewRelicOpentelemetrySample > 1.0 {
-		warnings = append(warnings, "NewRelicOpentelemetrySample should be between 0.0 and 1.0")
+	if c.NewRelicOpentelemetrySample <= 0 || c.NewRelicOpentelemetrySample > 1.0 {
+		warnings = append(warnings, "NewRelicOpentelemetrySample should be between 0.0 (exclusive) and 1.0")
 	}
-	if c.OTLPSamplingRatio < 0 || c.OTLPSamplingRatio > 1.0 {
-		warnings = append(warnings, "OTLPSamplingRatio should be between 0.0 and 1.0")
+	if c.OTLPSamplingRatio <= 0 || c.OTLPSamplingRatio > 1.0 {
+		warnings = append(warnings, "OTLPSamplingRatio should be between 0.0 (exclusive) and 1.0")
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/config.go` around lines 142 - 169, The validation in Config.Validate
is inconsistent with SetupOpenTelemetry: change the checks for OTLPSamplingRatio
and NewRelicOpentelemetrySample to treat zero as invalid by replacing the
current "< 0" checks with "<= 0" so both OTLPSamplingRatio and
NewRelicOpentelemetrySample append warnings when <= 0, ensuring users are warned
when they set 0 which SetupOpenTelemetry currently overrides to the 0.2 default;
update the comparisons in Validate for OTLPSamplingRatio and
NewRelicOpentelemetrySample accordingly (refer to Config.Validate,
OTLPSamplingRatio, NewRelicOpentelemetrySample and SetupOpenTelemetry).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@config/config.go`:
- Around line 142-169: The validation in Config.Validate is inconsistent with
SetupOpenTelemetry: change the checks for OTLPSamplingRatio and
NewRelicOpentelemetrySample to treat zero as invalid by replacing the current "<
0" checks with "<= 0" so both OTLPSamplingRatio and NewRelicOpentelemetrySample
append warnings when <= 0, ensuring users are warned when they set 0 which
SetupOpenTelemetry currently overrides to the 0.2 default; update the
comparisons in Validate for OTLPSamplingRatio and NewRelicOpentelemetrySample
accordingly (refer to Config.Validate, OTLPSamplingRatio,
NewRelicOpentelemetrySample and SetupOpenTelemetry).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4d910746-2d37-4303-a6ec-b0d663893bd0

📥 Commits

Reviewing files that changed from the base of the PR and between a755c27 and 03f6a28.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (4)
  • config/config.go
  • core.go
  • go.mod
  • initializers.go

ankurs added 2 commits March 23, 2026 09:13
…ort 0, add tests

- Only force-stop peer server on unexpected errors (not http.ErrServerClosed)
  to avoid racing with in-progress graceful shutdown
- Defer validation warning logs until after processConfig so logger is
  configured with correct level and format
- Allow port 0 (ephemeral) in validation — only negative or >65535 warns
- Add table-driven tests for Config.Validate covering defaults, port 0,
  port conflict, sampling ratios, TLS mismatch, and shutdown timing
SetupOpenTelemetry was silently overriding 0 to 0.2. Now only
negative values trigger the default — 0 is respected as a valid
explicit choice to disable sampling.
@ankurs ankurs requested a review from Copilot March 23, 2026 01:17
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
config/config_test.go (1)

8-19: TestValidateDefaults is currently misleading and may miss default regressions.

Line 9 constructs a hand-picked config literal, not the runtime/default-populated config path, so default-tag changes can drift without failing this test. Either build the config through the same initialization/defaulting path or rename this test to reflect that it validates a known-good config, not defaults.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/config_test.go` around lines 8 - 19, TestValidateDefaults constructs a
hand-written Config literal instead of exercising the real
defaulting/initialization path, so change the test to instantiate the config via
the actual default factory (e.g., call the package's constructor like
NewConfig() or DefaultConfig() or whichever function populates default-tag
values) and then call c.Validate() to assert no warnings; alternatively, if you
intentionally want to validate the hand-crafted values, rename
TestValidateDefaults to TestValidateKnownGoodConfig to accurately reflect its
behavior and avoid future drift. Ensure you reference the Config type and the
Validate() method in the updated test so it clearly uses the runtime/default
population path or is renamed to match intent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@config/config_test.go`:
- Around line 21-30: TestValidatePortZero currently only rejects "range"
warnings but doesn't assert there is no "conflict" warning when GRPCPort and
HTTPPort are both 0; update the test to fail if any warning contains
conflict-related text (e.g. "conflict", "in use", or "already") to ensure
Config.Validate (method Validate on type Config using fields GRPCPort and
HTTPPort) does not emit a port conflict for the ephemeral 0/0 case—locate
TestValidatePortZero and add a check iterating warnings that calls t.Errorf if
any warning indicates a conflict between ports.
- Around line 46-57: Update the TestValidateSamplingRatio test to assert the
identity of warnings rather than only count: call c.Validate() (uses
Config.Validate) and verify the returned warnings include entries that reference
both NewRelicOpentelemetrySample and OTLPSamplingRatio (the fields on Config) —
e.g. check each warning string contains the respective field name or expected
message for NewRelicOpentelemetrySample and OTLPSamplingRatio; fail the test if
either specific warning is missing. This ensures TestValidateSamplingRatio (the
test function) fails if either sampling validation stops producing its expected
warning.

---

Nitpick comments:
In `@config/config_test.go`:
- Around line 8-19: TestValidateDefaults constructs a hand-written Config
literal instead of exercising the real defaulting/initialization path, so change
the test to instantiate the config via the actual default factory (e.g., call
the package's constructor like NewConfig() or DefaultConfig() or whichever
function populates default-tag values) and then call c.Validate() to assert no
warnings; alternatively, if you intentionally want to validate the hand-crafted
values, rename TestValidateDefaults to TestValidateKnownGoodConfig to accurately
reflect its behavior and avoid future drift. Ensure you reference the Config
type and the Validate() method in the updated test so it clearly uses the
runtime/default population path or is renamed to match intent.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3c7d32af-6afc-42f4-92b5-e030ab3c533c

📥 Commits

Reviewing files that changed from the base of the PR and between 03f6a28 and b7eb75f.

📒 Files selected for processing (3)
  • config/config.go
  • config/config_test.go
  • core.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • core.go
  • config/config.go

Comment thread config/config_test.go
Comment thread config/config_test.go
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread initializers.go Outdated
Comment thread initializers.go Outdated
Comment thread go.mod
Comment thread core.go
Comment thread core.go Outdated
…ng, use impl.config

- Handle grpc.ErrServerStopped and nil error as expected shutdown signals
  in Run() to avoid force-stopping peer server during graceful shutdown
- Gate Jaeger deprecation warning on actual Jaeger configuration being
  present, avoiding log noise on every startup
- Fix sampling ratio comment to mention >1 is also defaulted
- Validate impl.config instead of the input param c for robustness
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

core.go:501

  • Stop() now blocks on httpServer.Shutdown(ctx) before starting gRPC graceful shutdown. Because both use the same timeout context, time spent draining HTTP directly reduces the remaining time available for GracefulStop(), which can cause gRPC to be force-stopped earlier than the configured shutdown duration. To preserve the full grace window for both servers, consider running HTTP shutdown and gRPC graceful stop concurrently (still waiting for both to finish before returning), or using separate child contexts with the same deadline.
	if c.httpServer != nil {
		if err := c.httpServer.Shutdown(ctx); err != nil {
			log.Error(context.Background(), "msg", "http server shutdown error", "err", err)
		}
	}
	if c.grpcServer != nil {
		timedCall(ctx, c.grpcServer.GracefulStop)
		c.grpcServer.Stop()

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread initializers.go
Comment thread initializers.go
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread initializers.go
Comment thread core.go
Comment thread config/config.go
@ankurs ankurs merged commit 6afc999 into main Mar 23, 2026
16 checks passed
@ankurs ankurs deleted the feat/improvements branch March 23, 2026 06:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants