Skip to content

feat: jitter support and composable middleware chain#5

Merged
ankurs merged 28 commits into
mainfrom
feat/jitter-middleware-chain
Apr 24, 2026
Merged

feat: jitter support and composable middleware chain#5
ankurs merged 28 commits into
mainfrom
feat/jitter-middleware-chain

Conversation

@ankurs
Copy link
Copy Markdown
Member

@ankurs ankurs commented Apr 23, 2026

Summary

  • Replace WorkerContext interface with *WorkerInfo struct (private fields + getters) — middleware can't accidentally mutate worker metadata
  • Add CycleHandler, CycleFunc, Middleware types following gRPC interceptor convention
  • Add jitter support: WithJitter(percent), WithDefaultJitter(percent), WithInitialDelay(d) — desynchronizes periodic workers to prevent thundering herd
  • Add middleware/ sub-package with built-in interceptors: Recover, Tracing, Duration, Timeout, Slog, LogContext, DistributedLock, DefaultInterceptors
  • Slim Serve() — tracing, logging, and duration metrics move to middleware; core package drops go-coldbrew/log and go-coldbrew/tracing imports
  • Restart enabled by default; ErrDoNotRestart for explicit permanent completion
  • closingSupervisor wrapper ensures handler.Close() fires exactly once
  • WithTestChildren(ctx) option for testable WorkerInfo

Breaking changes (v0.0.x)

  • NewWorker(name, fn)NewWorker(name).HandlerFunc(fn)
  • func(WorkerContext) errorfunc(context.Context, *WorkerInfo) error
  • ctx.Name() / ctx.Attempt()info.GetName() / info.GetAttempt()
  • ctx.Add(w) / ctx.Remove(n)info.Add(w) / info.Remove(n)
  • ctx.Children()info.GetChildren()
  • ChannelWorker / BatchChannelWorker callback signatures updated
  • Restart is now the default (WithRestart(false) to disable)

Test plan

Replace WorkerContext with WorkerInfo (private fields + getters).
Add CycleHandler, CycleFunc, and Middleware types following
the gRPC interceptor convention.

Every() now stores interval as data — wrapping deferred to startup
where jitter config (WithJitter, WithDefaultJitter, WithInitialDelay)
is resolved. Uses time.NewTimer + Reset for variable intervals.

Slim Serve() — tracing, logging, and duration metrics move out of
the core package into the middleware/ sub-package:
  Recover, Tracing, Duration, Timeout, Slog, LogContext,
  DistributedLock, DefaultInterceptors

Core workers package drops go-coldbrew/log and go-coldbrew/tracing
imports; supervisor events use log/slog directly.

Builder API: NewWorker(name).HandlerFunc(fn).Every(d).WithJitter(10)
  .Interceptors(middleware.DefaultInterceptors()...)

Run options: WithInterceptors, AddInterceptors, WithDefaultJitter
Chain ordering: run-level → worker-level → handler.RunCycle
Copilot AI review requested due to automatic review settings April 23, 2026 07:39
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The worker model was refactored from a tracing/WorkerContext approach to a cycle-oriented pipeline: handlers implement CycleHandler/CycleFunc, per-cycle Middleware interceptors compose around cycles, WorkerInfo replaces WorkerContext, and builder-style worker configuration (HandlerFunc, Every, jitter, Interceptors) plus a middleware package were added.

Changes

Cohort / File(s) Summary
Docs & README
AGENTS.md, README.md, middleware/README.md, middleware/doc.go
Rewrote docs to reflect WorkerContext → WorkerInfo change, new CycleHandler/CycleFunc/Middleware APIs, builder-style Worker construction (HandlerFunc, Every, WithJitter, WithInitialDelay, Interceptors), Run signatures now accept ...RunOption, and added middleware package docs.
Core types & Worker API
worker.go, worker_test.go
Removed WorkerContext; added WorkerInfo with child-management and getters; added CycleHandler, CycleFunc, Middleware; NewWorker is name-only and builder methods added (Handler, HandlerFunc, Interceptors, AddInterceptors, Every, WithJitter, WithInitialDelay, GetHandler, GetName).
Run orchestration
run.go, run_test.go
Pre-compose run-level + worker-level middleware chains into a CycleFunc per worker, apply timer/jitter wrapping, treat ErrDoNotRestart as non-restart sentinel, atomic attempt tracking, ensure handler.Close() is called exactly once, and switch to slog-based logging; added run-level interceptor options.
Helpers & Channel workers
helpers.go, helpers_test.go
EveryInterval reimplemented as timer+reset with jitter/initialDelay; function signatures changed to CycleFunc style; ChannelWorker/BatchChannelWorker now accept (context.Context, *WorkerInfo, ...), return ErrDoNotRestart on close, clamp sizes/durations, and adjust flush/stop semantics; added jitter-focused tests.
Middleware implementations
middleware/... (recover.go, logcontext.go, tracing.go, slog.go, duration.go, timeout.go, lock.go, defaults.go)
Added middleware: Recover, LogContext, Tracing, Slog, Duration, Timeout, DistributedLock (with Locker and LockOption helpers) and DefaultInterceptors() helper.
Middleware tests
middleware/*_test.go
Added unit tests for each middleware, DistributedLock behaviors, and DefaultInterceptors length.
Examples & integration tests
example_test.go, metrics_test.go, run_test.go, helpers_test.go
Updated examples/tests to new handler signature (context.Context, *WorkerInfo), use info.Add/Remove/GetChildren, added interceptor example, and adjusted metrics/test expectations for new behaviors (including ErrDoNotRestart handling).
Metrics API
metrics.go
Added ObserveRunDuration(name string, duration time.Duration) to Metrics interface and updated histogram help text to reflect attempt-lifetime semantics.
Build
go.mod
Bumped Go version directive from 1.26.1 to 1.26.2.

Sequence Diagram(s)

sequenceDiagram
    participant Supervisor as Supervisor
    participant Chain as Middleware Chain
    participant Handler as CycleHandler
    participant Locker as Locker (optional)

    Supervisor->>Supervisor: construct *WorkerInfo(name, attempt) and ctx
    Supervisor->>Chain: invoke composed Middleware(ctx, info, next)

    alt Recover present
        Chain->>Chain: Recover -> defer recover wrapper
    end
    alt LogContext present
        Chain->>Chain: LogContext -> inject worker metadata into ctx
    end
    alt Tracing present
        Chain->>Chain: Tracing -> start span, ctx'
    end
    alt DistributedLock present
        Chain->>Locker: Acquire(ctx', key, ttl)
        Locker-->>Chain: (acquired?, err)
        alt acquired
            Chain->>Handler: next.RunCycle(ctx', info)
            Handler-->>Chain: err
            Chain->>Locker: Release(ctx with TTL)
        else not acquired
            Chain-->>Supervisor: return (nil or onNotAcquired error)
        end
    else no lock
        Chain->>Handler: next.RunCycle(ctx', info)
        Handler-->>Chain: err
    end

    Chain-->>Supervisor: return err (ErrDoNotRestart treated as permanent-stop)
    Supervisor->>Handler: Close() once on shutdown
    Handler-->>Supervisor: closed ack
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

I hop through cycles, one by one,
Middleware chains now gleam like sun.
WorkerInfo holds names and young,
Interceptors hum — the work's begun! 🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 38.10% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main changes: introducing jitter support and a composable middleware chain architecture, which are the primary objectives of this PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/jitter-middleware-chain

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the workers execution model to support a composable per-cycle middleware/interceptor chain (gRPC-style), adds configurable jitter/initial-delay for periodic workers to reduce synchronized load spikes, and replaces the mutable WorkerContext interface with an immutable-ish *WorkerInfo metadata struct plus child-management methods.

Changes:

  • Introduce WorkerInfo, CycleHandler/CycleFunc, and Middleware types; migrate worker APIs to NewWorker(name).Handler/HandlerFunc(...).
  • Add periodic jitter support via per-worker WithJitter/WithInitialDelay and run-level WithDefaultJitter.
  • Add middleware/ subpackage with built-in interceptors (Recover/Tracing/Timeout/Duration/LogContext/Slog/DistributedLock) and update docs/examples/tests accordingly.

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
worker.go Replaces WorkerContext with *WorkerInfo; adds handler + middleware types and builder-style worker configuration (interval/jitter/interceptors).
run.go Builds middleware chains at startup, resolves jitter, and simplifies Serve() responsibilities; adds run-level interceptors + default jitter options.
helpers.go Refactors helper workers to new CycleFunc signature and introduces timer-based jittered interval implementation.
worker_test.go Updates worker tests to new handler signature/API and adds unit tests for jitter/initial delay/interceptors config.
run_test.go Updates lifecycle tests for new signature and adds middleware ordering/close behavior tests plus default jitter smoke test.
helpers_test.go Updates helper tests to new signatures and adds jitter behavior tests.
metrics_test.go Updates metrics tests to new worker signature; adjusts expectations after duration observation moved out of core.
example_test.go Updates all examples to new NewWorker().HandlerFunc(...) pattern and adds an interceptor example.
README.md Regenerates/updates public docs to reflect new API, middleware concept, and WorkerInfo.
DESIGN.md Adds design document describing the jitter + middleware architecture and rationale.
AGENTS.md Updates repo overview/docs to reflect new key types and middleware subpackage.
middleware/doc.go Adds package docs for middleware subpackage and how to wire interceptors.
middleware/defaults.go Adds DefaultInterceptors() convenience bundle.
middleware/defaults_test.go Tests default interceptor bundle size.
middleware/recover.go Adds per-cycle panic recovery middleware.
middleware/recover_test.go Tests Recover behavior (panic capture, nil callback, passthrough).
middleware/tracing.go Adds per-cycle tracing span middleware.
middleware/tracing_test.go Tests Tracing passthrough/error propagation.
middleware/timeout.go Adds per-cycle timeout/deadline middleware.
middleware/timeout_test.go Tests timeout cancellation and fast-path completion.
middleware/duration.go Adds per-cycle duration measurement middleware hook.
middleware/duration_test.go Tests duration observer invocation on success and error.
middleware/logcontext.go Adds middleware to inject worker metadata into logging context.
middleware/logcontext_test.go Tests LogContext passthrough/error propagation.
middleware/slog.go Adds cycle logging middleware via go-coldbrew/log.
middleware/slog_test.go Tests Slog passthrough/error propagation.
middleware/lock.go Adds distributed lock middleware + options.
middleware/lock_test.go Tests lock acquire/release behavior and option hooks.
middleware/README.md Adds generated API reference for middleware subpackage.
Comments suppressed due to low confidence (1)

helpers.go:90

  • BatchChannelWorker creates a time.Timer but never stops it on return paths (ctx cancellation, channel close, error). Adding defer timer.Stop() (and draining when needed) avoids keeping an active timer around after the worker exits and matches the careful stop/drain handling already present in the maxSize flush path.
		batch := make([]T, 0, maxSize)
		timer := time.NewTimer(maxDelay)
		timer.Stop() // don't start until first item


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread run.go
Comment thread run.go
Comment thread helpers.go
Comment thread worker.go Outdated
Comment thread README.md Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
README.md (1)

10-10: ⚠️ Potential issue | 🟡 Minor

Update the generated README tagline for the middleware model.

The package no longer has tracing as a core dependency/default; tracing is now optional middleware. Update the README source/template and rerun make doc.

📝 Proposed wording
-**A worker lifecycle library for Go — manage background goroutines with panic recovery, configurable restart, tracing, and structured shutdown.**
+**A worker lifecycle library for Go — manage background goroutines with panic recovery, configurable restart, composable middleware, jitter, and structured shutdown.**

Based on learnings, Always run make doc after changing exported APIs or docstrings.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` at line 10, Update the generated README tagline to remove tracing
as a core feature and present it as optional middleware: edit the README
source/template (the README.md generation template) to change the line "**A
worker lifecycle library for Go — manage background goroutines with panic
recovery, configurable restart, tracing, and structured shutdown.**" to wording
that lists tracing as optional middleware (e.g., mention "middleware for
tracing" or "optional tracing middleware") and then rerun `make doc` to
regenerate docs so the README reflects the new middleware model.
🧹 Nitpick comments (2)
middleware/slog.go (1)

14-19: Keep attempt on all cycle log records.

cycle error and cycle end omit attempt, so Slog() used without LogContext() loses retry/cycle correlation after the start log.

Proposed logging consistency tweak
 	log.Info(ctx, "msg", "cycle start", "worker", info.Name(), "attempt", info.Attempt())
 	err := next(ctx, info)
 	if err != nil {
-		log.Error(ctx, "msg", "cycle error", "worker", info.Name(), "error", err)
+		log.Error(ctx, "msg", "cycle error", "worker", info.Name(), "attempt", info.Attempt(), "error", err)
 	} else {
-		log.Info(ctx, "msg", "cycle end", "worker", info.Name())
+		log.Info(ctx, "msg", "cycle end", "worker", info.Name(), "attempt", info.Attempt())
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@middleware/slog.go` around lines 14 - 19, The cycle log records omit the
"attempt" field on the "cycle error" and "cycle end" logs, breaking correlation
with the initial "cycle start" entry; update the calls to log.Error and the
final log.Info in middleware/slog.go to include the "attempt" key and
info.Attempt() value (same as the "cycle start" call), i.e., change the
log.Error(ctx, "msg", "cycle error", "worker", info.Name(), "error", err) and
log.Info(ctx, "msg", "cycle end", "worker", info.Name()) to include "attempt",
info.Attempt() so all three cycle logs carry the attempt id (references:
log.Info, log.Error, info.Attempt(), next(ctx, info)).
middleware/defaults_test.go (1)

9-12: Strengthen this beyond a length check.

This would still pass if one interceptor is replaced, duplicated, or reordered. Since the order is part of the API contract, consider adding a behavior smoke test around the returned chain, especially that Recover catches downstream panics.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@middleware/defaults_test.go` around lines 9 - 12, Update
TestDefaultInterceptors_ReturnsFour to assert the exact interceptor sequence and
behavior: call DefaultInterceptors() and verify the returned slice length and
that elements are the expected interceptors in order (e.g., Recover, LogContext,
Tracing, Slog) by comparing either concrete types or sentinel identifiers rather
than just length, then add a small smoke test that composes the chain and
asserts that a downstream handler which panics is recovered by the Recover
interceptor (i.e., the request does not crash and an error/response is
produced), referencing DefaultInterceptors and Recover to locate where to add
these assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@AGENTS.md`:
- Around line 40-46: The docs currently imply WorkerInfo exposes exported fields
`Name` and `Attempt`; update the wording to state that WorkerInfo has private
fields and provides accessor methods `Name()` and `Attempt()`, and clarify that
child management is via methods `Add`, `Remove`, and `Children()` (not fields);
also adjust the Key Types list to describe `WorkerInfo` as "struct with private
fields and methods `Name()`, `Attempt()`, and child-management methods `Add`,
`Remove`, `Children()`" so readers understand these are methods rather than
exported fields.

In `@DESIGN.md`:
- Around line 45-60: The doc examples show exported fields Name and Attempt but
the real API exposes getters Name() and Attempt(); update the snippets to either
(a) remove the exported fields from the WorkerInfo example and show only the
unexported backing fields, or (b) keep the example struct but change all usage
sites to call info.Name() and info.Attempt() instead of info.Name and
info.Attempt; apply this same change to the other occurrences of WorkerInfo
examples and usages (replace direct field access with the Name()/Attempt()
accessor calls and keep references to methods like Add, Remove, Children
unchanged).

In `@helpers.go`:
- Around line 26-35: Normalize and clamp the base interval once before starting
the timer loop to avoid a zero-duration hot loop: compute a normalizedBase :=
max(base, time.Millisecond) at the top of everyIntervalWithJitter and use
normalizedBase inside computeInterval (instead of raw base), and apply the same
normalization to the sibling interval helper (the other
everyInterval/EveryInterval function in the same file) so no timer is ever reset
with a non-positive duration.
- Around line 85-89: The BatchChannelWorker function can panic when called with
maxSize <= 0 due to make([]T, 0, maxSize); before returning the CycleFunc,
validate maxSize (e.g., if maxSize <= 0 then set to a sensible minimum like 1 or
return an error/ panic with a clear message) and only allocate the batch buffer
after that check; update the BatchChannelWorker signature body to guard/clamp
maxSize prior to calling make([]T, 0, maxSize) so batch is never created with a
non-positive capacity.

In `@middleware/lock_test.go`:
- Around line 14-22: The test TestDistributedLock_CustomKeyAndTTL currently only
asserts the release key and may miss regressions that ignore WithTTLFunc; modify
the mockLocker.Acquire implementation to capture the ttl argument passed in
(e.g., store it on mockLocker as acquiredTTL) when Acquire(ctx, key, ttl) is
called, then in TestDistributedLock_CustomKeyAndTTL assert that
mockLocker.acquiredTTL equals the expected custom TTL returned by WithTTLFunc;
apply the same TTL-capture-and-assert pattern to the other affected test block
(lines ~103-119) so tests fail if the custom TTL is not forwarded to Acquire.

In `@middleware/lock.go`:
- Around line 71-73: The defer currently swallows errors from locker.Release;
change the defer to capture the release error and propagate it (either return it
or merge it with the function's existing error): replace the anonymous discard
with something like err := locker.Release(context.WithoutCancel(ctx), key) and
if err != nil then set/append it to the function's named return error (or call a
provided join/WrapError helper) so Release failures on
locker.Release(context.WithoutCancel(ctx), key) are not silently dropped.

In `@middleware/README.md`:
- Around line 98-105: Update the Slog docstring in the Slog function's source
(the exported func Slog()) to state that it logs via the standard log/slog
package instead of go-coldbrew/log, then regenerate the markdown docs by running
make doc so middleware/README.md reflects the corrected text; ensure the updated
comment mentions log/slog and retains the behavior note ("Logs at Info on
success, Error on failure").
- Around line 56-60: The fenced code block in the generated README lacks a
language tag; update the source/generator input that produces the example (the
generator/template that emits the snippet referencing workers.Run and
middleware.DefaultInterceptors) to use a "go" code fence, then regenerate docs
(run make doc) so the README gets updated; do this in the generator/exported
docstring changes rather than editing middleware/README.md directly.

In `@middleware/recover.go`:
- Around line 14-20: The recover block should set retErr first and then call
onPanic inside a protected wrapper so a panic inside the callback cannot
re-propagate; specifically, when v := recover() != nil, assign retErr =
fmt.Errorf("panic in worker %s: %v", info.Name(), v) before invoking onPanic,
and invoke onPanic(info.Name(), v) inside its own small func with a
defer-recover (or equivalent try/catch) so any panic from onPanic is caught and
does not escape the middleware.

In `@README.md`:
- Around line 578-583: The README example still says "ticker loop" but
EveryInterval now uses time.NewTimer/Reset; update the example wording in the
EveryInterval docstring and README to use "timer loop" (or "uses time.NewTimer
and Reset") and adjust any example description that mentions "ticker" to "timer"
or "timer loop" so it matches the implemented behavior (search for EveryInterval
and its example block in the docs), then regenerate docs by running make doc.

In `@run.go`:
- Around line 103-111: WorkerInfo.Add currently calls info.sup.Add(w) and thus
bypasses run-level wiring; change it to call addWorkerToSupervisor(info.sup, w,
info.cfg, info.active) and store the returned token on the WorkerInfo (instead
of the raw worker), so dynamically added children go through the
interceptor/jitter/metrics/active-count/subtree pipeline; ensure you null-check
info.cfg and info.active (tests may construct WorkerInfo manually) before
passing them to addWorkerToSupervisor and apply the same replacement in the
other Add implementation referenced (the block covering lines 149-187) so all
dynamic additions use the helper and its returned token.

In `@worker.go`:
- Around line 41-44: The package comment references a non-existent exported
helper EveryIntervalWithJitter; update the comment to remove that broken link
and point users to the supported API by replacing the line mentioning
EveryIntervalWithJitter with guidance to use Worker.Every(...).WithJitter(...),
or simply drop the mention and keep the list as: EveryInterval,
Worker.Every(...).WithJitter(...), and ChannelWorker so generated docs no longer
contain a dead link.
- Around line 70-79: The Add/Remove sequence on WorkerInfo.children is not
atomic and can race; serialize per-child operations by introducing a per-child
lock mechanism (e.g., a WorkerInfo.childLocks *sync.Map mapping child name →
*sync.Mutex) and acquire the lock for the given name inside Add (around
LoadAndDelete → Remove → addWorkerToSupervisor → Store) and Remove (around
Load/Delete/stop) so only one goroutine modifies that child's supervisor
membership at a time; update functions Add, Remove, and any other places
touching WorkerInfo.children to use this per-name lock when manipulating
suture.ServiceToken entries.
- Around line 222-230: The Interceptors and AddInterceptors methods currently
assign/append the caller's mw slice directly, allowing external mutation to
affect w.interceptors; fix by copying the incoming slice(s) before
storing/appending: in Interceptors set w.interceptors =
append([]Middleware(nil), mw...) and in AddInterceptors set w.interceptors =
append(w.interceptors, append([]Middleware(nil), mw...)...), referencing the
Interceptors and AddInterceptors methods and the w.interceptors field.

---

Outside diff comments:
In `@README.md`:
- Line 10: Update the generated README tagline to remove tracing as a core
feature and present it as optional middleware: edit the README source/template
(the README.md generation template) to change the line "**A worker lifecycle
library for Go — manage background goroutines with panic recovery, configurable
restart, tracing, and structured shutdown.**" to wording that lists tracing as
optional middleware (e.g., mention "middleware for tracing" or "optional tracing
middleware") and then rerun `make doc` to regenerate docs so the README reflects
the new middleware model.

---

Nitpick comments:
In `@middleware/defaults_test.go`:
- Around line 9-12: Update TestDefaultInterceptors_ReturnsFour to assert the
exact interceptor sequence and behavior: call DefaultInterceptors() and verify
the returned slice length and that elements are the expected interceptors in
order (e.g., Recover, LogContext, Tracing, Slog) by comparing either concrete
types or sentinel identifiers rather than just length, then add a small smoke
test that composes the chain and asserts that a downstream handler which panics
is recovered by the Recover interceptor (i.e., the request does not crash and an
error/response is produced), referencing DefaultInterceptors and Recover to
locate where to add these assertions.

In `@middleware/slog.go`:
- Around line 14-19: The cycle log records omit the "attempt" field on the
"cycle error" and "cycle end" logs, breaking correlation with the initial "cycle
start" entry; update the calls to log.Error and the final log.Info in
middleware/slog.go to include the "attempt" key and info.Attempt() value (same
as the "cycle start" call), i.e., change the log.Error(ctx, "msg", "cycle
error", "worker", info.Name(), "error", err) and log.Info(ctx, "msg", "cycle
end", "worker", info.Name()) to include "attempt", info.Attempt() so all three
cycle logs carry the attempt id (references: log.Info, log.Error,
info.Attempt(), next(ctx, info)).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 72f697d3-5aa3-4e46-9ae8-94bbba776839

📥 Commits

Reviewing files that changed from the base of the PR and between 0c62e84 and 20fbfc1.

📒 Files selected for processing (29)
  • AGENTS.md
  • DESIGN.md
  • README.md
  • example_test.go
  • helpers.go
  • helpers_test.go
  • metrics_test.go
  • middleware/README.md
  • middleware/defaults.go
  • middleware/defaults_test.go
  • middleware/doc.go
  • middleware/duration.go
  • middleware/duration_test.go
  • middleware/lock.go
  • middleware/lock_test.go
  • middleware/logcontext.go
  • middleware/logcontext_test.go
  • middleware/recover.go
  • middleware/recover_test.go
  • middleware/slog.go
  • middleware/slog_test.go
  • middleware/timeout.go
  • middleware/timeout_test.go
  • middleware/tracing.go
  • middleware/tracing_test.go
  • run.go
  • run_test.go
  • worker.go
  • worker_test.go

Comment thread AGENTS.md
Comment thread DESIGN.md Outdated
Comment thread helpers.go
Comment thread helpers.go
Comment thread middleware/lock_test.go Outdated
Comment thread README.md Outdated
Comment thread run.go
Comment thread worker.go
Comment thread worker.go Outdated
Comment thread worker.go
ankurs added 5 commits April 23, 2026 16:18
- Guard non-positive base interval (clamp to 1ms) in everyIntervalWithJitter
- Guard negative maxSize in BatchChannelWorker (clamp to 1)
- Protect onPanic callback in Recover middleware (nested recover)
- Propagate lock release errors via errors.Join instead of silently dropping
- Copy interceptor slices in Worker.Interceptors to prevent caller mutation
- Serialize Add/Remove with mutex (replace sync.Map with plain map)
- Fix handler.Close() lifecycle: called once on shutdown via closerService,
  not per-restart attempt
- Remove broken EveryIntervalWithJitter doc link from package comment
- Fix AGENTS.md: Name/Attempt are methods not fields
- Verify TTL in lock_test.go custom key/TTL test
- Remove DESIGN.md (implementation complete, spec was outdated)
Store *Worker alongside suture.ServiceToken in children map via
childEntry struct. Add WorkerInfo.Child(name) to look up a running
child's Worker. Add Worker.GetName() and Worker.GetHandler() for
inspecting child workers.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
run.go (1)

134-139: ⚠️ Potential issue | 🟠 Major

Honor WithRestart(false) for all failure cases.

The current condition only suppresses restart when a worker explicitly disables restarts and exits cleanly or is cancelled. When a worker with WithRestart(false) returns a non-nil error, the function returns err, causing the supervisor to restart the worker despite the explicit opt-out.

🛠️ Proposed fix
-	// Suppress restart when the worker doesn't want it and either exited
-	// cleanly or the context was cancelled (graceful shutdown).
-	if !ws.w.restartOnFail && (err == nil || ctx.Err() != nil) {
+	// Suppress restart on clean exits, graceful shutdown, or when restart
+	// on failure is disabled.
+	if err == nil || ctx.Err() != nil || !ws.w.restartOnFail {
 		return suture.ErrDoNotRestart
 	}
 	return err
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@run.go` around lines 134 - 139, The restart suppression should honor
WithRestart(false) for all exit cases: change the conditional that checks
ws.w.restartOnFail so that if ws.w.restartOnFail is false the function returns
suture.ErrDoNotRestart unconditionally (regardless of err or ctx.Err()), instead
of only when err == nil || ctx.Err() != nil; update the conditional around the
return in the run.go block handling worker exit to always return
suture.ErrDoNotRestart when ws.w.restartOnFail is false.
♻️ Duplicate comments (3)
README.md (1)

582-582: ⚠️ Potential issue | 🟡 Minor

Use timer wording consistently.

This example still says “ticker loop,” but the helper is now timer-based. Update the example source and regenerate the README. As per coding guidelines, **/*.go: Always run make doc after changing exported APIs or docstrings.

📝 Expected generated output
-EveryInterval wraps a function in a ticker loop.
+EveryInterval wraps a function in a timer loop.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` at line 582, Update the README example language and source to use
"timer" instead of "ticker" for the EveryInterval helper: locate the
EveryInterval documentation/example and replace "ticker loop" wording and any
example code/comments that reference a ticker with "timer" (or timer-based)
equivalents; then regenerate docs by running make doc for **/*.go changes so the
README is rebuilt. Ensure references to the EveryInterval symbol and its example
reflect the timer-based behavior consistently.
middleware/README.md (2)

56-60: ⚠️ Potential issue | 🟡 Minor

Add the generated fence language.

This generated usage block still trips MD040. Update the source/template to emit a go fence, then regenerate the README instead of editing this file directly. As per coding guidelines, **/*.go: Always run make doc after changing exported APIs or docstrings.

📝 Expected generated output
-```
+```go
 workers.Run(ctx, myWorkers,
     workers.WithInterceptors(middleware.DefaultInterceptors()...),
 )
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @middleware/README.md around lines 56 - 60, The README usage code block is
missing a language fence and tripping MD040; update the doc generation template
that emits the example for workers.Run / workers.WithInterceptors /
middleware.DefaultInterceptors so it emits a go fence (not plain ) and
then regenerate the docs instead of editing middleware/README.md directly; after
updating the template or exported docstring, run the recommended docs generation
(make doc) to produce the updated README with the go-fenced code block.


</details>

---

`105-105`: _⚠️ Potential issue_ | _🟡 Minor_

**Fix the stale `Slog` package reference.**

`Slog` is documented as logging via `go-coldbrew/log`, which conflicts with the standard `log/slog` middleware direction. Update the source doc comment and regenerate. As per coding guidelines, `**/*.go`: Always run `make doc` after changing exported APIs or docstrings.



<details>
<summary>📝 Expected generated output</summary>

```diff
-Slog logs each cycle via go\-coldbrew/log. Logs at Info on success, Error on failure.
+Slog logs each cycle via log/slog. Logs at Info on success, Error on failure.
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@middleware/README.md` at line 105, The README mentions Slog logging via
"go-coldbrew/log" which is stale; update the Slog package references and doc
comments to point to the standard "log/slog" middleware direction (search for
"Slog" and the phrase "go-coldbrew/log" in middleware/README.md and any exported
docstrings in the middleware package), then regenerate docs by running make doc
for the changed Go files (per guideline: run make doc after changing exported
APIs/docstrings).
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @middleware/lock.go:

  • Around line 73-76: The defer currently calls locker.Release with
    context.WithoutCancel(ctx) which removes deadlines and can hang; change it to
    create a bounded context (e.g., ctx, cancel :=
    context.WithTimeout(context.Background(), cfg.LockReleaseTimeout) or
    context.WithTimeout(ctx, cfg.LockReleaseTimeout) then defer cancel) and pass
    that to locker.Release so release has a short, configurable timeout; update the
    surrounding deferred block referencing locker.Release, key and retErr to use the
    new timeout context and ensure cancel is called to avoid leaks.

In @README.md:

  • Around line 52-59: The markdown code fence for the middleware example (the
    myMiddleware snippet) is missing a language tag; update the gomarkdoc
    source/template that generates that fence so it emits ```go instead of an
    untyped fence, ensuring the middleware example (func myMiddleware(ctx
    context.Context, info *workers.WorkerInfo, next workers.CycleFunc) error { ...
    }) is rendered as Go; after updating the template regenerate docs by running
    make doc as per the repo guidelines.

In @run.go:

  • Around line 33-36: WithInterceptors currently assigns the caller's slice
    directly to runConfig.interceptors which can lead to accidental mutation; update
    the WithInterceptors function so it creates a copy of the provided mw slice
    (e.g., allocate a new slice with make(len(mw)) and copy(mw, dst)) and store that
    copy into c.interceptors to avoid sharing the caller's backing array with future
    children that read info.cfg.interceptors.
  • Around line 118-128: The attempt creates a new WorkerInfo (name, attempt,
    children map) but children added via info.Add are attached to the long-lived
    ws.childSup, so when ws.runFn returns you must stop all attempt-scoped children
    to avoid them persisting across restarts: after err := ws.runFn(ctx, info)
    iterate over info.children (map[string]suture.ServiceToken) and call Stop() (or
    the appropriate shutdown method on suture.ServiceToken) for each token, wait for
    their termination if available, then clear the map before returning so
    previous-attempt children are not left running when the parent restarts.

Outside diff comments:
In @run.go:

  • Around line 134-139: The restart suppression should honor WithRestart(false)
    for all exit cases: change the conditional that checks ws.w.restartOnFail so
    that if ws.w.restartOnFail is false the function returns suture.ErrDoNotRestart
    unconditionally (regardless of err or ctx.Err()), instead of only when err ==
    nil || ctx.Err() != nil; update the conditional around the return in the run.go
    block handling worker exit to always return suture.ErrDoNotRestart when
    ws.w.restartOnFail is false.

Duplicate comments:
In @middleware/README.md:

  • Around line 56-60: The README usage code block is missing a language fence and
    tripping MD040; update the doc generation template that emits the example for
    workers.Run / workers.WithInterceptors / middleware.DefaultInterceptors so it
    emits a go fence (not plain ) and then regenerate the docs instead of
    editing middleware/README.md directly; after updating the template or exported
    docstring, run the recommended docs generation (make doc) to produce the updated
    README with the go-fenced code block.
  • Line 105: The README mentions Slog logging via "go-coldbrew/log" which is
    stale; update the Slog package references and doc comments to point to the
    standard "log/slog" middleware direction (search for "Slog" and the phrase
    "go-coldbrew/log" in middleware/README.md and any exported docstrings in the
    middleware package), then regenerate docs by running make doc for the changed Go
    files (per guideline: run make doc after changing exported APIs/docstrings).

In @README.md:

  • Line 582: Update the README example language and source to use "timer" instead
    of "ticker" for the EveryInterval helper: locate the EveryInterval
    documentation/example and replace "ticker loop" wording and any example
    code/comments that reference a ticker with "timer" (or timer-based) equivalents;
    then regenerate docs by running make doc for **/*.go changes so the README is
    rebuilt. Ensure references to the EveryInterval symbol and its example reflect
    the timer-based behavior consistently.

</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended)
- [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: defaults

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `27c78dda-49d6-443d-971d-2fc144957f7d`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 20fbfc1505ffad20b142eed9eecb1ec3b969b1b8 and 4f9f6b6282ff7ad2fec9221d77698a866e5a8643.

</details>

<details>
<summary>📒 Files selected for processing (11)</summary>

* `AGENTS.md`
* `Makefile`
* `README.md`
* `helpers.go`
* `middleware/README.md`
* `middleware/lock.go`
* `middleware/lock_test.go`
* `middleware/recover.go`
* `run.go`
* `run_test.go`
* `worker.go`

</details>

<details>
<summary>🚧 Files skipped from review as they are similar to previous changes (2)</summary>

* middleware/lock_test.go
* run_test.go

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment thread middleware/lock.go
Comment thread README.md
Comment thread run.go
Comment thread run.go
ankurs added 2 commits April 23, 2026 17:24
- Clean up children on restart: defer removal of all attempt-scoped
  children from childSup so they don't leak across suture restarts
- Lock release timeout: wrap release context with WithTimeout(ttl)
  to prevent stuck Release from blocking shutdown indefinitely
- Copy run-level interceptors in WithInterceptors to prevent
  caller-side slice mutation
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

helpers.go:92

  • BatchChannelWorker stops the newly created timer without draining timer.C if Stop() returns false. If maxDelay <= 0 (or the timer fires before Stop), a value may remain in the channel and cause an unexpected immediate <-timer.C case (flush with an empty batch). Consider draining timer.C when the initial Stop() fails, similar to the later Stop() handling.
	return func(ctx context.Context, info *WorkerInfo) error {
		batch := make([]T, 0, maxSize)
		timer := time.NewTimer(maxDelay)
		timer.Stop() // don't start until first item


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread middleware/lock.go
Comment thread run.go
Comment thread run.go Outdated
Comment thread helpers.go
Comment thread middleware/duration.go
Comment thread middleware/lock.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
helpers.go (1)

102-107: ⚠️ Potential issue | 🟠 Major

Return partial-flush errors on cancellation.

Line 106 drops flush() failures, so a cancelled worker can silently lose the final batch even though the helper promises to flush it.

🐛 Proposed fix
 import (
 	"context"
+	"errors"
 	"math/rand/v2"
 	"time"
 )
@@
 			case <-ctx.Done():
 				// Flush remaining items before exit.
-				_ = flush()
+				if err := flush(); err != nil {
+					return errors.Join(ctx.Err(), err)
+				}
 				return ctx.Err()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@helpers.go` around lines 102 - 107, The select branch handling ctx.Done()
currently discards errors from flush(), so update that branch to call err :=
flush() and if err != nil return that error (or wrap it with context info),
otherwise return ctx.Err(); i.e., replace the "_ = flush()" line in the loop's
ctx.Done() case with logic that returns the flush error when present
(referencing the flush() call and the ctx.Done()/ctx.Err() handling).
run.go (1)

151-156: ⚠️ Potential issue | 🔴 Critical

Suppress restarts whenever WithRestart(false) is configured.

Line 153 returns ordinary handler errors to suture when restartOnFail is false, allowing a worker to restart despite the no-restart configuration. Additionally, clean exits should never be restartable regardless of the restartOnFail setting—the flag should only control whether errors trigger restarts.

Proposed fix
-	// Suppress restart when the worker doesn't want it and either exited
-	// cleanly or the context was cancelled (graceful shutdown).
-	if !ws.w.restartOnFail && (err == nil || ctx.Err() != nil) {
+	// Suppress restart on clean exits, graceful shutdown, or when restarts are disabled.
+	if err == nil || ctx.Err() != nil || !ws.w.restartOnFail {
 		return suture.ErrDoNotRestart
 	}
 	return err
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@run.go` around lines 151 - 156, The current logic only suppresses restarts
for clean exits or cancelled contexts when ws.w.restartOnFail is false, but
still returns ordinary handler errors which allows restarts despite
WithRestart(false); change the condition so that if ws.w.restartOnFail is false
OR the handler exited cleanly (err == nil) OR the context was cancelled
(ctx.Err() != nil) you always return suture.ErrDoNotRestart. In other words,
update the return logic around ws.w.restartOnFail, err, and ctx.Err() so that
restartOnFail only enables restarts for non-nil errors when the context wasn't
cancelled; reference the symbols ws.w.restartOnFail, err, ctx.Err(), and
suture.ErrDoNotRestart when making this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@middleware/lock.go`:
- Around line 62-76: Validate the TTL returned by cfg.ttlFunc(info.Name())
before calling locker.Acquire and before using context.WithTimeout: compute ttl
:= cfg.ttlFunc(...), if ttl <= 0 return a clear error (e.g. fmt.Errorf("invalid
ttl %v", ttl)) or normalize it to a safe positive duration, then pass that
validated/normalized ttl to locker.Acquire and reuse it in the defer release
block when creating releaseCtx with context.WithTimeout; ensure the code paths
referencing cfg.ttlFunc, locker.Acquire, locker.Release, and context.WithTimeout
all use the validated value.

In `@run.go`:
- Around line 84-98: closerService currently calls handler.Close() as soon as
the supervisor ctx is cancelled which can race with handler.RunCycle; add
explicit coordination so Close cannot run concurrently with an active RunCycle:
introduce a shared synchronization primitive (e.g., a sync.WaitGroup field or a
doneCycle chan struct{} on the same struct that holds closerService and
workerRunService) and have workerRunService signal when handler.RunCycle returns
(e.g., defer wg.Done() or close(doneCycle)) and have closerService.Serve block
waiting for that signal (wg.Wait() or <-doneCycle) before calling
handler.Close(); update references to handler.Close and handler.RunCycle to use
this new coordination mechanism so Close always waits for the active RunCycle to
finish.

In `@worker.go`:
- Around line 113-122: The current code mutates the passed-in worker by setting
w.metrics = info.metrics which causes the same *Worker to retain parent metrics
across uses; instead avoid changing w.metrics — either pass info.metrics into
addWorkerToSupervisor (or make addWorkerToSupervisor accept an explicit metrics
param) and keep the parent metrics in the childEntry (or a new wrapper) rather
than writing back to w, or clone the metrics into a new metrics object used by
the supervisor before calling addWorkerToSupervisor; update references to
w.metrics usage accordingly so the original Worker instance is not mutated.

---

Outside diff comments:
In `@helpers.go`:
- Around line 102-107: The select branch handling ctx.Done() currently discards
errors from flush(), so update that branch to call err := flush() and if err !=
nil return that error (or wrap it with context info), otherwise return
ctx.Err(); i.e., replace the "_ = flush()" line in the loop's ctx.Done() case
with logic that returns the flush error when present (referencing the flush()
call and the ctx.Done()/ctx.Err() handling).

In `@run.go`:
- Around line 151-156: The current logic only suppresses restarts for clean
exits or cancelled contexts when ws.w.restartOnFail is false, but still returns
ordinary handler errors which allows restarts despite WithRestart(false); change
the condition so that if ws.w.restartOnFail is false OR the handler exited
cleanly (err == nil) OR the context was cancelled (ctx.Err() != nil) you always
return suture.ErrDoNotRestart. In other words, update the return logic around
ws.w.restartOnFail, err, and ctx.Err() so that restartOnFail only enables
restarts for non-nil errors when the context wasn't cancelled; reference the
symbols ws.w.restartOnFail, err, ctx.Err(), and suture.ErrDoNotRestart when
making this change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9ea8b1f1-a5c0-44f9-a9bf-b5a7ddcb1328

📥 Commits

Reviewing files that changed from the base of the PR and between 4f9f6b6 and e1ca449.

📒 Files selected for processing (6)
  • README.md
  • go.mod
  • helpers.go
  • middleware/lock.go
  • run.go
  • worker.go
✅ Files skipped from review due to trivial changes (1)
  • go.mod

Comment thread middleware/lock.go
Comment thread run.go Outdated
Comment thread worker.go Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

run.go:156

  • restartOnFail is currently ignored when the handler returns a non-nil error: workerRunService.Serve will return that error, causing suture to restart the worker even when WithRestart(false) (default) is in effect. This contradicts the WithRestart doc comment (“restarted on failure”) and can lead to unexpected restart loops for workers that intentionally return an error to stop.

Adjust the suppression logic so that when restartOnFail is false, the service returns suture.ErrDoNotRestart even on non-context errors (while still recording WorkerFailed). If you still want an error surfaced, log it or expose it via metrics rather than triggering a restart.

	// Suppress restart when the worker doesn't want it and either exited
	// cleanly or the context was cancelled (graceful shutdown).
	if !ws.w.restartOnFail && (err == nil || ctx.Err() != nil) {
		return suture.ErrDoNotRestart
	}
	return err

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread helpers.go
ankurs added 3 commits April 23, 2026 19:05
Workers now restart on failure by default. Use WithRestart(false)
for one-shot workers.

ChannelWorker and BatchChannelWorker return ErrDoNotRestart when
the channel is closed — prevents infinite restart loop on exhausted
channels.

ErrDoNotRestart is exported so handlers can signal permanent
completion (e.g., work exhausted, channel closed).
- Remove redundant worker/attempt fields from Slog (LogContext injects them)
- Extract removeLocked helper to deduplicate Add/Remove/cleanup
- Rename GetName → Name (Go convention, no Get prefix)
- Add defensive defer timer.Stop in BatchChannelWorker
- Add WHY comment on dropped flush error during shutdown
@ankurs ankurs requested a review from Copilot April 23, 2026 11:25
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread worker.go
Comment thread helpers.go
Comment thread helpers_test.go
Comment thread README.md Outdated
ankurs added 2 commits April 23, 2026 22:41
closerService now waits for the active RunCycle to finish via a
shared sync.WaitGroup before calling handler.Close(). Prevents
concurrent access to the handler when the supervisor cancels
both services simultaneously.

Also fix "ticker loop" → "timer loop" in EveryInterval example.
@ankurs ankurs requested a review from Copilot April 23, 2026 16:17
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread helpers.go
Comment thread worker.go
Comment thread run.go Outdated
Close() was only called from Serve() when permanentStop was detected.
But when suture exhausts the failure threshold (e.g. 5 failures), it
stops restarting without cancelling the context — Serve() returned
with permanentStop=false and Close() never fired.

Fix: share a sync.Once-wrapped close func between the workerRunService
and the suture event hook. The event hook fires onPermanentStop when
EventServiceTerminate has Restarting=false, covering the suture
exhaustion case. Both paths call the same func — sync.Once guarantees
exactly-once execution.

Also add worker name to Close error log.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
run.go (1)

14-17: Run make doc before merging — exported API surface has expanded.

This file adds new exported symbols (ErrDoNotRestart, AddInterceptors, WithDefaultJitter) and new docstrings on WithMetrics / WithInterceptors. Since README.md is auto-generated via gomarkdoc, please regenerate it so the published docs reflect the new run-level options and the sentinel semantics. The PR description already notes "CI pipeline and docs update pending" — flagging for traceability.

As per coding guidelines: "Always run make doc after changing exported APIs or docstrings".

Also applies to: 39-62

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@run.go` around lines 14 - 17, The exported API surface changed (new symbols
ErrDoNotRestart, AddInterceptors, WithDefaultJitter and updated docstrings on
WithMetrics/WithInterceptors), so regenerate the public docs and README produced
by gomarkdoc: run make doc to rebuild documentation (updating README.md and any
generated docs), review the diff to ensure the new docstrings and exported
symbols are captured, and commit the regenerated files alongside the code
changes so the published docs reflect the new run-level options and sentinel
semantics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Line 1113: The README example references a non-existent method name
"Children"; update the Go docstring/example that generates this section to use
the correct API method GetChildren() instead of Children (e.g., change any
example or comment using WorkerInfo.Children to WorkerInfo.GetChildren()). After
editing the Go source comment (the docstring used by gomarkdoc), re-run make doc
to regenerate README.md so the example aligns with the actual API
(WorkerInfo.Add, WorkerInfo.Remove, WorkerInfo.GetChildren).
- Line 336: The README line uses the old/internal term RestartOnFail; update the
Go doc comment for the RunWorker function to use the public API term WithRestart
(e.g., “Blocks until ctx is cancelled or the worker exits without
WithRestart(true/false)” or “...without WithRestart”), then run the doc
generation command (make doc) to regenerate README.md; locate the doc comment
adjacent to the RunWorker function definition to make this change.

---

Nitpick comments:
In `@run.go`:
- Around line 14-17: The exported API surface changed (new symbols
ErrDoNotRestart, AddInterceptors, WithDefaultJitter and updated docstrings on
WithMetrics/WithInterceptors), so regenerate the public docs and README produced
by gomarkdoc: run make doc to rebuild documentation (updating README.md and any
generated docs), review the diff to ensure the new docstrings and exported
symbols are captured, and commit the regenerated files alongside the code
changes so the published docs reflect the new run-level options and sentinel
semantics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b083e25a-4f14-4b1a-bfc2-80962aa6dec1

📥 Commits

Reviewing files that changed from the base of the PR and between fcb0647 and 4a28abf.

📒 Files selected for processing (4)
  • README.md
  • example_test.go
  • helpers.go
  • run.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • example_test.go

Comment thread README.md Outdated
Comment thread README.md Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread run.go
Comment thread worker.go
- Add WorkerInfoOption pattern: NewWorkerInfo now accepts options
- Add WithTestChildren(ctx) — creates a live supervisor so
  Add/Remove/GetChildren work in tests without calling Run
- Add Handler Contract section to package doc: long-running vs periodic
  handler behavior, return nil = permanent stop
- WithRestart godoc: document return nil = permanent stop
- RunWorker godoc: note it discards error unlike Run
- Slog godoc: clarify two log lines per cycle (start + end/error)
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread run.go Outdated
ankurs added 4 commits April 24, 2026 16:06
Suture research revealed the event hook approach was broken:
- ErrDoNotRestart skips EventHook entirely (suture deletes the
  service before firing events)
- Restarting:false means "in backoff" not "permanently removed"
- Child terminations could trigger parent Close

Replace with closingSupervisor — wraps the child supervisor and
calls closeFn after Supervisor.Serve returns. Combined with the
permanentStop path in Serve() and sync.Once, handler.Close()
fires exactly once on permanent stop.

Revert makeEventHook to simple form (logging + metrics only).
- Worker getters: GetName, GetHandler
- Builder methods: WithFailureDecay, WithFailureThreshold, WithFailureBackoff,
  WithBackoffJitter, WithTimeout
- WorkerInfo: GetChild with value copy verification, WithTestChildren,
  NewWorkerInfo minimal (nil sup safety), interceptor slice copy
- Run options: AddInterceptors ordering
- closingSupervisor: Close called exactly once on shutdown
- ErrDoNotRestart not counted as WorkerFailed
- ResolveMetrics default fallback
Add no longer silently replaces running workers — returns false
if a worker with the same name is already running. To replace,
call Remove then Add explicitly (not atomic, brief gap).

Also:
- Update example_test.go replace example to use Remove+Add
- Add tests for skip-on-duplicate and add-after-remove
- Report coverage in make test (-cover flag)
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 31 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread worker_test.go
Comment thread worker.go
Comment thread worker.go
Replace suture.Jitter interface with a plain function in the public
API. Users no longer need to import suture. The jitterFunc adapter
wraps the function into suture's Jitter interface internally.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 31 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ankurs ankurs merged commit d124a58 into main Apr 24, 2026
8 checks passed
@ankurs ankurs deleted the feat/jitter-middleware-chain branch April 24, 2026 14:22
ankurs added a commit to go-coldbrew/docs.coldbrew.cloud that referenced this pull request Apr 24, 2026
* docs: update workers howto for jitter and middleware chain

Add Jitter, Middleware, and Built-in Middleware sections.
Update all code examples for new API:
  NewWorker(name).HandlerFunc(fn), info.Name(), info.Attempt()
Update WorkerContext → WorkerInfo with private fields + getters.
Add Playwright content test for middleware/jitter keywords.
Update Packages.md description.

Ref: go-coldbrew/workers#5

* docs: add Child() and Worker getters to WorkerInfo section

* docs: clarify attempt lifetime vs per-cycle duration in metrics table

* docs: update for restart default, remove redundant WithRestart(true), add ErrDoNotRestart

* fix: address PR review comments — consistent log imports, clarify slog, fix WorkerInfo listing

* refactor: consistent getter naming in docs — all getters use Get prefix

* fix: broken code fences and stale GetChildren reference

* docs: clarify run-level vs worker-level AddInterceptors distinction

* fix: add language tag to jitter formula code fence (MD040)

* docs: comprehensive workers howto update

- Add "Why Workers" section with plain goroutines comparison + distributed locking
- Add "Handler Return Values" section documenting return nil/error/ErrDoNotRestart behavior
- Add "Graceful Shutdown" section
- Add "Testing" section with NewWorkerInfo, WithTestChildren, RunWorker patterns
- Add "Best Practices" section
- Fix Builder Methods table: add WithMetrics, fix parameter types, fix Slog description
- Fix log import ambiguity in examples
- Fix RunWorker docs (discards error)
- Clarify two logging layers (supervisor slog vs middleware Slog)
- Label ColdBrew Integration as planned

* docs: Add is now skip-if-exists, document Remove+Add for replace

* docs: expand key properties — two-layer panic recovery, metrics, handler cleanup

* docs: final cleanup — 10 fixes for workers.md

- Fix WithBackoffJitter type (interface, not function)
- Add go-coldbrew/log import comments to disambiguate examples
- Fix Slog section to say "log lines" (plural) with LogContext pairing
- Move ErrDoNotRestart section to after Handler Return Values
- Remove misleading errors.Is example (Run doesn't propagate ErrDoNotRestart)
- Remove redundant WithRestart(true) in per-tenant example
- Fix log.Fatal → slog.Error in Prometheus example
- Update overview diagram to show middleware → handler
- Add complete CycleHandler struct example with Close()

* docs: make Quick Start a complete, runnable Go program

* docs: fix 5 code example issues in workers.md

1. batchProcessor one-liner uses NewBatchProcessor constructor
2. "Before" goroutine example shows ctx creation
3. EveryInterval section: builder form first, manual form second
4. EveryInterval manual form uses named function (not inline)
5. Per-worker metrics example uses named handler not undefined fn

* docs: fix batchProcessor example — remove advisory locks mention, use conn consistently

* docs: remove struct handler from HandlerFunc example — covered below in CycleHandler section

* docs: remove suture type from WithBackoffJitter table entry

* docs: remove 'tracing' from Packages.md workers overview — tracing is middleware not core
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants