fix(authx): prevent concurrent dynamic fetch race#6946
fix(authx): prevent concurrent dynamic fetch race#6946Godzilla675 wants to merge 3 commits intoprojectdiscovery:devfrom
Conversation
Use sync.Once-backed fetch synchronization for dynamic secrets, ensure strategy lookup waits for fetch completion, and add concurrency regression tests for authx and file auth provider paths.
WalkthroughReplaces atomic flag-based lazy-fetch state with a per-instance Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR updates authx.Dynamic’s lazy-fetch synchronization to prevent concurrent fetch races, and adds regression tests to validate correct blocking/once-only behavior under concurrency.
Changes:
- Replace the previous atomic-flag fetch gating with
sync.Once-based coordination inDynamic.Fetch. - Ensure
Dynamic.GetStrategieswaits for fetch completion before returning strategies. - Add concurrency regression tests for both
authx.DynamicandFileAuthProviderdynamic-secret strategy application.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| pkg/authprovider/authx/dynamic.go | Switches dynamic secret fetch coordination to sync.Once and forces GetStrategies to wait for fetch completion. |
| pkg/authprovider/authx/dynamic_test.go | Adds concurrent fetch tests validating waiters block and callback executes only once. |
| pkg/authprovider/file_test.go | Adds a concurrent-access regression test for file-based dynamic secrets applying strategies safely. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pkg/authprovider/authx/dynamic.go
Outdated
| d.fetchOnce.Do(func() { | ||
| if d.fetchCallback == nil { | ||
| d.error = errkit.New("dynamic secret fetch callback not set: call SetLazyFetchCallback() before Fetch()") | ||
| return | ||
| } | ||
| d.error = d.fetchCallback(d) | ||
| }) |
There was a problem hiding this comment.
Dynamic is copied by value in DynamicAuthStrategy (authx/strategy.go:20-23) and when strategies are built in FileAuthProvider.init (file.go:91+). With fetchOnce shared across those copies but d.error stored per-copy, only the copy that actually executes the Do closure will capture the callback error; other copies will skip the closure and may incorrectly return nil (and GetStrategies may proceed) even if the fetch failed. To make the fetch result consistent for all callers, keep the fetch state (once + error, and ideally extracted values) in a shared pointer owned by Dynamic, or avoid copying Dynamic by storing/passing it as *Dynamic in strategies/providers.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/authprovider/authx/dynamic.go (1)
186-202:⚠️ Potential issue | 🟡 Minor
GetStrategiescallsFetch(isFatal: true)— any fetch error terminates the process.The
Fetch()method explicitly documents (line 205) thatisFatal: truewill "stop the execution if the secret could not be fetched" by callinggologger.Fatal().Msgf(). This means a transient network error during dynamic secret resolution will terminate the entire application rather than returning an empty strategy list.The
Apply()method instrategy.go(line 27) gracefully handles a nil return with an explicit check, butApplyOnRR()(line 41) does not. If this termination behavior is unintended, consider usingisFatal: falseand letting thed.error != nilcheck at line 191 returnnilnaturally.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/authprovider/authx/dynamic.go` around lines 186 - 202, GetStrategies currently calls d.Fetch(true) which will terminate the process on fetch errors; change the call to d.Fetch(false) so fetch failures are non-fatal and let the existing d.error != nil check return nil strategies; reference Dynamic.GetStrategies and Fetch(isFatal bool). Also verify consumers like ApplyOnRR handle a nil strategies return (or add a nil-check there) to avoid panics when no strategies are returned.
🧹 Nitpick comments (2)
pkg/authprovider/authx/dynamic_test.go (1)
189-222: Consider adding a barrier for stronger concurrent overlap guarantee.The "fetch-callback-runs-once" subtest relies on the 20ms sleep in the callback to provide a window for concurrent goroutines to arrive. Without a barrier (like the first subtest's
fetchStartedchannel), there's no guarantee that all 20 goroutines enterFetchbefore the first one completes — especially on fast machines or under low load. This is fine as a regression test and the assertion is still valid, but aclose(barrier)pattern (as used infile_test.go) would make the timing-independence explicit.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/authprovider/authx/dynamic_test.go` around lines 189 - 222, The test "fetch-callback-runs-once" can race because it relies solely on a sleep to allow other goroutines to enter Fetch; add a barrier channel (e.g., fetchStarted := make(chan struct{})) and change the SetLazyFetchCallback closure to block on that channel (e.g., <-fetchStarted) before doing the sleep and setting dynamic.Extracted, then start all goroutines, close(fetchStarted) to release the callback once all goroutines are waiting, and proceed with wg.Wait/err collection; reference the test name ("fetch-callback-runs-once") and the Dynamic.SetLazyFetchCallback closure to locate and modify the code.pkg/authprovider/authx/dynamic.go (1)
17-18: Document the re-validation behavior ofValidate(): CallingValidate()after a successfulFetch()replacesfetchOncewith a freshsync.Once, making the Dynamic instance re-fetchable. If this is intentional (e.g., for credential refresh), add a comment clarifying the design. If not, consider either preventing re-initialization or resettingfetchOnceonly if it'snil.Additionally, while the current call-sites safely separate
Validate()(during initialization inNewFileAuthProvider) fromFetch()(during runtime), the API doesn't enforce this constraint. IfValidate()andFetch()were called concurrently on the same instance, there would be a data race on thefetchOncepointer. Consider documenting thatValidate()must not be called concurrently withFetch().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/authprovider/authx/dynamic.go` around lines 17 - 18, Document that calling Dynamic.Validate() after a successful Dynamic.Fetch() resets the internal fetchOnce (*sync.Once) making the instance re-fetchable (used for credential refresh), and explicitly state that Validate() must not be called concurrently with Fetch() because resetting fetchOnce races with Fetch()—add this comment to the Dynamic type and to the Validate() method (reference: Dynamic, Validate(), Fetch(), fetchOnce *sync.Once); alternatively, if you prefer to prevent re-initialization, adjust Validate() to only reset fetchOnce when it is nil (instead of always replacing it) and document that choice and its concurrency implications.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@pkg/authprovider/authx/dynamic.go`:
- Around line 186-202: GetStrategies currently calls d.Fetch(true) which will
terminate the process on fetch errors; change the call to d.Fetch(false) so
fetch failures are non-fatal and let the existing d.error != nil check return
nil strategies; reference Dynamic.GetStrategies and Fetch(isFatal bool). Also
verify consumers like ApplyOnRR handle a nil strategies return (or add a
nil-check there) to avoid panics when no strategies are returned.
---
Nitpick comments:
In `@pkg/authprovider/authx/dynamic_test.go`:
- Around line 189-222: The test "fetch-callback-runs-once" can race because it
relies solely on a sleep to allow other goroutines to enter Fetch; add a barrier
channel (e.g., fetchStarted := make(chan struct{})) and change the
SetLazyFetchCallback closure to block on that channel (e.g., <-fetchStarted)
before doing the sleep and setting dynamic.Extracted, then start all goroutines,
close(fetchStarted) to release the callback once all goroutines are waiting, and
proceed with wg.Wait/err collection; reference the test name
("fetch-callback-runs-once") and the Dynamic.SetLazyFetchCallback closure to
locate and modify the code.
In `@pkg/authprovider/authx/dynamic.go`:
- Around line 17-18: Document that calling Dynamic.Validate() after a successful
Dynamic.Fetch() resets the internal fetchOnce (*sync.Once) making the instance
re-fetchable (used for credential refresh), and explicitly state that Validate()
must not be called concurrently with Fetch() because resetting fetchOnce races
with Fetch()—add this comment to the Dynamic type and to the Validate() method
(reference: Dynamic, Validate(), Fetch(), fetchOnce *sync.Once); alternatively,
if you prefer to prevent re-initialization, adjust Validate() to only reset
fetchOnce when it is nil (instead of always replacing it) and document that
choice and its concurrency implications.
Ensure concurrent Dynamic value copies observe the same fetch error by storing sync.Once and error in shared state, and strengthen concurrency test with a start barrier.
|
Addressed review feedback: Dynamic is copied by value in DynamicAuthStrategy, so Fetch error state must be shared across copies. Reworked Dynamic fetch synchronization to store sync.Once+error in a shared *fetchState and updated the concurrency test to start goroutines behind a barrier. PR head updated to a3591da. |
Call Fetch(false) from Dynamic.GetStrategies so dynamic auth fetch errors do not terminate the scan; strategies are skipped when fetch fails.
|
Addressed remaining review notes: Dynamic.GetStrategies now calls Fetch(false) (non-fatal) and returns nil strategies when fetch fails, matching previous non-fatal behavior after a failed prefetch. PR head updated to 5b126df. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
pkg/authprovider/authx/dynamic.go (1)
213-232: Captured.fetchStatein a local variable to make the single-acquisition semantics explicit.
Fetch()currently dereferencesd.fetchStateat least five times (nil check,once.Doselector, two writes inside the closure, two reads afterDo). Under the documented "no concurrentValidate()" contract this is safe, but if the captureddand the raw field access inside theonce.Doclosure diverge (e.g.,Validate()races despite the warning), the closure ends up callingonce.Doon the oldfetchState.oncewhile writingfetchState.errto whateverd.fetchStatehappens to point to at execution time. Snapshotting into a local removes all five re-reads and aligns theOncecall site with its error target.♻️ Proposed refactor
func (d *Dynamic) Fetch(isFatal bool) error { - if d.fetchState == nil { + state := d.fetchState + if state == nil { if isFatal { gologger.Fatal().Msgf("Could not fetch dynamic secret: %s\n", errNotValidated) } return errNotValidated } - d.fetchState.once.Do(func() { + state.once.Do(func() { if d.fetchCallback == nil { - d.fetchState.err = errkit.New("dynamic secret fetch callback not set: call SetLazyFetchCallback() before Fetch()") + state.err = errkit.New("dynamic secret fetch callback not set: call SetLazyFetchCallback() before Fetch()") return } - d.fetchState.err = d.fetchCallback(d) + state.err = d.fetchCallback(d) }) - if d.fetchState.err != nil && isFatal { - gologger.Fatal().Msgf("Could not fetch dynamic secret: %s\n", d.fetchState.err) + if state.err != nil && isFatal { + gologger.Fatal().Msgf("Could not fetch dynamic secret: %s\n", state.err) } - return d.fetchState.err + return state.err }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/authprovider/authx/dynamic.go` around lines 213 - 232, The Fetch method should snapshot d.fetchState into a local variable to avoid multiple field reads and ensure the once.Do and err writes/read operate on the same fetchState instance; modify Dynamic.Fetch to assign s := d.fetchState at the top, use s for the nil check, call s.once.Do(...) and set/read s.fetchState.err via s.fetchState.err (i.e., s.err) and leave existing checks for s.fetchCallback (and the log/error paths calling gologger and errkit) unchanged; ensure references to SetLazyFetchCallback() remain valid and the behavior for isFatal is preserved.pkg/authprovider/authx/dynamic_test.go (1)
150-158: Optional: add a start barrier to maximize concurrent overlap inall-waiters-block-until-done.Unlike
fetch-callback-runs-once, this subtest launches goroutines without a synchronizing barrier before they callFetch(). The 25 ms non-completion check remains logically correct (a goroutine that hasn't calledwg.Done()keepswg.Wait()live regardless), but without a barrier some goroutines may not have enteredonce.Dowhen the 25 ms window opens, reducing the test's ability to detect a regression where concurrent callers escape before the callback completes.♻️ Proposed refactor
+ barrier := make(chan struct{}) for i := 0; i < numGoroutines; i++ { go func(idx int) { defer wg.Done() + <-barrier results[idx] = d.Fetch(false) }(i) } + close(barrier)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/authprovider/authx/dynamic_test.go` around lines 150 - 158, The test should introduce a start barrier so all goroutines call d.Fetch() at the same time: create a start channel or a second sync.WaitGroup (e.g., startWG or startCh), have each goroutine wait on that barrier before calling d.Fetch(), launch all goroutines with wg.Add(numGoroutines) as before, then release the barrier (close startCh or call startWG.Done()/startWG.Wait()) so they concurrently enter d.Fetch() and the once.Do path; this maximizes overlap for the all-waiters-block-until-done subtest while keeping existing wg and results usage unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/authprovider/authx/dynamic.go`:
- Around line 17-18: The sentinel error errNotValidated is unexported but
returned by the exported Fetch(ctx, ...) method, preventing callers from using
errors.Is; rename and export it (e.g., ErrNotValidated) and update its
declaration (var ErrNotValidated = errkit.New(...)) and all internal references
(any usages in Validate, Fetch, tests) so external packages can perform
errors.Is(err, authx.ErrNotValidated) to detect the condition.
---
Nitpick comments:
In `@pkg/authprovider/authx/dynamic_test.go`:
- Around line 150-158: The test should introduce a start barrier so all
goroutines call d.Fetch() at the same time: create a start channel or a second
sync.WaitGroup (e.g., startWG or startCh), have each goroutine wait on that
barrier before calling d.Fetch(), launch all goroutines with
wg.Add(numGoroutines) as before, then release the barrier (close startCh or call
startWG.Done()/startWG.Wait()) so they concurrently enter d.Fetch() and the
once.Do path; this maximizes overlap for the all-waiters-block-until-done
subtest while keeping existing wg and results usage unchanged.
In `@pkg/authprovider/authx/dynamic.go`:
- Around line 213-232: The Fetch method should snapshot d.fetchState into a
local variable to avoid multiple field reads and ensure the once.Do and err
writes/read operate on the same fetchState instance; modify Dynamic.Fetch to
assign s := d.fetchState at the top, use s for the nil check, call
s.once.Do(...) and set/read s.fetchState.err via s.fetchState.err (i.e., s.err)
and leave existing checks for s.fetchCallback (and the log/error paths calling
gologger and errkit) unchanged; ensure references to SetLazyFetchCallback()
remain valid and the behavior for isFatal is preserved.
| // errNotValidated is returned when Fetch is called before Validate. | ||
| var errNotValidated = errkit.New("dynamic secret not validated: call Validate() before Fetch()") |
There was a problem hiding this comment.
errNotValidated is unexported but surfaced through the exported Fetch() method.
When Fetch(false) is called before Validate(), it returns errNotValidated. Because the sentinel is unexported, callers outside the authx package cannot perform a typed check with errors.Is(err, errNotValidated) and must either inspect the error message or treat it as a generic non-nil error.
If callers in other packages need to programmatically distinguish this condition, consider exporting it as ErrNotValidated.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/authprovider/authx/dynamic.go` around lines 17 - 18, The sentinel error
errNotValidated is unexported but returned by the exported Fetch(ctx, ...)
method, preventing callers from using errors.Is; rename and export it (e.g.,
ErrNotValidated) and update its declaration (var ErrNotValidated =
errkit.New(...)) and all internal references (any usages in Validate, Fetch,
tests) so external packages can perform errors.Is(err, authx.ErrNotValidated) to
detect the condition.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/authprovider/authx/dynamic.go (1)
238-243:Error()readsfetchState.errwithoutonce.Do— potential data race under-race.
Fetch()writesfetchState.errexclusively insideonce.Do. Per the Go memory model, only goroutines that callonce.Doafter the first execution observe a happens-before guarantee with those writes.Error()bypassesonce.Doentirely, so a goroutine that callsError()concurrently with the firstFetch()(or without having previously calledFetch()) has no synchronization guarantee on thefetchState.errread — the Go race detector would flag this.Using
once.Do(func(){})insideError()to establish happens-before would break the fetch: ifError()fires beforeFetch(), it consumes theonceandfetchCallbacknever runs.The lowest-friction remediation is to document the constraint explicitly, or switch
fetchState.errto anatomic.Pointer:♻️ Option A — atomic storage (race-free without extra docs)
type fetchState struct { - once sync.Once - err error + once sync.Once + err atomic.Pointer[error] // nil pointer means no error }In
Fetch(), replaced.fetchState.err = ...with a helper that stores a non-nil pointer on error:// inside once.Do: if err := d.fetchCallback(d); err != nil { d.fetchState.err.Store(&err) }In
Error():if d.fetchState == nil { return nil } if p := d.fetchState.err.Load(); p != nil { return *p } return nil♻️ Option B — document the constraint (minimal change)
-// Error returns the error if any +// Error returns the fetch error, if any. +// It is only safe to call after Fetch() has returned in the same goroutine +// or with external happens-before synchronization. func (d *Dynamic) Error() error {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/authprovider/authx/dynamic.go` around lines 238 - 243, Error() currently reads d.fetchState.err without synchronization and can race with Fetch() which writes it inside once.Do; fix by making fetchState.err access race-free: either convert fetchState.err to an atomic.Pointer to an error and use atomic Store/Load in Fetch() and Error() (referencing Dynamic.fetchState.err, Dynamic.Fetch(), and Dynamic.Error()), or document the caller constraint that Error() must only be called after a successful Fetch() completes; implement the atomic-pointer approach if you want no behavioral change and no extra caller documentation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/authprovider/authx/dynamic.go`:
- Around line 196-201: The current code calls d.Fetch(false) but ignores its
return value, so when Validate() wasn't called d.fetchState stays nil and the
subsequent check (d.fetchState != nil && d.fetchState.err != nil) is bypassed,
causing raw templated secrets to be used; change the call to capture and check
the returned error from d.Fetch(false) (i.e., err := d.Fetch(false); if err !=
nil { return nil } ) so all error paths including errNotValidated are handled
instead of relying on d.fetchState being non-nil; update references around
Fetch(false), fetchState, and Validate() accordingly.
---
Duplicate comments:
In `@pkg/authprovider/authx/dynamic.go`:
- Around line 17-18: errNotValidated is currently unexported but returned from
the exported Fetch() method; change it to an exported symbol (ErrNotValidated)
so callers can use errors.Is(err, authx.ErrNotValidated). Update the declaration
of errNotValidated to ErrNotValidated (same errkit.New message) and update any
internal references (e.g., where Fetch() returns it and any tests) to use the
new ErrNotValidated identifier, ensuring package authx exports the error for
external callers.
---
Nitpick comments:
In `@pkg/authprovider/authx/dynamic.go`:
- Around line 238-243: Error() currently reads d.fetchState.err without
synchronization and can race with Fetch() which writes it inside once.Do; fix by
making fetchState.err access race-free: either convert fetchState.err to an
atomic.Pointer to an error and use atomic Store/Load in Fetch() and Error()
(referencing Dynamic.fetchState.err, Dynamic.Fetch(), and Dynamic.Error()), or
document the caller constraint that Error() must only be called after a
successful Fetch() completes; implement the atomic-pointer approach if you want
no behavioral change and no extra caller documentation.
| // Fetch errors are treated as non-fatal here so a failed dynamic auth fetch | ||
| // does not terminate the entire scan process. | ||
| _ = d.Fetch(false) | ||
|
|
||
| if d.error != nil { | ||
| if d.fetchState != nil && d.fetchState.err != nil { | ||
| return nil |
There was a problem hiding this comment.
nil fetchState bypasses the error guard — raw, uninterpolated strategies are returned.
When Validate() was never called, d.fetchState is nil. Fetch(false) returns errNotValidated but does not set d.fetchState, leaving it nil. The subsequent condition:
if d.fetchState != nil && d.fetchState.err != nilshort-circuits to false because d.fetchState == nil, so execution falls through and builds AuthStrategy values from the raw, uninterpolated secrets (still containing {{...}} template markers). These strategies are then applied to outbound requests, resulting in exactly the unauthenticated-request scenario this PR is meant to eliminate.
The fix is to use the return value of Fetch(false) directly, which is both idiomatic and covers all error paths including the nil fetchState case:
🐛 Proposed fix
- _ = d.Fetch(false)
-
- if d.fetchState != nil && d.fetchState.err != nil {
+ if err := d.Fetch(false); err != nil {
return nil
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/authprovider/authx/dynamic.go` around lines 196 - 201, The current code
calls d.Fetch(false) but ignores its return value, so when Validate() wasn't
called d.fetchState stays nil and the subsequent check (d.fetchState != nil &&
d.fetchState.err != nil) is bypassed, causing raw templated secrets to be used;
change the call to capture and check the returned error from d.Fetch(false)
(i.e., err := d.Fetch(false); if err != nil { return nil } ) so all error paths
including errNotValidated are handled instead of relying on d.fetchState being
non-nil; update references around Fetch(false), fetchState, and Validate()
accordingly.
|
Thank you for your contribution @Godzilla675! Your mutex-based approach to preventing concurrent fetch races was a solid attempt at solving the problem. We merged #6976 which uses Closing this as the issue has been resolved. Thank you for contributing to nuclei! |
Proposed Changes
Proof
closes [BUG] Authenticated scanning starts executing templates before the secret-file template finishes #6592
/claim [BUG] Authenticated scanning starts executing templates before the secret-file template finishes #6592
Summary by CodeRabbit
Tests
Bug Fixes
Refactor