reverseproxy: Track dynamic upstreams, enable passive healthchecking#7539
Merged
francislavoie merged 2 commits intomasterfrom Mar 4, 2026
Merged
reverseproxy: Track dynamic upstreams, enable passive healthchecking#7539francislavoie merged 2 commits intomasterfrom
francislavoie merged 2 commits intomasterfrom
Conversation
c5649f7 to
71f5131
Compare
mholt
approved these changes
Mar 4, 2026
Member
mholt
left a comment
There was a problem hiding this comment.
Thanks Francis, this is cool. Looks a little complicated but is probably a pretty decent way to do things. Let's try it?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I was reviewing #7517 and thinking about how it would handle dynamic upstreams. I was thinking there might be a leak with dynamic upstreams not getting cleaned up since it doesn't happen in the proxy handler's
Cleanup(), but that's not the case because we actually drop the upstreams immediately after every request. This also means we don't have support for passive health checks for dynamic upstreams.This adds separate tracking for dynamic upstreams, which enables passive health checks to work (but not active health checks still, for obvious reasons) and the admin API endpoint can show upstream health as well (with inactive ones getting removed after an hour). This "one hour" lifetime is somewhat arbitrary, but I think it should serve most users well enough, ensures long-term leaks shouldn't happen while making most real-time usecases (checking health and such) still work fine consistently.
Also added tests for dynamic upstream tracking, the admin endpoint, and passive health checks.
AI's Summary of changes
Problem
Dynamic upstreams (from
DynamicUpstreams.GetUpstreams) were provisioned into the sameUsagePoolas static upstreams, but with adefer hosts.Delete(...)at the end of each proxy loop iteration. This caused two bugs:*Hostentry. The next request created a fresh one, wiping passive fail counts. A backend could fail repeatedly and never be marked down.Solution: separate tracking for dynamic hosts
hosts.go
fillDynamicHost()on*Upstream— stores the host in a plainmap[string]dynamicHostEntry(protected bysync.RWMutex) instead of the reference-countedUsagePool. Each call refreshes thelastSeentimestamp.sync.Once) that sweeps the map every 5 minutes and evicts entries idle for more than an hour.reverseproxy.go
provisionUpstreamgains adynamic boolparameter, routing tofillDynamicHost()vsfillHost()accordingly.Provisiontime:provisionUpstream(u, false)— unchanged behaviour.provisionUpstream(dUp, true)— no moredefer hosts.Delete(...).admin.go
/reverse_proxy/upstreamsAPI endpoint now ranges over bothhosts(static) anddynamicHosts(dynamic) so the admin view is complete.Why passive health checks now work
countFailureincrementsHost.countFail(1)and schedules a goroutine to decrement it afterFailDuration. Since*Hostnow persists across requests indynamicHosts, those counts survive between sequential requests.Healthy()readsHost.Fails() < healthCheckPolicy.MaxFails— both of which are correctly set on each fresh*Upstreamstruct fromprovisionUpstream— so a dynamic backend that has accumulated enough failures will correctly be skipped by the load balancer.Assistance Disclosure
Used Github Copilot + Claude Sonnet 4.6