[management] monitoring updates#4937
Conversation
WalkthroughRemoved per-request sync latency tracking and its metric; lowered blocked/login log levels and moved a Login log later in the flow; ensured accountID is injected into context before GetAccountUsers; added HTTP middleware propagation of AccountId/UserId into request context for downstream logging/metrics. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧬 Code graph analysis (1)management/server/telemetry/http_api_metrics.go (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (22)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
management/server/account.go (1)
790-792: Context enrichment for AccountID looks good; consider using the string form for consistencySetting
nbcontext.AccountIDKeyon the context beforeGetAccountUsersis consistent with the rest of the codebase and helps downstream logging/telemetry.Since you already compute
accountIDString := fmt.Sprintf("%v", accountID), you might want to store that string value in the context instead of the rawanyto keep the value type uniform across all AccountID usages:- // nolint:staticcheck - ctx = context.WithValue(ctx, nbcontext.AccountIDKey, accountID) + // nolint:staticcheck + ctx = context.WithValue(ctx, nbcontext.AccountIDKey, accountIDString)Not mandatory, but it can avoid surprises if something later assumes a
stringAccountID in the context.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
management/internals/shared/grpc/server.go(1 hunks)management/server/account.go(1 hunks)management/server/telemetry/grpc_metrics.go(0 hunks)management/server/telemetry/http_api_metrics.go(2 hunks)
💤 Files with no reviewable changes (1)
- management/server/telemetry/grpc_metrics.go
🧰 Additional context used
🧬 Code graph analysis (2)
management/server/account.go (2)
shared/context/keys.go (1)
AccountIDKey(5-5)management/server/context/keys.go (1)
AccountIDKey(7-7)
management/server/telemetry/http_api_metrics.go (2)
management/server/context/auth.go (1)
GetUserAuthFromContext(25-30)shared/management/http/util/util.go (1)
WriteError(84-120)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (26)
- GitHub Check: Client (Docker) / Unit
- GitHub Check: Management / Benchmark (API) (amd64, postgres)
- GitHub Check: Management / Unit (amd64, sqlite)
- GitHub Check: Management / Benchmark (API) (amd64, sqlite)
- GitHub Check: Relay / Unit (amd64, -race)
- GitHub Check: Management / Integration (amd64, postgres)
- GitHub Check: Client / Unit (amd64)
- GitHub Check: Signal / Unit (386)
- GitHub Check: Relay / Unit (386)
- GitHub Check: Management / Unit (amd64, mysql)
- GitHub Check: Management / Benchmark (amd64, postgres)
- GitHub Check: Management / Benchmark (amd64, sqlite)
- GitHub Check: Client / Unit (386)
- GitHub Check: Management / Unit (amd64, postgres)
- GitHub Check: Management / Integration (amd64, sqlite)
- GitHub Check: Linux
- GitHub Check: Windows
- GitHub Check: Darwin
- GitHub Check: Client / Unit
- GitHub Check: Client / Unit
- GitHub Check: JS / Lint
- GitHub Check: iOS / Build
- GitHub Check: release_ui_darwin
- GitHub Check: Android / Build
- GitHub Check: release
- GitHub Check: release_ui
🔇 Additional comments (1)
management/internals/shared/grpc/server.go (1)
550-550: Logging after AccountID context setup is appropriateMoving the login debug log to after
PeerIDKeyandAccountIDKeyare set onctxensures logs carry richer context without changing request handling. Looks good.
| "github.com/netbirdio/netbird/shared/management/http/util" | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find the actual implementation in http_api_metrics.go
cat management/server/telemetry/http_api_metrics.go | head -250Repository: netbirdio/netbird
Length of output: 7416
🏁 Script executed:
# Now let's verify the context issue and how auth middleware works
# 1. Check GetUserAuthFromContext implementation
rg -n "GetUserAuthFromContext" -A 10 management | head -50
# 2. Find where UserAuth is set in context
rg -n "UserAuthContextKey|WithValue.*UserAuth" -B 2 -A 2 management | head -100Repository: netbirdio/netbird
Length of output: 4923
🏁 Script executed:
# Find where auth middleware sets UserAuth in context
rg -n "SetUserAuthInContext|UserAuthContextKey.*WithValue" -B 3 -A 3 management | head -80
# Find auth middleware implementations
rg -n "type.*Auth.*Middleware|AuthenticationMiddleware|JWTMiddleware" -B 2 -A 5 management | head -100Repository: netbirdio/netbird
Length of output: 3122
🏁 Script executed:
# Check AuthMiddleware Handler implementation
rg -n "func.*Handler|type.*Handler" management/server/http/middleware/auth_middleware.go -A 20 | head -100
# Look for the actual Handler function that wraps h http.Handler
cat management/server/http/middleware/auth_middleware.go | sed -n '1,150p'Repository: netbirdio/netbird
Length of output: 5776
🏁 Script executed:
# Check checkJWTFromRequest to confirm SetUserAuthInRequest is called
cat management/server/http/middleware/auth_middleware.go | sed -n '135,250p'
# Find router setup to see middleware order
rg -n "NewAuthMiddleware|NewMetricsMiddleware|mux.NewRouter" management/server/http -B 2 -A 5 | head -150Repository: netbirdio/netbird
Length of output: 15636
🏁 Script executed:
# Find how middlewares are chained/composed with the router
cat management/server/http/handler.go | sed -n '100,180p'Repository: netbirdio/netbird
Length of output: 1628
🏁 Script executed:
# Check WriteError implementation to confirm it writes headers and body
rg -n "func WriteError" -A 15 shared/management/http/utilRepository: netbirdio/netbird
Length of output: 2385
🏁 Script executed:
# Verify that context.WithValue creates a NEW context and doesn't mutate the original
rg -n "func WithValue" -B 2 -A 5 --type go | head -30
# Also double-check that the modified request is not accessible after h.ServeHTTP returns
rg -n "SetUserAuthInRequest" -B 3 -A 3 management/server/context/auth.goRepository: netbirdio/netbird
Length of output: 281
HTTP metrics middleware should not write a new error response after the handler or depend critically on UserAuth
Two problems here can break or distort HTTP responses:
-
Reading UserAuth from the wrong context source
The middleware chain order is:
metrics → cors → auth → handler. When auth runs insideh.ServeHTTP, it setsUserAuthon a modified request viaSetUserAuthInRequest(r, userAuth), which returns a new*http.Requestwith enriched context. This modified request is only visible to downstream handlers within theh.ServeHTTPcall. Afterh.ServeHTTPreturns, the metrics middleware's original outerrstill holds its original context withoutUserAuth. CallingGetUserAuthFromContext(r.Context())here will always fail because the context mutations that happen insideh.ServeHTTPare not visible to the original request object. -
Writing an error response after the wrapped handler already responded
At this point
h.ServeHTTPhas already run and may have:- Written headers and a body (successful or error response), or
- Written its own error via
util.WriteError.
Calling
WriteErrorhere will:- Write headers again (the
WrappedResponseWriterwill silently ignore this due to thewroteHeaderflag). - Append a JSON error body to an already-sent response, creating a malformed response.
- Log the same error twice (once in the handler, once in metrics).
- Change behavior for endpoints that legitimately don't attach
UserAuth(e.g. health checks, login, or any unauthenticated route), even though the underlying handler completed successfully.
Middleware responsible for metrics/logging should be observational, not alter HTTP semantics when it can't find optional context like
UserAuth.
Suggested fix
Make UserAuth enrichment best-effort and non-fatal, and avoid writing a new error response here:
- h.ServeHTTP(w, r.WithContext(ctx))
-
- userAuth, err := nbContext.GetUserAuthFromContext(r.Context())
- if err != nil {
- util.WriteError(r.Context(), err, w)
- return
- }
-
- //nolint
- ctx = context.WithValue(ctx, nbContext.AccountIDKey, userAuth.AccountId)
- //nolint
- ctx = context.WithValue(ctx, nbContext.UserIDKey, userAuth.UserId)
+ h.ServeHTTP(w, r.WithContext(ctx))
+
+ // Best-effort enrichment: if user auth is present, add account/user IDs to the logging context.
+ // If not present (e.g. unauthenticated endpoints or failed auth), just skip enrichment.
+ if userAuth, err := nbContext.GetUserAuthFromContext(ctx); err == nil {
+ //nolint
+ ctx = context.WithValue(ctx, nbContext.AccountIDKey, userAuth.AccountId)
+ //nolint
+ ctx = context.WithValue(ctx, nbContext.UserIDKey, userAuth.UserId)
+ }Key points:
- No
WriteErrorhere; errors from auth/handlers should be handled in their own middleware/handlers. - Check
ctx(the enriched context passed into the handler) instead ofr.Context(). While this doesn't fully solve the context visibility problem, it's more correct and keeps the logic aligned. The real fix requires verifying how the HTTP auth middleware is wired to ensureUserAuthis actually reachable at the appropriate point.
This keeps metrics/logging behavior non-invasive while still enriching logs when user/account information is available.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In management/server/telemetry/http_api_metrics.go around lines 18–19, the
metrics middleware currently reads UserAuth from the original request context
after calling h.ServeHTTP and may call util.WriteError if auth info is missing;
instead make UserAuth enrichment best-effort by reading from the
handler-provided ctx (the context used inside ServeHTTP) rather than
r.Context(), do not call util.WriteError or otherwise mutate the response after
h.ServeHTTP returns, and only add user/account fields to logs/metrics when the
UserAuth lookup succeeds; keep the middleware observational—log presence/absence
of UserAuth but do not alter HTTP semantics or write responses.
|



Describe your changes
Issue ticket number and link
Stack
Checklist
Documentation
Select exactly one:
Docs PR URL (required if "docs added" is checked)
Paste the PR link from https://github.com/netbirdio/docs here:
https://github.com/netbirdio/docs/pull/__
Summary by CodeRabbit
Chores
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.