[Management/Client] Trigger debug bundle runs from API/Dashboard (#4592)#4832
[Management/Client] Trigger debug bundle runs from API/Dashboard (#4592)#4832
Conversation
📝 WalkthroughWalkthroughAdds a per-peer Job subsystem (streaming RPC, manager, persistence, HTTP API), refactors client debug bundle generation/upload (remove DebugBundleRequest.status, LogFile→LogPath, centralized upload), threads logPath/profile through client/engine, and updates status conversion to use proto.FullStatus plus daemon version. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
rect rgba(240,248,255,0.5)
participant MServer as Management gRPC Server
participant JobMgr as Job Manager
participant Store as DB/Store
end
rect rgba(245,255,240,0.5)
participant Client as Peer Client (mgmt stream)
participant Engine as Client Engine / Job Executor
participant DebugGen as Debug Bundle Generator
participant Upload as Upload service
end
MServer->>JobMgr: CreatePeerJob(accountID, peerID, JobRequest)
JobMgr->>Store: Persist job (pending)
MServer->>Client: Stream JobRequest via Job RPC
Client->>Engine: Executor.BundleJob(ctx, deps, params, waitFor)
Engine->>DebugGen: Generate bundle (uses LogPath, StatusRecorder, ProfileConfig)
DebugGen-->>Engine: bundle file path
Engine->>Upload: UploadDebugBundle(mgmURL, bundlePath)
Upload-->>Engine: uploadKey / error
Engine->>Client: Send JobResponse(uploadKey/status)
Client->>MServer: Stream JobResponse
MServer->>JobMgr: HandleResponse(JobResponse)
JobMgr->>Store: CompletePeerJob (succeeded/failed)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
client/status/status.go (1)
234-315: ToProtoFullStatus mapping is good; add nil‑safety when reading latency/handshakeThe new
ToProtoFullStatushelper correctly maps:
- Management and signal state (URL, connected, error).
- Local peer state (IP, pub key, kernel, FQDN, Rosenpass flags, networks from route keys).
- Per‑peer fields (status, timestamps, transfer stats, Rosenpass, networks, latency, SSH host key).
- Relays and DNS group state, including error strings.
Downstream,
mapPeersassumesLatencyandLastWireguardHandshakeare always set:lastHandshake := pbPeerState.GetLastWireguardHandshake().AsTime().Local() ... Latency: pbPeerState.GetLatency().AsDuration(),That’s safe for
FullStatusobjects produced byToProtoFullStatus, but will panic if aPeerStatearrives with these fields unset (e.g., older daemon or other producer). To harden this:var lastHandshake time.Time if ts := pbPeerState.GetLastWireguardHandshake(); ts != nil { lastHandshake = ts.AsTime().Local() } var latency time.Duration if d := pbPeerState.GetLatency(); d != nil { latency = d.AsDuration() }and then use those locals when building
PeerStateDetailOutput.Also applies to: 549-635
client/internal/engine.go (2)
34-53: Engine config/struct wiring for jobs is fine, butc *profilemanager.Configis unusedThe new debug/job additions to
EngineConfig(ProfileConfig,LogPath) andEngine(jobExecutor,jobExecutorWG) are wired in cleanly, and creatingjobExecutorinNewEngineis reasonable.However, the extra parameter
c *profilemanager.ConfigonNewEngineis never used in the function body. In Go this is a compile‑time error. Either remove the parameter or actually thread it into the engine configuration, e.g.:config.ProfileConfig = c // or engine.config.ProfileConfig = cdepending on where you intend to own that reference.
Also applies to: 83-142, 145-222, 235-276
278-366: Job stream consumption and bundle handling are generally solid; watch for restart behavior and nil profile configThe new job flow on the client side looks good overall:
receiveJobEventshooks intomgmClient.Jobwith a per‑message handler that defaults responses toJobStatus_failedand branches onWorkloadParameters– currently onlyBundleis handled, everything else returnsErrJobNotImplemented.jobExecutorWGensuresStop()waits for the Job stream goroutine to exit before tearing down the engine.handleBundlebuildsdebug.GeneratorDependenciesfrom engine state, callsjobExecutor.BundleJob, and wraps the resulting upload key in aJobResponse_Bundle.Two points to be aware of:
When
mgmClient.Jobreturns any error (including whene.ctxis canceled duringStop()), you treat it as a hard failure and callCtxGetState(e.ctx).Wrap(ErrResetConnection)+e.clientCancel(), which triggers a full client restart. That mirrors how Sync/Signal errors are handled but also means a graceful “down” or engine stop will trigger a restart path instead of a quiet shutdown. If that’s not intended, you may want to distinguish context cancellation from other errors.
handleBundleassumese.config.ProfileConfigande.config.ProfileConfig.ManagementURLare non‑nil:InternalConfig: e.config.ProfileConfig, ... uploadKey, err := e.jobExecutor.BundleJob(..., e.config.ProfileConfig.ManagementURL.String())If
ProfileConfigor itsManagementURLcan be unset (e.g., in tests or some platforms), this will panic. A defensive check that returns a failedJobResponsewith a clear reason would make the behavior safer.Also applies to: 942-1012
🧹 Nitpick comments (38)
shared/management/client/grpc.go (5)
112-160: Centralized withMgmtStream helper looks good; double‑check backoff choice and final loggingThe
withMgmtStreamwrapper nicely unifies readiness checks, server key fetching, and retry for bothSyncandJob. One thing to double‑check is whether you explicitly want streaming calls to usedefaultBackoff(ctx)while other RPCs still usenbgrpc.Backoff(...); if not, consider reusing the same backoff helper for consistency, or documenting the intentional difference.Also,
withMgmtStreamlogs"unrecoverable error"for any non‑nilerrafterbackoff.Retry, includingcontext.Canceled/DeadlineExceeded. If you expect normal shutdowns via context cancellation, you may want to special‑case those errors before logging at warn to avoid noisy logs.
162-216: Job stream error handling is solid; refine connection notifications for better state reportingThe overall
handleJobStreamflow (open stream → handshake → recv/process/respond loop) and error classification by gRPC code look good.A couple of refinements around connection state notifications:
notifyDisconnected(err)is called for every receive error, includingcodes.Unimplemented. In that case, the server is reachable but doesn’t supportJob; marking management as “disconnected” can mislead consumers ofConnStateNotifier. Consider movingnotifyDisconnectedinto theswitchand skipping it forcodes.Unimplemented(and possiblycodes.Canceled, which usually indicates local shutdown).- The Job path never calls
notifyConnected(), so this stream can only move the state toward “disconnected”. IfConnStateNotifieris used for user‑visible connectivity, you might want to callnotifyConnected()once the stream and handshake succeed (or when the first job is successfully received) to keep the state transitions balanced.- On
sendJobResponsefailure you return the error but don’t notify disconnection; if this error typically indicates a broken stream, it may be worth also invokingnotifyDisconnected(err)there.These are behavioral tweaks rather than correctness issues but would make connection state reporting more accurate.
218-252: Job helpers mirror existing encryption patterns; consider cleaning up unused ctx parameters
sendHandshakeandreceiveJobRequestcorrectly follow the existingEncryptMessage/DecryptMessagepattern and use the WireGuard public key in theEncryptedMessageas elsewhere in this file.Right now the
ctxparameter passed into these helpers isn’t used inside them; the only effective cancellation point is the stream’s own context fromrealClient.Job(ctx). That’s fine behaviorally, but for clarity you could either:
- Drop
ctxfrom these helper signatures, or- Start using it explicitly (e.g., check
ctx.Err()before/after blocking operations, or for future timeouts / per‑job cancellation).Given this is internal code, I’d treat it as a readability/maintenance cleanup to do when convenient.
254-295: Ensure JobResponse is always well‑formed for correlation and logging
processJobRequestnicely guards against anilhandler result by synthesizing aJobResponsewithStatus: JobStatus_failedand a reason. Two minor robustness tweaks you might consider:
- If the handler returns a non‑nil
JobResponsebut forgets to populateID, you could default it tojobReq.IDso the management server can always correlate responses:jobResp := msgHandler(jobReq) if jobResp == nil { jobResp = &proto.JobResponse{ ID: jobReq.ID, Status: proto.JobStatus_failed, Reason: []byte("handler returned nil response"), } } else if len(jobResp.ID) == 0 { jobResp.ID = jobReq.ID }
- For logging,
string(jobReq.ID)/string(resp.ID)assumes the IDs are valid UTF‑8. If these are ever changed to non‑textual bytes, consider logging them as%xor viahex.EncodeToStringto avoid odd output.Not blockers, but they make the job channel a bit more defensive and easier to debug.
297-395: Sync refactor via connectToSyncStream/receiveUpdatesEvents looks consistent with existing patternsThe split into
connectToSyncStreamandreceiveUpdatesEventsreads clean and matches the encryption/decryption contract used elsewhere (EncryptMessage(serverPubKey, c.key, req)andDecryptMessage(serverPubKey, c.key, update.Body, resp)).A small optional improvement:
GetNetworkMapnow reimplements the “singleRecv+ decryptSyncResponse” logic that’s very similar to whatreceiveUpdatesEventsdoes in a loop. If you find yourself touching this code again, you might consider a tiny shared helper (e.g.,decryptSyncUpdate(serverPubKey, update *proto.EncryptedMessage) (*proto.SyncResponse, error)) to keep the decryption path fully DRY.Functionally, the new sync connection flow looks correct.
client/internal/debug/upload_test.go (1)
1-1: Test now exercises the exported UploadDebugBundle correctlySwitching to
package debugand callingUploadDebugBundle(context.Background(), testURL+types.GetURLPath, testURL, file)matches the new API and keeps the expectations (getURLHash(testURL)prefix, stored file content) intact.You might consider using a context with timeout here (and/or a simple readiness wait on
srv.Start) to make this integration-style test more robust under slow CI environments.Also applies to: 41-48
client/cmd/debug.go (1)
338-360: Debug bundle generator now uses LogPath and shared default upload URLPassing
LogPath: logFilePathintodebug.NewBundleGeneratormatches the updatedGeneratorDependenciesand ensures the bundle generator knows where to look for logs. Reusingtypes.DefaultBundleURLas the default for--upload-bundle-urlin bothdebug bundleanddebug forkeeps the CLI consistent with the upload-server configuration.Consider clarifying in the flag help text that the default is the NetBird-hosted debug upload service, so self‑hosters know they should override it if needed.
Also applies to: 369-379
client/server/server_test.go (1)
21-21: JobManager wiring into test management server is correctCreating a
jobManager := job.NewJobManager(nil, store)and passing it into bothserver.BuildManager(...)andnbgrpc.NewServer(...)matches the updated constructor signatures and ensures job flows are exercised in this test setup.The
if err != nil { return nil, "", err }right aftereventStore := &activity.InMemoryEventStore{}still refers to the earliererrfromNewTestStoreFromSQL, which has already been handled and is guaranteed nil here. It can be removed to avoid confusion.Also applies to: 298-329
client/internal/debug/debug.go (1)
30-34: Status seeding via FullStatus/nbstatus is coherent with anonymization strategyUsing
statusRecorder.GetFullStatus()→nbstatus.ToProtoFullStatus→ConvertToStatusOutputOverview→ParseToFullDetailSummaryaligns thestatus.txtcontents with the regularnetbird statusoutput and correctly injects event history and profile name. Seeding the bundle anonymizer viaseedFromStatus(g.anonymizer, &fullStatus)leverages this richer status snapshot to anonymize later artifacts more consistently.One minor nit:
seedFromStatusruns even wheng.anonymizeis false, which is harmless but unnecessary work; you could optionally guard it withif g.anonymize { ... }for clarity.Also applies to: 382-405, 852-880
shared/management/http/api/generate.sh (1)
14-16: oapi-codegen v2 install path looks correct; consider pinning a versionThe switch to
github.com/oapi-codegen/oapi-codegen/v2/cmd/oapi-codegen@latestmatches the v2 module layout and should work fine with recent Go toolchains. For more reproducible codegen in CI, you might eventually want to pin a specific version instead of@latest.client/embed/embed.go (1)
175-185: Recorder integration and updated Run call look consistentWiring a
peer.NewRecorderintoNewConnectClientand updatingRuntoRun(run, "")matches the new client API and should keep embedded startup behavior intact. If you later surface a log/debug path inOptions, this is the right place to thread it through instead of an empty string.management/server/http/testing/testing_tools/channel/channel.go (1)
19-39: JobManager wiring into BuildManager is correct; reuse metrics instead of nilInjecting a concrete
job.ManagerintoBuildManagerhere is a nice improvement in test realism. Given you already constructmetricsin this helper, you can pass it into the job manager instead ofnilso job‑related metrics behavior is also covered in tests:- peersUpdateManager := update_channel.NewPeersUpdateManager(nil) - jobManager := job.NewJobManager(nil, store) + peersUpdateManager := update_channel.NewPeersUpdateManager(nil) + jobManager := job.NewJobManager(metrics, store) ... - am, err := server.BuildManager(ctx, nil, store, networkMapController, jobManager, nil, "", &activity.InMemoryEventStore{}, geoMock, false, validatorMock, metrics, proxyController, settingsManager, permissionsManager, false) + am, err := server.BuildManager(ctx, nil, store, networkMapController, jobManager, nil, "", &activity.InMemoryEventStore{}, geoMock, false, validatorMock, metrics, proxyController, settingsManager, permissionsManager, false)Also applies to: 53-79
management/server/account_test.go (1)
35-50: Account test manager now wires a JobManager; prefer passing metrics instead of nilImporting
joband passing a concreteJobManagerintoBuildManagerincreateManagermatches the new account manager wiring and lets tests exercise job‑related paths. Since you already computemetricsin this helper, you can avoid a nil metrics field on the JobManager and better mirror production setup by doing:- updateManager := update_channel.NewPeersUpdateManager(metrics) - requestBuffer := NewAccountRequestBuffer(ctx, store) - networkMapController := controller.NewController(ctx, store, metrics, updateManager, requestBuffer, MockIntegratedValidator{}, settingsMockManager, "netbird.cloud", port_forwarding.NewControllerMock(), &config.Config{}) - manager, err := BuildManager(ctx, nil, store, networkMapController, job.NewJobManager(nil, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false) + updateManager := update_channel.NewPeersUpdateManager(metrics) + requestBuffer := NewAccountRequestBuffer(ctx, store) + networkMapController := controller.NewController(ctx, store, metrics, updateManager, requestBuffer, MockIntegratedValidator{}, settingsMockManager, "netbird.cloud", port_forwarding.NewControllerMock(), &config.Config{}) + manager, err := BuildManager(ctx, nil, store, networkMapController, job.NewJobManager(metrics, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)Also applies to: 2931-2969
management/server/route_test.go (1)
22-34: Route tests correctly inject a JobManager; reuse metrics for consistencyThe additional
jobimport and passingjob.NewJobManager(nil, store)intoBuildManagerincreateRouterManagerare aligned with the new constructor and ensure route tests run with job support enabled. Since this helper already initializesmetrics, you can tighten the fidelity of the test setup by wiring it into the JobManager too:- updateManager := update_channel.NewPeersUpdateManager(metrics) - requestBuffer := NewAccountRequestBuffer(ctx, store) - networkMapController := controller.NewController(ctx, store, metrics, updateManager, requestBuffer, MockIntegratedValidator{}, settingsMockManager, "netbird.selfhosted", port_forwarding.NewControllerMock(), &config.Config{}) - - am, err := BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(nil, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false) + updateManager := update_channel.NewPeersUpdateManager(metrics) + requestBuffer := NewAccountRequestBuffer(ctx, store) + networkMapController := controller.NewController(ctx, store, metrics, updateManager, requestBuffer, MockIntegratedValidator{}, settingsMockManager, "netbird.selfhosted", port_forwarding.NewControllerMock(), &config.Config{}) + + am, err := BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(metrics, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)Also applies to: 1258-1302
management/server/peer_test.go (1)
35-35: Prefer initializingjob.Managerwith real metrics in testsThe wiring of
BuildManagerlooks correct (argument order matches the updated signature), but all these test setups construct the job manager withjob.NewJobManager(nil, s)whilemetricsis already available. This leavesjob.Manager.metricsnil in tests, which can (a) hide metric-related bugs and (b) potentially panic later if the manager starts using metrics without nil checks.Consider passing the same metrics instance you already create:
- am, err := BuildManager(context.Background(), nil, s, networkMapController, job.NewJobManager(nil, s), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false) + am, err := BuildManager(context.Background(), nil, s, networkMapController, job.NewJobManager(metrics, s), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)Apply the same pattern to the other
BuildManagerinvocations in this file for consistency.Also applies to: 1296-1296, 1381-1381, 1534-1534, 1614-1614
management/server/nameserver_test.go (1)
19-19: Use the existing metrics instance when creatingJobManager
createNSManagercorrectly wires the newjobManagerdependency intoBuildManager, but you’re passingjob.NewJobManager(nil, store)even though a realmetricsinstance is already created and used forupdate_channel.NewPeersUpdateManager.To avoid a nil
metricsinsidejob.Managerin tests (and potential surprises if metrics are used later), prefer:- return BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(nil, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false) + return BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(metrics, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)Also applies to: 798-799
management/server/dns_test.go (1)
17-17: Align testJobManagerconstruction with real metrics
createDNSManagercorrectly injects aJobManagerintoBuildManager, but usesjob.NewJobManager(nil, store)despite having ametricsinstance already.For more realistic tests and to avoid a nil metrics field inside
job.Manager, consider:- return BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(nil, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false) + return BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(metrics, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)Also applies to: 229-230
management/server/management_proto_test.go (1)
32-32: JobManager wiring in tests looks correct; consider reusing metricsThe test correctly instantiates
job.Managerand threads it throughBuildManagerandnbgrpc.NewServer, matching the new signatures. To exercise metrics inside the job manager during tests, you might consider passing themetricsinstance instead ofnilwhen creatingjobManager, but that's optional.Also applies to: 342-343, 369-371, 380-381
client/cmd/testutil_test.go (1)
19-19: Consistent JobManager setup in test helper; minor polish possibleThe JobManager is correctly created and passed into
BuildManagerandnbgrpc.NewServer, aligning with the new signatures. Two small nits you could optionally address later:
- Pass the
metricsinstance tojob.NewJobManagerinstead ofnilif you want job metrics in tests.- The
if err != nil { return nil, nil }check aftereventStore := &activity.InMemoryEventStore{}is dead code and can be removed.Also applies to: 92-96, 123-124, 129-130
management/server/account.go (1)
18-18: JobManager injection is clean; clarify non‑nil contractAdding
jobManager *job.ManagertoDefaultAccountManagerand threading it viaBuildManageris consistent and keeps wiring explicit. However, methods likeCreatePeerJobdereferenceam.jobManagerwithout nil checks, soBuildManagereffectively requires a non‑nil JobManager in all call sites. Consider either:
- Documenting that
jobManagermust be non‑nil (and enforcing via tests), or- Adding a defensive nil check that returns a clear
status.Internal/status.PreconditionFailederror instead of panicking if it’s miswired.Would you like a small
ast-grep/rgscript to verify that allBuildManagercall sites now pass a non‑nil JobManager?Also applies to: 68-76, 179-197, 203-224
management/server/management_test.go (1)
31-31: Test server wiring with JobManager is correctThe test server now correctly constructs a JobManager and passes it through to
BuildManagerandnbgrpc.NewServer, matching the new APIs. As with other tests, you could optionally passmetricsinstead ofnilintojob.NewJobManagerif you want job metrics exercised, but the current setup is functionally fine.Also applies to: 183-185, 212-228, 235-247
management/server/http/handlers/peers/peers_handler.go (1)
31-41: New peer job HTTP endpoints are well‑structured; tighten validation and rely on fixed ownership checkThe new
/peers/{peerId}/jobsand/peers/{peerId}/jobs/{jobId}handlers are consistent with existing patterns (auth from context,util.WriteError/WriteJSONObject, and conversion helpers). A few small points:
- Once
GetPeerJobByIDis fixed to assertjob.PeerID == peerID(see account manager comment), these handlers will correctly enforce per‑peer job ownership.- For consistency with other handlers in this file (e.g.
HandlePeer,GetAccessiblePeers), you may want to add an explicit empty‑peerIdcheck inCreateJob,ListJobs, andGetJobto return a clearInvalidArgumentinstead of silently passing""through.toSingleJobResponsecleanly maps the domainJobto the HTTPJobResponse, including optionalFailedReasonand typedWorkload; that looks good.Overall, the HTTP surface for jobs is in good shape once the backend ownership check is tightened.
Also applies to: 51-142, 618-638
shared/management/proto/management.proto (1)
52-53: Refine Job proto types for clarity and forward compatibilityThe Job RPC and related messages look structurally sound and align with the existing EncryptedMessage pattern. A few tweaks would make them easier to use:
JobResponse.Reasonasbytesis unusual for a human-readable explanation; astringwould be more ergonomic unless you explicitly plan to ship arbitrary binary blobs.JobStatusonly hasunknown_status,succeeded,failed. If you intend to reflect persistent job lifecycle (pending/running vs completed) it may be worth adding an explicit in‑progress state instead of overloadingunknown_statusas a placeholder.- Consider documenting expected encoding/format for
JobRequest.ID/JobResponse.ID(stringified UUID vs raw bytes) so store/types.Job integration remains consistent.Functionally this is fine, these are just protocol polish suggestions.
Also applies to: 66-89
client/internal/debug/upload.go (1)
15-101: Upload helper is solid; consider small robustness improvementsThe end-to-end flow (get presigned URL → size check → PUT upload) looks good and side‑effect free. A few minor refinements:
- In
upload, prefer%wwhen wrapping the HTTP error to preserve the error chain:return fmt.Errorf("upload failed: %w", err)getUploadURLbuilds the query asurl+"?id="+id. Ifurlmight ever include its own query string, usingnet/urlto appendidwould be safer.- Both
getUploadURLanduploadstrictly requireStatusOK. If the upload service ever returns another 2xx (e.g. 201/204), this will be treated as failure. If you control that service and guarantee 200, current code is fine; otherwise considerif putResp.StatusCode/100 != 2.None of these are blockers; the current implementation should work as intended.
client/server/debug.go (1)
27-39: Use the incoming context (and possibly avoid holding the mutex) during uploadThe switch to
debug.UploadDebugBundleandLogPath: s.logFilelooks correct.Two follow‑up improvements worth considering:
DebugBundleignores the incoming context and callsUploadDebugBundlewithcontext.Background(). That means client cancellation/timeouts won’t stop the upload. Renaming the parameter toctx context.Contextand passingctxthrough would make this RPC better behaved under cancellation.- The upload is done while
s.mutexis held. For large bundles or slow networks this can block other server operations behind this lock. If state safety allows it, you might generate the bundle under the lock, then release the mutex before starting the network upload.Behavior is fine for a rarely used debug path, but these tweaks would make it more responsive.
Also applies to: 49-57
management/server/account/manager.go (1)
127-129: Job APIs on Manager look good; consider pagination/filtersThe new job methods are consistent with the rest of the Manager interface (accountID/userID/peerID ordering, types.Job usage).
Two design points to keep in mind for implementations:
GetAllPeerJobscan grow unbounded over time; if jobs are expected to accumulate, consider adding pagination and/or server‑side filtering (e.g. by status or time range) at some point.- Ensure implementations consistently enforce
userIDauthorization, since these methods expose per‑peer operational history.No changes required here, just points to watch in the concrete manager.
client/internal/engine_test.go (1)
1591-1592: JobManager wiring in startManagement test mirrors production; consider reusing metricsThe new
jobManager := job.NewJobManager(nil, store)and its injection intoBuildManagerandnbgrpc.NewServerproperly mirror the production wiring, so tests now exercise the job pipeline end‑to‑end.Since you already create
metrics, err := telemetry.NewDefaultAppMetrics(...)a few lines later, you may want to:
- Construct
metricsfirst, then- Call
job.NewJobManager(metrics, store)This would keep metrics behavior consistent between the JobManager and the rest of the management stack even in tests. Not required for correctness, but a bit cleaner.
Also applies to: 1622-1623, 1628-1629
shared/management/http/api/openapi.yml (2)
41-121: Align bundle/workload schema with actual bundle generator configThe Bundle/Workload schemas look reasonable, but there are a couple of potential contract mismatches to double‑check:
BundleParametersexposesbundle_forandbundle_for_timewhile the currentdebug.BundleConfigonly hasAnonymize,IncludeSystemInfo, andLogFileCount. There is no way for callers to controlIncludeSystemInfo, and the semantics ofbundle_for/bundle_for_timevs. what the client actually does (currently just a wait + regular bundle) should be clarified or adjusted so the API is not misleading.- You’re using a discriminator on
WorkloadRequest/WorkloadResponsewithpropertyName: type. Make sure the generated clients you care about support this pattern where the discriminator property lives only in the concrete types (BundleWorkloadRequest/Response) and not also declared at the base level; some generators are picky here.I’d suggest either:
- Extending
BundleConfig+ implementation to truly honor all declared parameters, or- Tightening the OpenAPI description and fields to only what is actually used today, to avoid surprising API consumers.
35-38: Clarify experimental nature and pagination/filtering for Jobs endpointsThe
Jobstag is markedx-experimental: true, but the individual operations are not. If you rely on tooling that inspects operation/vendor extensions rather than tags, you may want to duplicatex-experimental: trueon the new/api/peers/{peerId}/jobsand/api/peers/{peerId}/jobs/{jobId}operations.Additionally, listing jobs currently returns an unbounded array with no filtering or pagination parameters. If job counts can become large per peer, it’s worth at least documenting ordering (e.g. newest first) and planning for
page/page_size/statusfilters, even if you defer implementing them now.Also applies to: 2353-2456
shared/management/client/client_test.go (1)
25-26: JobManager wiring in tests looks correct; consider reusing metrics if you need job metricsThe new
jobManager := job.NewJobManager(nil, store)and its injection into bothBuildManagerandnbgrpc.NewServeralign with the updated constructor signatures and keep the tests realistic.If you ever want to exercise job‑related metrics in tests, you could pass the already‑created
metricsinstead ofniltoNewJobManager; otherwise this setup is fine as‑is for functional tests.Also applies to: 76-77, 123-124, 130-132
management/server/store/sql_store.go (1)
136-207: Job CRUD helpers mostly fine; consider tightening error handling and update scopeFunctionality looks correct overall (ID scoping, account/peer filters, status transitions), but a few details could be improved:
CompletePeerJobupdates byidonly; sinceIDis the primary key this is safe, but for consistency withGetPeerJobByIDyou may want to includeaccount_idin theWHEREas well.CreatePeerJob,CompletePeerJob, andMarkPendingJobsAsFailedcorrectly wrap errors withstatus.Internal, butGetPeerJobByID/GetPeerJobsreturn raw GORM errors on internal failure. Consider wrapping those instatus.Errorf(status.Internal, ...)as done elsewhere in this store so callers don’t depend on storage-specific errors.- If
types.Job.ApplyResponseonly populates a subset of fields,CompletePeerJob’sUpdates(job)call is fine; if it ever starts zeroing other fields, you may want to switch to an explicit field list/map to avoid unintended overwrites.management/internals/shared/grpc/server.go (1)
184-219: Job stream handling is generally sound; consider surfacing send failuresThe new
JobRPC and helpers hold up well:
- Handshake reads a single encrypted
JobRequestand reuses the existingparseRequestpath, so message framing and crypto stay consistent with Sync/Login.- Account/peer lookup mirrors the Sync flow and correctly maps missing peers to
PermissionDenied/Unauthenticated.CreateJobChannel+ deferredCloseChannelprovide a clear per-peer lifecycle, andstartResponseReceivercleanly demultiplexesJobResponsemessages into the job manager.One point to consider tightening:
- In
sendJobsLoop, any failure insendJobis logged and then the method returnsnil, so the gRPC stream terminates with an OK status even if job delivery to the client failed. For parity withhandleUpdates/sendUpdate, you may want to return the error (or a mapped status error) so callers can observe transport failures on the Job stream.Also applies to: 333-397, 450-467
management/server/job/manager.go (2)
44-151: Pending‑job cleanup semantics are coarse; minor concurrency detail worth double‑checkingOverall behavior is coherent (fail stuck jobs, enqueue new ones, complete on response), but a few aspects could be sharpened:
CreateJobChannelcallsMarkPendingJobsAsFailedfor(accountID, peerID)before taking the lock. That’s fine functionally, but means “stuck too long” in the DB is actually “any pending job when a fresh channel is created” — if you later add explicit TTL semantics, you may want a more targeted query.- In
SendJob, you look up theChannelunderRLockand then callch.AddEventafter releasing the lock. IfCreateJobChannelorCloseChannelcan callch.Close()concurrently, make sureChannel.AddEventis written to safely detect a closed channel and returnErrJobChannelClosedrather than panic on send to a closed chan.- Both
cleanupandCloseChanneluseMarkPendingJobsAsFailed(accountID, peerID, reason), which marks all pending DB jobs for that peer, not just the specificjobID. That’s acceptable if you only ever have a single pending job per peer, but if you later support multiple in‑flight jobs, you’ll likely want a job‑scoped failure path in the store.- In
CloseChannel, the loop overjm.pendingcallsMarkPendingJobsAsFailedonce per event for the same peer; a minor optimization is to call it once per peerID, then delete all relevantpendingentries.These are mostly behavioral clarifications and small optimizations rather than blockers.
90-117: HandleResponse flow is correct but could simplify error‑response coupling
HandleResponsecorrectly:
- Looks up the pending event by
jobID.- Builds a
types.Jobfrom the response viaApplyResponse.- Calls
Store.CompletePeerJoband always removes the entry frompending.Given that the
event.Responsefield is only set whenCompletePeerJobsucceeds and then immediately dropped frompending, you could omitevent.Responseentirely or set it before calling the store to simplify reasoning. Current behavior is valid; this is mainly an internal API cleanup opportunity.client/status/status.go (1)
122-169: Overview mapping from FullStatus looks correct; consider a nil guardSwitching
ConvertToStatusOutputOverviewto use*proto.FullStatusdirectly and threading indaemonVersionmatches how the proto is structured and keeps the output fields aligned with the underlying status (management, signal, local peer, relays, DNS, events, SSH).One small robustness improvement: if
pbFullStatuscan ever be nil (e.g., from an older daemon or error path),pbFullStatus.Get...()calls will panic. A quick upfront check likeif pbFullStatus == nil { return OutputOverview{} }would make the function safer against unexpected inputs.
management/server/types/job.go (2)
105-138: Minor: preserve wrapped error instead of usingerr.Error()In
BuildWorkloadResponse, you wrap bundle build errors as:if err := j.buildBundleResponse(&wl); err != nil { return nil, status.Errorf(status.InvalidArgument, err.Error()) }Using
err.Error()as the format string loses the original error as a distinct argument and makes further wrapping/debugging slightly harder. Consider:- return nil, status.Errorf(status.InvalidArgument, err.Error()) + return nil, status.Errorf(status.InvalidArgument, "%v", err)to keep the original error value intact and consistent with the rest of the file.
140-162: Bundle parameter validation is clear; consider tightening semantics only if neededThe validation for
BundleForTime(1–5 minutes whenBundleForis true) andLogFileCount(1–1000) is straightforward and matches the API docs. You also normalize stored parameters to theBundleParametersJSON and initializeResultto{}, which keepsBuildWorkloadResponsehappy for pending/failed jobs.If you ever want to hard‑fail obviously nonsensical inputs when
BundleForis false (e.g., negativeBundleForTime), you could extend the check here, but that’s optional and not strictly required by the current contract.shared/management/http/api/types.gen.go (1)
2112-2169: WorkloadRequest helpers match server usage; consider using the WorkloadType constantThe
As*/From*/Merge*helpers andDiscriminator/ValueByDiscriminatorforWorkloadRequestare consistent with howNewJobandvalidateAndBuildBundleParamsconsume the workload. One small optional improvement would be to useWorkloadTypeBundleinstead of the literal"bundle"to avoid drift if the constant ever changes:-func (t *WorkloadRequest) FromBundleWorkloadRequest(v BundleWorkloadRequest) error { - v.Type = "bundle" +func (t *WorkloadRequest) FromBundleWorkloadRequest(v BundleWorkloadRequest) error { + v.Type = WorkloadTypeBundleSame for
MergeBundleWorkloadRequestand the switch inValueByDiscriminator.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (4)
client/proto/daemon.pb.gois excluded by!**/*.pb.gogo.sumis excluded by!**/*.sumshared/management/proto/management.pb.gois excluded by!**/*.pb.goshared/management/proto/management_grpc.pb.gois excluded by!**/*.pb.go
📒 Files selected for processing (52)
.github/workflows/wasm-build-validation.yml(1 hunks)client/cmd/debug.go(1 hunks)client/cmd/status.go(1 hunks)client/cmd/testutil_test.go(3 hunks)client/cmd/up.go(1 hunks)client/embed/embed.go(1 hunks)client/internal/connect.go(7 hunks)client/internal/debug/debug.go(7 hunks)client/internal/debug/upload.go(1 hunks)client/internal/debug/upload_test.go(2 hunks)client/internal/engine.go(13 hunks)client/internal/engine_test.go(9 hunks)client/jobexec/executor.go(1 hunks)client/proto/daemon.proto(0 hunks)client/server/debug.go(2 hunks)client/server/server.go(4 hunks)client/server/server_test.go(3 hunks)client/status/status.go(4 hunks)client/status/status_test.go(1 hunks)client/ui/debug.go(2 hunks)go.mod(2 hunks)management/internals/server/boot.go(1 hunks)management/internals/server/controllers.go(3 hunks)management/internals/server/modules.go(1 hunks)management/internals/shared/grpc/server.go(11 hunks)management/server/account.go(4 hunks)management/server/account/manager.go(1 hunks)management/server/account_test.go(2 hunks)management/server/activity/codes.go(2 hunks)management/server/dns_test.go(2 hunks)management/server/http/handlers/peers/peers_handler.go(3 hunks)management/server/http/testing/testing_tools/channel/channel.go(3 hunks)management/server/job/channel.go(1 hunks)management/server/job/manager.go(1 hunks)management/server/management_proto_test.go(4 hunks)management/server/management_test.go(4 hunks)management/server/mock_server/account_mock.go(2 hunks)management/server/nameserver_test.go(2 hunks)management/server/peer.go(1 hunks)management/server/peer_test.go(5 hunks)management/server/route_test.go(2 hunks)management/server/store/sql_store.go(3 hunks)management/server/store/store.go(1 hunks)management/server/types/job.go(1 hunks)shared/management/client/client.go(1 hunks)shared/management/client/client_test.go(3 hunks)shared/management/client/grpc.go(8 hunks)shared/management/client/mock.go(2 hunks)shared/management/http/api/generate.sh(1 hunks)shared/management/http/api/openapi.yml(2 hunks)shared/management/http/api/types.gen.go(8 hunks)shared/management/proto/management.proto(2 hunks)
💤 Files with no reviewable changes (1)
- client/proto/daemon.proto
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-13T00:29:53.247Z
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/cmd/ssh_exec_unix.go:53-74
Timestamp: 2025-11-13T00:29:53.247Z
Learning: In client/ssh/server/executor_unix.go, the method ExecuteWithPrivilegeDrop(ctx context.Context, config ExecutorConfig) has a void return type (no error return). It handles failures by exiting the process directly with appropriate exit codes rather than returning errors to the caller.
Applied to files:
client/jobexec/executor.go
📚 Learning: 2025-11-14T13:05:31.729Z
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/ssh/server/userswitching_windows.go:89-139
Timestamp: 2025-11-14T13:05:31.729Z
Learning: In client/ssh/server/executor_windows.go, the WindowsExecutorConfig struct's Pty, PtyWidth, and PtyHeight fields are intentionally left unused for now and will be implemented in a future update.
Applied to files:
client/jobexec/executor.go
🧬 Code graph analysis (42)
shared/management/client/client.go (3)
management/server/types/job.go (1)
Job(34-58)shared/management/http/api/types.gen.go (2)
JobRequest(708-710)JobResponse(713-721)shared/management/proto/management.pb.go (6)
JobRequest(388-398)JobRequest(413-413)JobRequest(428-430)JobResponse(463-475)JobResponse(490-490)JobResponse(505-507)
management/internals/server/controllers.go (3)
management/internals/server/server.go (1)
BaseServer(45-68)management/server/job/manager.go (2)
Manager(23-30)NewJobManager(32-42)management/internals/server/container.go (1)
Create(6-10)
client/status/status_test.go (1)
client/status/status.go (1)
ConvertToStatusOutputOverview(122-169)
client/internal/debug/upload_test.go (2)
client/internal/debug/upload.go (1)
UploadDebugBundle(17-28)upload-server/types/upload.go (1)
GetURLPath(9-9)
management/server/store/store.go (1)
management/server/types/job.go (1)
Job(34-58)
management/internals/server/boot.go (1)
management/internals/shared/grpc/server.go (1)
NewServer(84-147)
client/internal/debug/upload.go (1)
upload-server/types/upload.go (3)
GetURLResponse(15-18)ClientHeader(5-5)ClientHeaderValue(7-7)
client/cmd/up.go (1)
util/log.go (1)
FindFirstLogPath(77-84)
management/server/account_test.go (2)
management/server/account.go (1)
BuildManager(180-268)management/server/job/manager.go (1)
NewJobManager(32-42)
management/server/peer.go (7)
management/server/account.go (1)
DefaultAccountManager(68-113)management/server/types/job.go (2)
Job(34-58)Workload(60-64)management/server/permissions/modules/module.go (1)
Peers(7-7)management/server/permissions/operations/operation.go (1)
Delete(9-9)shared/management/status/error.go (5)
NewPermissionValidationError(213-215)NewPermissionDeniedError(209-211)NewPeerNotPartOfAccountError(105-107)Errorf(70-75)Type(46-46)management/server/store/store.go (3)
Store(50-211)LockingStrengthNone(47-47)LockingStrengthUpdate(43-43)management/server/activity/codes.go (1)
JobCreatedByUser(183-183)
management/server/http/testing/testing_tools/channel/channel.go (2)
management/server/job/manager.go (1)
NewJobManager(32-42)management/server/account.go (1)
BuildManager(180-268)
management/internals/server/modules.go (1)
management/server/account.go (1)
BuildManager(180-268)
shared/management/client/mock.go (3)
shared/management/http/api/types.gen.go (2)
JobRequest(708-710)JobResponse(713-721)shared/management/proto/management.pb.go (6)
JobRequest(388-398)JobRequest(413-413)JobRequest(428-430)JobResponse(463-475)JobResponse(490-490)JobResponse(505-507)management/server/types/job.go (1)
Job(34-58)
management/server/account/manager.go (1)
management/server/types/job.go (1)
Job(34-58)
client/embed/embed.go (1)
client/internal/connect.go (1)
NewConnectClient(51-62)
client/cmd/status.go (1)
client/status/status.go (1)
ConvertToStatusOutputOverview(122-169)
management/server/management_test.go (1)
management/server/job/manager.go (1)
NewJobManager(32-42)
management/server/peer_test.go (2)
management/server/account.go (1)
BuildManager(180-268)management/server/job/manager.go (1)
NewJobManager(32-42)
client/server/server.go (1)
client/status/status.go (1)
ToProtoFullStatus(549-635)
management/server/mock_server/account_mock.go (1)
management/server/types/job.go (1)
Job(34-58)
management/server/nameserver_test.go (2)
management/server/account.go (1)
BuildManager(180-268)management/server/job/manager.go (1)
NewJobManager(32-42)
management/server/job/channel.go (1)
management/server/job/manager.go (1)
Event(17-21)
management/server/http/handlers/peers/peers_handler.go (4)
management/server/context/auth.go (1)
GetUserAuthFromContext(25-30)shared/management/http/util/util.go (3)
WriteError(84-120)WriteErrorResponse(70-80)WriteJSONObject(27-35)shared/management/http/api/types.gen.go (3)
JobRequest(708-710)JobResponse(713-721)JobResponseStatus(724-724)management/server/types/job.go (3)
NewJob(67-103)Job(34-58)Workload(60-64)
client/jobexec/executor.go (3)
client/internal/debug/debug.go (3)
GeneratorDependencies(238-243)BundleConfig(232-236)NewBundleGenerator(245-264)client/internal/debug/upload.go (1)
UploadDebugBundle(17-28)upload-server/types/upload.go (1)
DefaultBundleURL(11-11)
client/server/debug.go (1)
client/internal/debug/upload.go (1)
UploadDebugBundle(17-28)
client/internal/engine.go (6)
client/internal/profilemanager/config.go (1)
Config(89-160)client/jobexec/executor.go (3)
Executor(23-24)NewExecutor(26-28)ErrJobNotImplemented(20-20)shared/management/client/client.go (1)
Client(14-27)shared/management/proto/management.pb.go (21)
JobRequest(388-398)JobRequest(413-413)JobRequest(428-430)JobResponse(463-475)JobResponse(490-490)JobResponse(505-507)JobStatus_failed(30-30)JobRequest_Bundle(457-459)JobRequest_Bundle(461-461)JobStatus_succeeded(29-29)BundleParameters(554-563)BundleParameters(578-578)BundleParameters(593-595)JobResponse_Bundle(548-550)JobResponse_Bundle(552-552)SyncResponse(721-738)SyncResponse(753-753)SyncResponse(768-770)BundleResult(625-631)BundleResult(646-646)BundleResult(661-663)client/internal/state.go (1)
CtxGetState(31-33)client/internal/debug/debug.go (2)
GeneratorDependencies(238-243)BundleConfig(232-236)
management/internals/shared/grpc/server.go (6)
management/server/account/manager.go (1)
Manager(27-131)management/server/job/manager.go (2)
Manager(23-30)Event(17-21)shared/management/proto/management_grpc.pb.go (1)
ManagementService_JobServer(427-431)shared/management/proto/management.pb.go (9)
JobRequest(388-398)JobRequest(413-413)JobRequest(428-430)JobResponse(463-475)JobResponse(490-490)JobResponse(505-507)EncryptedMessage(322-333)EncryptedMessage(348-348)EncryptedMessage(363-365)management/server/job/channel.go (2)
Channel(18-21)ErrJobChannelClosed(15-15)encryption/message.go (1)
EncryptMessage(10-24)
management/server/store/sql_store.go (1)
management/server/types/job.go (3)
Job(34-58)JobStatusPending(18-18)JobStatusFailed(20-20)
management/server/job/manager.go (3)
shared/management/proto/management.pb.go (6)
JobRequest(388-398)JobRequest(413-413)JobRequest(428-430)JobResponse(463-475)JobResponse(490-490)JobResponse(505-507)management/server/job/channel.go (2)
Channel(18-21)NewChannel(23-29)management/server/types/job.go (1)
Job(34-58)
shared/management/client/client_test.go (3)
management/server/job/manager.go (1)
NewJobManager(32-42)management/server/account.go (1)
BuildManager(180-268)management/internals/shared/grpc/server.go (1)
NewServer(84-147)
client/internal/connect.go (2)
client/internal/engine.go (2)
NewEngine(235-276)EngineConfig(83-142)client/internal/profilemanager/config.go (1)
Config(89-160)
management/server/dns_test.go (2)
management/server/account.go (1)
BuildManager(180-268)management/server/job/manager.go (1)
NewJobManager(32-42)
shared/management/client/grpc.go (3)
shared/management/proto/management.pb.go (13)
JobRequest(388-398)JobRequest(413-413)JobRequest(428-430)JobResponse(463-475)JobResponse(490-490)JobResponse(505-507)EncryptedMessage(322-333)EncryptedMessage(348-348)EncryptedMessage(363-365)JobStatus_failed(30-30)SyncResponse(721-738)SyncResponse(753-753)SyncResponse(768-770)shared/management/proto/management_grpc.pb.go (2)
ManagementService_JobClient(169-173)ManagementService_SyncClient(89-92)encryption/message.go (2)
EncryptMessage(10-24)DecryptMessage(27-40)
management/server/types/job.go (3)
shared/management/proto/management.pb.go (22)
JobStatus(25-25)JobStatus(57-59)JobStatus(61-63)JobStatus(70-72)JobRequest(388-398)JobRequest(413-413)JobRequest(428-430)BundleParameters(554-563)BundleParameters(578-578)BundleParameters(593-595)BundleResult(625-631)BundleResult(646-646)BundleResult(661-663)JobResponse(463-475)JobResponse(490-490)JobResponse(505-507)JobStatus_succeeded(29-29)JobStatus_failed(30-30)JobResponse_Bundle(548-550)JobResponse_Bundle(552-552)JobRequest_Bundle(457-459)JobRequest_Bundle(461-461)shared/management/http/api/types.gen.go (8)
JobRequest(708-710)WorkloadResponse(1948-1950)BundleParameters(361-373)BundleResult(376-378)BundleWorkloadResponse(391-399)WorkloadTypeBundle(195-195)WorkloadRequest(1943-1945)JobResponse(713-721)shared/management/status/error.go (4)
Errorf(70-75)BadRequest(36-36)InvalidArgument(27-27)Error(54-57)
client/server/server_test.go (3)
management/server/job/manager.go (1)
NewJobManager(32-42)management/server/account.go (1)
BuildManager(180-268)management/internals/shared/grpc/server.go (1)
NewServer(84-147)
client/internal/debug/debug.go (4)
util/log.go (1)
SpecialLogs(25-28)client/internal/profilemanager/profilemanager.go (1)
NewProfileManager(56-58)client/status/status.go (3)
ToProtoFullStatus(549-635)ConvertToStatusOutputOverview(122-169)ParseToFullDetailSummary(532-547)version/version.go (1)
NetbirdVersion(18-20)
client/internal/engine_test.go (5)
client/internal/engine.go (2)
NewEngine(235-276)EngineConfig(83-142)shared/management/client/mock.go (1)
MockClient(13-24)management/server/job/manager.go (1)
NewJobManager(32-42)management/server/account.go (1)
BuildManager(180-268)management/internals/shared/grpc/server.go (1)
NewServer(84-147)
shared/management/http/api/types.gen.go (2)
shared/management/proto/management.pb.go (9)
BundleParameters(554-563)BundleParameters(578-578)BundleParameters(593-595)BundleResult(625-631)BundleResult(646-646)BundleResult(661-663)JobRequest(388-398)JobRequest(413-413)JobRequest(428-430)management/server/types/job.go (1)
Workload(60-64)
management/server/route_test.go (2)
management/server/account.go (1)
BuildManager(180-268)management/server/job/manager.go (1)
NewJobManager(32-42)
client/status/status.go (2)
client/proto/daemon.pb.go (21)
FullStatus(1994-2008)FullStatus(2021-2021)FullStatus(2036-2038)ManagementState(1682-1689)ManagementState(1702-1702)ManagementState(1717-1719)SignalState(1621-1628)SignalState(1641-1641)SignalState(1656-1658)LocalPeerState(1528-1539)LocalPeerState(1552-1552)LocalPeerState(1567-1569)PeerState(1347-1369)PeerState(1382-1382)PeerState(1397-1399)RelayState(1743-1750)RelayState(1763-1763)RelayState(1778-1780)NSGroupState(1803-1811)NSGroupState(1824-1824)NSGroupState(1839-1841)client/internal/peer/status.go (6)
FullStatus(151-161)ManagementState(128-132)SignalState(121-125)LocalPeerState(106-112)RosenpassState(135-138)NSGroupState(142-148)
management/server/management_proto_test.go (3)
management/server/job/manager.go (1)
NewJobManager(32-42)management/server/account.go (1)
BuildManager(180-268)management/internals/shared/grpc/server.go (1)
NewServer(84-147)
client/cmd/testutil_test.go (3)
management/server/job/manager.go (1)
NewJobManager(32-42)management/server/account.go (1)
BuildManager(180-268)management/internals/shared/grpc/server.go (1)
NewServer(84-147)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: release_ui
- GitHub Check: release_ui_darwin
- GitHub Check: release
- GitHub Check: iOS / Build
- GitHub Check: Android / Build
- GitHub Check: JS / Lint
- GitHub Check: Windows
- GitHub Check: Linux
- GitHub Check: Darwin
- GitHub Check: Client / Unit
- GitHub Check: Client / Unit
- GitHub Check: Build Cache
- GitHub Check: Check External GPL/AGPL Licenses
- GitHub Check: Client / Unit
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
client/internal/connect.go(7 hunks)client/internal/engine.go(13 hunks)client/internal/engine_test.go(9 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
client/internal/engine_test.go (5)
client/internal/engine.go (2)
NewEngine(235-276)EngineConfig(83-142)shared/management/client/mock.go (1)
MockClient(13-24)management/server/job/manager.go (1)
NewJobManager(32-42)management/server/account.go (1)
BuildManager(180-268)management/internals/shared/grpc/server.go (1)
NewServer(84-147)
client/internal/engine.go (7)
client/internal/profilemanager/config.go (1)
Config(89-160)client/jobexec/executor.go (3)
Executor(23-24)NewExecutor(26-28)ErrJobNotImplemented(20-20)shared/management/client/client.go (1)
Client(14-27)shared/management/proto/management.pb.go (19)
JobRequest(388-398)JobRequest(413-413)JobRequest(428-430)JobResponse(463-475)JobResponse(490-490)JobResponse(505-507)JobStatus_failed(30-30)JobRequest_Bundle(457-459)JobRequest_Bundle(461-461)JobStatus_succeeded(29-29)BundleParameters(554-563)BundleParameters(578-578)BundleParameters(593-595)SyncResponse(721-738)SyncResponse(753-753)SyncResponse(768-770)BundleResult(625-631)BundleResult(646-646)BundleResult(661-663)client/internal/state.go (1)
CtxGetState(31-33)client/internal/debug/debug.go (2)
GeneratorDependencies(238-243)BundleConfig(232-236)client/server/server.go (1)
New(96-107)
client/internal/connect.go (3)
client/internal/engine.go (2)
NewEngine(235-276)EngineConfig(83-142)client/internal/profilemanager/config.go (1)
Config(89-160)shared/management/proto/management.pb.go (3)
PeerConfig(2082-2098)PeerConfig(2113-2113)PeerConfig(2128-2130)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
- GitHub Check: Relay / Unit (amd64, -race)
- GitHub Check: Management / Unit (amd64, mysql)
- GitHub Check: Management / Unit (amd64, postgres)
- GitHub Check: Management / Unit (amd64, sqlite)
- GitHub Check: Relay / Unit (386)
- GitHub Check: Management / Integration (amd64, postgres)
- GitHub Check: Management / Benchmark (API) (amd64, postgres)
- GitHub Check: Client (Docker) / Unit
- GitHub Check: Management / Benchmark (API) (amd64, sqlite)
- GitHub Check: Client / Unit (amd64)
- GitHub Check: Management / Benchmark (amd64, sqlite)
- GitHub Check: Management / Benchmark (amd64, postgres)
- GitHub Check: Client / Unit (386)
- GitHub Check: Client / Unit
- GitHub Check: release
- GitHub Check: release_ui_darwin
- GitHub Check: Android / Build
- GitHub Check: Linux
- GitHub Check: Client / Unit
- GitHub Check: Darwin
- GitHub Check: Windows
- GitHub Check: JS / Lint
- GitHub Check: Client / Unit
🔇 Additional comments (16)
client/internal/connect.go (5)
65-66: LGTM! logPath parameter threading is clean.The public Run signature now accepts logPath and forwards it correctly. Mobile variants appropriately default to empty string, which is sensible given platform-specific logging constraints.
Also applies to: 87-87, 105-105
108-108: LGTM!Internal run signature correctly updated to receive logPath.
251-251: LGTM!Call to createEngineConfig correctly passes logPath parameter.
275-275: LGTM!NewEngine call correctly updated with profile config parameter, enabling debug bundle generation support.
414-414: LGTM! EngineConfig enriched for debug bundle support.The function signature and field assignments correctly thread logPath and ProfileConfig through to EngineConfig. Note that the entire profilemanager.Config is stored in ProfileConfig—this appears intentional for debug bundle generation, which may need access to ManagementURL and other config details.
Also applies to: 449-452
client/internal/engine_test.go (3)
28-28: LGTM!Import added to support JobManager usage in test setup.
256-256: LGTM! Test updates are consistent.All NewEngine calls correctly updated with two additional nil parameters (checks, profileConfig). Using nil is appropriate for tests that don't exercise debug bundle or checks functionality.
Also applies to: 418-424, 637-643, 802-808, 1004-1010, 1536-1536
1595-1595: LGTM! Test infrastructure correctly wired.JobManager created and threaded through BuildManager and NewServer, enabling job subsystem testing. Using nil for metrics is appropriate for test context.
Also applies to: 1626-1626, 1632-1632
client/internal/engine.go (8)
34-34: LGTM!Imports added to support debug bundle generation and job execution.
Also applies to: 52-52
138-141: LGTM!EngineConfig fields added to support debug bundle generation. Clear comment documents the purpose.
206-207: LGTM! Proper concurrency primitives added.syncRespMux provides dedicated locking for sync response persistence, separate from syncMsgMux. jobExecutorWG ensures clean shutdown of job-related goroutines.
Also applies to: 220-221
234-234: LGTM!NewEngine signature correctly extended with profile config parameter. jobExecutor initialized during construction for handling management-service job requests.
Also applies to: 254-254
335-335: LGTM! Proper shutdown ordering.jobExecutorWG.Wait() ensures job-related goroutines complete before resource cleanup. The preceding e.cancel() will signal job goroutines to exit via context cancellation.
523-523: LGTM! receiveJobEvents follows established patterns.The function correctly:
- Increments jobExecutorWG before spawning goroutine
- Handles job requests synchronously via callback
- Triggers client restart on Job stream errors (consistent with Management/Signal event handlers)
- Returns proper error codes for unimplemented job types
Also applies to: 942-977
803-814: LGTM! Proper lock separation for sync response persistence.The code correctly uses syncRespMux (not syncMsgMux) to protect sync response persistence state. RLock for reading the flag, then separate write lock for storing the response—proper lock upgrade pattern.
1874-1886: LGTM! Thread-safe sync response persistence.Both functions correctly use syncRespMux for protection. GetLatestSyncResponse properly:
- Reads state under RLock
- Releases lock before expensive clone operation
- Returns a clone to prevent external mutation
Also applies to: 1889-1911
| } | ||
|
|
||
| // GetJobByID fetches job by ID | ||
| func (s *SqlStore) GetPeerJobByID(ctx context.Context, accountID, jobID string) (*types.Job, error) { |
There was a problem hiding this comment.
This method should also verify the peer this is coming from
|
|
||
| func (s *Server) startResponseReceiver(ctx context.Context, srv proto.ManagementService_JobServer) { | ||
| go func() { | ||
| for { |
There was a problem hiding this comment.
Ideally we should also check if the ctx is done here or Err() != nil
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
client/internal/engine.go (1)
1295-1326: Fix compile error:zone.GetSearchDomainDisabledis not defined onmgmProto.CustomZone
toDNSConfigcurrently does:dnsZone := nbdns.CustomZone{ Domain: zone.GetDomain(), SearchDomainDisabled: zone.GetSearchDomainDisabled(), SkipPTRProcess: zone.GetSkipPTRProcess(), }The pipeline error indicates
GetSearchDomainDisableddoes not exist on*mgmProto.CustomZone, so this code does not compile. You need to either:
- Add a corresponding
search_domain_disabled(or similarly named) field to theCustomZonemessage inshared/management/proto/management.protoand regenerate the Go bindings, or- Drop this mapping for now and rely on the default value in
nbdns.CustomZone.If you just want the current code to build without changing the proto, adjust as:
- dnsZone := nbdns.CustomZone{ - Domain: zone.GetDomain(), - SearchDomainDisabled: zone.GetSearchDomainDisabled(), - SkipPTRProcess: zone.GetSkipPTRProcess(), - } + dnsZone := nbdns.CustomZone{ + Domain: zone.GetDomain(), + SkipPTRProcess: zone.GetSkipPTRProcess(), + }and reintroduce the mapping once the proto field exists.
♻️ Duplicate comments (1)
client/internal/engine.go (1)
958-993: Nil pointer risk when usingProfileConfig.ManagementURLin bundle job handler
handleBundlecalls:uploadKey, err := e.jobExecutor.BundleJob( e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String(), )without checking whether
e.config.ProfileConfigorProfileConfig.ManagementURLare nil. Either being nil will panic when a bundle job arrives.Add a defensive check before constructing
bundleDepsand the URL, and return a failed job response if the configuration is missing. For example:func (e *Engine) handleBundle(params *mgmProto.BundleParameters) (*mgmProto.JobResponse_Bundle, error) { @@ - syncResponse, err := e.GetLatestSyncResponse() + syncResponse, err := e.GetLatestSyncResponse() @@ - bundleDeps := debug.GeneratorDependencies{ - InternalConfig: e.config.ProfileConfig, + if e.config.ProfileConfig == nil || e.config.ProfileConfig.ManagementURL == nil { + return nil, errors.New("profile config or management URL not available for bundle generation") + } + + bundleDeps := debug.GeneratorDependencies{ + InternalConfig: e.config.ProfileConfig, @@ - waitFor := time.Duration(params.BundleForTime) * time.Minute - - uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String()) + waitFor := time.Duration(params.BundleForTime) * time.Minute + managementURL := e.config.ProfileConfig.ManagementURL.String() + + uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, managementURL)This ensures the job handler fails gracefully instead of panicking the client.
Also applies to: 995-1028
🧹 Nitpick comments (7)
shared/management/client/client_test.go (1)
76-76: Consider passing metrics to JobManager instead of nil.The
JobManageris created withnilfor the metrics parameter, but a validmetricsobject is created just below on lines 100-101. Consider reordering the initialization to create metrics first, then pass it toNewJobManager, unless nil metrics is intentionally acceptable for test scenarios.Apply this diff to use the actual metrics object:
+ metrics, err := telemetry.NewDefaultAppMetrics(context.Background()) + require.NoError(t, err) + - jobManager := job.NewJobManager(nil, store) + jobManager := job.NewJobManager(metrics, store) eventStore := &activity.InMemoryEventStore{} ctrl := gomock.NewController(t) t.Cleanup(ctrl.Finish) permissionsManagerMock := permissions.NewMockManager(ctrl) permissionsManagerMock. EXPECT(). ValidateUserPermissions( gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any(), ). Return(true, nil). AnyTimes() peersManger := peers.NewManager(store, permissionsManagerMock) settingsManagerMock := settings.NewMockManager(ctrl) ia, _ := integrations.NewIntegratedValidator(context.Background(), peersManger, settingsManagerMock, eventStore) - metrics, err := telemetry.NewDefaultAppMetrics(context.Background()) - require.NoError(t, err) -shared/management/proto/management.proto (1)
91-96: Consider adding documentation forBundleParametersfields.The field names could benefit from proto comments for clarity:
bundle_foris ambiguous—it's unclear what this flag enables without context from the HTTP API docsbundle_for_timeshould document its unit (minutes, based on the HTTP API)message BundleParameters { - bool bundle_for = 1; - int64 bundle_for_time = 2; + // Whether to generate a bundle for a specific timeframe. + bool bundle_for = 1; + // Time period in minutes for which to generate the bundle. + int64 bundle_for_time = 2; int32 log_file_count = 3; + // Whether sensitive data should be anonymized in the bundle. bool anonymize = 4; }management/server/account/manager.go (1)
126-128: Normalize parameter ordering for peer job Manager methods
CreatePeerJobtakes(accountID, peerID, userID string, ...), whileGetAllPeerJobsandGetPeerJobByIDuse(accountID, userID, peerID string, ...). This asymmetry is easy to trip over in implementations and mocks.Consider standardizing on one ordering (e.g.
ctx, accountID, userID, peerID, ...) for all three methods and updating implementations accordingly.client/internal/engine_test.go (1)
239-257: Consider centralizing NewEngine test construction to avoid duplicatednil, nilargumentsAll the
NewEngine(...)calls now end with..., peer.NewRecorder("https://mgm"), nil, nil). This is correct for the current signature, but the repeated trailingnil, nilacross multiple tests makes future signature changes easy to miss.Consider a small helper (e.g.
newTestEngine(t, ctx, cfg, mgmtClientOpts...)) that sets up the common arguments (including the last two) so only that helper needs updating ifNewEngineevolves again.Also applies to: 417-425, 637-644, 801-809, 1003-1011, 1536-1537
management/internals/shared/grpc/server.go (2)
55-77: Consider guarding against a niljobManagerin server construction
Servernow depends onjobManager(CreateJobChannel,CloseChannel,HandleResponse), butNewServeraccepts a*job.Managerwithout validating it. If any caller passesnil, theJobRPC will panic. Consider either:
- Requiring non-nil and returning an error from
NewServerifjobManager == nil, or- Making the
Jobhandler explicitly reject calls whenjobManageris nil with a gRPCInternal/Unavailableerror.This keeps failures explicit instead of panicking.
Also applies to: 79-132
175-210: Job streaming flow is coherent; error handling strategy is acceptable but could be tightenedThe
JobRPC handshake (decrypt first message, resolve account/peer, set context keys), background response receiver (startResponseReceiver), andsendJobsLoop/sendJobpipeline all look logically consistent and reuse existing encryption and context patterns fromSync. The goroutine cleanly exits onio.EOF/context.Canceled, and the job channel is closed viadefer CloseChannel.The only notable tradeoff is that
sendJobsLooplogs and returnsnilonsendJobfailures, so the client sees a successful stream close rather than an error status; you already left a TODO to refine this. Given that, the current implementation is reasonable for an initial version.Also applies to: 327-390, 450-473
client/internal/engine.go (1)
235-255: Unusedc *profilemanager.Configparameter inNewEngine
NewEngineaccepts ac *profilemanager.Configargument but doesn’t use it; all configuration is read fromconfig *EngineConfiginstead. Unless future work will usec, consider removing this parameter to avoid confusion or, alternatively, wiringcintoconfig.ProfileConfighere.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
client/proto/daemon.pb.gois excluded by!**/*.pb.gogo.sumis excluded by!**/*.sum
📒 Files selected for processing (29)
client/cmd/testutil_test.go(4 hunks)client/internal/engine.go(13 hunks)client/internal/engine_test.go(10 hunks)client/proto/daemon.proto(0 hunks)client/server/server.go(4 hunks)client/server/server_test.go(4 hunks)go.mod(2 hunks)management/internals/server/boot.go(1 hunks)management/internals/server/controllers.go(3 hunks)management/internals/server/modules.go(1 hunks)management/internals/shared/grpc/server.go(10 hunks)management/server/account.go(4 hunks)management/server/account/manager.go(1 hunks)management/server/account_test.go(2 hunks)management/server/activity/codes.go(2 hunks)management/server/dns_test.go(2 hunks)management/server/http/handlers/peers/peers_handler.go(3 hunks)management/server/http/testing/testing_tools/channel/channel.go(3 hunks)management/server/management_proto_test.go(4 hunks)management/server/management_test.go(3 hunks)management/server/mock_server/account_mock.go(2 hunks)management/server/nameserver_test.go(2 hunks)management/server/peer.go(1 hunks)management/server/peer_test.go(5 hunks)management/server/route_test.go(2 hunks)shared/management/client/client_test.go(4 hunks)shared/management/http/api/openapi.yml(2 hunks)shared/management/http/api/types.gen.go(8 hunks)shared/management/proto/management.proto(2 hunks)
💤 Files with no reviewable changes (1)
- client/proto/daemon.proto
🚧 Files skipped from review as they are similar to previous changes (12)
- management/server/account_test.go
- management/server/dns_test.go
- management/server/account.go
- management/internals/server/modules.go
- management/server/route_test.go
- management/server/http/testing/testing_tools/channel/channel.go
- management/server/peer_test.go
- client/cmd/testutil_test.go
- management/server/activity/codes.go
- management/server/management_test.go
- management/server/peer.go
- management/server/nameserver_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-28T12:20:47.254Z
Learnt from: bcmmbaga
Repo: netbirdio/netbird PR: 4849
File: management/internals/modules/zones/manager/manager.go:55-86
Timestamp: 2025-11-28T12:20:47.254Z
Learning: In the NetBird management server, DNS zones without records are automatically filtered out in network map generation (filterPeerAppliedZones in management/internals/controllers/network_map/controller/controller.go checks `len(zone.Records) == 0`). Therefore, CreateZone operations don't need to call UpdateAccountPeers since empty zones don't affect the network map.
Applied to files:
shared/management/client/client_test.go
🧬 Code graph analysis (10)
management/server/account/manager.go (1)
management/server/types/job.go (1)
Job(34-58)
client/server/server_test.go (3)
management/server/job/manager.go (1)
NewJobManager(32-42)management/server/account.go (1)
BuildManager(178-266)management/internals/shared/grpc/server.go (1)
NewServer(80-132)
management/internals/server/boot.go (1)
management/internals/shared/grpc/server.go (1)
NewServer(80-132)
management/server/mock_server/account_mock.go (1)
management/server/types/job.go (1)
Job(34-58)
management/server/management_proto_test.go (3)
management/server/job/manager.go (1)
NewJobManager(32-42)management/server/account.go (1)
BuildManager(178-266)management/internals/shared/grpc/server.go (1)
NewServer(80-132)
shared/management/client/client_test.go (3)
management/server/job/manager.go (1)
NewJobManager(32-42)management/server/account.go (1)
BuildManager(178-266)management/internals/shared/grpc/server.go (1)
NewServer(80-132)
client/server/server.go (1)
client/status/status.go (1)
ToProtoFullStatus(549-635)
management/internals/server/controllers.go (3)
management/internals/server/server.go (1)
BaseServer(45-68)management/server/job/manager.go (2)
Manager(23-30)NewJobManager(32-42)management/internals/server/container.go (1)
Create(6-10)
management/server/http/handlers/peers/peers_handler.go (4)
management/server/context/auth.go (1)
GetUserAuthFromContext(25-30)shared/management/http/util/util.go (3)
WriteError(84-120)WriteErrorResponse(70-80)WriteJSONObject(27-35)shared/management/http/api/types.gen.go (3)
JobRequest(708-710)JobResponse(713-721)JobResponseStatus(724-724)management/server/types/job.go (3)
NewJob(67-103)Job(34-58)Workload(60-64)
client/internal/engine_test.go (3)
client/internal/engine.go (2)
NewEngine(235-276)EngineConfig(83-142)management/server/job/manager.go (1)
NewJobManager(32-42)management/server/account.go (1)
BuildManager(178-266)
🪛 GitHub Actions: Linux
client/internal/engine.go
[error] 1312-1312: zone.GetSearchDomainDisabled undefined (type *"github.com/netbirdio/netbird/shared/management/proto".CustomZone has no field or method GetSearchDomainDisabled)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: Android / Build
- GitHub Check: release_ui_darwin
- GitHub Check: release
- GitHub Check: release_ui
- GitHub Check: JS / Lint
- GitHub Check: iOS / Build
- GitHub Check: Linux
- GitHub Check: Darwin
- GitHub Check: Windows
- GitHub Check: Client / Unit
- GitHub Check: Check External GPL/AGPL Licenses
- GitHub Check: Client / Unit
- GitHub Check: Client / Unit
🔇 Additional comments (21)
shared/management/client/client_test.go (2)
27-27: LGTM!The import addition is necessary to support JobManager instantiation in the test setup.
123-123: LGTM!The
BuildManagercall correctly includes thejobManagerparameter in the expected position (5th parameter).management/internals/server/controllers.go (1)
29-33: LGTM!The
JobManager()accessor correctly follows the established lazy initialization pattern using theCreatehelper, consistent with other accessors likePeersUpdateManager()andIntegratedValidator().shared/management/proto/management.proto (2)
51-53: LGTM!The bidirectional streaming RPC correctly follows the encrypted message pattern used by other management service endpoints, maintaining consistency in the API design.
75-89: Potential inconsistency between proto and HTTP API status values.The
JobStatusenum here definesunknown_status,succeeded, andfailed, but the HTTP API types (JobResponseStatusintypes.gen.go) include an additionalpendingstatus. If jobs can be in a pending state before completion, consider adding it to the proto enum for consistency:enum JobStatus { unknown_status = 0; //placeholder - succeeded = 1; - failed = 2; + pending = 1; + succeeded = 2; + failed = 3; }Also,
Reasonon line 84 uses PascalCase while proto convention typically uses snake_case (reason).shared/management/http/api/types.gen.go (1)
1-4: This is auto-generated code.This file is generated by
oapi-codegen. Any issues with the types or structure should be addressed in the source OpenAPI specification rather than this file.management/internals/server/boot.go (1)
148-148: LGTM!The
JobManageris correctly wired into the gRPC server construction, following the established dependency injection pattern. The parameter position matches theNewServerfunction signature inmanagement/internals/shared/grpc/server.go.management/server/management_proto_test.go (2)
343-343: LGTM!The
JobManagerinstantiation withnilmetrics is acceptable for test purposes—the implementation handles nil metrics gracefully.
372-373: LGTM!The test correctly wires
jobManagerinto bothBuildManagerandnbgrpc.NewServer, matching the updated function signatures and maintaining consistency with the production code inboot.go.Also applies to: 386-386
client/server/server.go (2)
31-33: Status conversion now correctly uses shared helperSwitching to
nbstatus.ToProtoFullStatus(fullStatus)centralizes the in‑memory→proto mapping and keeps this server aligned with the status package; the subsequent mutation ofEventsandSshServerStatestill fits the previous behaviour.Also applies to: 1080-1088
1537-1545: ConnectClient.Run wiring matches new signaturePassing
s.logFileintos.connectClient.Run(runningChan, s.logFile)correctly adapts to the new Run signature and ensures the engine has access to the daemon’s log path without altering existing retry semantics.shared/management/http/api/openapi.yml (3)
35-38: Jobs tag fits existing API categorisationAdding the
Jobstag (markedx-experimental) matches the tagging style of other feature areas and clearly scopes the new endpoints.
41-156: Workload and Job schemas are consistent and extensibleThe
WorkloadType/Bundle*schemas plusWorkloadRequest/ResponseandJobRequest/JobResponseare structured cleanly:
- Discriminator on
typewithbundlemapping is valid and forward‑extensible.JobResponserequired fields avoid nullable properties (completed_at,failed_reason) while guaranteeingid,status,triggered_by, andworkload.- Bundle parameter/result fields match the described bundle use case.
No structural issues from an OpenAPI standpoint.
2353-2456: Peer Job endpoints integrate cleanly into peers API
/api/peers/{peerId}/jobs(GET/POST) and/api/peers/{peerId}/jobs/{jobId}(GET):
- Use the new
JobRequest/JobResponseschemas correctly.- Reuse
BearerAuthandTokenAuthsecurity consistent with other peer operations.- Return an array for list, 201 with a single
JobResponsefor create, and 200 for get.The shape and placement of these endpoints look correct and consistent with the rest of the spec.
client/server/server_test.go (1)
23-24: JobManager wiring in management test server matches production setupCreating a
jobManager := job.NewJobManager(nil, store)and threading it intoserver.BuildManager(...)andnbgrpc.NewServer(...)keeps the test server aligned with the new job‑enabled construction path without altering existing behaviour of this test.Also applies to: 298-332
client/internal/engine_test.go (1)
27-29: Engine tests correctly integrate JobManager into management server startupImporting
management/server/job, constructingjobManager := job.NewJobManager(nil, store), and passing it through toserver.BuildManager(...)andnbgrpc.NewServer(...)keeps this test’s management server in sync with the new job‑aware construction path and should let engine tests run against the same wiring as production.Also applies to: 1595-1636
management/server/http/handlers/peers/peers_handler.go (2)
31-41: Peer job HTTP endpoints and handlers look consistent with existing patternsThe new
/peers/{peerId}/jobsand/peers/{peerId}/jobs/{jobId}routes plusCreateJob,ListJobs, andGetJobfollow the existing handler conventions: user auth from context,mux.Varsfor IDs, JSON decoding with proper 400 on parse failure, and error propagation viautil.WriteError. Delegation toaccountManagerfor authorization and data access is appropriate. I don’t see correctness or security issues in this flow.Also applies to: 51-142
594-614: Job→API response mapping is correct and defensive
toSingleJobResponsebuildsapi.JobResponsefromtypes.Jobcleanly, including building the workload viaBuildWorkloadResponse, handling optionalFailedReason, and preserving timestamps and status. This matches the shared API types and is safe as long asBuildWorkloadResponsereturns a non-nil workload whenerr == nil, which it’s already checked for.management/server/mock_server/account_mock.go (1)
132-155: Job-related mock hooks are wired correctlyThe new
CreatePeerJobFunc,GetAllPeerJobsFunc, andGetPeerJobByIDFuncfields and their methods mirror the existing mock style (delegate if set, otherwise returncodes.Unimplemented). This should integrate cleanly with the expandedaccount.Managerinterface and make job behavior easily configurable in tests.client/internal/engine.go (2)
205-212: Sync response persistence refactor withsyncRespMuxlooks correctThe move from using
syncMsgMuxto a dedicatedsyncRespMuxforpersistSyncResponse/latestSyncResponseremoves the race between sync handling and debug bundle reads:
handleSyncnow checkspersistSyncResponseunderRLockand conditionally writeslatestSyncResponseunderLock.SetSyncResponsePersistenceandGetLatestSyncResponseboth operate entirely undersyncRespMux, withGetLatestSyncResponsecloning before returning.The lock usage avoids deadlocks (no upgrade while holding
RLock) and keeps sync message processing serialized viasyncMsgMuxas before.Also applies to: 770-841, 1891-1929
958-993: Job event client wiring matches existing management/signal patterns
receiveJobEventsmirrors the existingreceiveManagementEvents/receiveSignalEventspatterns: it uses a dedicated wait group, runsmgmClient.Jobwith the engine context, maps terminal errors to a global client restart (ErrResetConnection+clientCancel), and defaults job status to failed unless the handler succeeds. That’s a good, conservative default and makes job-stream failures consistent with other control-plane failures.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
shared/management/client/client_test.go (1)
76-135: ConstructjobManagerwith real metrics instead ofnilto match production wiring.Right now the job manager is created with a
nilmetrics handle even thoughmetricsis initialized later viatelemetry.NewDefaultAppMetrics. To avoid potential nil dereferences inside the job manager and to keep the test server wiring closer to production, it’s better to build the job manager with the real metrics instance after it’s created. The rest of the wiring (NewController,BuildManager,nbgrpc.NewServer) matches the signatures shown in the referenced snippets. Based on relevant_code_snippets frommanagement/server/job/manager.go,management/server/account.go, andmanagement/internals/shared/grpc/server.go.You could adjust as follows:
- jobManager := job.NewJobManager(nil, store) - eventStore := &activity.InMemoryEventStore{} + eventStore := &activity.InMemoryEventStore{} @@ - metrics, err := telemetry.NewDefaultAppMetrics(context.Background()) - require.NoError(t, err) + metrics, err := telemetry.NewDefaultAppMetrics(context.Background()) + require.NoError(t, err) + jobManager := job.NewJobManager(metrics, store)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
shared/management/proto/management.pb.gois excluded by!**/*.pb.go
📒 Files selected for processing (1)
shared/management/client/client_test.go(4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-28T12:20:47.254Z
Learnt from: bcmmbaga
Repo: netbirdio/netbird PR: 4849
File: management/internals/modules/zones/manager/manager.go:55-86
Timestamp: 2025-11-28T12:20:47.254Z
Learning: In the NetBird management server, DNS zones without records are automatically filtered out in network map generation (filterPeerAppliedZones in management/internals/controllers/network_map/controller/controller.go checks `len(zone.Records) == 0`). Therefore, CreateZone operations don't need to call UpdateAccountPeers since empty zones don't affect the network map.
Applied to files:
shared/management/client/client_test.go
🧬 Code graph analysis (1)
shared/management/client/client_test.go (4)
management/server/job/manager.go (1)
NewJobManager(32-42)management/internals/controllers/network_map/controller/controller.go (1)
NewController(75-111)management/internals/modules/peers/ephemeral/manager/ephemeral.go (1)
NewEphemeralManager(53-61)management/server/account.go (1)
BuildManager(178-266)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: release_ui
- GitHub Check: release
- GitHub Check: release_ui_darwin
- GitHub Check: Build Cache
- GitHub Check: JS / Lint
- GitHub Check: Client / Unit
- GitHub Check: Darwin
- GitHub Check: Linux
- GitHub Check: Windows
- GitHub Check: iOS / Build
- GitHub Check: Android / Build
- GitHub Check: Client / Unit
- GitHub Check: Client / Unit
🔇 Additional comments (1)
shared/management/client/client_test.go (1)
20-28: New imports for ephemeral manager and job manager look correct and are properly used.The
ephemeral_manageralias andjobpackage are both referenced later instartManagement(forNewEphemeralManagerandNewJobManagerrespectively), so the import changes are consistent and compile-time safe.
There was a problem hiding this comment.
Actionable comments posted: 7
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
management/server/job/manager.go(1 hunks)management/server/management_test.go(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- management/server/management_test.go
🧰 Additional context used
🧬 Code graph analysis (1)
management/server/job/manager.go (3)
management/server/job/channel.go (2)
Channel(18-21)NewChannel(23-29)management/server/store/store.go (1)
Store(50-211)management/server/types/job.go (1)
Job(34-58)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (26)
- GitHub Check: Relay / Unit (386)
- GitHub Check: Management / Benchmark (amd64, postgres)
- GitHub Check: Relay / Unit (amd64, -race)
- GitHub Check: Management / Integration (amd64, sqlite)
- GitHub Check: Management / Benchmark (API) (amd64, postgres)
- GitHub Check: Management / Unit (amd64, postgres)
- GitHub Check: Management / Integration (amd64, postgres)
- GitHub Check: Management / Unit (amd64, sqlite)
- GitHub Check: Signal / Unit (386)
- GitHub Check: Client / Unit (amd64)
- GitHub Check: Client / Unit (386)
- GitHub Check: Management / Unit (amd64, mysql)
- GitHub Check: Management / Benchmark (API) (amd64, sqlite)
- GitHub Check: Management / Benchmark (amd64, sqlite)
- GitHub Check: Client (Docker) / Unit
- GitHub Check: Client / Unit
- GitHub Check: Client / Unit
- GitHub Check: Windows
- GitHub Check: Darwin
- GitHub Check: Linux
- GitHub Check: Android / Build
- GitHub Check: Client / Unit
- GitHub Check: release_ui_darwin
- GitHub Check: release
- GitHub Check: release_ui
- GitHub Check: JS / Lint
🔇 Additional comments (5)
management/server/job/manager.go (5)
17-21: LGTM!The Event struct is a clean container for tracking job request/response pairs.
23-30: LGTM!The Manager struct is properly designed with appropriate fields for concurrent job management.
32-42: LGTM!The constructor properly initializes all Manager fields with sensible defaults.
153-159: LGTM!The method correctly checks peer connection status with appropriate read lock usage.
161-171: LGTM!The method correctly checks for pending jobs with appropriate read lock usage. The O(n) scan is acceptable for typical workloads.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
management/server/types/job.go (1)
120-138: Potential JSON shape mismatch between stored proto.BundleResult and api.BundleResult
ApplyResponsestoresresp.WorkloadResults.(*proto.JobResponse_Bundle).Bundleby JSON‑marshalling the proto struct, whilebuildBundleResponselater unmarshals that blob intoapi.BundleResult. Ifproto.BundleResult.UploadKeyis still a wrapped type (e.g. a protobufStringmessage) rather than a plainstring, the JSON will be{ "upload_key": { "value": "..." } }, which will fail to unmarshal intoapi.BundleResult’sUploadKey *string. Consider mapping the proto result into anapi.BundleResultand marshalling that instead, or updating the proto field to a plainstring, so the stored JSON shape matches the HTTP type.To confirm the field type and JSON shape, please run:
#!/bin/bash # Inspect BundleResult in the proto package and compare to HTTP BundleResult rg -n "message BundleResult" shared/management/proto -n -C3 || true rg -n "type BundleResult struct" shared/management/http/api -n -C5 || trueAlso applies to: 164-201
management/server/job/manager.go (1)
24-32: Race: SendJob can panic sending on a channel closed by CloseChannel/CreateJobChannel, and old streams can close new channels
SendJobgrabs a*ChannelunderRLock, releases the mutex, then callsch.AddEvent, whileCloseChannelandCreateJobChannelcan concurrently callch.Close()on the same channel underLock. This leaves a window whereAddEventwrites into a channel that has just been closed, which will panic with “send on closed channel.” Additionally, becauseCloseChannellooks up the channel only bypeerID, a previous Job stream’s deferredCloseChannelcan close and delete a new channel created for the same peer by a later Job stream. You should either (a) ensure channels are not closed whileAddEventmay be sending (e.g. by coordinating via the same mutex or a per‑channel state flag), or (b) makeAddEventrobust to a concurrently closed channel (e.g. recovering from the panic and returning a typedErrJobChannelClosed, whichSendJobalready routes intocleanup).Also applies to: 34-45, 47-65, 68-91, 130-149, 151-162
🧹 Nitpick comments (2)
management/internals/shared/grpc/server.go (1)
446-449: Minor: comment still refers to “update message” instead of “job message”The comment above
sendJobmentions “update message” and “sync server”, which is copy‑pasted from the Sync path; updating it to refer to “job message”/“Job server” would avoid confusion.management/server/store/sql_store.go (1)
109-118: Job persistence methods are scoped correctly; consider checking RowsAffected in CompletePeerJob
CreatePeerJob,MarkPendingJobsAsFailed,MarkAllPendingJobsAsFailed,GetPeerJobByID, andGetPeerJobsall use appropriate account/peer scoping and statuses, which should keep job rows well‑bounded. InCompletePeerJob, however, a missing job ID currently yields a successful return even if no rows were updated; consider checkingresult.RowsAffectedand returning a NotFound‑style error when it is zero to avoid silently losing job completions.Also applies to: 136-157, 159-193, 195-225
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
management/internals/modules/peers/manager.go(2 hunks)management/internals/server/controllers.go(3 hunks)management/internals/shared/grpc/server.go(10 hunks)management/server/job/manager.go(1 hunks)management/server/store/sql_store.go(4 hunks)management/server/store/store.go(1 hunks)management/server/types/job.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
management/internals/modules/peers/manager.go (1)
management/server/store/store.go (1)
LockingStrengthNone(47-47)
management/server/types/job.go (1)
shared/management/http/api/types.gen.go (7)
JobRequest(708-710)WorkloadResponse(1948-1950)BundleParameters(361-373)BundleResult(376-378)BundleWorkloadResponse(391-399)WorkloadTypeBundle(195-195)JobResponse(713-721)
management/server/job/manager.go (3)
management/server/job/channel.go (2)
Channel(18-21)NewChannel(23-29)management/server/store/store.go (1)
Store(50-213)management/internals/modules/peers/manager.go (1)
Manager(23-33)
management/server/store/store.go (1)
management/server/types/job.go (1)
Job(34-58)
management/server/store/sql_store.go (2)
management/server/types/job.go (3)
Job(34-58)JobStatusPending(18-18)JobStatusFailed(20-20)management/server/store/store.go (2)
LockingStrength(40-40)LockingStrengthNone(47-47)
🔇 Additional comments (6)
management/server/types/job.go (1)
66-103: Job creation and bundle parameter validation look consistent
NewJobandvalidateAndBuildBundleParamscorrectly derive the workload type, enforce tight bounds onBundleForTimeandLogFileCount, and persist a normalized JSON parameters blob; this should interoperate cleanly withbuildStreamBundleResponseand the protoBundleParameters.Also applies to: 140-162
management/internals/server/controllers.go (1)
29-33: JobManager wiring via BaseServer accessor looks correctThe
JobManager()accessor follows the existingCreatepattern and passes the expected dependencies (Metrics,Store,PeersManager) intojob.NewJobManager, so the wiring is coherent with the rest of the server.management/internals/modules/peers/manager.go (1)
23-33: GetPeerID helper cleanly delegates to storeExtending
ManagerwithGetPeerIDand delegating tostore.GetPeerIDByKeywithLockingStrengthNoneis straightforward and keeps peer‑ID resolution centralized in the store.Also applies to: 165-167
management/server/job/manager.go (1)
93-128: HandleResponse flow (pending lookup, peer validation, DB completion) looks soundWithin the manager lock,
HandleResponsenow looks up the pending event by jobID, resolves the responding peer ID viapeersManager.GetPeerID, enforces equality withevent.PeerID, applies the proto response onto atypes.Job, persists withCompletePeerJob, and only then deletes the pending entry; this fixes earlier concerns about peer spoofing and premature pending cleanup.management/internals/shared/grpc/server.go (1)
79-132: Job streaming wiring and handshake/response receiver are coherentThe
Jobhandler’s handshake, account/peer lookup, response receiver startup, and per‑peerCreateJobChannel/sendJobsLoopintegration follow the existing Sync patterns and correctly tie intojob.Managerand the store; the response receiver also exits cleanly on EOF/context cancellation while validating and dispatching decryptedJobResponses.Also applies to: 175-207, 324-337, 339-362, 364-386, 446-469
management/server/store/store.go (1)
40-48: Store interface extensions for jobs and GetPeerIDByKey look appropriateAdding job lifecycle methods and
GetPeerIDByKeytoStorematches the SqlStore implementation and the new job/peers flows; just ensure any non‑SQLStoreimplementations are updated accordingly to avoid build breakage.If you have other
Storeimplementations (e.g. a FileStore), please confirm they now implement these methods as well.Also applies to: 206-213
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
management/server/store/sql_store.go (1)
146-157: Address past review feedback: Verify peer ownership in CompletePeerJob.As noted in past reviews, this method should verify that the job being completed actually belongs to the peer making the request. Currently, it only checks the job ID without validating the peer_id, which could allow a peer to complete another peer's job.
Apply this diff to add peer verification:
func (s *SqlStore) CompletePeerJob(ctx context.Context, job *types.Job) error { result := s.db. Model(&types.Job{}). - Where(idQueryCondition, job.ID). + Where(accountAndIDQueryCondition+" AND id = ?", job.AccountID, job.PeerID, job.ID). Updates(job) if result.Error != nil { log.WithContext(ctx).Errorf("failed to update job in store: %s", result.Error) return status.Errorf(status.Internal, "failed to update job in store") } + + if result.RowsAffected == 0 { + return status.Errorf(status.NotFound, "job %s not found or does not belong to peer %s", job.ID, job.PeerID) + } + return nil }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
management/server/store/sql_store.go(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
management/server/store/sql_store.go (2)
management/server/types/job.go (3)
Job(34-58)JobStatusPending(18-18)JobStatusFailed(20-20)management/server/store/store.go (2)
LockingStrength(40-40)LockingStrengthNone(47-47)
🔇 Additional comments (3)
management/server/store/sql_store.go (3)
43-52: LGTM: Constants follow existing patterns.The new
accountAndPeerIDQueryConditionconstant is consistent with the existing query condition constants and will be used for job filtering operations.
117-117: LGTM: Job entity added to auto-migration.The addition of
&types.Job{}to the AutoMigrate call ensures the jobs table schema is properly created and maintained.
4221-4239: LGTM: GetPeerIDByKey correctly uses GetKeyQueryCondition helper.The method properly uses
GetKeyQueryCondition(s)at line 4230 to handle MySQL's reservedkeyidentifier, addressing the concern raised in past reviews about MySQL compatibility.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
client/jobexec/executor.go (1)
63-63: Fix grammatical error in log message.The message "generated well" is grammatically incorrect and misleading since at this point both generation and upload have succeeded.
- log.Infof("debug bundle has been generated well") + log.Infof("debug bundle has been uploaded successfully")
🧹 Nitpick comments (1)
client/jobexec/executor.go (1)
51-61: Consider retry-friendly cleanup strategy on upload failure.Currently, the deferred
os.Removedeletes the bundle file regardless of upload outcome. If the upload fails, the generated bundle is lost, requiring full regeneration for any retry attempt.Consider keeping the file on upload failure to enable retry, or implementing retry logic within this method before cleanup.
- defer func() { - if err := os.Remove(path); err != nil { - log.Errorf("failed to remove debug bundle file: %v", err) - } - }() - key, err := debug.UploadDebugBundle(ctx, types.DefaultBundleURL, mgmURL, path) if err != nil { log.Errorf("failed to upload debug bundle: %v", err) return "", fmt.Errorf("upload debug bundle: %w", err) } + + if err := os.Remove(path); err != nil { + log.Errorf("failed to remove debug bundle file: %v", err) + }Alternatively, implement retry logic before cleanup if transient upload failures are expected.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
client/jobexec/executor.go(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-13T00:29:53.247Z
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/cmd/ssh_exec_unix.go:53-74
Timestamp: 2025-11-13T00:29:53.247Z
Learning: In client/ssh/server/executor_unix.go, the method ExecuteWithPrivilegeDrop(ctx context.Context, config ExecutorConfig) has a void return type (no error return). It handles failures by exiting the process directly with appropriate exit codes rather than returning errors to the caller.
Applied to files:
client/jobexec/executor.go
📚 Learning: 2025-11-14T13:05:31.729Z
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/ssh/server/userswitching_windows.go:89-139
Timestamp: 2025-11-14T13:05:31.729Z
Learning: In client/ssh/server/executor_windows.go, the WindowsExecutorConfig struct's Pty, PtyWidth, and PtyHeight fields are intentionally left unused for now and will be implemented in a future update.
Applied to files:
client/jobexec/executor.go
🧬 Code graph analysis (1)
client/jobexec/executor.go (3)
client/internal/debug/debug.go (3)
GeneratorDependencies(238-243)BundleConfig(232-236)NewBundleGenerator(245-264)client/internal/debug/upload.go (1)
UploadDebugBundle(17-28)upload-server/types/upload.go (1)
DefaultBundleURL(11-11)
🔇 Additional comments (5)
client/jobexec/executor.go (5)
1-14: LGTM!Package declaration and imports are correctly structured. All imports are utilized within the file.
16-18: LGTM!The 60-minute cap on bundle wait time is a reasonable safeguard against excessively long waits.
24-29: LGTM!Empty struct with constructor is an acceptable pattern for future extensibility.
67-76: LGTM!The
waitForimplementation correctly handles context cancellation and returns the appropriate error, addressing prior review feedback. The logging provides good observability.
20-22: Remove this comment —ErrJobNotImplementedis actively used.The error is referenced in
client/internal/engine.go:979where it's accessed via.Error()method. It's appropriately exported for use across packages.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
client/internal/engine.go (1)
235-255: Remove unusedc *profilemanager.Configparameter inNewEngine
NewEngine’s last parameterc *profilemanager.Configis never used in the function body. This is dead API surface and (depending on tooling) can trigger compile or lint failures and unnecessary churn at call sites.Consider dropping this parameter from the signature and corresponding call sites, since
EngineConfig.ProfileConfigalready carries the same information.-func NewEngine(..., config *EngineConfig, mobileDep MobileDependency, statusRecorder *peer.Status, checks []*mgmProto.Checks, c *profilemanager.Config) *Engine { +func NewEngine(..., config *EngineConfig, mobileDep MobileDependency, statusRecorder *peer.Status, checks []*mgmProto.Checks) *Engine {…and update call sites accordingly.
♻️ Duplicate comments (4)
management/server/peer.go (3)
348-358: Potential nil-pointer onam.jobManager.The code calls
am.jobManager.IsPeerConnected(peerID)andam.jobManager.IsPeerHasPendingJobs(peerID)without checking ifam.jobManageris non-nil. If the JobManager is ever nil (e.g., in certain test configurations or misconfigured environments), this will panic.Consider adding a defensive check at the start of the function:
+ if am.jobManager == nil { + return status.Errorf(status.Internal, "job manager is not configured") + } + if !am.jobManager.IsPeerConnected(peerID) {Additionally, the TODO comment on lines 353-355 correctly identifies a race condition where concurrent calls could overwrite pending job entries. This should be addressed in a follow-up to ensure job queue integrity.
Based on past review comments, this issue was previously flagged but remains unaddressed.
365-398: Send-before-persist ordering can cause inconsistency between runtime and persistent state.The current implementation sends the job to the peer (line 366) before persisting it to the database (line 379). If the database write fails after the job has been dispatched, the job will execute on the client but will be absent from persistent storage and from
/jobsAPI listings.This ordering can lead to:
- Jobs executing on peers with no database record
- Inability to track or query these jobs via the API
- Potential audit and compliance issues
Consider reversing the order:
- Persist the job first with
status = pending- Send the job to the peer
- If sending fails, update the job status to
failedwith an appropriateFailedReasonThis ensures the database remains the source of truth and runtime state stays consistent.
Based on past review comments, this issue was previously flagged but remains unaddressed.
427-451: Missing job ownership validation - authorization bypass.After fetching the job by
jobID(line 445), the function does not verify thatjob.PeerIDmatches the requestedpeerID. This allows a caller to access any job within the account by manipulating the URL path:Example:
GET /peers/{peerA}/jobs/{jobOfPeerB}will returnjobOfPeerBeven though the path specifiespeerA.Add an ownership check after retrieving the job:
job, err := am.Store.GetPeerJobByID(ctx, accountID, jobID) if err != nil { return nil, err } + + if job.PeerID != peerID { + return nil, status.NewPeerNotPartOfAccountError() + } return job, nilBased on past review comments, this issue was previously flagged but remains unaddressed.
client/internal/engine.go (1)
987-1010: Guard against nilProfileConfigandManagementURLinhandleBundle
handleBundledereferencese.config.ProfileConfigande.config.ProfileConfig.ManagementURLwithout checking for nil:
InternalConfig: e.config.ProfileConfige.config.ProfileConfig.ManagementURL.String()If
ProfileConfigor itsManagementURLis nil (e.g. in tests or alternativeEngineConfigconstruction), this will panic during remote debug bundle generation. A previous review already flagged this; it’s still present.func (e *Engine) handleBundle(params *mgmProto.BundleParameters) (*mgmProto.JobResponse_Bundle, error) { log.Infof("handle remote debug bundle request: %s", params.String()) @@ - bundleDeps := debug.GeneratorDependencies{ - InternalConfig: e.config.ProfileConfig, + if e.config.ProfileConfig == nil || e.config.ProfileConfig.ManagementURL == nil { + return nil, fmt.Errorf("profile config or management URL not available for bundle generation") + } + + bundleDeps := debug.GeneratorDependencies{ + InternalConfig: e.config.ProfileConfig, @@ - waitFor := time.Duration(params.BundleForTime) * time.Minute - - uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String()) + waitFor := time.Duration(params.BundleForTime) * time.Minute + + managementURL := e.config.ProfileConfig.ManagementURL.String() + uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, managementURL)This keeps the job API unchanged while ensuring the engine can’t panic on missing profile data.
🧹 Nitpick comments (3)
client/server/server.go (1)
31-33: Usingnbstatus.ToProtoFullStatussimplifies and centralizes status conversionReplacing the local
toProtoFullStatushelper withnbstatus.ToProtoFullStatus(fullStatus)is a good consolidation. The returnedpbFullStatusis still enriched locally with events and SSH server state before being attached to the response, so behavior remains consistent while reducing duplication.Also applies to: 1078-1091
client/internal/connect.go (2)
64-67: Log path propagation throughConnectClient.Runand mobile variants is consistentThe new
Run(runningChan, logPath)signature, plus forwarding oflogPathintorun(...), cleanly introduces log‑path awareness without disturbing existing control flow. Android/iOS wrappers deliberately pass an empty log path, which keeps their behavior unchanged while still benefiting from other refactors.Also applies to: 70-88, 90-106, 108-108
251-256: EngineConfig enrichment withLogPathandProfileConfigis correctly wired
createEngineConfignow:
- Accepts
logPathand assigns it toEngineConfig.LogPath.- Sets
EngineConfig.ProfileConfig = config.This, combined with the updated call
createEngineConfig(myPrivateKey, c.config, peerConfig, logPath)and the use of these fields in the engine’s debug bundle path, ensures the engine has both the profile metadata and log location needed for remote debug bundles.Aside from the unused extra parameter on
NewEngine(flagged separately inengine.go), the data flow looks sound.Also applies to: 417-456
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
client/internal/connect.go(7 hunks)client/internal/engine.go(13 hunks)client/server/server.go(4 hunks)management/server/peer.go(2 hunks)management/server/permissions/modules/module.go(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
client/internal/engine.go (3)
client/internal/profilemanager/config.go (1)
Config(89-160)client/jobexec/executor.go (3)
Executor(24-25)NewExecutor(27-29)ErrJobNotImplemented(21-21)client/internal/debug/debug.go (2)
GeneratorDependencies(238-243)BundleConfig(232-236)
client/server/server.go (1)
client/status/status.go (1)
ToProtoFullStatus(549-635)
client/internal/connect.go (3)
client/internal/mobile_dependency.go (1)
MobileDependency(13-25)client/internal/engine.go (2)
NewEngine(235-276)EngineConfig(83-142)client/internal/profilemanager/config.go (1)
Config(89-160)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
- GitHub Check: Management / Unit (amd64, mysql)
- GitHub Check: Client / Unit (amd64)
- GitHub Check: Management / Benchmark (amd64, sqlite)
- GitHub Check: Management / Benchmark (amd64, postgres)
- GitHub Check: Relay / Unit (386)
- GitHub Check: Client (Docker) / Unit
- GitHub Check: Management / Unit (amd64, postgres)
- GitHub Check: Management / Unit (amd64, sqlite)
- GitHub Check: Relay / Unit (amd64, -race)
- GitHub Check: Client / Unit (386)
- GitHub Check: release_ui_darwin
- GitHub Check: Client / Unit
- GitHub Check: Client / Unit
- GitHub Check: release
- GitHub Check: JS / Lint
- GitHub Check: Client / Unit
- GitHub Check: Darwin
- GitHub Check: Linux
- GitHub Check: Windows
- GitHub Check: iOS / Build
- GitHub Check: Android / Build
🔇 Additional comments (6)
management/server/permissions/modules/module.go (1)
8-8: LGTM!The new
RemoteJobsmodule constant and its registration in theAllmap follow the established pattern correctly.Also applies to: 25-25
management/server/peer.go (2)
34-34: LGTM!The minimum version constant for remote jobs is clear and will enable proper feature gating.
400-425: Function looks good; note the TODO for dedicated permissions.The permission checks and peer ownership validation are correctly implemented. The TODO comment on line 401 indicates planned work to create more granular job permissions, which is a reasonable future enhancement.
client/internal/engine.go (2)
205-222: Job executor lifecycle wiring looks correctThe addition of
jobExecutor, its dedicatedjobExecutorWG, thereceiveJobEvents()call inStart(), ande.jobExecutorWG.Wait()inStop()form a coherent lifecycle: job goroutine is tracked and properly drained on shutdown, without sharingshutdownWgwith other engine goroutines. I don’t see deadlock or ordering issues given current usage.Also applies to: 278-337, 524-527, 950-985
809-822: Sync response persistence refactor is thread‑safeThe move to a dedicated
syncRespMuxforpersistSyncResponseandlatestSyncResponse(with R/W separation and cloning inGetLatestSyncResponse) removes contention withsyncMsgMuxand avoids data races on the stored message. Lock ordering is one‑way (syncMsgMux→syncRespMux) and I don’t see any reverse order, so deadlock risk is low.Also applies to: 1889-1925
client/server/server.go (1)
1539-1547: Passings.logFileintoConnectClient.Runmatches new signatureUpdating
connect()to calls.connectClient.Run(runningChan, s.logFile)correctly propagates the daemon’s log file path into the client, aligning with the new Run signature and the debug bundle/log‑path plumbing inConnectClientandEngineConfig.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
client/cmd/testutil_test.go(4 hunks)client/internal/engine_test.go(10 hunks)client/server/server_test.go(4 hunks)management/internals/modules/peers/manager_mock.go(1 hunks)management/server/account_test.go(2 hunks)management/server/dns_test.go(2 hunks)management/server/http/testing/testing_tools/channel/channel.go(2 hunks)management/server/management_proto_test.go(3 hunks)management/server/management_test.go(4 hunks)management/server/nameserver_test.go(2 hunks)management/server/peer_test.go(5 hunks)management/server/route_test.go(2 hunks)shared/management/client/client_test.go(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
- management/server/dns_test.go
- management/server/management_proto_test.go
- management/server/peer_test.go
- management/server/http/testing/testing_tools/channel/channel.go
- management/server/account_test.go
- client/cmd/testutil_test.go
- management/server/route_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-28T12:20:47.254Z
Learnt from: bcmmbaga
Repo: netbirdio/netbird PR: 4849
File: management/internals/modules/zones/manager/manager.go:55-86
Timestamp: 2025-11-28T12:20:47.254Z
Learning: In the NetBird management server, DNS zones without records are automatically filtered out in network map generation (filterPeerAppliedZones in management/internals/controllers/network_map/controller/controller.go checks `len(zone.Records) == 0`). Therefore, CreateZone operations don't need to call UpdateAccountPeers since empty zones don't affect the network map.
Applied to files:
shared/management/client/client_test.go
🧬 Code graph analysis (2)
management/server/management_test.go (3)
management/server/networks/routers/manager.go (1)
NewManager(40-46)management/server/settings/manager.go (1)
NewManager(37-44)management/server/job/manager.go (1)
NewJobManager(34-45)
management/server/nameserver_test.go (3)
management/internals/modules/peers/manager.go (1)
NewManager(44-49)management/server/account.go (1)
BuildManager(178-266)management/server/job/manager.go (1)
NewJobManager(34-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: Build Cache
- GitHub Check: Linux
- GitHub Check: JS / Lint
- GitHub Check: Client / Unit
- GitHub Check: Client / Unit
- GitHub Check: Client / Unit
- GitHub Check: release_ui
- GitHub Check: release_ui_darwin
- GitHub Check: release
- GitHub Check: Darwin
- GitHub Check: iOS / Build
- GitHub Check: Windows
- GitHub Check: Android / Build
- GitHub Check: Check External GPL/AGPL Licenses
🔇 Additional comments (10)
management/internals/modules/peers/manager_mock.go (1)
100-113: LGTM! Auto-generated mock looks correct.The mock method and recorder for
GetPeerIDfollow standard gomock patterns and are consistent with the existing mock methods in this file. The signatures and implementations are correct.client/server/server_test.go (2)
310-310: LGTM! JobManager correctly instantiated for tests.The JobManager is created with nil metrics, which is appropriate for test setup. The store and peersManager dependencies are properly provided.
323-323: No action needed—BuildManager parameter positions are correct.The jobManager is correctly positioned as the 5th parameter in the BuildManager call at line 323, matching the function signature where it follows networkMapController. All parameter alignments are accurate.
shared/management/client/client_test.go (2)
96-96: LGTM! JobManager wiring integrated correctly.The JobManager is properly instantiated and will be threaded through the test setup.
122-122: Ephemeral manager import alias improves readability.Using the
ephemeral_manageralias avoids conflicts and makes the code clearer.management/server/nameserver_test.go (1)
794-801: LGTM! Clean inline JobManager creation.Creating the JobManager inline within the BuildManager call is concise and appropriate for test code. The peersManager dependency is correctly instantiated on the preceding line.
client/internal/engine_test.go (3)
1602-1635: LGTM! Management server test wiring is consistent.The JobManager is correctly created and passed through to BuildManager and nbgrpc.NewServer, following the same pattern established in other test files.
1536-1536: No changes needed—NewEngine signature is correct.NewEngine returns only
*Engine(not(*Engine, error)), making the assignmente, err := NewEngine(...), nilvalid Go syntax. This is not unconventional; Go functions can return single values, and assigningniltoerrhere is a straightforward tuple assignment with no issues.
256-256: The two trailingnilparameters passed to NewEngine correspond to thechecksandcparameters in the function signature (line 235 of client/internal/engine.go), which are[]*mgmProto.Checksand*profilemanager.Configrespectively. These parameters are already documented in the function signature itself through their names and type annotations, so no additional documentation is needed. The speculation about "jobManager and logPath" does not match the actual implementation.Likely an incorrect or invalid review comment.
management/server/management_test.go (1)
206-244: LGTM! JobManager integration is clean and consistent.The test correctly:
- Extracts peersManager into a variable (line 206)
- Creates jobManager with nil metrics (line 207)
- Passes jobManager to BuildManager (line 219)
- Passes jobManager to nbgrpc.NewServer (line 244)
This follows the established pattern across all test files.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
shared/management/client/grpc.go (1)
305-427: LGTM: Consistent refactoring of internal methods.The changes thread
serverPubKeyas a parameter instead of fetching it within each method. This eliminates redundantGetServerPublicKeycalls and aligns with the centralized key retrieval inwithMgmtStream.
🧹 Nitpick comments (1)
shared/management/client/grpc.go (1)
218-232: Consider documenting the handshake protocol.The handshake sends a
JobRequestwith only an ID field populated (UUID string as bytes). While functional, this implicit protocol could benefit from documentation or a dedicated handshake message type if available in the proto definitions.Optional: If proto supports it, consider:
-handshakeReq := &proto.JobRequest{ - ID: []byte(uuid.New().String()), -} +// Use dedicated handshake message if available +handshakeReq := &proto.JobHandshake{ + SessionID: uuid.New().String(), +}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
shared/management/client/grpc.go(8 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
shared/management/client/grpc.go (2)
shared/management/proto/management.pb.go (6)
EncryptedMessage(322-333)EncryptedMessage(348-348)EncryptedMessage(363-365)SyncResponse(721-738)SyncResponse(753-753)SyncResponse(768-770)encryption/message.go (2)
EncryptMessage(10-24)DecryptMessage(27-40)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: Darwin
- GitHub Check: Android / Build
- GitHub Check: iOS / Build
- GitHub Check: Client / Unit
- GitHub Check: Windows
- GitHub Check: Linux
- GitHub Check: JS / Lint
- GitHub Check: Build Cache
- GitHub Check: release_ui
- GitHub Check: Client / Unit
- GitHub Check: release_ui_darwin
- GitHub Check: release
- GitHub Check: Check External GPL/AGPL Licenses
- GitHub Check: Client / Unit
🔇 Additional comments (6)
shared/management/client/grpc.go (6)
15-15: LGTM: Well-established library for UUID generation.The addition of
github.com/google/uuidis appropriate for generating unique handshake identifiers in the Job streaming flow.
234-260: LGTM: Proper message verification and decryption.The peer ID verification at lines 246-251 is a good security practice, ensuring messages are intended for this specific peer before processing.
263-278: LGTM: Defensive nil-response handling.The fallback at lines 269-276 ensures that even if the handler returns nil, a valid failure response is constructed. This prevents the send operation from failing unexpectedly.
280-303: LGTM: Comprehensive logging and error handling.The job ID is included in all log messages, providing excellent traceability for debugging. Error handling properly propagates failures for retry logic.
120-126: Job RPC implementation is complete and follows established patterns.The new
Jobmethod provides a clean public API for bidirectional job streaming, consistent with the existingSyncmethod. The server-side implementation inmanagement/internals/shared/grpc/server.go(line 175) properly handles the handshake protocol and delegates tosendJobsLoopfor stream management. The proto definition inshared/management/proto/management.proto(line 53) documents the RPC as a bidirectional stream withJobRequestandJobResponsemessage types.
205-208: This check is correct and requires no changes.The code properly skips processing when
jobReqis nil or has an empty ID. Since the ID is essential for correlating the response back at the server (used insendJobResponseandprocessJobRequest), a request without an ID cannot be processed or responded to. Skipping silently is the correct behavior—sending a response with an empty ID would be orphaned at the server and unrecoverable.
|
@coderabbitai close all your reviews and start agian |
Co-authored-by: Pascal Fischer <32096965+pascal-fischer@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (5)
management/server/peer.go (2)
34-36: Remote job creation: version gating OK, but Send‑before‑persist and jobManager assumptions are riskyThe remote job entrypoint is nicely scoped (new
RemoteJobsmodule +Createop, explicit min version check viaposture.MeetsMinVersion, and an account/peer ownership check), but a few non‑trivial risks remain:
Send‑before‑persist can desync runtime and DB state
The flow is:
- Check pending jobs (
IsPeerHasPendingJobs).- Build stream request.
SendJob(...).- Only if send succeeds, persist via
CreatePeerJobin a transaction.If the DB write fails after
SendJobsucceeds (e.g., constraint, connection issue), the client can still be executing the job while/jobsand/jobs/{id}never show it. That makes observability and recovery hard. Persisting first (status =pending) and then marking itfailedon send error would keep state consistent, even if that means occasionally recording “failed to send” jobs.Implicit non‑nil
jobManagermay panic in misconfigured environments
am.jobManageris used unconditionally (IsPeerConnected,IsPeerHasPendingJobs,SendJob). If any construction path builds aDefaultAccountManagerwith a nil jobManager (older tests, custom deployments), the first use of this API will panic. Either:
- Enforce non‑nil in constructor/tests, or
- Add an early guard here returning an internal error like “job manager is not configured”.
Pending‑job check is acknowledged as racy
The TODO correctly notes the window betweenIsPeerHasPendingJobsandSendJobacross concurrent callers. That’s acceptable for now given it’s documented, but if you later tighten semantics (“at most one job per peer”), you’ll likely want to move that invariant into the job manager or DB layer.The version gate and permission checks themselves look good; the main suggestion is to re‑order and harden the send/persist and jobManager assumptions.
Also applies to: 325-398
427-451: GetPeerJobByID: missing peer/job association check lets callers see jobs for other peersAfter verifying that
peerIDbelongs toaccountID, the method loads the job by(accountID, jobID)only:job, err := am.Store.GetPeerJobByID(ctx, accountID, jobID)but never asserts that
job.PeerID == peerID. This allows a caller to hit/peers/{peerA}/jobs/{jobOfPeerB}(within the same account) and successfully retrieve a job that actually belongs to another peer.You’ll likely want a post‑load check here:
job, err := am.Store.GetPeerJobByID(ctx, accountID, jobID) if err != nil { return nil, err } if job.PeerID != peerID { return nil, status.NewPeerNotPartOfAccountError() }so that the peer path parameter and the underlying job record stay consistent.
client/internal/engine.go (1)
137-143: Guard against nilProfileConfig/ManagementURLbefore bundle job
handleBundledereferencese.config.ProfileConfigande.config.ProfileConfig.ManagementURLwithout checks. If either is nil (e.g. tests, misconfigured profiles, or older configs), this will panic when handling a job.This was flagged earlier and is still present; please add a defensive check and fail the job gracefully instead of panicking.
Proposed defensive check in
handleBundlefunc (e *Engine) handleBundle(params *mgmProto.BundleParameters) (*mgmProto.JobResponse_Bundle, error) { log.Infof("handle remote debug bundle request: %s", params.String()) syncResponse, err := e.GetLatestSyncResponse() @@ - bundleDeps := debug.GeneratorDependencies{ - InternalConfig: e.config.ProfileConfig, - StatusRecorder: e.statusRecorder, - SyncResponse: syncResponse, - LogPath: e.config.LogPath, - } + if e.config == nil || e.config.ProfileConfig == nil || e.config.ProfileConfig.ManagementURL == nil { + return nil, fmt.Errorf("profile config or management URL not available for bundle generation") + } + + bundleDeps := debug.GeneratorDependencies{ + InternalConfig: e.config.ProfileConfig, + StatusRecorder: e.statusRecorder, + SyncResponse: syncResponse, + LogPath: e.config.LogPath, + } @@ - waitFor := time.Duration(params.BundleForTime) * time.Minute - - uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String()) + waitFor := time.Duration(params.BundleForTime) * time.Minute + + uploadKey, err := e.jobExecutor.BundleJob( + e.ctx, + bundleDeps, + bundleJobParams, + waitFor, + e.config.ProfileConfig.ManagementURL.String(), + )Also applies to: 1036-1058
management/internals/shared/grpc/server.go (1)
171-203: Propagate job send failures instead of returning OK gRPC statusThe overall Job stream wiring (encrypted handshake, per‑peer channel, background response receiver) looks solid. However,
sendJobsLoopstill swallowssendJobfailures:if err := s.sendJob(ctx, peerKey, event, srv); err != nil { log.WithContext(ctx).Warnf("send job failed: %v", err) return nil }This causes the
JobRPC to end with an OK status even when we fail to deliver a job to the client, which was already pointed out in prior review feedback. The same applies if you later add more failure modes insendJob.Suggested change to surface send errors to the client
func (s *Server) sendJobsLoop(ctx context.Context, accountID string, peerKey wgtypes.Key, peer *nbpeer.Peer, updates *job.Channel, srv proto.ManagementService_JobServer) error { // todo figure out better error handling strategy defer s.jobManager.CloseChannel(ctx, accountID, peer.ID) @@ - if err := s.sendJob(ctx, peerKey, event, srv); err != nil { - log.WithContext(ctx).Warnf("send job failed: %v", err) - return nil - } + if err := s.sendJob(ctx, peerKey, event, srv); err != nil { + log.WithContext(ctx).Warnf("send job failed: %v", err) + // Propagate a gRPC status error so callers and observability can see failure + return err + } } }If you prefer to distinguish transport vs. logical errors, you can map specific failures to gRPC codes (e.g.
codes.Unavailablefor transport issues,codes.Internalfor encryption problems) insidesendJoband keepsendJobsLoopreturning them as‑is.Also applies to: 323-361, 363-385, 445-468
management/server/store/sql_store.go (1)
194-224: GetPeerJobByID still doesn’t verify peer ownership
GetPeerJobByIDcurrently filters only byaccount_idandid, so any caller with account‑level access can retrieve another peer’s job just by knowing the job ID. Earlier feedback suggested adding apeerIDconstraint (or clearly marking this as an admin‑only accessor); that hasn’t been applied yet.Example: tighten
GetPeerJobByIDto a specific peer-// GetJobByID fetches job by ID -func (s *SqlStore) GetPeerJobByID(ctx context.Context, accountID, jobID string) (*types.Job, error) { +// GetPeerJobByID fetches a job by ID scoped to a specific peer within an account. +func (s *SqlStore) GetPeerJobByID(ctx context.Context, accountID, peerID, jobID string) (*types.Job, error) { var job types.Job - err := s.db. - Where(accountAndIDQueryCondition, accountID, jobID). + err := s.db. + Where(accountAndPeerIDQueryCondition+" AND id = ?", accountID, peerID, jobID). First(&job).ErrorThis would require updating the store interface and call sites, but makes the API unambiguous and avoids accidental cross‑peer access.
🧹 Nitpick comments (17)
client/server/server_test.go (2)
310-323: Consider passing metrics to JobManager for consistency.The
jobManageris created withnilfor the metrics parameter (line 310), but a validmetricsinstance is created on line 314 and passed toBuildManager(line 323). For consistency and better test coverage, consider reordering the code to createmetricsfirst, then pass it toNewJobManager.🔎 Proposed refactor
permissionsManagerMock := permissions.NewMockManager(ctrl) peersManager := peers.NewManager(store, permissionsManagerMock) settingsManagerMock := settings.NewMockManager(ctrl) - - jobManager := job.NewJobManager(nil, store, peersManager) ia, _ := integrations.NewIntegratedValidator(context.Background(), peersManager, settingsManagerMock, eventStore) metrics, err := telemetry.NewDefaultAppMetrics(context.Background()) require.NoError(t, err) + + jobManager := job.NewJobManager(metrics, store, peersManager) settingsMockManager := settings.NewMockManager(ctrl) groupsManager := groups.NewManagerMock()
332-332: JobManager integration looks correct; consider using available metrics instance.The
jobManagerparameter is correctly positioned and passed tonbgrpc.NewServer. However, themetricsinstance created on line 314 is available but not used in this call (currently passingnilforappMetrics). While tests can function withnildependencies, passing the actualmetricsinstance would provide more thorough test coverage.🔎 Optional improvement
If you address the metrics usage suggestion from line 310, you can also apply it here:
- mgmtServer, err := nbgrpc.NewServer(config, accountManager, settingsMockManager, jobManager, secretsManager, nil, nil, &server.MockIntegratedValidator{}, networkMapController) + mgmtServer, err := nbgrpc.NewServer(config, accountManager, settingsMockManager, jobManager, secretsManager, metrics, nil, &server.MockIntegratedValidator{}, networkMapController)Note: A past review comment also raised concerns about the
nilparameters forappMetricsandauthManager.management/server/activity/codes.go (1)
186-187: Add JobCreatedByUser activity; consider minor naming/API enum follow-upThe new
JobCreatedByUserconstant and itsactivityMapentry are correctly wired and preserve existing enum values.As optional polish:
- To match surrounding messages like "Peer added"/"Route created", consider changing the message to something like
"Job created for peer".- If these activities are surfaced via the HTTP API, consider extending
EventActivityCodeinshared/management/http/api/types.gen.gowith"peer.job.create"so API enums stay in sync.Also applies to: 301-302
client/cmd/up.go (1)
200-205: Propagate logPath into ConnectClient.Run in foreground modePassing
util.FindFirstLogPath(logFiles)intoconnectClient.Run(nil, …)correctly wires the effective log file path into the client/engine for debug bundle support and matches the new Run signature.Optionally, if
FindFirstLogPath(logFiles)is used multiple times along this path, consider computing it once into a locallogPathvariable for clarity.shared/management/proto/management.proto (1)
52-54: Job streaming RPC and job messages look structurally correctThe new
Jobbidi-stream RPC and the associatedJobRequest/JobResponse/BundleParameters/BundleResultdefinitions are well-formed and consistent with the existing encryption pattern (wrapping concrete messages inEncryptedMessage).As a small documentation improvement, consider extending the
Jobcomment to note that the request/response bodies carryJobRequestandJobResponsemessages, similar to howLogin/Syncdocument their payloads.Also applies to: 66-101
shared/management/http/api/types.gen.go (1)
111-117: HTTP Job and bundle workload models match the intended APIThe additions for jobs and bundle workloads (
JobRequest,JobResponse,JobResponseStatus*,BundleParameters,BundleResult,BundleWorkloadRequest,BundleWorkloadResponse, andWorkloadTypeBundle) give a clear and strongly-typed shape to the new job API and align with the underlying protoBundleParameters/BundleResult.Exposing
PostApiPeersPeerIdJobsJSONRequestBodyas aJobRequestkeeps the per-peer jobs endpoint straightforward for clients.Minor generated-code nit: where you assign
"bundle"as a discriminator string, usingWorkloadTypeBundleinstead would add a bit of compile-time safety; if you want that, it’s better addressed in the OpenAPI/oapi-codegen setup rather than by hand-editing this file.Also applies to: 196-200, 366-405, 713-731
management/server/account/manager.go (1)
127-129: Parameter order inconsistency across Job methods.The parameter ordering differs between these methods:
CreatePeerJob:accountID, peerID, userIDGetAllPeerJobs:accountID, userID, peerIDGetPeerJobByID:accountID, userID, peerID, jobIDConsider aligning the order for consistency (e.g., always
accountID, userID, peerIDoraccountID, peerID, userID). This will reduce cognitive load and prevent parameter swapping mistakes by callers.🔎 Suggested consistent ordering
- CreatePeerJob(ctx context.Context, accountID, peerID, userID string, job *types.Job) error - GetAllPeerJobs(ctx context.Context, accountID, userID, peerID string) ([]*types.Job, error) - GetPeerJobByID(ctx context.Context, accountID, userID, peerID, jobID string) (*types.Job, error) + CreatePeerJob(ctx context.Context, accountID, userID, peerID string, job *types.Job) error + GetAllPeerJobs(ctx context.Context, accountID, userID, peerID string) ([]*types.Job, error) + GetPeerJobByID(ctx context.Context, accountID, userID, peerID, jobID string) (*types.Job, error)shared/management/client/grpc.go (2)
235-304: Unusedctxparameters in helper functions.The
ctxparameter is passed toreceiveJobRequest,processJobRequest, andsendJobResponsebut never used. If these are placeholders for future context-aware operations (timeouts, cancellation), consider adding a//nolint:revivecomment or a TODO. Otherwise, remove them to avoid misleading readers.
196-199: Graceful degradation onUnimplementedmay hide feature unavailability.Returning
nilwhen the server returnscodes.Unimplementedmakes theJob()method appear successful, but no jobs will ever be processed. Consider returning a sentinel error or exposing this state to callers so they can take appropriate action (e.g., disable job-related UI, log at startup).🔎 Alternative: Return a typed error
+var ErrJobsNotSupported = errors.New("job feature not supported by management server") + case codes.Unimplemented: log.Warn("Job feature is not supported by the current management server version. " + "Please update the management service to use this feature.") - return nil + return backoff.Permanent(ErrJobsNotSupported)Callers can then check for this specific error and handle accordingly.
shared/management/http/api/openapi.yml (3)
35-37: Jobs tag looks consistent with existing tagging; mark experimental in docs tooThe new
Jobstag withx-experimental: truefits the pattern of other tags (e.g., lazy connection, traffic events). Just ensure UI/docs also surface this as experimental so consumers don’t treat it as fully stable yet.
2410-2477: List/Create peer jobs: path and shapes align; HTTP status code may diverge from implementationThe list/create
/api/peers/{peerId}/jobsoperations line up with the Go handlers inpeers_handler.go(ListJobsandCreateJob) and theJobRequest/JobResponseshapes, so wiring looks correct.One mismatch to be aware of: the OpenAPI declares
201for the POST, bututil.WriteJSONObjecttypically responds with200 OK. Either adjust the handler to write201on success or update the spec to200to avoid contract drift.
2479-2513: Get single peer job: contract looks fine; consider explicit 404 in the specThe
/api/peers/{peerId}/jobs/{jobId}endpoint matches the handler (GetJob) andJobResponseshape. Given the code can return “not found” when the job/peer combo doesn’t exist, you may want to add a404response here for client clarity, similar to other APIs that expose explicit not‑found states.management/server/http/handlers/peers/peers_handler.go (2)
51-85: Job handlers: good reuse of infra; consider refining error codes and partial failure behaviorThe Create/List/Get job handlers follow established patterns (auth via
nbcontext, path params viamux.Vars, error writing viautil), and delegate all authorization/account scoping toaccountManager, which keeps HTTP thin. That’s solid.A couple of small refinements you might consider:
- For
CreateJob, if you want to honor the OpenAPI201contract, set the status code explicitly before writing the body (assumingutil.WriteJSONObjectdoesn’t override it).- In
ListJobs, if a single malformed job causesBuildWorkloadResponseto fail, the handler returns an error for the entire list. If corruption is possible, you might prefer to skip/best‑effort those entries or surface which job failed, instead of bailing mid‑loop.Otherwise, the handlers look straightforward and consistent with the rest of this package.
Also applies to: 87-115, 117-142
620-640: toSingleJobResponse: mapping is correct; add light nil‑safety around workloadThe translation from
types.Jobtoapi.JobResponselooks accurate: timestamps,TriggeredBy, status cast, optionalFailedReason, and workload wiring are all aligned.One defensive tweak to consider:
BuildWorkloadResponsereturns a pointer, and you immediately dereference it. If that method ever returns(nil, nil)(e.g., for an unsupported workload type), this will panic. A short guard like:workload, err := job.BuildWorkloadResponse() if err != nil { return nil, err } if workload == nil { return nil, fmt.Errorf("nil workload for job %s", job.ID) }would make the failure mode explicit and easier to diagnose.
management/server/store/store.go (1)
208-214: Job persistence API surface looks reasonable; consider clarifying locking and invariantsThe new job methods on
Storematch how the account manager uses them (e.g.,CreatePeerJobinside a transaction,GetPeerJobByIDscoped byaccountID,GetPeerJobsby(accountID, peerID), and failure‑marking helpers).Two small design considerations for later:
- Other store methods often carry an explicit
LockingStrength; job methods don’t. If you ever need stronger consistency guarantees (e.g., no concurrent updates to a pending job), adding a lock strength or documenting expected usage (always within a transaction) would make the contract clearer.- With both
MarkPendingJobsAsFailedandMarkAllPendingJobsAsFailed, it’s worth documenting whether they only touchStatus=pendingrecords and how they behave if the job is alreadysucceeded/failed, so higher‑level code can rely on idempotency.No blockers here; interface looks aligned with the job lifecycle logic in
peer.goand the job manager.client/internal/engine.go (1)
224-226: Job executor lifecycle wiring looks good; consider bounded wait on shutdownThe addition of
jobExecutor,jobExecutorWG,receiveJobEvents, and theWait()call inStop()correctly tracks the job-stream goroutine and ties it to engine lifecycle. However, unlikeshutdownWg,jobExecutorWG.Wait()has no timeout, so a stuckmgmClient.Jobimplementation would block engine shutdown indefinitely.Not urgent, but consider using the same
waitWithContext+ timeout pattern used forshutdownWg(or folding this intoshutdownWg) to keep shutdown bounded.Also applies to: 277-341, 997-1034
management/server/store/sql_store.go (1)
135-193: Job persistence helpers are fine; be explicit about “no-op” semantics on missing rows
CreatePeerJob,CompletePeerJob,MarkPendingJobsAsFailed, andMarkAllPendingJobsAsFailedfollow existing store patterns and look correct for basic CRUD.Note, though, that the
Mark*PendingJobsAsFailedmethods treat “no matching rows” as success (no error, no logging), which is ok if they’re intended as best‑effort cleanups. If callers need to distinguish “no such pending job(s)” from “updated”, you may want to checkRowsAffectedand optionally log/return a status in that case.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
client/proto/daemon.pb.gois excluded by!**/*.pb.gogo.sumis excluded by!**/*.sumshared/management/proto/management.pb.gois excluded by!**/*.pb.go
📒 Files selected for processing (26)
client/cmd/up.goclient/embed/embed.goclient/internal/connect.goclient/internal/debug/debug.goclient/internal/engine.goclient/internal/engine_test.goclient/proto/daemon.protoclient/server/server.goclient/server/server_test.gogo.modmanagement/internals/modules/peers/manager.gomanagement/internals/server/boot.gomanagement/internals/shared/grpc/server.gomanagement/server/account.gomanagement/server/account/manager.gomanagement/server/account_test.gomanagement/server/activity/codes.gomanagement/server/http/handlers/peers/peers_handler.gomanagement/server/mock_server/account_mock.gomanagement/server/peer.gomanagement/server/store/sql_store.gomanagement/server/store/store.goshared/management/client/grpc.goshared/management/http/api/openapi.ymlshared/management/http/api/types.gen.goshared/management/proto/management.proto
💤 Files with no reviewable changes (1)
- client/proto/daemon.proto
🚧 Files skipped from review as they are similar to previous changes (4)
- management/internals/modules/peers/manager.go
- management/server/mock_server/account_mock.go
- management/server/account.go
- client/embed/embed.go
🧰 Additional context used
🧬 Code graph analysis (15)
management/internals/server/boot.go (1)
management/internals/shared/grpc/server.go (1)
NewServer(80-132)
client/cmd/up.go (1)
util/log.go (1)
FindFirstLogPath(77-84)
management/server/account/manager.go (1)
management/server/types/job.go (1)
Job(34-58)
client/server/server_test.go (3)
management/server/job/manager.go (1)
NewJobManager(34-45)management/server/account.go (1)
BuildManager(178-266)management/internals/shared/grpc/server.go (1)
NewServer(80-132)
shared/management/client/grpc.go (4)
shared/management/proto/management.pb.go (6)
EncryptedMessage(322-333)EncryptedMessage(348-348)EncryptedMessage(363-365)SyncResponse(721-738)SyncResponse(753-753)SyncResponse(768-770)shared/management/status/error.go (3)
Errorf(70-75)FromError(78-87)PermissionDenied(18-18)shared/management/proto/management_grpc.pb.go (2)
ManagementService_JobClient(169-173)ManagementService_SyncClient(89-92)encryption/message.go (2)
EncryptMessage(10-24)DecryptMessage(27-40)
management/server/store/store.go (1)
management/server/types/job.go (1)
Job(34-58)
client/internal/connect.go (1)
client/internal/engine.go (1)
EngineConfig(84-143)
client/internal/debug/debug.go (4)
util/log.go (1)
SpecialLogs(25-28)client/internal/profilemanager/profilemanager.go (1)
NewProfileManager(56-58)client/status/status.go (3)
ToProtoFullStatus(549-635)ConvertToStatusOutputOverview(122-169)ParseToFullDetailSummary(532-547)version/version.go (1)
NetbirdVersion(18-20)
client/server/server.go (1)
client/status/status.go (1)
ToProtoFullStatus(549-635)
management/server/store/sql_store.go (2)
management/server/types/job.go (3)
Job(34-58)JobStatusPending(18-18)JobStatusFailed(20-20)management/server/store/store.go (2)
LockingStrength(40-40)LockingStrengthNone(47-47)
management/server/peer.go (8)
management/server/types/job.go (2)
Job(34-58)Workload(60-64)management/server/permissions/modules/module.go (1)
RemoteJobs(8-8)management/server/permissions/operations/operation.go (2)
Create(6-6)Read(7-7)shared/management/status/error.go (4)
NewPermissionValidationError(213-215)NewPermissionDeniedError(209-211)NewPeerNotPartOfAccountError(105-107)Type(46-46)management/server/store/store.go (1)
Store(50-215)management/server/posture/nb_version.go (1)
MeetsMinVersion(57-72)route/route.go (1)
ID(48-48)management/server/activity/codes.go (1)
JobCreatedByUser(186-186)
shared/management/http/api/types.gen.go (1)
shared/management/proto/management.pb.go (6)
BundleParameters(554-563)BundleParameters(578-578)BundleParameters(593-595)BundleResult(625-631)BundleResult(646-646)BundleResult(661-663)
client/internal/engine_test.go (3)
management/server/job/manager.go (1)
NewJobManager(34-45)management/server/account.go (1)
BuildManager(178-266)management/internals/shared/grpc/server.go (1)
NewServer(80-132)
management/server/account_test.go (1)
management/server/job/manager.go (1)
NewJobManager(34-45)
client/internal/engine.go (3)
client/internal/profilemanager/config.go (1)
Config(90-161)client/jobexec/executor.go (3)
Executor(24-25)NewExecutor(27-29)ErrJobNotImplemented(21-21)client/internal/debug/debug.go (1)
GeneratorDependencies(243-248)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
- GitHub Check: Windows
- GitHub Check: Client / Unit
- GitHub Check: iOS / Build
- GitHub Check: Android / Build
- GitHub Check: Darwin
- GitHub Check: Linux
- GitHub Check: Build Cache
- GitHub Check: release_ui_darwin
- GitHub Check: Client / Unit
- GitHub Check: release
- GitHub Check: release_ui
- GitHub Check: FreeBSD Port / Build & Test
- GitHub Check: JS / Lint
- GitHub Check: Client / Unit
- GitHub Check: Check External GPL/AGPL Licenses
🔇 Additional comments (20)
client/server/server_test.go (1)
23-23: LGTM: Import correctly added for JobManager.The import is necessary for the new JobManager integration and follows the correct package path.
go.mod (2)
69-69: Appropriate dependency for API code generation support.The
oapi-codegen/runtimepackage provides runtime-specific code that oapi-codegen generates, aligning with the PR's addition of API endpoints for debug bundle operations. The package is widely imported (1,291+ projects) and carries an Apache-2.0 license.
136-136: This dependency is legitimate and necessary. Theapapsch/go-jsonmerge/v2package is a real, published library pulled in as an indirect dependency byoapi-codegen/runtime(a direct dependency in this project). It is a stable package (v2.0.0 released Nov 2021) with no known security vulnerabilities. While the upstream author is not actively developing it, the library remains production-ready and is packaged by distributions like Fedora through 2025.client/server/server.go (1)
31-33: Use shared status conversion and wire logPath into ConnectClient.RunSwitching to
nbstatus.ToProtoFullStatus(fullStatus)centralizes the protobuf conversion in the status package instead of a local helper, which reduces duplication and keeps the serialization logic in one place.Passing
s.logFileintos.connectClient.Run(runningChan, s.logFile)correctly forwards the daemon’s log path into the connection/engine stack so debug bundle generation can find the right log files.Also applies to: 1088-1097, 1545-1553
client/internal/debug/debug.go (2)
221-269: logPath-based log collection and fallback behavior are coherentUsing
logPath(and skipping entries present inutil.SpecialLogs) ensures the bundle generator only tries to read actual log files, falling back totrySystemdLogFallback()when running with console/syslog logging or no path. DerivinglogDirfromg.logPathand reusing it for rotated logs plus stderr/stdout files preserves the previous layout while keeping the Darwin-specific overrides.This wiring looks consistent with the new
EngineConfig.LogPath/ConnectClient.RunAPIs and should behave correctly across platforms.Also applies to: 347-357, 719-749
395-418: addStatus now reuses status helpers and properly seeds anonymizerThe updated
addStatusimplementation:
- Uses
nbstatus.ToProtoFullStatus,ConvertToStatusOutputOverview, andParseToFullDetailSummaryto generatestatus.txt, keeping the bundle’s status output aligned with the CLIstatuscommand.- Looks up the active profile via
profilemanager.NewProfileManager().GetActiveProfile()and passes its name into the overview, which is helpful in multi-profile scenarios and degrades gracefully if the lookup fails.- Calls
seedFromStatusafter generating the status file, ensuring the anonymizer is seeded from the full status before other anonymized artifacts (config, sync response, state, etc.) are added.No changes needed here.
client/internal/connect.go (1)
72-75: Plumb logPath through ConnectClient and EngineConfigExtending
ConnectClient.Run/runto acceptlogPathand threading it intocreateEngineConfig(which in turn setsEngineConfig.LogPathandEngineConfig.ProfileConfig) cleanly exposes the log location and profile config to the engine and debug layers.The desktop path now passes a real log path, while the Android/iOS entrypoints correctly pass an empty string, preserving existing behavior on mobile where file-based logs are not used.
Also applies to: 95-96, 113-114, 116-376, 461-500
shared/management/http/api/types.gen.go (1)
1992-2005: WorkloadRequest/WorkloadResponse union helpers are consistentThe
WorkloadRequestandWorkloadResponsewrappers aroundjson.RawMessagewith discriminator-based helpers (AsBundle…,FromBundle…,MergeBundle…,Discriminator,ValueByDiscriminator) follow the usual oapi-codegen union pattern:
From…andMerge…forceTypeto"bundle"before marshalling, so clients can’t accidentally send a mismatched discriminator.ValueByDiscriminatorswitches on"type"and currently supports the"bundle"case with a clear error for unknown types.MarshalJSON/UnmarshalJSONdelegate to the underlyingjson.RawMessage, which is idiomatic for this style of union.This should make it straightforward for API consumers to both construct and inspect workload payloads.
Also applies to: 2168-2284
management/internals/server/boot.go (1)
147-152: Wire JobManager into gRPC management server constructionUpdating the
nbgrpc.NewServercall to passs.JobManager()in the expected position (afterSettingsManager()and beforeSecretsManager()) matches the new server signature and ensures the gRPC layer has access to the job subsystem for the newJobRPC.This keeps the bootstrapping pattern consistent with other managers.
management/server/account_test.go (1)
2997-3004: LGTM: JobManager wiring correctly integrated into test setup.The test setup properly creates the
peersManagerand passes it along with the store tojob.NewJobManager. Passingnilfor the metrics parameter is acceptable in test contexts where telemetry isn't being verified.client/internal/engine_test.go (1)
1603-1636: LGTM: JobManager properly wired into management server test setup.The test correctly creates the
JobManagerand passes it through bothBuildManagerandNewServer, aligning with the new constructor signatures. The integration enables Job-related functionality testing in the engine tests.shared/management/client/grpc.go (3)
163-217: Missing connection state notification in Job stream.Unlike
handleSyncStreamwhich callsc.notifyConnected()after successfully opening the stream (line 320),handleJobStreamonly callsc.notifyDisconnected(err)on errors. If the Job stream should also report connection state for monitoring/status purposes, consider addingnotifyConnected()after the successful handshake.log.Debug("job stream handshake sent successfully") + c.notifyConnected() // Main loop: receive, process, respond
114-161: Well-structured refactoring of streaming logic.The extraction of
withMgmtStreameffectively centralizes connection readiness checks, server key retrieval, and retry backoff. BothSyncandJobnow share this common infrastructure while maintaining their distinct streaming behaviors. This improves maintainability and reduces code duplication.
219-233: Handshake sends minimal JobRequest with only ID.The handshake message contains only a generated UUID with no additional metadata. Verify this aligns with the server's expectations for the Job handshake protocol. If the server expects additional fields (e.g., client capabilities, version), they should be included here.
management/server/http/handlers/peers/peers_handler.go (1)
31-40: New job routes correctly wired to peers handlerThe three new routes under
/peers/{peerId}/jobscleanly map to their handlers and follow the existing router style (same base path prefix, sameOPTIONScoverage). No issues from a routing perspective.management/server/peer.go (1)
400-425: GetAllPeerJobs: access control and scoping look correct
GetAllPeerJobsvalidatesRemoteJobs/Readpermission, ensures the peer belongs to the account viaGetAccountIDByPeerID, and then fetches jobs per(accountID, peerID). That gives you clean account‑scoped job listings. No functional issues spotted here.shared/management/http/api/openapi.yml (1)
41-156: Schema constraints are properly synchronized with server-side validation—no action neededVerified that
BundleParametersconstraints match the server-side validation inmanagement/server/types/job.go:
bundle_for_time: schema specifies 1–5, server enforces 1–5 ✓log_file_count: schema specifies 1–1000, server enforces 1–1000 ✓The validation is already implemented in the
validateAndBuildBundleParamsfunction, so there is no risk of 500 errors for values the schema claims are valid.client/internal/engine.go (1)
206-210: Sync-response persistence locking & cloning look correctThe new
syncRespMuxand the changes inhandleSync,SetSyncResponsePersistence, andGetLatestSyncResponseavoid races withlatestSyncResponse/persistSyncResponse, and cloning viaproto.Cloneensures callers can’t mutate internal state. This is a solid concurrency/encapsulation improvement.Also applies to: 858-871, 1949-1985
management/internals/shared/grpc/server.go (1)
323-361: startResponseReceiver Recv loop is straightforward and non‑blockingThe
startResponseReceivergoroutine cleanly handles EOF/ctx cancellation and logs invalid responses without tearing down the whole stream. Given the bidirectional nature ofJob, this “one goroutine recv, one goroutine send” pattern is appropriate and doesn’t contend with the handshake (which consumes only the first message before spawning the goroutine).management/server/store/sql_store.go (1)
42-51: Job schema wiring and query constants are consistentAdding
accountAndPeerIDQueryConditionand including&types.Job{}inAutoMigratealigns the new Job model with existing patterns (account‑scoped queries, standard migrations). This should create the required table/indexes without special handling.Also applies to: 108-117
|
@pappz, I'll resolve all my existing review comments and start a fresh review of the PR. ✅ Actions performedComments resolved. Full review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
client/jobexec/executor.go (2)
57-61: Consider adding retry logic for upload failures.Transient network issues could cause upload failures. A simple retry with backoff would improve reliability.
🔎 Example retry pattern
+const maxUploadRetries = 3 + func (e *Executor) BundleJob(ctx context.Context, debugBundleDependencies debug.GeneratorDependencies, params debug.BundleConfig, waitForDuration time.Duration, mgmURL string) (string, error) { // ... existing code ... - key, err := debug.UploadDebugBundle(ctx, types.DefaultBundleURL, mgmURL, path) - if err != nil { - log.Errorf("failed to upload debug bundle: %v", err) - return "", fmt.Errorf("upload debug bundle: %w", err) - } + var key string + var err error + for attempt := 1; attempt <= maxUploadRetries; attempt++ { + key, err = debug.UploadDebugBundle(ctx, types.DefaultBundleURL, mgmURL, path) + if err == nil { + break + } + if ctx.Err() != nil { + return "", fmt.Errorf("upload debug bundle: %w", ctx.Err()) + } + log.Warnf("upload attempt %d/%d failed: %v", attempt, maxUploadRetries, err) + if attempt < maxUploadRetries { + time.Sleep(time.Duration(attempt) * time.Second) + } + } + if err != nil { + return "", fmt.Errorf("upload debug bundle after %d attempts: %w", maxUploadRetries, err) + }
63-63: Clarify log message to reflect upload completion.At this point, both generation and upload succeeded. The message could be more accurate.
🔎 Suggested fix
- log.Infof("debug bundle has been generated successfully") + log.Infof("debug bundle has been generated and uploaded successfully")
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
client/jobexec/executor.go
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-13T00:29:53.247Z
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/cmd/ssh_exec_unix.go:53-74
Timestamp: 2025-11-13T00:29:53.247Z
Learning: In client/ssh/server/executor_unix.go, the method ExecuteWithPrivilegeDrop(ctx context.Context, config ExecutorConfig) has a void return type (no error return). It handles failures by exiting the process directly with appropriate exit codes rather than returning errors to the caller.
Applied to files:
client/jobexec/executor.go
📚 Learning: 2025-11-14T13:05:31.729Z
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/ssh/server/userswitching_windows.go:89-139
Timestamp: 2025-11-14T13:05:31.729Z
Learning: In client/ssh/server/executor_windows.go, the WindowsExecutorConfig struct's Pty, PtyWidth, and PtyHeight fields are intentionally left unused for now and will be implemented in a future update.
Applied to files:
client/jobexec/executor.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: iOS / Build
- GitHub Check: Android / Build
- GitHub Check: Client / Unit
- GitHub Check: Client / Unit
- GitHub Check: Build Cache
- GitHub Check: release
- GitHub Check: FreeBSD Port / Build & Test
- GitHub Check: Linux
- GitHub Check: Windows
- GitHub Check: Darwin
- GitHub Check: Client / Unit
- GitHub Check: JS / Lint
🔇 Additional comments (4)
client/jobexec/executor.go (4)
1-14: LGTM!Package declaration and imports are clean and appropriate for the executor functionality.
24-29: LGTM!The stateless executor pattern is appropriate. The struct can be extended later if state (e.g., retry policies, shared dependencies) is needed.
67-76: LGTM!The
waitForfunction correctly handles context cancellation and returns appropriate errors, addressing the previous review feedback.
20-22: No action required.ErrJobNotImplementedis actively used inclient/internal/engine.goto indicate unsupported job types during job dispatch.
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
shared/management/client/mock.go (1)
79-84: Fix infinite recursion in GetPKCEAuthorizationFlow
GetPKCEAuthorizationFlowcurrently calls itself instead of the function field, leading to infinite recursion (stack overflow) whenever the mock is used for PKCE:func (m *MockClient) GetPKCEAuthorizationFlow(serverKey wgtypes.Key) (*proto.PKCEAuthorizationFlow, error) { if m.GetPKCEAuthorizationFlowFunc == nil { return nil, nil } return m.GetPKCEAuthorizationFlow(serverKey) // recursive call }This should delegate to
GetPKCEAuthorizationFlowFunc, like the other mock methods do.Proposed fix
func (m *MockClient) GetPKCEAuthorizationFlow(serverKey wgtypes.Key) (*proto.PKCEAuthorizationFlow, error) { if m.GetPKCEAuthorizationFlowFunc == nil { return nil, nil } - return m.GetPKCEAuthorizationFlow(serverKey) + return m.GetPKCEAuthorizationFlowFunc(serverKey) }management/internals/shared/grpc/server.go (1)
55-77: Guard Server.Job against nil jobManager
Servernow carries ajobManager *job.Manager, andJobunconditionally uses it:updates := s.jobManager.CreateJobChannel(ctx, accountID, peer.ID) ... s.startResponseReceiver(ctx, srv) ... return s.sendJobsLoop(ctx, accountID, peerKey, peer, updates, srv)If
NewServeris ever called with a niljobManager(e.g., misconfigured wiring or feature disabled), anyJobRPC will panic. Either:
- Enforce non-nil
jobManagerinNewServer(panic or return error when nil), or- Treat a nil
jobManageras “feature disabled” and haveJobreturncodes.Unimplementedup front.This will make the behavior explicit and avoid runtime panics.
Also applies to: 79-132, 171-203
♻️ Duplicate comments (10)
management/server/types/job.go (1)
164-201: JSON shape mismatch between proto.BundleResult and api.BundleResult.The issue flagged in the past review remains unresolved. When serializing
r.Bundle(type*proto.BundleResult) directly withjson.Marshalat line 194, the protobufUploadKeyfield (which appears to be a protobuf wrapper type) will produce a different JSON structure than expected byapi.BundleResult(which hasUploadKey *string).This will cause
json.UnmarshalinbuildBundleResponse(line 126) to fail when trying to deserialize succeeded jobs.Consider mapping the proto result to the API result type before marshaling:
🔎 Recommended fix
case *proto.JobResponse_Bundle: - if j.Workload.Result, err = json.Marshal(r.Bundle); err != nil { - return fmt.Errorf("failed to marshal workload results: %w", err) - } + apiResult := api.BundleResult{} + if r.Bundle != nil && r.Bundle.UploadKey != nil { + key := r.Bundle.UploadKey.GetValue() + apiResult.UploadKey = &key + } + if j.Workload.Result, err = json.Marshal(apiResult); err != nil { + return fmt.Errorf("failed to marshal workload results: %w", err) + }management/server/store/sql_store.go (2)
195-224: GetPeerJobByID still ignores peer ownership (matches prior review concern)
GetPeerJobByIDonly filters onaccount_idandid, so any caller with an account-level context can read any peer’s job, regardless of which peer it belongs to. Earlier review feedback suggested verifying the requesting peer as well.If this method is used from a peer-authenticated path (not just admin/management), consider either:
- Extending the signature to
GetPeerJobByID(ctx, accountID, peerID, jobID)and usingaccountAndPeerIDQueryCondition+" AND id = ?"; or- Restricting this method’s usage to privileged management flows and introducing a peer-scoped variant for peer access.
4197-4215: Handle “peer not found” explicitly in GetPeerIDByKey (prior feedback still applies)
GetPeerIDByKeyuses.Scan(&peerID)and returnspeerIDwithout checking whether any row matched. On a missing peer, GORM leavespeerIDempty andresult.Error == nil, so callers see("", nil)and can easily misinterpret that as “found a peer with empty ID”.To match the patterns used by other getters here (
GetPeerIdByLabel,GetPeerByPeerPubKey, etc.), consider:
- Treating
peerID == ""(orresult.RowsAffected == 0) as “not found”, and- Returning a
NotFounderror (e.g.status.NewPeerNotFoundError(key)or similar).This avoids silent misses and makes error handling clearer at the call site.
client/internal/engine.go (1)
1036-1069: Guard against nil ProfileConfig / ManagementURL in handleBundle (same concern as prior review)
handleBundlecurrently does:uploadKey, err := e.jobExecutor.BundleJob( e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String(), )If
e.config.ProfileConfigorProfileConfig.ManagementURLis nil (e.g., certain test setups or non‑profile-based runs), this will panic when a remote bundle job arrives.Consider adding a defensive check and returning a failed job response instead of crashing the client, e.g.:
if e.config.ProfileConfig == nil || e.config.ProfileConfig.ManagementURL == nil { return nil, fmt.Errorf("profile config or management URL not available for bundle generation") } mgmtURL := e.config.ProfileConfig.ManagementURL.String() uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, mgmtURL)management/server/job/manager.go (2)
47-65: Race condition: Store operation before lock acquisition.The call to
MarkAllPendingJobsAsFailedat lines 50-52 runs before acquiring the lock at line 54. This creates a race with methods likeSendJoborHandleResponsethat may concurrently modify thependingmap or peer state. The store operation and in-memory cleanup should be coordinated under the same lock.🔎 Suggested fix
func (jm *Manager) CreateJobChannel(ctx context.Context, accountID, peerID string) *Channel { + jm.mu.Lock() + defer jm.mu.Unlock() + // all pending jobs stored in db for this peer should be failed if err := jm.Store.MarkAllPendingJobsAsFailed(ctx, accountID, peerID, "Pending job cleanup: marked as failed automatically due to being stuck too long"); err != nil { log.WithContext(ctx).Error(err.Error()) } - jm.mu.Lock() - defer jm.mu.Unlock() - if ch, ok := jm.jobChannels[peerID]; ok { ch.Close() delete(jm.jobChannels, peerID) } ch := NewChannel() jm.jobChannels[peerID] = ch return ch }
67-91: Race condition: Channel reference may become stale after releasing read lock.After releasing the read lock at line 71, another goroutine can call
CloseChannel, which closes and deletes the channel from the map. Thechreference then points to a closed channel, and callingAddEventat line 85 will panic with "send on closed channel".This can occur when:
SendJobacquireschand releases the read lock- The gRPC stream ends, triggering
CloseChannelfrom another goroutineSendJobcallsch.AddEventon the now-closed channel🔎 Suggested fix: hold write lock across the operation
func (jm *Manager) SendJob(ctx context.Context, accountID, peerID string, req *proto.JobRequest) error { - jm.mu.RLock() + jm.mu.Lock() + defer jm.mu.Unlock() + ch, ok := jm.jobChannels[peerID] - jm.mu.RUnlock() if !ok { return fmt.Errorf("peer %s has no channel", peerID) } event := &Event{ PeerID: peerID, Request: req, } - jm.mu.Lock() jm.pending[string(req.ID)] = event - jm.mu.Unlock() if err := ch.AddEvent(ctx, jm.responseWait, event); err != nil { jm.cleanup(ctx, accountID, string(req.ID), err.Error()) return err } return nil }Note: If holding the lock during
AddEvent(which has a 5-minute timeout) is unacceptable, consider a different synchronization strategy such as checking channel validity after re-acquiring the lock or using a channel-level closed flag.management/server/job/channel.go (1)
18-21: Channel.AddEvent can panic on send after Close – guard against closed channel
Close()closesjc.events, butAddEventcan still executejc.events <- eventconcurrently, which will panic if the channel has just been closed.sync.Onceprevents double-close, not send-on-closed.Consider adding internal closed state with locking, or otherwise guaranteeing that:
AddEventreturnsErrJobChannelClosedwithout attempting a send when the channel is closed, andClosecannot close the channel while anAddEventis in the middle of a blocking send.For example, introduce a mutex +
closedflag onChannel, hold a read-lock around the send, and setclosedunder a write-lock before closing the channel, so sends cannot proceed after closure.You may also want to move the timeout responsibility out of
AddEvent(as per the TODO) into the job lifecycle manager, so enqueueing isn’t rejected solely because the buffer is full while a peer is slow or unresponsive.Also applies to: 31-41, 43-47
management/server/peer.go (2)
325-398: Peer job creation: guard against nil jobManager and consider send-vs-persist orderingA couple of points in
CreatePeerJob:
Nil jobManager will panic
am.jobManageris used unconditionally:if !am.jobManager.IsPeerConnected(peerID) { ... } if am.jobManager.IsPeerHasPendingJobs(peerID) { ... } if err := am.jobManager.SendJob(...); err != nil { ... }If
DefaultAccountManageris constructed with a niljobManager(e.g., in older tests or miswired setups), this will panic the server on first job creation. Either:
- Enforce a non-nil
jobManagerat construction and in all call sites, or- Add a defensive check at the top of
CreatePeerJobreturning an internal error orcodes.Unimplementedwhen jobs are disabled.Job is sent before it is persisted
The current order is:
- Check for existing pending jobs.
- Convert to stream request and call
jobManager.SendJob(...).- Only then create the job in the store inside a transaction.
If the DB write fails, the peer may already be executing a job that doesn’t exist in persistent storage or APIs. Consider persisting the job first (status = pending), then sending it to the peer, and, if sending fails, updating the job status to failed with an appropriate reason. This keeps runtime and storage in sync even on failures.
400-451: Ensure job ownership is enforced in GetPeerJobByID
GetPeerJobByIDvalidates that the peer belongs to the account, but doesn’t ensure that the returned job actually belongs to the requested peer:job, err := am.Store.GetPeerJobByID(ctx, accountID, jobID) if err != nil { return nil, err } return job, nilA caller can request
/peers/{peerA}/jobs/{jobOfPeerB}(same account) and still seejobOfPeerB. After loading the job, you should verify thatjob.PeerID == peerIDand return a suitable error (e.g.status.NewPeerNotPartOfAccountError()) otherwise.This is important to keep the per-peer job API consistent with the URL path and to avoid leaking jobs between peers within an account.
management/internals/shared/grpc/server.go (1)
363-385: Don’t swallow sendJob errors in sendJobsLoopIn
sendJobsLoop, ifsendJobfails you only log and returnnil:if err := s.sendJob(ctx, peerKey, event, srv); err != nil { log.WithContext(ctx).Warnf("send job failed: %v", err) return nil }This causes the
JobRPC to complete with an OK status even when encryption, key retrieval, orSendfails, making it impossible for clients or observability to distinguish success from failure.Consider:
- Propagating the error (possibly via
status.Errorf(codes.Internal, ...)) so the RPC status reflects job delivery failure.- Optionally marking the job as failed in the job manager/store when
sendJobreturns an error.This will give callers accurate feedback and improve debuggability of job delivery issues.
Also applies to: 445-468
🧹 Nitpick comments (12)
client/internal/debug/upload.go (1)
61-64: Consider limiting error response body size.The code reads the entire error response body without a size limit. While this is in error paths and less critical, a malicious or misconfigured server could return very large bodies that consume excessive memory.
🔎 Optional: limit error body reads
if putResp.StatusCode != http.StatusOK { - body, _ := io.ReadAll(putResp.Body) + body, _ := io.ReadAll(io.LimitReader(putResp.Body, 1024*1024)) // 1 MB limit return fmt.Errorf("upload status %d: %s", putResp.StatusCode, string(body)) }Apply similar change at line 84 for the GET request error handling.
Also applies to: 83-86
management/server/account_test.go (1)
37-37: JobManager wiring into tests looks correct; consider reusing peersManager instanceThe new
jobimport andjob.NewJobManager(nil, store, peersManager)usage increateManageralign with the updatedBuildManagersignature and keep tests close to production wiring. One minor improvement would be to pass the existingpeersManagerintoephemeral_manager.NewEphemeralManagerinstead of constructing a secondpeers.NewManager(store, permissionsManager)instance, to avoid duplicate managers over the same store in this test setup.Also applies to: 2996-3005
management/server/route_test.go (1)
24-24: Router tests now wire JobManager consistently with account testsUsing
peersManager := peers.NewManager(store, permissionsManager)and passingjob.NewJobManager(nil, store, peersManager)intoBuildManageraligns route tests with the new job‑aware account manager constructor and keeps test wiring close to production. As a minor clean‑up, you could reusepeersManagerwhen creating the ephemeral manager instead of instantiating a second peers manager, but this is non‑blocking for tests.Also applies to: 1292-1301
client/internal/engine_test.go (1)
28-28: JobManager wiring in test management server looks good; reusemetricsinstead ofnilThe new
job.Managerintegration instartManagementis consistent with the extendedBuildManagerandnbgrpc.NewServersignatures. To keep metrics usage consistent, you can construct the JobManager with the realmetricsinstance instead ofnil.Proposed tweak to construct JobManager with metrics
- permissionsManager := permissions.NewManager(store) - peersManager := peers.NewManager(store, permissionsManager) - jobManager := job.NewJobManager(nil, store, peersManager) - - ia, _ := integrations.NewIntegratedValidator(context.Background(), peersManager, nil, eventStore) - - metrics, err := telemetry.NewDefaultAppMetrics(context.Background()) + permissionsManager := permissions.NewManager(store) + peersManager := peers.NewManager(store, permissionsManager) + + ia, _ := integrations.NewIntegratedValidator(context.Background(), peersManager, nil, eventStore) + + metrics, err := telemetry.NewDefaultAppMetrics(context.Background()) require.NoError(t, err) + + jobManager := job.NewJobManager(metrics, store, peersManager)Also applies to: 1601-1638
management/server/nameserver_test.go (1)
21-21: Use the existingmetricswhen constructingJobManagerin testsThe new JobManager wiring in
createNSManageris correct. Since you already have ametricsinstance, you can pass it intojob.NewJobManagerinstead ofnilto keep instrumentation consistent and avoid having a partially wired JobManager in tests.Proposed change
- permissionsManager := permissions.NewManager(store) - peersManager := peers.NewManager(store, permissionsManager) + permissionsManager := permissions.NewManager(store) + peersManager := peers.NewManager(store, permissionsManager) @@ - return BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(nil, store, peersManager), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false) + return BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(metrics, store, peersManager), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)Also applies to: 793-802
management/server/peer_test.go (1)
37-38: JobManager wiring across peer tests is correct; consider reusingmetricsand sharing setupThe updated tests correctly construct a
peersManagerand pass a concreteJobManagerintoBuildManager, which is required after the signature change. Two small cleanups would make this more consistent and DRY:
- Pass
metricsintojob.NewJobManagerinstead ofnilin each test, since you already createmetricsand JobManager supports it.- Factor a small helper that builds
(accountManager, peersManager, jobManager)soTest_RegisterPeerByUser,Test_RegisterPeerBySetupKey,Test_RegisterPeerRollbackOnFailure, andTest_LoginPeerdon’t repeat the same wiring.Illustrative change for one test (`Test_RegisterPeerByUser`)
- permissionsManager := permissions.NewManager(s) - peersManager := peers.NewManager(s, permissionsManager) - - ctx := context.Background() - updateManager := update_channel.NewPeersUpdateManager(metrics) + permissionsManager := permissions.NewManager(s) + peersManager := peers.NewManager(s, permissionsManager) + + ctx := context.Background() + updateManager := update_channel.NewPeersUpdateManager(metrics) @@ - networkMapController := controller.NewController(ctx, s, metrics, updateManager, requestBuffer, MockIntegratedValidator{}, settingsMockManager, "netbird.cloud", port_forwarding.NewControllerMock(), ephemeral_manager.NewEphemeralManager(s, peers.NewManager(s, permissionsManager)), &config.Config{}) - - am, err := BuildManager(context.Background(), nil, s, networkMapController, job.NewJobManager(nil, s, peersManager), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false) + networkMapController := controller.NewController(ctx, s, metrics, updateManager, requestBuffer, MockIntegratedValidator{}, settingsMockManager, "netbird.cloud", port_forwarding.NewControllerMock(), ephemeral_manager.NewEphemeralManager(s, peers.NewManager(s, permissionsManager)), &config.Config{}) + + jobManager := job.NewJobManager(metrics, s, peersManager) + am, err := BuildManager(context.Background(), nil, s, networkMapController, jobManager, nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)You can apply the same pattern to the other three tests or extract a shared helper that returns
am.Also applies to: 1289-1301, 1370-1387, 1528-1541, 1605-1622
shared/management/proto/management.proto (1)
52-54: Job RPC and job messages are well-shaped; consider usingDurationandstringfor clarityThe new
Jobstreaming RPC and its associated messages (JobRequest,JobResponse,BundleParameters,BundleResult) look structurally sound and fit the existing pattern of wrapping concrete payloads insideEncryptedMessage.Two non‑blocking API tweaks that could improve clarity:
- Model job duration explicitly
You already import
google.protobuf.Durationand use it elsewhere. Ifbundle_for_timerepresents a duration, usingDurationmakes units explicit and avoids integer‑unit confusion:message BundleParameters { bool bundle_for = 1; - int64 bundle_for_time = 2; + google.protobuf.Duration bundle_for_time = 2; int32 log_file_count = 3; bool anonymize = 4; }
- Use
stringfor human‑readable reasonsIf
JobResponse.Reasonis meant to carry a textual failure/success reason rather than opaque binary data, astringfield is more idiomatic and avoids encoding ambiguity:message JobResponse { bytes ID = 1; JobStatus status = 2; - bytes Reason = 3; + string reason = 3; oneof workload_results { BundleResult bundle = 10; } }If you deliberately intend to carry serialized or encrypted blobs in
Reason, keeping it asbytesis fine, but consider a more explicit name (e.g.,reason_blob) to signal that.Also applies to: 66-101
client/cmd/testutil_test.go (1)
21-21: Test management server correctly wires JobManager; reusemetricsinstead ofnilThe updated
startManagementhelper now correctly creates apeersmanager, ajobManager, and threadsjobManagerinto bothBuildManagerandnbgrpc.NewServer, keeping this test utility aligned with the new APIs.To keep metrics usage consistent with the rest of the setup, you can pass the existing
metricsintojob.NewJobManagerinstead ofnil:Proposed adjustment
- permissionsManagerMock := permissions.NewMockManager(ctrl) - peersmanager := peers.NewManager(store, permissionsManagerMock) - settingsManagerMock := settings.NewMockManager(ctrl) - - jobManager := job.NewJobManager(nil, store, peersmanager) + permissionsManagerMock := permissions.NewMockManager(ctrl) + peersmanager := peers.NewManager(store, permissionsManagerMock) + settingsManagerMock := settings.NewMockManager(ctrl) @@ - accountManager, err := mgmt.BuildManager(context.Background(), config, store, networkMapController, jobManager, nil, "", eventStore, nil, false, iv, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManagerMock, false) + jobManager := job.NewJobManager(metrics, store, peersmanager) + + accountManager, err := mgmt.BuildManager(context.Background(), config, store, networkMapController, jobManager, nil, "", eventStore, nil, false, iv, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManagerMock, false) @@ - mgmtServer, err := nbgrpc.NewServer(config, accountManager, settingsMockManager, jobManager, secretsManager, nil, nil, &mgmt.MockIntegratedValidator{}, networkMapController) + mgmtServer, err := nbgrpc.NewServer(config, accountManager, settingsMockManager, jobManager, secretsManager, nil, nil, &mgmt.MockIntegratedValidator{}, networkMapController)Also applies to: 100-105, 124-125, 133-134
management/server/http/handlers/peers/peers_handler.go (1)
31-41: Job handlers are correctly wired but consider aligning create response status with OpenAPIAuth extraction, peer scoping, and delegation to
types.NewJobandaccountManager.{CreatePeerJob,GetAllPeerJobs,GetPeerJobByID}are consistent with existing patterns, and error handling viautil.WriteError/WriteErrorResponselooks fine.One divergence:
CreateJobusesutil.WriteJSONObject, which always returns HTTP 200, while the OpenAPI spec declares201for successful creates. Consider either:
- Switching to a custom encoder here that sets
w.WriteHeader(http.StatusCreated), or- Adjusting the spec to 200, if you prefer that convention.
Also applies to: 51-142
shared/management/http/api/openapi.yml (1)
2410-2513: Job paths look correct; consider tightening response contractThe new peer‑scoped job endpoints are consistent with existing peer routes in terms of auth, path params, and payload types.
Two contract nits:
POST /api/peers/{peerId}/jobsdocuments201but the handler currently responds with200viaWriteJSONObject; either adjust the handler or change the spec to 200.- For
GEToperations, you may want to add a documented404response (reusingcomponents/responses/not_found) for missing peers/jobs, as other resources often do.client/internal/engine.go (1)
224-226: Job stream integration is sound, but not tracked in shutdownWgWiring
jobExecutorviaNewEngineand starting themgmClient.Jobloop inreceiveJobEvents()with its ownjobExecutorWGis reasonable:Stop()waits onjobExecutorWGbefore tearing down the engine, so job goroutines won’t outlive the context or DB clients.If you want full symmetry with other long‑running loops, you could also account for this goroutine in
shutdownWg; functionally it’s already safely terminated, so this is just an observability/consistency tweak.Also applies to: 239-271, 997-1034
management/server/job/manager.go (1)
172-182: Consider using an index for peer-to-jobs lookup.
IsPeerHasPendingJobsiterates over all pending jobs to find any belonging to the given peer. If the pending map grows large, this O(n) scan could become a bottleneck. Consider maintaining a secondary index likepeerJobs map[string]map[string]struct{}(peerID → set of jobIDs) for O(1) lookup.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (4)
client/proto/daemon.pb.gois excluded by!**/*.pb.gogo.sumis excluded by!**/*.sumshared/management/proto/management.pb.gois excluded by!**/*.pb.goshared/management/proto/management_grpc.pb.gois excluded by!**/*.pb.go
📒 Files selected for processing (54)
client/cmd/debug.goclient/cmd/status.goclient/cmd/testutil_test.goclient/cmd/up.goclient/embed/embed.goclient/internal/connect.goclient/internal/debug/debug.goclient/internal/debug/upload.goclient/internal/debug/upload_test.goclient/internal/engine.goclient/internal/engine_test.goclient/jobexec/executor.goclient/proto/daemon.protoclient/server/debug.goclient/server/server.goclient/server/server_test.goclient/status/status.goclient/status/status_test.goclient/ui/debug.gogo.modmanagement/internals/modules/peers/manager.gomanagement/internals/modules/peers/manager_mock.gomanagement/internals/server/boot.gomanagement/internals/server/controllers.gomanagement/internals/server/modules.gomanagement/internals/shared/grpc/server.gomanagement/server/account.gomanagement/server/account/manager.gomanagement/server/account_test.gomanagement/server/activity/codes.gomanagement/server/dns_test.gomanagement/server/http/handlers/peers/peers_handler.gomanagement/server/http/testing/testing_tools/channel/channel.gomanagement/server/job/channel.gomanagement/server/job/manager.gomanagement/server/management_proto_test.gomanagement/server/management_test.gomanagement/server/mock_server/account_mock.gomanagement/server/nameserver_test.gomanagement/server/peer.gomanagement/server/peer_test.gomanagement/server/permissions/modules/module.gomanagement/server/route_test.gomanagement/server/store/sql_store.gomanagement/server/store/store.gomanagement/server/types/job.goshared/management/client/client.goshared/management/client/client_test.goshared/management/client/grpc.goshared/management/client/mock.goshared/management/http/api/generate.shshared/management/http/api/openapi.ymlshared/management/http/api/types.gen.goshared/management/proto/management.proto
💤 Files with no reviewable changes (1)
- client/proto/daemon.proto
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-13T00:29:53.247Z
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/cmd/ssh_exec_unix.go:53-74
Timestamp: 2025-11-13T00:29:53.247Z
Learning: In client/ssh/server/executor_unix.go, the method ExecuteWithPrivilegeDrop(ctx context.Context, config ExecutorConfig) has a void return type (no error return). It handles failures by exiting the process directly with appropriate exit codes rather than returning errors to the caller.
Applied to files:
client/embed/embed.goclient/jobexec/executor.go
📚 Learning: 2025-11-28T12:20:47.254Z
Learnt from: bcmmbaga
Repo: netbirdio/netbird PR: 4849
File: management/internals/modules/zones/manager/manager.go:55-86
Timestamp: 2025-11-28T12:20:47.254Z
Learning: In the NetBird management server, DNS zones without records are automatically filtered out in network map generation (filterPeerAppliedZones in management/internals/controllers/network_map/controller/controller.go checks `len(zone.Records) == 0`). Therefore, CreateZone operations don't need to call UpdateAccountPeers since empty zones don't affect the network map.
Applied to files:
shared/management/client/client_test.go
🧬 Code graph analysis (31)
management/internals/server/controllers.go (2)
management/server/job/manager.go (2)
Manager(24-32)NewJobManager(34-45)management/internals/server/container.go (1)
Create(6-10)
client/server/server_test.go (3)
management/server/job/manager.go (1)
NewJobManager(34-45)management/server/account.go (1)
BuildManager(178-266)management/internals/shared/grpc/server.go (1)
NewServer(80-132)
shared/management/client/client.go (3)
management/server/types/job.go (1)
Job(34-58)shared/management/http/api/types.gen.go (2)
JobRequest(714-716)JobResponse(719-727)shared/management/proto/management.pb.go (6)
JobRequest(388-398)JobRequest(413-413)JobRequest(428-430)JobResponse(463-475)JobResponse(490-490)JobResponse(505-507)
client/cmd/testutil_test.go (3)
management/server/job/manager.go (1)
NewJobManager(34-45)management/server/account.go (1)
BuildManager(178-266)management/internals/shared/grpc/server.go (1)
NewServer(80-132)
management/server/route_test.go (3)
management/internals/modules/peers/manager.go (1)
NewManager(44-49)management/server/account.go (1)
BuildManager(178-266)management/server/job/manager.go (1)
NewJobManager(34-45)
management/server/management_proto_test.go (1)
management/server/job/manager.go (1)
NewJobManager(34-45)
management/server/http/testing/testing_tools/channel/channel.go (3)
management/server/job/manager.go (1)
NewJobManager(34-45)management/internals/modules/peers/ephemeral/manager/ephemeral.go (1)
NewEphemeralManager(53-61)management/server/account.go (1)
BuildManager(178-266)
management/server/types/job.go (2)
shared/management/proto/management.pb.go (20)
JobStatus(25-25)JobStatus(57-59)JobStatus(61-63)JobStatus(70-72)JobRequest(388-398)JobRequest(413-413)JobRequest(428-430)BundleParameters(554-563)BundleParameters(578-578)BundleParameters(593-595)BundleResult(625-631)BundleResult(646-646)BundleResult(661-663)JobResponse(463-475)JobResponse(490-490)JobResponse(505-507)JobResponse_Bundle(548-550)JobResponse_Bundle(552-552)JobRequest_Bundle(457-459)JobRequest_Bundle(461-461)shared/management/http/api/types.gen.go (8)
JobRequest(714-716)WorkloadResponse(1998-2000)BundleParameters(367-379)BundleResult(382-384)BundleWorkloadResponse(397-405)WorkloadTypeBundle(198-198)WorkloadRequest(1993-1995)JobResponse(719-727)
client/internal/debug/upload.go (1)
upload-server/types/upload.go (3)
GetURLResponse(15-18)ClientHeader(5-5)ClientHeaderValue(7-7)
client/cmd/status.go (1)
client/status/status.go (1)
ConvertToStatusOutputOverview(122-169)
management/server/mock_server/account_mock.go (1)
management/server/types/job.go (1)
Job(34-58)
shared/management/client/grpc.go (3)
shared/management/proto/management.pb.go (6)
EncryptedMessage(322-333)EncryptedMessage(348-348)EncryptedMessage(363-365)SyncResponse(721-738)SyncResponse(753-753)SyncResponse(768-770)shared/management/proto/management_grpc.pb.go (2)
ManagementService_JobClient(169-173)ManagementService_SyncClient(89-92)encryption/message.go (2)
EncryptMessage(10-24)DecryptMessage(27-40)
client/cmd/up.go (1)
util/log.go (1)
FindFirstLogPath(77-84)
management/server/store/store.go (1)
management/server/types/job.go (1)
Job(34-58)
management/internals/shared/grpc/server.go (4)
management/server/job/manager.go (2)
Manager(24-32)Event(18-22)shared/management/proto/management.pb.go (3)
EncryptedMessage(322-333)EncryptedMessage(348-348)EncryptedMessage(363-365)management/server/job/channel.go (2)
Channel(18-21)ErrJobChannelClosed(15-15)encryption/message.go (1)
EncryptMessage(10-24)
client/ui/debug.go (1)
shared/management/status/error.go (1)
Error(54-57)
management/internals/server/boot.go (1)
management/internals/shared/grpc/server.go (1)
NewServer(80-132)
management/server/nameserver_test.go (3)
management/internals/modules/peers/manager.go (1)
NewManager(44-49)management/server/account.go (1)
BuildManager(178-266)management/server/job/manager.go (1)
NewJobManager(34-45)
shared/management/client/client_test.go (4)
management/server/job/manager.go (1)
NewJobManager(34-45)management/internals/controllers/network_map/controller/controller.go (1)
NewController(75-111)management/internals/modules/peers/ephemeral/manager/ephemeral.go (1)
NewEphemeralManager(53-61)management/server/account.go (1)
BuildManager(178-266)
client/internal/engine.go (5)
client/internal/profilemanager/config.go (1)
Config(90-161)client/jobexec/executor.go (2)
Executor(24-25)NewExecutor(27-29)shared/management/proto/management.pb.go (19)
JobRequest(388-398)JobRequest(413-413)JobRequest(428-430)JobResponse(463-475)JobResponse(490-490)JobResponse(505-507)JobRequest_Bundle(457-459)JobRequest_Bundle(461-461)BundleParameters(554-563)BundleParameters(578-578)BundleParameters(593-595)JobResponse_Bundle(548-550)JobResponse_Bundle(552-552)SyncResponse(721-738)SyncResponse(753-753)SyncResponse(768-770)BundleResult(625-631)BundleResult(646-646)BundleResult(661-663)shared/management/status/error.go (2)
Errorf(70-75)Error(54-57)client/internal/debug/debug.go (2)
GeneratorDependencies(243-248)BundleConfig(237-241)
management/server/peer_test.go (3)
management/server/permissions/manager.go (1)
NewManager(33-37)management/server/account.go (1)
BuildManager(178-266)management/server/job/manager.go (1)
NewJobManager(34-45)
management/server/account/manager.go (1)
management/server/types/job.go (1)
Job(34-58)
client/internal/connect.go (1)
client/internal/engine.go (1)
EngineConfig(84-143)
client/internal/debug/upload_test.go (2)
client/internal/debug/upload.go (1)
UploadDebugBundle(17-28)upload-server/types/upload.go (1)
GetURLPath(9-9)
management/server/dns_test.go (2)
management/server/account.go (1)
BuildManager(178-266)management/server/job/manager.go (1)
NewJobManager(34-45)
client/status/status_test.go (1)
client/status/status.go (1)
ConvertToStatusOutputOverview(122-169)
management/server/http/handlers/peers/peers_handler.go (4)
management/server/context/auth.go (1)
GetUserAuthFromContext(25-30)shared/management/http/util/util.go (3)
WriteError(84-120)WriteErrorResponse(70-80)WriteJSONObject(27-35)shared/management/http/api/types.gen.go (3)
JobRequest(714-716)JobResponse(719-727)JobResponseStatus(730-730)management/server/types/job.go (3)
NewJob(67-103)Job(34-58)Workload(60-64)
management/server/store/sql_store.go (2)
management/server/types/job.go (3)
Job(34-58)JobStatusPending(18-18)JobStatusFailed(20-20)management/server/store/store.go (2)
LockingStrength(40-40)LockingStrengthNone(47-47)
client/server/server.go (1)
client/status/status.go (1)
ToProtoFullStatus(549-635)
management/server/job/manager.go (2)
management/server/job/channel.go (2)
Channel(18-21)NewChannel(23-29)management/server/types/job.go (1)
Job(34-58)
management/internals/server/modules.go (1)
management/server/account.go (1)
BuildManager(178-266)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
- GitHub Check: Management / Unit (amd64, mysql)
- GitHub Check: Relay / Unit (amd64, -race)
- GitHub Check: Management / Benchmark (API) (amd64, sqlite)
- GitHub Check: Management / Benchmark (API) (amd64, postgres)
- GitHub Check: Client / Unit (386)
- GitHub Check: Management / Benchmark (amd64, sqlite)
- GitHub Check: Management / Unit (amd64, sqlite)
- GitHub Check: Client / Unit (amd64)
- GitHub Check: Management / Unit (amd64, postgres)
- GitHub Check: Management / Benchmark (amd64, postgres)
- GitHub Check: Client (Docker) / Unit
- GitHub Check: Android / Build
- GitHub Check: Client / Unit
- GitHub Check: Client / Unit
- GitHub Check: release
- GitHub Check: Windows
- GitHub Check: Client / Unit
| req, err := http.NewRequestWithContext(ctx, "PUT", response.URL, fileData) | ||
| if err != nil { | ||
| return fmt.Errorf("create PUT request: %w", err) | ||
| } | ||
|
|
||
| req.ContentLength = stat.Size() | ||
| req.Header.Set("Content-Type", "application/octet-stream") | ||
|
|
||
| putResp, err := http.DefaultClient.Do(req) | ||
| if err != nil { | ||
| return fmt.Errorf("upload failed: %v", err) | ||
| } | ||
| defer putResp.Body.Close() |
There was a problem hiding this comment.
Configure HTTP client timeout to prevent indefinite hangs.
The http.DefaultClient has no timeout configured. If the upload service is unresponsive, the request could hang indefinitely, blocking the debug bundle generation.
🔎 Recommended fix: use context-aware client with timeout
- req, err := http.NewRequestWithContext(ctx, "PUT", response.URL, fileData)
- if err != nil {
- return fmt.Errorf("create PUT request: %w", err)
- }
-
- req.ContentLength = stat.Size()
- req.Header.Set("Content-Type", "application/octet-stream")
-
- putResp, err := http.DefaultClient.Do(req)
+ req, err := http.NewRequestWithContext(ctx, "PUT", response.URL, fileData)
+ if err != nil {
+ return fmt.Errorf("create PUT request: %w", err)
+ }
+
+ req.ContentLength = stat.Size()
+ req.Header.Set("Content-Type", "application/octet-stream")
+
+ client := &http.Client{Timeout: 5 * time.Minute}
+ putResp, err := client.Do(req)Note: A 5-minute timeout is reasonable for a 50 MB upload over typical network conditions. Adjust based on your expected network performance.
🤖 Prompt for AI Agents
In client/internal/debug/upload.go around lines 47 to 59, the code uses
http.DefaultClient which has no timeout and can hang indefinitely; replace use
of http.DefaultClient.Do with a dedicated http.Client configured with a
reasonable Timeout (e.g., 5 * time.Minute) and use the existing ctx on the
request so the request is cancellable; instantiate the client locally (or reuse
a package-level client) with Timeout set, call client.Do(req), and keep the
existing defer putResp.Body.Close() handling.
| getReq, err := http.NewRequestWithContext(ctx, "GET", url+"?id="+id, nil) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("create GET request: %w", err) | ||
| } | ||
|
|
||
| getReq.Header.Set(types.ClientHeader, types.ClientHeaderValue) | ||
|
|
||
| resp, err := http.DefaultClient.Do(getReq) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("get presigned URL: %w", err) | ||
| } | ||
| defer resp.Body.Close() |
There was a problem hiding this comment.
Configure HTTP client timeout for presigned URL request.
Similar to the upload request, the http.DefaultClient has no timeout configured for fetching the presigned URL. This could hang indefinitely if the service is unresponsive.
🔎 Recommended fix: use client with timeout
getReq, err := http.NewRequestWithContext(ctx, "GET", url+"?id="+id, nil)
if err != nil {
return nil, fmt.Errorf("create GET request: %w", err)
}
getReq.Header.Set(types.ClientHeader, types.ClientHeaderValue)
- resp, err := http.DefaultClient.Do(getReq)
+ client := &http.Client{Timeout: 30 * time.Second}
+ resp, err := client.Do(getReq)Note: A 30-second timeout is reasonable for fetching a presigned URL.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| getReq, err := http.NewRequestWithContext(ctx, "GET", url+"?id="+id, nil) | |
| if err != nil { | |
| return nil, fmt.Errorf("create GET request: %w", err) | |
| } | |
| getReq.Header.Set(types.ClientHeader, types.ClientHeaderValue) | |
| resp, err := http.DefaultClient.Do(getReq) | |
| if err != nil { | |
| return nil, fmt.Errorf("get presigned URL: %w", err) | |
| } | |
| defer resp.Body.Close() | |
| getReq, err := http.NewRequestWithContext(ctx, "GET", url+"?id="+id, nil) | |
| if err != nil { | |
| return nil, fmt.Errorf("create GET request: %w", err) | |
| } | |
| getReq.Header.Set(types.ClientHeader, types.ClientHeaderValue) | |
| client := &http.Client{Timeout: 30 * time.Second} | |
| resp, err := client.Do(getReq) | |
| if err != nil { | |
| return nil, fmt.Errorf("get presigned URL: %w", err) | |
| } | |
| defer resp.Body.Close() |
🤖 Prompt for AI Agents
In client/internal/debug/upload.go around lines 70 to 81, the code uses
http.DefaultClient to fetch the presigned URL which has no timeout and can hang;
replace that call with a dedicated http.Client configured with a reasonable
timeout (e.g., 30s) and use that client's Do method (or reuse the existing
upload client if available) to perform the request, propagating context as
before and keeping the same header and error handling.
|
|
||
| AccountAutoUpdateVersionUpdated: {"Account AutoUpdate Version updated", "account.settings.auto.version.update"}, | ||
|
|
||
| JobCreatedByUser: {"Create Job for peer", "peer.job.create"}, |
There was a problem hiding this comment.
Use past tense for consistency with existing messages.
The message format should follow the established pattern of using simple past tense. All existing activity messages use past tense (e.g., "Peer added", "Route created", "User joined"), while "Create Job for peer" uses imperative form.
🔎 Suggested fix
- JobCreatedByUser: {"Create Job for peer", "peer.job.create"},
+ JobCreatedByUser: {"Job created for peer", "peer.job.create"},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| JobCreatedByUser: {"Create Job for peer", "peer.job.create"}, | |
| JobCreatedByUser: {"Job created for peer", "peer.job.create"}, |
🤖 Prompt for AI Agents
In management/server/activity/codes.go around line 301, the activity message for
JobCreatedByUser uses imperative tense ("Create Job for peer") instead of past
tense like the other entries; change the message string to past tense — e.g.,
"Job created for peer" — keeping the same activity key ("peer.job.create") so
the pair remains {"Job created for peer", "peer.job.create"}.
| // SaveJob persists a job in DB | ||
| func (s *SqlStore) CreatePeerJob(ctx context.Context, job *types.Job) error { | ||
| result := s.db.Create(job) | ||
| if result.Error != nil { | ||
| log.WithContext(ctx).Errorf("failed to create job in store: %s", result.Error) | ||
| return status.Errorf(status.Internal, "failed to create job in store") | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| func (s *SqlStore) CompletePeerJob(ctx context.Context, job *types.Job) error { | ||
| result := s.db. | ||
| Model(&types.Job{}). | ||
| Where(idQueryCondition, job.ID). | ||
| Updates(job) | ||
|
|
||
| if result.Error != nil { | ||
| log.WithContext(ctx).Errorf("failed to update job in store: %s", result.Error) | ||
| return status.Errorf(status.Internal, "failed to update job in store") | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| // job was pending for too long and has been cancelled | ||
| func (s *SqlStore) MarkPendingJobsAsFailed(ctx context.Context, accountID, peerID, jobID, reason string) error { | ||
| now := time.Now().UTC() | ||
| result := s.db. | ||
| Model(&types.Job{}). | ||
| Where(accountAndPeerIDQueryCondition+" AND id = ?"+" AND status = ?", accountID, peerID, jobID, types.JobStatusPending). | ||
| Updates(types.Job{ | ||
| Status: types.JobStatusFailed, | ||
| FailedReason: reason, | ||
| CompletedAt: &now, | ||
| }) | ||
| if result.Error != nil { | ||
| log.WithContext(ctx).Errorf("failed to mark pending jobs as Failed job in store: %s", result.Error) | ||
| return status.Errorf(status.Internal, "failed to mark pending job as Failed in store") | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| // job was pending for too long and has been cancelled | ||
| func (s *SqlStore) MarkAllPendingJobsAsFailed(ctx context.Context, accountID, peerID, reason string) error { | ||
| now := time.Now().UTC() | ||
| result := s.db. | ||
| Model(&types.Job{}). | ||
| Where(accountAndPeerIDQueryCondition+" AND status = ?", accountID, peerID, types.JobStatusPending). | ||
| Updates(types.Job{ | ||
| Status: types.JobStatusFailed, | ||
| FailedReason: reason, | ||
| CompletedAt: &now, | ||
| }) | ||
| if result.Error != nil { | ||
| log.WithContext(ctx).Errorf("failed to mark pending jobs as Failed job in store: %s", result.Error) | ||
| return status.Errorf(status.Internal, "failed to mark pending job as Failed in store") | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Consider checking RowsAffected in job update helpers
CreatePeerJob, CompletePeerJob, MarkPendingJobsAsFailed, and MarkAllPendingJobsAsFailed log DB errors but don’t distinguish between “no matching row” and a successful update. That makes silent no-ops hard to detect, especially for idempotent job transitions.
If you care about detecting missing jobs, consider checking result.RowsAffected and returning a NotFound status error when it’s zero, similar to patterns used elsewhere in this file (e.g. UpdateAccountDomainAttributes, SavePeerStatus).
| // Verify that the message is intended for this peer | ||
| currentPeerID := c.key.PublicKey().String() | ||
| if encryptedMsg.WgPubKey != currentPeerID { | ||
| err := fmt.Errorf("job request peer ID mismatch: expected %s, got %s", currentPeerID, encryptedMsg.WgPubKey) | ||
| log.Warnf("%v", err) | ||
| return nil, err | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the actual code in the file around the mentioned lines
fd -t f grpc.go | grep -E "shared/management/client"Repository: netbirdio/netbird
Length of output: 93
🏁 Script executed:
# Get the actual implementation of receiveJobRequest and surrounding code
cat -n shared/management/client/grpc.go | sed -n '240,270p'Repository: netbirdio/netbird
Length of output: 1207
🏁 Script executed:
# Search for the sendJob implementation on the server side
rg -t go "func.*sendJob" -A 15Repository: netbirdio/netbird
Length of output: 3863
🏁 Script executed:
# Look for how WgPubKey is set in the server implementation
rg -t go "WgPubKey.*=" -B 3 -A 3Repository: netbirdio/netbird
Length of output: 9479
🏁 Script executed:
# Check how other encrypted management messages validate WgPubKey
rg -t go "encryptedMsg.WgPubKey" -B 5 -A 5Repository: netbirdio/netbird
Length of output: 809
Job request peer-ID check is inverted – causes all job messages to be rejected
The client's receiveJobRequest validates that incoming messages have WgPubKey equal to the client's own public key:
currentPeerID := c.key.PublicKey().String()
if encryptedMsg.WgPubKey != currentPeerID {
return nil, fmt.Errorf("job request peer ID mismatch: expected %s, got %s", currentPeerID, encryptedMsg.WgPubKey)
}However, on the server side, sendJob sets WgPubKey to the server's WireGuard public key. This causes the client to reject every job request as a peer ID mismatch.
Validate against the expected serverPubKey instead (which is available in the function signature), matching the message semantics where the sender's key is embedded.
🤖 Prompt for AI Agents
In shared/management/client/grpc.go around lines 246 to 252, the peer-ID
comparison is inverted: the code compares the incoming message's WgPubKey to the
client's own public key, but the message contains the sender (server) WireGuard
key. Change the check to validate encryptedMsg.WgPubKey against the serverPubKey
parameter (i.e., if encryptedMsg.WgPubKey != serverPubKey then log and return an
error), and update the error/log message to state the expected serverPubKey and
the received WgPubKey for clarity.
# Conflicts: # client/cmd/debug.go # client/cmd/testutil_test.go # client/internal/engine_test.go # client/server/server.go # client/server/server_test.go # client/ui/debug.go # go.sum # management/internals/server/boot.go # management/internals/shared/grpc/server.go # management/server/account/manager.go # management/server/activity/codes.go # management/server/http/testing/testing_tools/channel/channel.go # management/server/management_proto_test.go # management/server/mock_server/account_mock.go # management/server/permissions/modules/module.go # management/server/store/sql_store.go # management/server/store/store.go # shared/management/client/client_test.go # shared/management/http/api/openapi.yml # shared/management/http/api/types.gen.go
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
client/server/server.go (1)
15-20: Remove duplicatewgtypesimport on line 19.The same package is imported twice at lines 16 and 19, which will cause a Go compilation error.
🛠️ Proposed fix
- "golang.zx2c4.com/wireguard/wgctrl/wgtypes"
🤖 Fix all issues with AI agents
In `@management/internals/shared/grpc/server.go`:
- Around line 177-208: The Server.Job handler must guard against a nil job
manager to avoid a panic; before calling s.jobManager.CreateJobChannel (and
before s.startResponseReceiver/sendJobsLoop), check if s.jobManager == nil and
return a gRPC error (e.g., status.Errorf(codes.Unavailable, "job manager not
available") or codes.Internal with an explanatory message). Update the nil-check
in the Job method around the CreateJobChannel usage and ensure you return the
gRPC status error instead of continuing when s.jobManager is nil.
♻️ Duplicate comments (9)
management/internals/shared/grpc/server.go (1)
371-391: sendJobsLoop still swallows send errors.
Same issue as previously flagged: returningnilonsendJobfailure hides stream errors from callers.client/internal/engine.go (1)
1036-1059: Potential nil dereference inhandleBundlestill present.
e.config.ProfileConfigorManagementURLcan be nil, causing a panic.🛠️ Suggested fix
func (e *Engine) handleBundle(params *mgmProto.BundleParameters) (*mgmProto.JobResponse_Bundle, error) { log.Infof("handle remote debug bundle request: %s", params.String()) syncResponse, err := e.GetLatestSyncResponse() if err != nil { log.Warnf("get latest sync response: %v", err) } @@ waitFor := time.Duration(params.BundleForTime) * time.Minute - - uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String()) + if e.config == nil || e.config.ProfileConfig == nil || e.config.ProfileConfig.ManagementURL == nil { + return nil, errors.New("profile config or management URL not available for bundle generation") + } + uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String()) if err != nil { return nil, err }shared/management/http/api/types.gen.go (1)
387-400: Type mismatch with proto definition persists.As noted in a previous review,
BundleForTimeusesinthere butint64in the proto definition, andLogFileCountusesinthere butint32in proto. Since this is generated code, the fix should be applied to the OpenAPI specification (openapi.yml) by explicitly specifyingformat: int64forBundleForTimeandformat: int32forLogFileCount.management/server/peer.go (3)
329-355: Guard against niljobManagerbefore use.
CreatePeerJobdereferencesam.jobManagerdirectly; if it’s nil in any deployment/test wiring, this will panic.🛠️ Suggested guard
func (am *DefaultAccountManager) CreatePeerJob(ctx context.Context, accountID, peerID, userID string, job *types.Job) error { + if am.jobManager == nil { + return status.Errorf(status.Internal, "job manager is not configured") + } allowed, err := am.permissionsManager.ValidateUserPermissions(ctx, accountID, userID, modules.RemoteJobs, operations.Create) if err != nil { return status.NewPermissionValidationError(err) }
369-399: Persist before dispatch to avoid orphaned jobs.Sending the job prior to DB persistence can leave a running job with no record if the transaction fails. Consider storing first (pending), then sending, and marking failed if dispatch fails.
431-454: Verify job ownership inGetPeerJobByID.The method returns any job by
jobIDwithin the account without confirming it belongs topeerID. That allows cross‑peer access within the same account.🔒 Suggested ownership check
job, err := am.Store.GetPeerJobByID(ctx, accountID, jobID) if err != nil { return nil, err } + if job.PeerID != peerID { + return nil, status.NewPeerNotPartOfAccountError() + } return job, nilmanagement/server/store/sql_store.go (3)
158-205: Return NotFound when no job rows are updated.
CompletePeerJoband the “mark failed” helpers treat zero‑row updates as success, which can silently mask missing jobs. Consider checkingRowsAffectedand returning a NotFound error.✅ Example pattern (apply similarly to other job updates)
result := s.db. Model(&types.Job{}). Where(accountAndPeerIDQueryCondition+" AND id = ?"+" AND status = ?", accountID, peerID, jobID, types.JobStatusPending). Updates(types.Job{ Status: types.JobStatusFailed, FailedReason: reason, CompletedAt: &now, }) if result.Error != nil { log.WithContext(ctx).Errorf("failed to mark pending jobs as Failed job in store: %s", result.Error) return status.Errorf(status.Internal, "failed to mark pending job as Failed in store") } +if result.RowsAffected == 0 { + return status.Errorf(status.NotFound, "job %s not found", jobID) +} return nil
207-221: Scope job fetches to peer ownership.
GetPeerJobByIDfilters byaccount_id+idonly. If peer scoping is required, addpeer_idto the query (or make this method explicitly “account‑wide only”).
4459-4477: Return NotFound when key lookup yields no rows.
Scan(&peerID)can returnnileven when no rows match, resulting in("", nil)and ambiguous callers.✅ Suggested handling
result := tx.Model(&nbpeer.Peer{}). Select("id"). Where(GetKeyQueryCondition(s), key). Limit(1). Scan(&peerID) if result.Error != nil { log.WithContext(ctx).Errorf("failed to get peer ID by key: %s", result.Error) return "", status.Errorf(status.Internal, "failed to get peer ID by key") } + if peerID == "" { + return "", status.Errorf(status.NotFound, "peer not found for key %s", key) + } return peerID, nil
🧹 Nitpick comments (5)
client/status/status.go (1)
584-584: Non-deterministic ordering of Networks field.
maps.Keysreturns keys in non-deterministic order. If consistent ordering matters for display, debugging, or test stability, consider sorting the result.🔧 Suggested fix to ensure deterministic ordering
- pbFullStatus.LocalPeerState.Networks = maps.Keys(fullStatus.LocalPeerState.Routes) + networks := maps.Keys(fullStatus.LocalPeerState.Routes) + sort.Strings(networks) + pbFullStatus.LocalPeerState.Networks = networksSimilarly for line 605:
- Networks: maps.Keys(peerState.GetRoutes()), + Networks: func() []string { + keys := maps.Keys(peerState.GetRoutes()) + sort.Strings(keys) + return keys + }(),Also applies to: 605-605
client/server/debug.go (1)
49-49: Consider propagating the request context.The function receives a context parameter (unused
_) but usescontext.Background()for the upload. If the upload should respect cancellation signals, consider using the request context instead:- key, err := debug.UploadDebugBundle(context.Background(), req.GetUploadURL(), s.config.ManagementURL.String(), path) + key, err := debug.UploadDebugBundle(ctx, req.GetUploadURL(), s.config.ManagementURL.String(), path)If using
context.Background()is intentional (e.g., to ensure uploads complete regardless of client disconnection), a brief comment explaining this would clarify the intent.shared/management/proto/management.proto (2)
66-89: Inconsistent field naming and considerstringforReason.Two observations on these message definitions:
Naming inconsistency:
IDandReasonuse PascalCase whileworkload_parametersandworkload_resultsuse snake_case. Proto3 convention typically uses snake_case for field names.
Reasonas bytes: Usingbytesfor a failure reason is unusual. Unless this field needs to carry binary/encrypted data,stringwould be more appropriate for human-readable error messages.Suggested fix
message JobRequest { - bytes ID = 1; + bytes id = 1; oneof workload_parameters { BundleParameters bundle = 10; } } message JobResponse{ - bytes ID = 1; + bytes id = 1; JobStatus status=2; - bytes Reason=3; + string reason=3; oneof workload_results { BundleResult bundle = 10; } }
91-96: Clarifybundle_forfield purpose.The field name
bundle_foras aboolis ambiguous. The name suggests a target/destination ("bundle for whom/what"), but a boolean doesn't convey that meaning. Consider renaming to clarify its purpose, such as:
enabledif it controls whether bundling is activeinclude_<something>if it's a toggle for including specific contentmanagement/server/http/handlers/peers/peers_handler.go (1)
52-86: Consider returning HTTP 201 Created for resource creation.The
CreateJobhandler usesutil.WriteJSONObjectwhich returns HTTP 200 OK. For RESTful APIs, resource creation via POST should typically return HTTP 201 Created.Suggested approach
Either create a new utility function or inline the response:
w.Header().Set("Content-Type", "application/json; charset=UTF-8") w.WriteHeader(http.StatusCreated) if err := json.NewEncoder(w).Encode(resp); err != nil { util.WriteError(ctx, err, w) return }Alternatively, if
util.WriteJSONObjectis used project-wide for POST creation endpoints and returning 200 is an established pattern, this can remain as-is for consistency.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
client/proto/daemon.pb.gois excluded by!**/*.pb.gogo.sumis excluded by!**/*.sumshared/management/proto/management.pb.gois excluded by!**/*.pb.go
📒 Files selected for processing (41)
client/cmd/debug.goclient/cmd/status.goclient/cmd/testutil_test.goclient/cmd/up.goclient/embed/embed.goclient/internal/connect.goclient/internal/engine.goclient/internal/engine_test.goclient/proto/daemon.protoclient/server/debug.goclient/server/server.goclient/server/server_test.goclient/status/status.goclient/status/status_test.gogo.modmanagement/internals/modules/peers/manager.gomanagement/internals/server/boot.gomanagement/internals/server/controllers.gomanagement/internals/server/modules.gomanagement/internals/shared/grpc/server.gomanagement/server/account.gomanagement/server/account/manager.gomanagement/server/account_test.gomanagement/server/activity/codes.gomanagement/server/dns_test.gomanagement/server/http/handlers/peers/peers_handler.gomanagement/server/http/testing/testing_tools/channel/channel.gomanagement/server/management_proto_test.gomanagement/server/management_test.gomanagement/server/mock_server/account_mock.gomanagement/server/nameserver_test.gomanagement/server/peer.gomanagement/server/peer_test.gomanagement/server/permissions/modules/module.gomanagement/server/route_test.gomanagement/server/store/sql_store.gomanagement/server/store/store.goshared/management/client/client_test.goshared/management/http/api/openapi.ymlshared/management/http/api/types.gen.goshared/management/proto/management.proto
💤 Files with no reviewable changes (1)
- client/proto/daemon.proto
🚧 Files skipped from review as they are similar to previous changes (14)
- management/internals/modules/peers/manager.go
- management/server/account_test.go
- client/embed/embed.go
- go.mod
- management/internals/server/controllers.go
- management/server/activity/codes.go
- management/server/account/manager.go
- management/server/http/testing/testing_tools/channel/channel.go
- management/server/mock_server/account_mock.go
- client/cmd/testutil_test.go
- management/server/route_test.go
- management/server/dns_test.go
- client/status/status_test.go
- management/internals/server/boot.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-28T12:20:47.254Z
Learnt from: bcmmbaga
Repo: netbirdio/netbird PR: 4849
File: management/internals/modules/zones/manager/manager.go:55-86
Timestamp: 2025-11-28T12:20:47.254Z
Learning: In the NetBird management server, DNS zones without records are automatically filtered out in network map generation (filterPeerAppliedZones in management/internals/controllers/network_map/controller/controller.go checks `len(zone.Records) == 0`). Therefore, CreateZone operations don't need to call UpdateAccountPeers since empty zones don't affect the network map.
Applied to files:
management/server/store/store.goshared/management/client/client_test.go
🧬 Code graph analysis (20)
management/server/http/handlers/peers/peers_handler.go (4)
management/server/context/auth.go (1)
GetUserAuthFromContext(25-30)shared/management/http/util/util.go (3)
WriteError(84-120)WriteErrorResponse(70-80)WriteJSONObject(27-35)shared/management/http/api/types.gen.go (3)
JobRequest(816-818)JobResponse(821-829)JobResponseStatus(832-832)management/server/types/job.go (3)
NewJob(67-103)Job(34-58)Workload(60-64)
client/cmd/up.go (1)
util/log.go (1)
FindFirstLogPath(77-84)
client/internal/engine_test.go (3)
management/server/job/manager.go (1)
NewJobManager(34-45)management/server/account.go (1)
BuildManager(178-266)management/internals/server/server.go (1)
NewServer(71-83)
management/server/peer_test.go (4)
management/server/permissions/manager.go (1)
NewManager(33-37)management/server/users/manager.go (1)
NewManager(22-26)management/server/account.go (1)
BuildManager(178-266)management/server/job/manager.go (1)
NewJobManager(34-45)
client/internal/engine.go (3)
client/internal/profilemanager/config.go (1)
Config(91-162)client/jobexec/executor.go (2)
Executor(24-25)NewExecutor(27-29)client/internal/debug/debug.go (1)
GeneratorDependencies(243-248)
management/server/management_test.go (7)
management/internals/modules/peers/manager.go (1)
NewManager(46-51)management/server/settings/manager.go (1)
NewManager(34-41)management/server/networks/routers/manager.go (1)
NewManager(40-46)management/server/networks/resources/manager.go (1)
NewManager(42-49)management/server/permissions/manager.go (1)
NewManager(33-37)management/server/groups/manager.go (1)
NewManager(36-42)management/server/job/manager.go (1)
NewJobManager(34-45)
management/server/nameserver_test.go (2)
management/server/account.go (1)
BuildManager(178-266)management/server/job/manager.go (1)
NewJobManager(34-45)
client/status/status.go (2)
client/internal/peer/status.go (6)
FullStatus(152-163)ManagementState(129-133)SignalState(122-126)LocalPeerState(107-113)RosenpassState(136-139)NSGroupState(143-149)client/proto/daemon.pb.go (21)
FullStatus(2140-2154)FullStatus(2167-2167)FullStatus(2182-2184)ManagementState(1820-1827)ManagementState(1840-1840)ManagementState(1855-1857)SignalState(1759-1766)SignalState(1779-1779)SignalState(1794-1796)LocalPeerState(1666-1677)LocalPeerState(1690-1690)LocalPeerState(1705-1707)PeerState(1485-1507)PeerState(1520-1520)PeerState(1535-1537)RelayState(1881-1888)RelayState(1901-1901)RelayState(1916-1918)NSGroupState(1941-1949)NSGroupState(1962-1962)NSGroupState(1977-1979)
client/cmd/status.go (1)
client/status/status.go (1)
ConvertToStatusOutputOverview(123-170)
management/internals/shared/grpc/server.go (5)
management/server/job/manager.go (2)
Manager(24-32)Event(18-22)shared/management/status/error.go (7)
FromError(78-87)NotFound(21-21)Errorf(70-75)PermissionDenied(18-18)Unauthenticated(39-39)InvalidArgument(27-27)Internal(24-24)shared/management/proto/management.pb.go (9)
JobRequest(388-398)JobRequest(413-413)JobRequest(428-430)JobResponse(463-475)JobResponse(490-490)JobResponse(505-507)EncryptedMessage(322-333)EncryptedMessage(348-348)EncryptedMessage(363-365)management/server/job/channel.go (2)
Channel(18-21)ErrJobChannelClosed(15-15)encryption/message.go (1)
EncryptMessage(10-24)
client/internal/connect.go (3)
client/internal/profilemanager/config.go (1)
Config(91-162)shared/management/proto/management.pb.go (3)
PeerConfig(2082-2100)PeerConfig(2115-2115)PeerConfig(2130-2132)client/internal/engine.go (1)
EngineConfig(84-143)
management/server/peer.go (4)
management/server/types/job.go (2)
Job(34-58)Workload(60-64)management/server/permissions/modules/module.go (1)
RemoteJobs(8-8)shared/management/status/error.go (5)
NewPermissionValidationError(213-215)NewPermissionDeniedError(209-211)NewPeerNotPartOfAccountError(105-107)Errorf(70-75)Type(46-46)management/server/posture/nb_version.go (1)
MeetsMinVersion(57-72)
management/server/store/sql_store.go (2)
management/server/types/job.go (3)
Job(34-58)JobStatusPending(18-18)JobStatusFailed(20-20)management/server/store/store.go (2)
LockingStrength(43-43)LockingStrengthNone(50-50)
client/server/debug.go (1)
client/internal/debug/upload.go (1)
UploadDebugBundle(17-28)
management/internals/server/modules.go (1)
management/server/account.go (1)
BuildManager(178-266)
management/server/store/store.go (1)
management/server/types/job.go (1)
Job(34-58)
shared/management/http/api/types.gen.go (1)
shared/management/proto/management.pb.go (6)
BundleParameters(554-563)BundleParameters(578-578)BundleParameters(593-595)JobRequest(388-398)JobRequest(413-413)JobRequest(428-430)
shared/management/client/client_test.go (4)
management/server/job/manager.go (1)
NewJobManager(34-45)management/internals/controllers/network_map/controller/controller.go (1)
NewController(76-112)management/internals/modules/peers/ephemeral/manager/ephemeral.go (1)
NewEphemeralManager(53-61)management/server/account.go (1)
BuildManager(178-266)
client/server/server_test.go (3)
management/server/job/manager.go (1)
NewJobManager(34-45)management/server/account.go (1)
BuildManager(178-266)management/internals/shared/grpc/server.go (1)
NewServer(84-138)
management/server/management_proto_test.go (2)
management/server/job/manager.go (1)
NewJobManager(34-45)management/internals/shared/grpc/server.go (1)
NewServer(84-138)
🔇 Additional comments (43)
shared/management/client/client_test.go (4)
21-27: LGTM!Import additions are correctly placed and necessary to support the new job management functionality in the test setup.
96-96: LGTM!The
jobManagerinstantiation correctly matches theNewJobManagersignature. Passingnilfor metrics is appropriate in test context.
122-123: LGTM!Both calls correctly match their respective function signatures:
ephemeral_manager.NewEphemeralManageruses the aliased import correctlyBuildManagerincludesjobManagerin the correct position (5th parameter) per the updated signature
134-134: No issues found. Thenbgrpc.NewServercall matches the function signature exactly. All 10 arguments are in the correct order and type:
config→*nbconfig.ConfigaccountManager→account.ManagersettingsMockManager→settings.ManagerjobManager→*job.ManagersecretsManager→SecretsManagernil→telemetry.AppMetricsnil→auth.Managermgmt.MockIntegratedValidator{}→integrated_validator.IntegratedValidatornetworkMapController→network_map.Controllernil→idp.OAuthConfigProvidermanagement/server/permissions/modules/module.go (1)
5-38: LGTM: RemoteJobs is fully registered in permissions.Constant and All map entry are consistent.
management/internals/server/modules.go (1)
88-91: LGTM: JobManager is wired into BuildManager.Call site aligns with the updated signature.
client/server/server.go (1)
1524-1527: LGTM: log path is now passed into ConnectClient.Run.No issues with the updated call.
shared/management/http/api/openapi.yml (3)
41-43: LGTM: Jobs tag added and marked experimental.
47-162: LGTM: Job/workload schemas are well-structured and consistent.
2677-2780: LGTM: Job endpoints are clearly defined with auth and responses.client/internal/connect.go (2)
72-75: LGTM: logPath is threaded through Run and mobile entry points.Also applies to: 95-96, 113-116
286-286: LGTM: EngineConfig now carries LogPath and ProfileConfig.Also applies to: 473-513
client/cmd/up.go (1)
200-204: Log path propagation intoRunlooks correct.
This keeps foreground mode aligned with the new log‑path‑aware debug bundle flow.management/internals/shared/grpc/server.go (3)
3-31: JobManager wiring in the server constructor is consistent.
The field + constructor parameter threading looks clean.Also applies to: 57-122
331-369: Handshake + response receiver flow looks solid.
The sequence and error handling are coherent for the Job stream.
436-448: Encrypted send paths look consistent.
sendUpdateandsendJobfollow the same encryption + send pattern.Also applies to: 453-475
client/server/server_test.go (1)
22-24: JobManager wiring in tests is updated correctly.
The constructor call chain now matches the new signatures.Also applies to: 310-333
client/internal/engine.go (4)
34-54: Config/engine struct extensions look good.
LogPath/ProfileConfig and job executor fields integrate cleanly.Also applies to: 137-143, 206-225
239-271: Engine init + shutdown coordination for job executor looks solid.
New executor creation andWait()on shutdown are appropriate.Also applies to: 339-341
527-530: Job stream hookup is clear and consistent with the existing streams.
Nice to see it aligned with the Signal/Sync error handling pattern.Also applies to: 999-1034
858-869: Sync response persistence is now properly guarded.
The dedicated lock and cloning behavior are good improvements.Also applies to: 1957-1994
client/internal/engine_test.go (1)
28-29: JobManager wiring in engine tests is aligned with new signatures.
The setup looks consistent across server construction paths.Also applies to: 1603-1637
client/status/status.go (3)
14-18: LGTM on new imports.The new imports for
durationpb,timestamppb, andmapsare appropriate for the proto conversion functionality being added.
123-145: LGTM on signature change.The updated
ConvertToStatusOutputOverviewsignature correctly accepts a fully constructed*proto.FullStatusand a separatedaemonVersionstring parameter, aligning with the proto structure refactoring.
558-644: Well-structured conversion function.The
ToProtoFullStatusfunction correctly converts all fields from the internalpeer.FullStatusto*proto.FullStatus. Error handling forManagementState.Error,SignalState.Error, relay errors, and DNS state errors is properly implemented with nil checks before calling.Error().shared/management/http/api/types.gen.go (1)
1-4: Generated file - changes should be made upstream.This file is auto-generated by
oapi-codegen. Any structural changes (like the type mismatch noted in past reviews forBundleParameters.BundleForTimeandLogFileCount) should be addressed in the OpenAPI specification rather than in this file directly.client/cmd/debug.go (1)
350-360: LGTM on field rename.The change from
LogFiletoLogPathaligns with the broader refactoring in the debug package that renames log file references to path-based fields.client/cmd/status.go (1)
102-102: LGTM on updated function call.The call to
ConvertToStatusOutputOverviewcorrectly passesresp.GetFullStatus()as the first argument andresp.GetDaemonVersion()as the third argument, matching the updated function signature.management/server/nameserver_test.go (2)
21-21: LGTM on new import.The
jobpackage import is required for the newJobManagerwiring.
794-801: LGTM on JobManager wiring.The test setup correctly:
- Creates a
peersManagerusingpeers.NewManager(store, permissionsManager)- Passes
job.NewJobManager(nil, store, peersManager)toBuildManagerThis follows the established pattern for wiring
JobManagerinto test infrastructure, consistent with the broader PR changes.management/server/account.go (1)
18-18: LGTM!The JobManager dependency is properly wired into
DefaultAccountManagerfollowing the existing patterns for other managers. The import, field declaration, parameter addition, and assignment are all consistent and correctly implemented.Also applies to: 74-74, 183-183, 206-206
management/server/peer_test.go (1)
1293-1300: LGTM!The test wiring consistently creates
peersManagerviapeers.NewManager(s, permissionsManager)and passesjob.NewJobManager(nil, s, peersManager)toBuildManageracross all four test functions. Usingnilfor metrics is appropriate in test contexts.Also applies to: 1379-1386, 1533-1540, 1614-1621
client/server/debug.go (1)
27-57: LGTM on the refactoring.The field rename from
LogFiletoLogPathimproves clarity, and delegating upload logic todebug.UploadDebugBundleproperly centralizes the functionality.shared/management/proto/management.proto (1)
51-53: LGTM on the new Job RPC.The bidirectional streaming RPC for job execution is well-suited for long-running operations where status updates need to flow in both directions.
management/server/http/handlers/peers/peers_handler.go (1)
39-41: LGTM on the job endpoints and helper.The new routes,
ListJobs,GetJobhandlers, andtoSingleJobResponsehelper follow existing patterns in the codebase. Error handling is consistent, and the response transformation correctly handles the optionalFailedReasonfield.Also applies to: 88-143, 620-640
management/server/management_test.go (2)
33-34: No review note.
205-245: JobManager wiring in tests looks consistent.The updated setup threads the new dependency through BuildManager and NewServer as expected.
management/server/management_proto_test.go (2)
34-34: No review note.
365-387: JobManager wiring in test setup looks good.The dependency is passed through BuildManager and NewServer consistently with the updated API.
management/server/peer.go (1)
34-34: No review note.management/server/store/store.go (1)
229-235: Store interface extension matches new job lifecycle needs.management/server/store/sql_store.go (2)
46-55: No review note.
124-130: No review note.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| func (s *Server) Job(srv proto.ManagementService_JobServer) error { | ||
| reqStart := time.Now() | ||
| ctx := srv.Context() | ||
|
|
||
| peerKey, err := s.handleHandshake(ctx, srv) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| accountID, err := s.accountManager.GetAccountIDForPeerKey(ctx, peerKey.String()) | ||
| if err != nil { | ||
| // nolint:staticcheck | ||
| ctx = context.WithValue(ctx, nbContext.AccountIDKey, "UNKNOWN") | ||
| log.WithContext(ctx).Tracef("peer %s is not registered", peerKey.String()) | ||
| if errStatus, ok := internalStatus.FromError(err); ok && errStatus.Type() == internalStatus.NotFound { | ||
| return status.Errorf(codes.PermissionDenied, "peer is not registered") | ||
| } | ||
| return err | ||
| } | ||
| // nolint:staticcheck | ||
| ctx = context.WithValue(ctx, nbContext.AccountIDKey, accountID) | ||
| peer, err := s.accountManager.GetStore().GetPeerByPeerPubKey(ctx, store.LockingStrengthNone, peerKey.String()) | ||
| if err != nil { | ||
| return status.Errorf(codes.Unauthenticated, "peer is not registered") | ||
| } | ||
|
|
||
| s.startResponseReceiver(ctx, srv) | ||
|
|
||
| updates := s.jobManager.CreateJobChannel(ctx, accountID, peer.ID) | ||
| log.WithContext(ctx).Debugf("Job: took %v", time.Since(reqStart)) | ||
|
|
||
| return s.sendJobsLoop(ctx, accountID, peerKey, peer, updates, srv) |
There was a problem hiding this comment.
Guard Job() when jobManager is nil.
If a server is instantiated without a JobManager, calling Job() will panic on CreateJobChannel. Return a gRPC error instead.
🛠️ Suggested fix
func (s *Server) Job(srv proto.ManagementService_JobServer) error {
reqStart := time.Now()
ctx := srv.Context()
+
+ if s.jobManager == nil {
+ return status.Error(codes.Unimplemented, "job manager not configured")
+ }
peerKey, err := s.handleHandshake(ctx, srv)
if err != nil {
return err
}🤖 Prompt for AI Agents
In `@management/internals/shared/grpc/server.go` around lines 177 - 208, The
Server.Job handler must guard against a nil job manager to avoid a panic; before
calling s.jobManager.CreateJobChannel (and before
s.startResponseReceiver/sendJobsLoop), check if s.jobManager == nil and return a
gRPC error (e.g., status.Errorf(codes.Unavailable, "job manager not available")
or codes.Internal with an explanatory message). Update the nil-check in the Job
method around the CreateJobChannel usage and ensure you return the gRPC status
error instead of continuing when s.jobManager is nil.
…omError` to avoid panic
# Conflicts: # shared/management/proto/management.pb.go
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@management/server/peer.go`:
- Around line 356-362: The current IsPeerHasPendingJobs check is racy; change
the logic in JobManager to provide a single atomic operation (e.g.,
AddPendingJobIfNone or TryEnqueuePendingJob) that both checks for an existing
pending job for peerID and enqueues/records the new pending job under the same
lock, returning a conflict/error if a job already exists; update the call site
in peer.go to call this new JobManager method instead of IsPeerHasPendingJobs so
the check-and-insert is protected by the JobManager's mutex and cannot be
interleaved by concurrent threads.
♻️ Duplicate comments (3)
management/server/peer.go (3)
329-354: Add a nil guard foram.jobManagerbefore first use.
am.jobManageris dereferenced on Line 352. If it’s ever unset (tests/misconfig), this will panic.🛠️ Suggested guard
func (am *DefaultAccountManager) CreatePeerJob(ctx context.Context, accountID, peerID, userID string, job *types.Job) error { + if am.jobManager == nil { + return status.Errorf(status.Internal, "job manager is not configured") + } allowed, err := am.permissionsManager.ValidateUserPermissions(ctx, accountID, userID, modules.RemoteJobs, operations.Create) if err != nil { return status.NewPermissionValidationError(err) }
369-399: Avoid “send before persist” to prevent orphaned jobs.If the DB transaction fails after
SendJob, the peer may execute a job that doesn’t exist in storage. Prefer persisting first (status=pending), then send, and mark failed if send errors.
449-454: Verify the job belongs to the requested peer.
GetPeerJobByIDdoesn’t check thatjob.PeerID == peerID, so a caller can fetch another peer’s job within the same account.🛠️ Suggested check
job, err := am.Store.GetPeerJobByID(ctx, accountID, jobID) if err != nil { return nil, err } + if job.PeerID != peerID { + return nil, status.NewPeerNotPartOfAccountError() + } return job, nil
| // check if already has pending jobs | ||
| // todo: The job checks here are not protected. The user can run this function from multiple threads, | ||
| // and each thread can think there is no job yet. This means entries in the pending job map will be overwritten, | ||
| // and only one will be kept, but potentially another one will overwrite it in the queue. | ||
| if am.jobManager.IsPeerHasPendingJobs(peerID) { | ||
| return status.Errorf(status.BadRequest, "peer already has pending job") | ||
| } |
There was a problem hiding this comment.
Make the pending-job check atomic (current check is racy).
The TODO already notes the race: concurrent calls can both pass and then overwrite pending jobs. This can silently drop or misroute work. Consider a single JobManager method that atomically “check‑and‑enqueue” or returns a conflict.
🤖 Prompt for AI Agents
In `@management/server/peer.go` around lines 356 - 362, The current
IsPeerHasPendingJobs check is racy; change the logic in JobManager to provide a
single atomic operation (e.g., AddPendingJobIfNone or TryEnqueuePendingJob) that
both checks for an existing pending job for peerID and enqueues/records the new
pending job under the same lock, returning a conflict/error if a job already
exists; update the call site in peer.go to call this new JobManager method
instead of IsPeerHasPendingJobs so the check-and-insert is protected by the
JobManager's mutex and cannot be interleaved by concurrent threads.



Describe your changes
This PR adds the ability to trigger debug bundle generation remotely from the Management API/Dashboard.
Core Features
Implementation Details
Client-Side:
jobexecpackage to handle remote job executionManagement-Side:
POST /api/peers/{peerId}/debug-bundleJob()streamIssue ticket number and link
Stack
Checklist
Documentation
Select exactly one:
Docs PR URL (required if "docs added" is checked)
Paste the PR link from https://github.com/netbirdio/docs here:
https://github.com/netbirdio/docs/pull/__
Summary by CodeRabbit
New Features
Bug Fixes / Improvements
Other
✏️ Tip: You can customize this high-level summary in your review settings.