Skip to content

[Management/Client] Trigger debug bundle runs from API/Dashboard (#4592)#4832

Merged
pappz merged 28 commits intomainfrom
feature/remote-debug-release
Jan 19, 2026
Merged

[Management/Client] Trigger debug bundle runs from API/Dashboard (#4592)#4832
pappz merged 28 commits intomainfrom
feature/remote-debug-release

Conversation

@pappz
Copy link
Copy Markdown
Collaborator

@pappz pappz commented Nov 21, 2025

Describe your changes

This PR adds the ability to trigger debug bundle generation remotely from the Management API/Dashboard.

Core Features

  • Remote Debug Bundle Triggering: Request debug bundles from peers via Management API/Dashboard
  • Job Management System: New job dispatch system enabling management server to request work from clients
  • Automatic Upload: Debug bundles are automatically uploaded to a centralized location

Implementation Details

Client-Side:

  • Added jobexec package to handle remote job execution
  • Implemented debug bundle upload functionality
  • Added job event stream handling in engine
  • Modified bundle generation to support remote triggers

Management-Side:

  • Created job manager and channel system for coordinating job lifecycle
  • Added new API endpoint: POST /api/peers/{peerId}/debug-bundle
  • Extended gRPC protocol with bidirectional Job() stream
  • Added job persistence to SQL store

Issue ticket number and link

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • I added/updated documentation for this change
  • Documentation is not needed for this change (explain why)

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

  • New Features

    • Jobs system: create/list/get per-peer jobs (bundle workloads) via API and streaming; background executor to generate/upload bundles; client/server job streaming support.
  • Bug Fixes / Improvements

    • Status output rebuilt from full status plus daemon version for improved summaries.
    • Debug bundle uploads now use presigned streamed uploads and honor a propagated log path.
    • Connect/run flows accept a log path parameter.
  • Other

    • Debug bundles no longer include the previous standalone status string.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Nov 21, 2025

📝 Walkthrough

Walkthrough

Adds a per-peer Job subsystem (streaming RPC, manager, persistence, HTTP API), refactors client debug bundle generation/upload (remove DebugBundleRequest.status, LogFile→LogPath, centralized upload), threads logPath/profile through client/engine, and updates status conversion to use proto.FullStatus plus daemon version.

Changes

Cohort / File(s) Summary
Job subsystem (proto, server, store, types)
shared/management/proto/management.proto, management/server/job/*, management/server/types/job.go, management/server/store/*, management/server/peer.go, management/server/account.go
Adds Job RPC/messages, Job types, channel/manager, persistence (Store + SqlStore migrations/queries), account manager methods, and peer-level job operations with permission/version checks.
gRPC server wiring & handlers
management/internals/shared/grpc/server.go, management/internals/server/controllers.go, management/internals/server/boot.go, management/internals/server/modules.go
Threads JobManager into Server/BaseServer; extends NewServer signature; implements Job stream (handshake, send/receive loops, response receiver) and per-peer job lifecycle integration.
HTTP API & OpenAPI + generated client types
management/server/http/handlers/peers/peers_handler.go, shared/management/http/api/openapi.yml, shared/management/http/api/types.gen.go
Adds peer-scoped Job endpoints (Create/List/Get) and OpenAPI + generated types for Job/Workload (bundle) including request/response schemas and discriminated unions.
Client-side job streaming & client API
shared/management/client/grpc.go, shared/management/client/client.go, shared/management/client/mock.go, shared/management/client/client_test.go
Adds client Job(...) streaming API, centralizes stream handling (withMgmtStream), refactors Sync to reuse wrapper, and adds MockClient job hook.
Job executor (client)
client/jobexec/executor.go
New lightweight Executor with BundleJob to wait (capped), generate debug bundle, upload via UploadDebugBundle, and return upload key; adds MaxBundleWaitTime and ErrJobNotImplemented.
Debug bundle refactor (client)
client/proto/daemon.proto, client/internal/debug/*, client/internal/debug/upload.go, client/internal/debug/upload_test.go, client/cmd/debug.go, client/server/debug.go, client/ui/debug.go
Removes status from DebugBundleRequest; renames GeneratorDependencies.LogFileLogPath; centralizes upload into UploadDebugBundle; removes UI/CLI status-string generation; generator seeds anonymizer from full status and uses LogPath.
Client connect / engine changes
client/internal/connect.go, client/internal/engine.go, client/internal/engine_test.go, client/cmd/up.go, client/embed/embed.go, client/server/server.go
ConnectClient.Run gains logPath param; EngineConfig adds LogPath and ProfileConfig; engine integrates job executor and job event handling (receiveJobEvents/handleBundle); sync-response locking adjusted; callers/tests updated.
Status conversion changes
client/status/status.go, client/cmd/status.go, client/status/status_test.go, client/wasm/cmd/main.go
ConvertToStatusOutputOverview now accepts *proto.FullStatus and daemonVersion; adds ToProtoFullStatus conversion; call sites updated to pass full status + version.
Tests and wiring updates
assorted *_test.go across client/*, management/server/*, shared/*
Tests/scaffolding updated to construct and pass job.NewJobManager(...) into BuildManager/NewServer; mocks extended and imports adjusted.
Auxiliary / tooling / small changes
go.mod, shared/management/http/api/generate.sh, activity codes, peers manager mocks
Adds github.com/oapi-codegen/runtime dep, go-jsonmerge indirect; updates generator script; adds activity code JobCreatedByUser; adds GetPeerID to peers manager and mocks; minor import/format tweaks.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    rect rgba(240,248,255,0.5)
    participant MServer as Management gRPC Server
    participant JobMgr as Job Manager
    participant Store as DB/Store
    end
    rect rgba(245,255,240,0.5)
    participant Client as Peer Client (mgmt stream)
    participant Engine as Client Engine / Job Executor
    participant DebugGen as Debug Bundle Generator
    participant Upload as Upload service
    end

    MServer->>JobMgr: CreatePeerJob(accountID, peerID, JobRequest)
    JobMgr->>Store: Persist job (pending)
    MServer->>Client: Stream JobRequest via Job RPC
    Client->>Engine: Executor.BundleJob(ctx, deps, params, waitFor)
    Engine->>DebugGen: Generate bundle (uses LogPath, StatusRecorder, ProfileConfig)
    DebugGen-->>Engine: bundle file path
    Engine->>Upload: UploadDebugBundle(mgmURL, bundlePath)
    Upload-->>Engine: uploadKey / error
    Engine->>Client: Send JobResponse(uploadKey/status)
    Client->>MServer: Stream JobResponse
    MServer->>JobMgr: HandleResponse(JobResponse)
    JobMgr->>Store: CompletePeerJob (succeeded/failed)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • pascal-fischer
  • mlsmaycon

Poem

🐰

I hopped through patches, neat and spry,
Opened job tunnels beneath the sky,
Logs found a path and bundles took flight,
Upload keys gleam like stars at night,
My whiskers twitch — new jobs feel right.

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.64% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description includes detailed explanations of changes, core features, implementation details on both client and server sides, and addresses the required checklist. However, the Documentation section is incomplete—no selection was made between the two required options, and the Docs PR URL remains a placeholder. Complete the Documentation section by selecting exactly one option and providing the actual Docs PR URL if documentation was added, or clarifying why documentation is not needed for this change.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: enabling remote debug bundle triggering from the API/Dashboard, which accurately reflects the PR's primary objective.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
client/status/status.go (1)

234-315: ToProtoFullStatus mapping is good; add nil‑safety when reading latency/handshake

The new ToProtoFullStatus helper correctly maps:

  • Management and signal state (URL, connected, error).
  • Local peer state (IP, pub key, kernel, FQDN, Rosenpass flags, networks from route keys).
  • Per‑peer fields (status, timestamps, transfer stats, Rosenpass, networks, latency, SSH host key).
  • Relays and DNS group state, including error strings.

Downstream, mapPeers assumes Latency and LastWireguardHandshake are always set:

lastHandshake := pbPeerState.GetLastWireguardHandshake().AsTime().Local()
...
Latency: pbPeerState.GetLatency().AsDuration(),

That’s safe for FullStatus objects produced by ToProtoFullStatus, but will panic if a PeerState arrives with these fields unset (e.g., older daemon or other producer). To harden this:

var lastHandshake time.Time
if ts := pbPeerState.GetLastWireguardHandshake(); ts != nil {
    lastHandshake = ts.AsTime().Local()
}

var latency time.Duration
if d := pbPeerState.GetLatency(); d != nil {
    latency = d.AsDuration()
}

and then use those locals when building PeerStateDetailOutput.

Also applies to: 549-635

client/internal/engine.go (2)

34-53: Engine config/struct wiring for jobs is fine, but c *profilemanager.Config is unused

The new debug/job additions to EngineConfig (ProfileConfig, LogPath) and Engine (jobExecutor, jobExecutorWG) are wired in cleanly, and creating jobExecutor in NewEngine is reasonable.

However, the extra parameter c *profilemanager.Config on NewEngine is never used in the function body. In Go this is a compile‑time error. Either remove the parameter or actually thread it into the engine configuration, e.g.:

config.ProfileConfig = c
// or
engine.config.ProfileConfig = c

depending on where you intend to own that reference.

Also applies to: 83-142, 145-222, 235-276


278-366: Job stream consumption and bundle handling are generally solid; watch for restart behavior and nil profile config

The new job flow on the client side looks good overall:

  • receiveJobEvents hooks into mgmClient.Job with a per‑message handler that defaults responses to JobStatus_failed and branches on WorkloadParameters – currently only Bundle is handled, everything else returns ErrJobNotImplemented.
  • jobExecutorWG ensures Stop() waits for the Job stream goroutine to exit before tearing down the engine.
  • handleBundle builds debug.GeneratorDependencies from engine state, calls jobExecutor.BundleJob, and wraps the resulting upload key in a JobResponse_Bundle.

Two points to be aware of:

  • When mgmClient.Job returns any error (including when e.ctx is canceled during Stop()), you treat it as a hard failure and call CtxGetState(e.ctx).Wrap(ErrResetConnection) + e.clientCancel(), which triggers a full client restart. That mirrors how Sync/Signal errors are handled but also means a graceful “down” or engine stop will trigger a restart path instead of a quiet shutdown. If that’s not intended, you may want to distinguish context cancellation from other errors.

  • handleBundle assumes e.config.ProfileConfig and e.config.ProfileConfig.ManagementURL are non‑nil:

    InternalConfig: e.config.ProfileConfig,
    ...
    uploadKey, err := e.jobExecutor.BundleJob(..., e.config.ProfileConfig.ManagementURL.String())

    If ProfileConfig or its ManagementURL can be unset (e.g., in tests or some platforms), this will panic. A defensive check that returns a failed JobResponse with a clear reason would make the behavior safer.

Also applies to: 942-1012

🧹 Nitpick comments (38)
shared/management/client/grpc.go (5)

112-160: Centralized withMgmtStream helper looks good; double‑check backoff choice and final logging

The withMgmtStream wrapper nicely unifies readiness checks, server key fetching, and retry for both Sync and Job. One thing to double‑check is whether you explicitly want streaming calls to use defaultBackoff(ctx) while other RPCs still use nbgrpc.Backoff(...); if not, consider reusing the same backoff helper for consistency, or documenting the intentional difference.

Also, withMgmtStream logs "unrecoverable error" for any non‑nil err after backoff.Retry, including context.Canceled / DeadlineExceeded. If you expect normal shutdowns via context cancellation, you may want to special‑case those errors before logging at warn to avoid noisy logs.


162-216: Job stream error handling is solid; refine connection notifications for better state reporting

The overall handleJobStream flow (open stream → handshake → recv/process/respond loop) and error classification by gRPC code look good.

A couple of refinements around connection state notifications:

  • notifyDisconnected(err) is called for every receive error, including codes.Unimplemented. In that case, the server is reachable but doesn’t support Job; marking management as “disconnected” can mislead consumers of ConnStateNotifier. Consider moving notifyDisconnected into the switch and skipping it for codes.Unimplemented (and possibly codes.Canceled, which usually indicates local shutdown).
  • The Job path never calls notifyConnected(), so this stream can only move the state toward “disconnected”. If ConnStateNotifier is used for user‑visible connectivity, you might want to call notifyConnected() once the stream and handshake succeed (or when the first job is successfully received) to keep the state transitions balanced.
  • On sendJobResponse failure you return the error but don’t notify disconnection; if this error typically indicates a broken stream, it may be worth also invoking notifyDisconnected(err) there.

These are behavioral tweaks rather than correctness issues but would make connection state reporting more accurate.


218-252: Job helpers mirror existing encryption patterns; consider cleaning up unused ctx parameters

sendHandshake and receiveJobRequest correctly follow the existing EncryptMessage / DecryptMessage pattern and use the WireGuard public key in the EncryptedMessage as elsewhere in this file.

Right now the ctx parameter passed into these helpers isn’t used inside them; the only effective cancellation point is the stream’s own context from realClient.Job(ctx). That’s fine behaviorally, but for clarity you could either:

  • Drop ctx from these helper signatures, or
  • Start using it explicitly (e.g., check ctx.Err() before/after blocking operations, or for future timeouts / per‑job cancellation).

Given this is internal code, I’d treat it as a readability/maintenance cleanup to do when convenient.


254-295: Ensure JobResponse is always well‑formed for correlation and logging

processJobRequest nicely guards against a nil handler result by synthesizing a JobResponse with Status: JobStatus_failed and a reason. Two minor robustness tweaks you might consider:

  • If the handler returns a non‑nil JobResponse but forgets to populate ID, you could default it to jobReq.ID so the management server can always correlate responses:
jobResp := msgHandler(jobReq)
if jobResp == nil {
    jobResp = &proto.JobResponse{
        ID:     jobReq.ID,
        Status: proto.JobStatus_failed,
        Reason: []byte("handler returned nil response"),
    }
} else if len(jobResp.ID) == 0 {
    jobResp.ID = jobReq.ID
}
  • For logging, string(jobReq.ID) / string(resp.ID) assumes the IDs are valid UTF‑8. If these are ever changed to non‑textual bytes, consider logging them as %x or via hex.EncodeToString to avoid odd output.

Not blockers, but they make the job channel a bit more defensive and easier to debug.


297-395: Sync refactor via connectToSyncStream/receiveUpdatesEvents looks consistent with existing patterns

The split into connectToSyncStream and receiveUpdatesEvents reads clean and matches the encryption/decryption contract used elsewhere (EncryptMessage(serverPubKey, c.key, req) and DecryptMessage(serverPubKey, c.key, update.Body, resp)).

A small optional improvement: GetNetworkMap now reimplements the “single Recv + decrypt SyncResponse” logic that’s very similar to what receiveUpdatesEvents does in a loop. If you find yourself touching this code again, you might consider a tiny shared helper (e.g., decryptSyncUpdate(serverPubKey, update *proto.EncryptedMessage) (*proto.SyncResponse, error)) to keep the decryption path fully DRY.

Functionally, the new sync connection flow looks correct.

client/internal/debug/upload_test.go (1)

1-1: Test now exercises the exported UploadDebugBundle correctly

Switching to package debug and calling UploadDebugBundle(context.Background(), testURL+types.GetURLPath, testURL, file) matches the new API and keeps the expectations (getURLHash(testURL) prefix, stored file content) intact.

You might consider using a context with timeout here (and/or a simple readiness wait on srv.Start) to make this integration-style test more robust under slow CI environments.

Also applies to: 41-48

client/cmd/debug.go (1)

338-360: Debug bundle generator now uses LogPath and shared default upload URL

Passing LogPath: logFilePath into debug.NewBundleGenerator matches the updated GeneratorDependencies and ensures the bundle generator knows where to look for logs. Reusing types.DefaultBundleURL as the default for --upload-bundle-url in both debug bundle and debug for keeps the CLI consistent with the upload-server configuration.

Consider clarifying in the flag help text that the default is the NetBird-hosted debug upload service, so self‑hosters know they should override it if needed.

Also applies to: 369-379

client/server/server_test.go (1)

21-21: JobManager wiring into test management server is correct

Creating a jobManager := job.NewJobManager(nil, store) and passing it into both server.BuildManager(...) and nbgrpc.NewServer(...) matches the updated constructor signatures and ensures job flows are exercised in this test setup.

The if err != nil { return nil, "", err } right after eventStore := &activity.InMemoryEventStore{} still refers to the earlier err from NewTestStoreFromSQL, which has already been handled and is guaranteed nil here. It can be removed to avoid confusion.

Also applies to: 298-329

client/internal/debug/debug.go (1)

30-34: Status seeding via FullStatus/nbstatus is coherent with anonymization strategy

Using statusRecorder.GetFullStatus()nbstatus.ToProtoFullStatusConvertToStatusOutputOverviewParseToFullDetailSummary aligns the status.txt contents with the regular netbird status output and correctly injects event history and profile name. Seeding the bundle anonymizer via seedFromStatus(g.anonymizer, &fullStatus) leverages this richer status snapshot to anonymize later artifacts more consistently.

One minor nit: seedFromStatus runs even when g.anonymize is false, which is harmless but unnecessary work; you could optionally guard it with if g.anonymize { ... } for clarity.

Also applies to: 382-405, 852-880

shared/management/http/api/generate.sh (1)

14-16: oapi-codegen v2 install path looks correct; consider pinning a version

The switch to github.com/oapi-codegen/oapi-codegen/v2/cmd/oapi-codegen@latest matches the v2 module layout and should work fine with recent Go toolchains. For more reproducible codegen in CI, you might eventually want to pin a specific version instead of @latest.

client/embed/embed.go (1)

175-185: Recorder integration and updated Run call look consistent

Wiring a peer.NewRecorder into NewConnectClient and updating Run to Run(run, "") matches the new client API and should keep embedded startup behavior intact. If you later surface a log/debug path in Options, this is the right place to thread it through instead of an empty string.

management/server/http/testing/testing_tools/channel/channel.go (1)

19-39: JobManager wiring into BuildManager is correct; reuse metrics instead of nil

Injecting a concrete job.Manager into BuildManager here is a nice improvement in test realism. Given you already construct metrics in this helper, you can pass it into the job manager instead of nil so job‑related metrics behavior is also covered in tests:

-	peersUpdateManager := update_channel.NewPeersUpdateManager(nil)
-	jobManager := job.NewJobManager(nil, store)
+	peersUpdateManager := update_channel.NewPeersUpdateManager(nil)
+	jobManager := job.NewJobManager(metrics, store)
...
-	am, err := server.BuildManager(ctx, nil, store, networkMapController, jobManager, nil, "", &activity.InMemoryEventStore{}, geoMock, false, validatorMock, metrics, proxyController, settingsManager, permissionsManager, false)
+	am, err := server.BuildManager(ctx, nil, store, networkMapController, jobManager, nil, "", &activity.InMemoryEventStore{}, geoMock, false, validatorMock, metrics, proxyController, settingsManager, permissionsManager, false)

Also applies to: 53-79

management/server/account_test.go (1)

35-50: Account test manager now wires a JobManager; prefer passing metrics instead of nil

Importing job and passing a concrete JobManager into BuildManager in createManager matches the new account manager wiring and lets tests exercise job‑related paths. Since you already compute metrics in this helper, you can avoid a nil metrics field on the JobManager and better mirror production setup by doing:

-	updateManager := update_channel.NewPeersUpdateManager(metrics)
-	requestBuffer := NewAccountRequestBuffer(ctx, store)
-	networkMapController := controller.NewController(ctx, store, metrics, updateManager, requestBuffer, MockIntegratedValidator{}, settingsMockManager, "netbird.cloud", port_forwarding.NewControllerMock(), &config.Config{})
-	manager, err := BuildManager(ctx, nil, store, networkMapController, job.NewJobManager(nil, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)
+	updateManager := update_channel.NewPeersUpdateManager(metrics)
+	requestBuffer := NewAccountRequestBuffer(ctx, store)
+	networkMapController := controller.NewController(ctx, store, metrics, updateManager, requestBuffer, MockIntegratedValidator{}, settingsMockManager, "netbird.cloud", port_forwarding.NewControllerMock(), &config.Config{})
+	manager, err := BuildManager(ctx, nil, store, networkMapController, job.NewJobManager(metrics, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)

Also applies to: 2931-2969

management/server/route_test.go (1)

22-34: Route tests correctly inject a JobManager; reuse metrics for consistency

The additional job import and passing job.NewJobManager(nil, store) into BuildManager in createRouterManager are aligned with the new constructor and ensure route tests run with job support enabled. Since this helper already initializes metrics, you can tighten the fidelity of the test setup by wiring it into the JobManager too:

-	updateManager := update_channel.NewPeersUpdateManager(metrics)
-	requestBuffer := NewAccountRequestBuffer(ctx, store)
-	networkMapController := controller.NewController(ctx, store, metrics, updateManager, requestBuffer, MockIntegratedValidator{}, settingsMockManager, "netbird.selfhosted", port_forwarding.NewControllerMock(), &config.Config{})
-
-	am, err := BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(nil, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)
+	updateManager := update_channel.NewPeersUpdateManager(metrics)
+	requestBuffer := NewAccountRequestBuffer(ctx, store)
+	networkMapController := controller.NewController(ctx, store, metrics, updateManager, requestBuffer, MockIntegratedValidator{}, settingsMockManager, "netbird.selfhosted", port_forwarding.NewControllerMock(), &config.Config{})
+
+	am, err := BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(metrics, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)

Also applies to: 1258-1302

management/server/peer_test.go (1)

35-35: Prefer initializing job.Manager with real metrics in tests

The wiring of BuildManager looks correct (argument order matches the updated signature), but all these test setups construct the job manager with job.NewJobManager(nil, s) while metrics is already available. This leaves job.Manager.metrics nil in tests, which can (a) hide metric-related bugs and (b) potentially panic later if the manager starts using metrics without nil checks.

Consider passing the same metrics instance you already create:

-	am, err := BuildManager(context.Background(), nil, s, networkMapController, job.NewJobManager(nil, s), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)
+	am, err := BuildManager(context.Background(), nil, s, networkMapController, job.NewJobManager(metrics, s), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)

Apply the same pattern to the other BuildManager invocations in this file for consistency.

Also applies to: 1296-1296, 1381-1381, 1534-1534, 1614-1614

management/server/nameserver_test.go (1)

19-19: Use the existing metrics instance when creating JobManager

createNSManager correctly wires the new jobManager dependency into BuildManager, but you’re passing job.NewJobManager(nil, store) even though a real metrics instance is already created and used for update_channel.NewPeersUpdateManager.

To avoid a nil metrics inside job.Manager in tests (and potential surprises if metrics are used later), prefer:

-	return BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(nil, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)
+	return BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(metrics, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)

Also applies to: 798-799

management/server/dns_test.go (1)

17-17: Align test JobManager construction with real metrics

createDNSManager correctly injects a JobManager into BuildManager, but uses job.NewJobManager(nil, store) despite having a metrics instance already.

For more realistic tests and to avoid a nil metrics field inside job.Manager, consider:

-	return BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(nil, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)
+	return BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(metrics, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)

Also applies to: 229-230

management/server/management_proto_test.go (1)

32-32: JobManager wiring in tests looks correct; consider reusing metrics

The test correctly instantiates job.Manager and threads it through BuildManager and nbgrpc.NewServer, matching the new signatures. To exercise metrics inside the job manager during tests, you might consider passing the metrics instance instead of nil when creating jobManager, but that's optional.

Also applies to: 342-343, 369-371, 380-381

client/cmd/testutil_test.go (1)

19-19: Consistent JobManager setup in test helper; minor polish possible

The JobManager is correctly created and passed into BuildManager and nbgrpc.NewServer, aligning with the new signatures. Two small nits you could optionally address later:

  • Pass the metrics instance to job.NewJobManager instead of nil if you want job metrics in tests.
  • The if err != nil { return nil, nil } check after eventStore := &activity.InMemoryEventStore{} is dead code and can be removed.

Also applies to: 92-96, 123-124, 129-130

management/server/account.go (1)

18-18: JobManager injection is clean; clarify non‑nil contract

Adding jobManager *job.Manager to DefaultAccountManager and threading it via BuildManager is consistent and keeps wiring explicit. However, methods like CreatePeerJob dereference am.jobManager without nil checks, so BuildManager effectively requires a non‑nil JobManager in all call sites. Consider either:

  • Documenting that jobManager must be non‑nil (and enforcing via tests), or
  • Adding a defensive nil check that returns a clear status.Internal/status.PreconditionFailed error instead of panicking if it’s miswired.

Would you like a small ast-grep/rg script to verify that all BuildManager call sites now pass a non‑nil JobManager?

Also applies to: 68-76, 179-197, 203-224

management/server/management_test.go (1)

31-31: Test server wiring with JobManager is correct

The test server now correctly constructs a JobManager and passes it through to BuildManager and nbgrpc.NewServer, matching the new APIs. As with other tests, you could optionally pass metrics instead of nil into job.NewJobManager if you want job metrics exercised, but the current setup is functionally fine.

Also applies to: 183-185, 212-228, 235-247

management/server/http/handlers/peers/peers_handler.go (1)

31-41: New peer job HTTP endpoints are well‑structured; tighten validation and rely on fixed ownership check

The new /peers/{peerId}/jobs and /peers/{peerId}/jobs/{jobId} handlers are consistent with existing patterns (auth from context, util.WriteError/WriteJSONObject, and conversion helpers). A few small points:

  • Once GetPeerJobByID is fixed to assert job.PeerID == peerID (see account manager comment), these handlers will correctly enforce per‑peer job ownership.
  • For consistency with other handlers in this file (e.g. HandlePeer, GetAccessiblePeers), you may want to add an explicit empty‑peerId check in CreateJob, ListJobs, and GetJob to return a clear InvalidArgument instead of silently passing "" through.
  • toSingleJobResponse cleanly maps the domain Job to the HTTP JobResponse, including optional FailedReason and typed Workload; that looks good.

Overall, the HTTP surface for jobs is in good shape once the backend ownership check is tightened.

Also applies to: 51-142, 618-638

shared/management/proto/management.proto (1)

52-53: Refine Job proto types for clarity and forward compatibility

The Job RPC and related messages look structurally sound and align with the existing EncryptedMessage pattern. A few tweaks would make them easier to use:

  • JobResponse.Reason as bytes is unusual for a human-readable explanation; a string would be more ergonomic unless you explicitly plan to ship arbitrary binary blobs.
  • JobStatus only has unknown_status, succeeded, failed. If you intend to reflect persistent job lifecycle (pending/running vs completed) it may be worth adding an explicit in‑progress state instead of overloading unknown_status as a placeholder.
  • Consider documenting expected encoding/format for JobRequest.ID / JobResponse.ID (stringified UUID vs raw bytes) so store/types.Job integration remains consistent.

Functionally this is fine, these are just protocol polish suggestions.

Also applies to: 66-89

client/internal/debug/upload.go (1)

15-101: Upload helper is solid; consider small robustness improvements

The end-to-end flow (get presigned URL → size check → PUT upload) looks good and side‑effect free. A few minor refinements:

  • In upload, prefer %w when wrapping the HTTP error to preserve the error chain:
    return fmt.Errorf("upload failed: %w", err)
  • getUploadURL builds the query as url+"?id="+id. If url might ever include its own query string, using net/url to append id would be safer.
  • Both getUploadURL and upload strictly require StatusOK. If the upload service ever returns another 2xx (e.g. 201/204), this will be treated as failure. If you control that service and guarantee 200, current code is fine; otherwise consider if putResp.StatusCode/100 != 2.

None of these are blockers; the current implementation should work as intended.

client/server/debug.go (1)

27-39: Use the incoming context (and possibly avoid holding the mutex) during upload

The switch to debug.UploadDebugBundle and LogPath: s.logFile looks correct.

Two follow‑up improvements worth considering:

  • DebugBundle ignores the incoming context and calls UploadDebugBundle with context.Background(). That means client cancellation/timeouts won’t stop the upload. Renaming the parameter to ctx context.Context and passing ctx through would make this RPC better behaved under cancellation.
  • The upload is done while s.mutex is held. For large bundles or slow networks this can block other server operations behind this lock. If state safety allows it, you might generate the bundle under the lock, then release the mutex before starting the network upload.

Behavior is fine for a rarely used debug path, but these tweaks would make it more responsive.

Also applies to: 49-57

management/server/account/manager.go (1)

127-129: Job APIs on Manager look good; consider pagination/filters

The new job methods are consistent with the rest of the Manager interface (accountID/userID/peerID ordering, types.Job usage).

Two design points to keep in mind for implementations:

  • GetAllPeerJobs can grow unbounded over time; if jobs are expected to accumulate, consider adding pagination and/or server‑side filtering (e.g. by status or time range) at some point.
  • Ensure implementations consistently enforce userID authorization, since these methods expose per‑peer operational history.

No changes required here, just points to watch in the concrete manager.

client/internal/engine_test.go (1)

1591-1592: JobManager wiring in startManagement test mirrors production; consider reusing metrics

The new jobManager := job.NewJobManager(nil, store) and its injection into BuildManager and nbgrpc.NewServer properly mirror the production wiring, so tests now exercise the job pipeline end‑to‑end.

Since you already create metrics, err := telemetry.NewDefaultAppMetrics(...) a few lines later, you may want to:

  • Construct metrics first, then
  • Call job.NewJobManager(metrics, store)

This would keep metrics behavior consistent between the JobManager and the rest of the management stack even in tests. Not required for correctness, but a bit cleaner.

Also applies to: 1622-1623, 1628-1629

shared/management/http/api/openapi.yml (2)

41-121: Align bundle/workload schema with actual bundle generator config

The Bundle/Workload schemas look reasonable, but there are a couple of potential contract mismatches to double‑check:

  • BundleParameters exposes bundle_for and bundle_for_time while the current debug.BundleConfig only has Anonymize, IncludeSystemInfo, and LogFileCount. There is no way for callers to control IncludeSystemInfo, and the semantics of bundle_for/bundle_for_time vs. what the client actually does (currently just a wait + regular bundle) should be clarified or adjusted so the API is not misleading.
  • You’re using a discriminator on WorkloadRequest/WorkloadResponse with propertyName: type. Make sure the generated clients you care about support this pattern where the discriminator property lives only in the concrete types (BundleWorkloadRequest/Response) and not also declared at the base level; some generators are picky here.

I’d suggest either:

  • Extending BundleConfig + implementation to truly honor all declared parameters, or
  • Tightening the OpenAPI description and fields to only what is actually used today, to avoid surprising API consumers.

35-38: Clarify experimental nature and pagination/filtering for Jobs endpoints

The Jobs tag is marked x-experimental: true, but the individual operations are not. If you rely on tooling that inspects operation/vendor extensions rather than tags, you may want to duplicate x-experimental: true on the new /api/peers/{peerId}/jobs and /api/peers/{peerId}/jobs/{jobId} operations.

Additionally, listing jobs currently returns an unbounded array with no filtering or pagination parameters. If job counts can become large per peer, it’s worth at least documenting ordering (e.g. newest first) and planning for page/page_size/status filters, even if you defer implementing them now.

Also applies to: 2353-2456

shared/management/client/client_test.go (1)

25-26: JobManager wiring in tests looks correct; consider reusing metrics if you need job metrics

The new jobManager := job.NewJobManager(nil, store) and its injection into both BuildManager and nbgrpc.NewServer align with the updated constructor signatures and keep the tests realistic.

If you ever want to exercise job‑related metrics in tests, you could pass the already‑created metrics instead of nil to NewJobManager; otherwise this setup is fine as‑is for functional tests.

Also applies to: 76-77, 123-124, 130-132

management/server/store/sql_store.go (1)

136-207: Job CRUD helpers mostly fine; consider tightening error handling and update scope

Functionality looks correct overall (ID scoping, account/peer filters, status transitions), but a few details could be improved:

  • CompletePeerJob updates by id only; since ID is the primary key this is safe, but for consistency with GetPeerJobByID you may want to include account_id in the WHERE as well.
  • CreatePeerJob, CompletePeerJob, and MarkPendingJobsAsFailed correctly wrap errors with status.Internal, but GetPeerJobByID / GetPeerJobs return raw GORM errors on internal failure. Consider wrapping those in status.Errorf(status.Internal, ...) as done elsewhere in this store so callers don’t depend on storage-specific errors.
  • If types.Job.ApplyResponse only populates a subset of fields, CompletePeerJob’s Updates(job) call is fine; if it ever starts zeroing other fields, you may want to switch to an explicit field list/map to avoid unintended overwrites.
management/internals/shared/grpc/server.go (1)

184-219: Job stream handling is generally sound; consider surfacing send failures

The new Job RPC and helpers hold up well:

  • Handshake reads a single encrypted JobRequest and reuses the existing parseRequest path, so message framing and crypto stay consistent with Sync/Login.
  • Account/peer lookup mirrors the Sync flow and correctly maps missing peers to PermissionDenied / Unauthenticated.
  • CreateJobChannel + deferred CloseChannel provide a clear per-peer lifecycle, and startResponseReceiver cleanly demultiplexes JobResponse messages into the job manager.

One point to consider tightening:

  • In sendJobsLoop, any failure in sendJob is logged and then the method returns nil, so the gRPC stream terminates with an OK status even if job delivery to the client failed. For parity with handleUpdates/sendUpdate, you may want to return the error (or a mapped status error) so callers can observe transport failures on the Job stream.

Also applies to: 333-397, 450-467

management/server/job/manager.go (2)

44-151: Pending‑job cleanup semantics are coarse; minor concurrency detail worth double‑checking

Overall behavior is coherent (fail stuck jobs, enqueue new ones, complete on response), but a few aspects could be sharpened:

  • CreateJobChannel calls MarkPendingJobsAsFailed for (accountID, peerID) before taking the lock. That’s fine functionally, but means “stuck too long” in the DB is actually “any pending job when a fresh channel is created” — if you later add explicit TTL semantics, you may want a more targeted query.
  • In SendJob, you look up the Channel under RLock and then call ch.AddEvent after releasing the lock. If CreateJobChannel or CloseChannel can call ch.Close() concurrently, make sure Channel.AddEvent is written to safely detect a closed channel and return ErrJobChannelClosed rather than panic on send to a closed chan.
  • Both cleanup and CloseChannel use MarkPendingJobsAsFailed(accountID, peerID, reason), which marks all pending DB jobs for that peer, not just the specific jobID. That’s acceptable if you only ever have a single pending job per peer, but if you later support multiple in‑flight jobs, you’ll likely want a job‑scoped failure path in the store.
  • In CloseChannel, the loop over jm.pending calls MarkPendingJobsAsFailed once per event for the same peer; a minor optimization is to call it once per peerID, then delete all relevant pending entries.

These are mostly behavioral clarifications and small optimizations rather than blockers.


90-117: HandleResponse flow is correct but could simplify error‑response coupling

HandleResponse correctly:

  • Looks up the pending event by jobID.
  • Builds a types.Job from the response via ApplyResponse.
  • Calls Store.CompletePeerJob and always removes the entry from pending.

Given that the event.Response field is only set when CompletePeerJob succeeds and then immediately dropped from pending, you could omit event.Response entirely or set it before calling the store to simplify reasoning. Current behavior is valid; this is mainly an internal API cleanup opportunity.

client/status/status.go (1)

122-169: Overview mapping from FullStatus looks correct; consider a nil guard

Switching ConvertToStatusOutputOverview to use *proto.FullStatus directly and threading in daemonVersion matches how the proto is structured and keeps the output fields aligned with the underlying status (management, signal, local peer, relays, DNS, events, SSH).

One small robustness improvement: if pbFullStatus can ever be nil (e.g., from an older daemon or error path), pbFullStatus.Get...() calls will panic. A quick upfront check like

if pbFullStatus == nil {
    return OutputOverview{}
}

would make the function safer against unexpected inputs.

management/server/types/job.go (2)

105-138: Minor: preserve wrapped error instead of using err.Error()

In BuildWorkloadResponse, you wrap bundle build errors as:

if err := j.buildBundleResponse(&wl); err != nil {
    return nil, status.Errorf(status.InvalidArgument, err.Error())
}

Using err.Error() as the format string loses the original error as a distinct argument and makes further wrapping/debugging slightly harder. Consider:

- return nil, status.Errorf(status.InvalidArgument, err.Error())
+ return nil, status.Errorf(status.InvalidArgument, "%v", err)

to keep the original error value intact and consistent with the rest of the file.


140-162: Bundle parameter validation is clear; consider tightening semantics only if needed

The validation for BundleForTime (1–5 minutes when BundleFor is true) and LogFileCount (1–1000) is straightforward and matches the API docs. You also normalize stored parameters to the BundleParameters JSON and initialize Result to {}, which keeps BuildWorkloadResponse happy for pending/failed jobs.

If you ever want to hard‑fail obviously nonsensical inputs when BundleFor is false (e.g., negative BundleForTime), you could extend the check here, but that’s optional and not strictly required by the current contract.

shared/management/http/api/types.gen.go (1)

2112-2169: WorkloadRequest helpers match server usage; consider using the WorkloadType constant

The As*/From*/Merge* helpers and Discriminator/ValueByDiscriminator for WorkloadRequest are consistent with how NewJob and validateAndBuildBundleParams consume the workload. One small optional improvement would be to use WorkloadTypeBundle instead of the literal "bundle" to avoid drift if the constant ever changes:

-func (t *WorkloadRequest) FromBundleWorkloadRequest(v BundleWorkloadRequest) error {
-    v.Type = "bundle"
+func (t *WorkloadRequest) FromBundleWorkloadRequest(v BundleWorkloadRequest) error {
+    v.Type = WorkloadTypeBundle

Same for MergeBundleWorkloadRequest and the switch in ValueByDiscriminator.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 32146e5 and 62ae8d8.

⛔ Files ignored due to path filters (4)
  • client/proto/daemon.pb.go is excluded by !**/*.pb.go
  • go.sum is excluded by !**/*.sum
  • shared/management/proto/management.pb.go is excluded by !**/*.pb.go
  • shared/management/proto/management_grpc.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (52)
  • .github/workflows/wasm-build-validation.yml (1 hunks)
  • client/cmd/debug.go (1 hunks)
  • client/cmd/status.go (1 hunks)
  • client/cmd/testutil_test.go (3 hunks)
  • client/cmd/up.go (1 hunks)
  • client/embed/embed.go (1 hunks)
  • client/internal/connect.go (7 hunks)
  • client/internal/debug/debug.go (7 hunks)
  • client/internal/debug/upload.go (1 hunks)
  • client/internal/debug/upload_test.go (2 hunks)
  • client/internal/engine.go (13 hunks)
  • client/internal/engine_test.go (9 hunks)
  • client/jobexec/executor.go (1 hunks)
  • client/proto/daemon.proto (0 hunks)
  • client/server/debug.go (2 hunks)
  • client/server/server.go (4 hunks)
  • client/server/server_test.go (3 hunks)
  • client/status/status.go (4 hunks)
  • client/status/status_test.go (1 hunks)
  • client/ui/debug.go (2 hunks)
  • go.mod (2 hunks)
  • management/internals/server/boot.go (1 hunks)
  • management/internals/server/controllers.go (3 hunks)
  • management/internals/server/modules.go (1 hunks)
  • management/internals/shared/grpc/server.go (11 hunks)
  • management/server/account.go (4 hunks)
  • management/server/account/manager.go (1 hunks)
  • management/server/account_test.go (2 hunks)
  • management/server/activity/codes.go (2 hunks)
  • management/server/dns_test.go (2 hunks)
  • management/server/http/handlers/peers/peers_handler.go (3 hunks)
  • management/server/http/testing/testing_tools/channel/channel.go (3 hunks)
  • management/server/job/channel.go (1 hunks)
  • management/server/job/manager.go (1 hunks)
  • management/server/management_proto_test.go (4 hunks)
  • management/server/management_test.go (4 hunks)
  • management/server/mock_server/account_mock.go (2 hunks)
  • management/server/nameserver_test.go (2 hunks)
  • management/server/peer.go (1 hunks)
  • management/server/peer_test.go (5 hunks)
  • management/server/route_test.go (2 hunks)
  • management/server/store/sql_store.go (3 hunks)
  • management/server/store/store.go (1 hunks)
  • management/server/types/job.go (1 hunks)
  • shared/management/client/client.go (1 hunks)
  • shared/management/client/client_test.go (3 hunks)
  • shared/management/client/grpc.go (8 hunks)
  • shared/management/client/mock.go (2 hunks)
  • shared/management/http/api/generate.sh (1 hunks)
  • shared/management/http/api/openapi.yml (2 hunks)
  • shared/management/http/api/types.gen.go (8 hunks)
  • shared/management/proto/management.proto (2 hunks)
💤 Files with no reviewable changes (1)
  • client/proto/daemon.proto
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-13T00:29:53.247Z
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/cmd/ssh_exec_unix.go:53-74
Timestamp: 2025-11-13T00:29:53.247Z
Learning: In client/ssh/server/executor_unix.go, the method ExecuteWithPrivilegeDrop(ctx context.Context, config ExecutorConfig) has a void return type (no error return). It handles failures by exiting the process directly with appropriate exit codes rather than returning errors to the caller.

Applied to files:

  • client/jobexec/executor.go
📚 Learning: 2025-11-14T13:05:31.729Z
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/ssh/server/userswitching_windows.go:89-139
Timestamp: 2025-11-14T13:05:31.729Z
Learning: In client/ssh/server/executor_windows.go, the WindowsExecutorConfig struct's Pty, PtyWidth, and PtyHeight fields are intentionally left unused for now and will be implemented in a future update.

Applied to files:

  • client/jobexec/executor.go
🧬 Code graph analysis (42)
shared/management/client/client.go (3)
management/server/types/job.go (1)
  • Job (34-58)
shared/management/http/api/types.gen.go (2)
  • JobRequest (708-710)
  • JobResponse (713-721)
shared/management/proto/management.pb.go (6)
  • JobRequest (388-398)
  • JobRequest (413-413)
  • JobRequest (428-430)
  • JobResponse (463-475)
  • JobResponse (490-490)
  • JobResponse (505-507)
management/internals/server/controllers.go (3)
management/internals/server/server.go (1)
  • BaseServer (45-68)
management/server/job/manager.go (2)
  • Manager (23-30)
  • NewJobManager (32-42)
management/internals/server/container.go (1)
  • Create (6-10)
client/status/status_test.go (1)
client/status/status.go (1)
  • ConvertToStatusOutputOverview (122-169)
client/internal/debug/upload_test.go (2)
client/internal/debug/upload.go (1)
  • UploadDebugBundle (17-28)
upload-server/types/upload.go (1)
  • GetURLPath (9-9)
management/server/store/store.go (1)
management/server/types/job.go (1)
  • Job (34-58)
management/internals/server/boot.go (1)
management/internals/shared/grpc/server.go (1)
  • NewServer (84-147)
client/internal/debug/upload.go (1)
upload-server/types/upload.go (3)
  • GetURLResponse (15-18)
  • ClientHeader (5-5)
  • ClientHeaderValue (7-7)
client/cmd/up.go (1)
util/log.go (1)
  • FindFirstLogPath (77-84)
management/server/account_test.go (2)
management/server/account.go (1)
  • BuildManager (180-268)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/peer.go (7)
management/server/account.go (1)
  • DefaultAccountManager (68-113)
management/server/types/job.go (2)
  • Job (34-58)
  • Workload (60-64)
management/server/permissions/modules/module.go (1)
  • Peers (7-7)
management/server/permissions/operations/operation.go (1)
  • Delete (9-9)
shared/management/status/error.go (5)
  • NewPermissionValidationError (213-215)
  • NewPermissionDeniedError (209-211)
  • NewPeerNotPartOfAccountError (105-107)
  • Errorf (70-75)
  • Type (46-46)
management/server/store/store.go (3)
  • Store (50-211)
  • LockingStrengthNone (47-47)
  • LockingStrengthUpdate (43-43)
management/server/activity/codes.go (1)
  • JobCreatedByUser (183-183)
management/server/http/testing/testing_tools/channel/channel.go (2)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/account.go (1)
  • BuildManager (180-268)
management/internals/server/modules.go (1)
management/server/account.go (1)
  • BuildManager (180-268)
shared/management/client/mock.go (3)
shared/management/http/api/types.gen.go (2)
  • JobRequest (708-710)
  • JobResponse (713-721)
shared/management/proto/management.pb.go (6)
  • JobRequest (388-398)
  • JobRequest (413-413)
  • JobRequest (428-430)
  • JobResponse (463-475)
  • JobResponse (490-490)
  • JobResponse (505-507)
management/server/types/job.go (1)
  • Job (34-58)
management/server/account/manager.go (1)
management/server/types/job.go (1)
  • Job (34-58)
client/embed/embed.go (1)
client/internal/connect.go (1)
  • NewConnectClient (51-62)
client/cmd/status.go (1)
client/status/status.go (1)
  • ConvertToStatusOutputOverview (122-169)
management/server/management_test.go (1)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/peer_test.go (2)
management/server/account.go (1)
  • BuildManager (180-268)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
client/server/server.go (1)
client/status/status.go (1)
  • ToProtoFullStatus (549-635)
management/server/mock_server/account_mock.go (1)
management/server/types/job.go (1)
  • Job (34-58)
management/server/nameserver_test.go (2)
management/server/account.go (1)
  • BuildManager (180-268)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/job/channel.go (1)
management/server/job/manager.go (1)
  • Event (17-21)
management/server/http/handlers/peers/peers_handler.go (4)
management/server/context/auth.go (1)
  • GetUserAuthFromContext (25-30)
shared/management/http/util/util.go (3)
  • WriteError (84-120)
  • WriteErrorResponse (70-80)
  • WriteJSONObject (27-35)
shared/management/http/api/types.gen.go (3)
  • JobRequest (708-710)
  • JobResponse (713-721)
  • JobResponseStatus (724-724)
management/server/types/job.go (3)
  • NewJob (67-103)
  • Job (34-58)
  • Workload (60-64)
client/jobexec/executor.go (3)
client/internal/debug/debug.go (3)
  • GeneratorDependencies (238-243)
  • BundleConfig (232-236)
  • NewBundleGenerator (245-264)
client/internal/debug/upload.go (1)
  • UploadDebugBundle (17-28)
upload-server/types/upload.go (1)
  • DefaultBundleURL (11-11)
client/server/debug.go (1)
client/internal/debug/upload.go (1)
  • UploadDebugBundle (17-28)
client/internal/engine.go (6)
client/internal/profilemanager/config.go (1)
  • Config (89-160)
client/jobexec/executor.go (3)
  • Executor (23-24)
  • NewExecutor (26-28)
  • ErrJobNotImplemented (20-20)
shared/management/client/client.go (1)
  • Client (14-27)
shared/management/proto/management.pb.go (21)
  • JobRequest (388-398)
  • JobRequest (413-413)
  • JobRequest (428-430)
  • JobResponse (463-475)
  • JobResponse (490-490)
  • JobResponse (505-507)
  • JobStatus_failed (30-30)
  • JobRequest_Bundle (457-459)
  • JobRequest_Bundle (461-461)
  • JobStatus_succeeded (29-29)
  • BundleParameters (554-563)
  • BundleParameters (578-578)
  • BundleParameters (593-595)
  • JobResponse_Bundle (548-550)
  • JobResponse_Bundle (552-552)
  • SyncResponse (721-738)
  • SyncResponse (753-753)
  • SyncResponse (768-770)
  • BundleResult (625-631)
  • BundleResult (646-646)
  • BundleResult (661-663)
client/internal/state.go (1)
  • CtxGetState (31-33)
client/internal/debug/debug.go (2)
  • GeneratorDependencies (238-243)
  • BundleConfig (232-236)
management/internals/shared/grpc/server.go (6)
management/server/account/manager.go (1)
  • Manager (27-131)
management/server/job/manager.go (2)
  • Manager (23-30)
  • Event (17-21)
shared/management/proto/management_grpc.pb.go (1)
  • ManagementService_JobServer (427-431)
shared/management/proto/management.pb.go (9)
  • JobRequest (388-398)
  • JobRequest (413-413)
  • JobRequest (428-430)
  • JobResponse (463-475)
  • JobResponse (490-490)
  • JobResponse (505-507)
  • EncryptedMessage (322-333)
  • EncryptedMessage (348-348)
  • EncryptedMessage (363-365)
management/server/job/channel.go (2)
  • Channel (18-21)
  • ErrJobChannelClosed (15-15)
encryption/message.go (1)
  • EncryptMessage (10-24)
management/server/store/sql_store.go (1)
management/server/types/job.go (3)
  • Job (34-58)
  • JobStatusPending (18-18)
  • JobStatusFailed (20-20)
management/server/job/manager.go (3)
shared/management/proto/management.pb.go (6)
  • JobRequest (388-398)
  • JobRequest (413-413)
  • JobRequest (428-430)
  • JobResponse (463-475)
  • JobResponse (490-490)
  • JobResponse (505-507)
management/server/job/channel.go (2)
  • Channel (18-21)
  • NewChannel (23-29)
management/server/types/job.go (1)
  • Job (34-58)
shared/management/client/client_test.go (3)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/account.go (1)
  • BuildManager (180-268)
management/internals/shared/grpc/server.go (1)
  • NewServer (84-147)
client/internal/connect.go (2)
client/internal/engine.go (2)
  • NewEngine (235-276)
  • EngineConfig (83-142)
client/internal/profilemanager/config.go (1)
  • Config (89-160)
management/server/dns_test.go (2)
management/server/account.go (1)
  • BuildManager (180-268)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
shared/management/client/grpc.go (3)
shared/management/proto/management.pb.go (13)
  • JobRequest (388-398)
  • JobRequest (413-413)
  • JobRequest (428-430)
  • JobResponse (463-475)
  • JobResponse (490-490)
  • JobResponse (505-507)
  • EncryptedMessage (322-333)
  • EncryptedMessage (348-348)
  • EncryptedMessage (363-365)
  • JobStatus_failed (30-30)
  • SyncResponse (721-738)
  • SyncResponse (753-753)
  • SyncResponse (768-770)
shared/management/proto/management_grpc.pb.go (2)
  • ManagementService_JobClient (169-173)
  • ManagementService_SyncClient (89-92)
encryption/message.go (2)
  • EncryptMessage (10-24)
  • DecryptMessage (27-40)
management/server/types/job.go (3)
shared/management/proto/management.pb.go (22)
  • JobStatus (25-25)
  • JobStatus (57-59)
  • JobStatus (61-63)
  • JobStatus (70-72)
  • JobRequest (388-398)
  • JobRequest (413-413)
  • JobRequest (428-430)
  • BundleParameters (554-563)
  • BundleParameters (578-578)
  • BundleParameters (593-595)
  • BundleResult (625-631)
  • BundleResult (646-646)
  • BundleResult (661-663)
  • JobResponse (463-475)
  • JobResponse (490-490)
  • JobResponse (505-507)
  • JobStatus_succeeded (29-29)
  • JobStatus_failed (30-30)
  • JobResponse_Bundle (548-550)
  • JobResponse_Bundle (552-552)
  • JobRequest_Bundle (457-459)
  • JobRequest_Bundle (461-461)
shared/management/http/api/types.gen.go (8)
  • JobRequest (708-710)
  • WorkloadResponse (1948-1950)
  • BundleParameters (361-373)
  • BundleResult (376-378)
  • BundleWorkloadResponse (391-399)
  • WorkloadTypeBundle (195-195)
  • WorkloadRequest (1943-1945)
  • JobResponse (713-721)
shared/management/status/error.go (4)
  • Errorf (70-75)
  • BadRequest (36-36)
  • InvalidArgument (27-27)
  • Error (54-57)
client/server/server_test.go (3)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/account.go (1)
  • BuildManager (180-268)
management/internals/shared/grpc/server.go (1)
  • NewServer (84-147)
client/internal/debug/debug.go (4)
util/log.go (1)
  • SpecialLogs (25-28)
client/internal/profilemanager/profilemanager.go (1)
  • NewProfileManager (56-58)
client/status/status.go (3)
  • ToProtoFullStatus (549-635)
  • ConvertToStatusOutputOverview (122-169)
  • ParseToFullDetailSummary (532-547)
version/version.go (1)
  • NetbirdVersion (18-20)
client/internal/engine_test.go (5)
client/internal/engine.go (2)
  • NewEngine (235-276)
  • EngineConfig (83-142)
shared/management/client/mock.go (1)
  • MockClient (13-24)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/account.go (1)
  • BuildManager (180-268)
management/internals/shared/grpc/server.go (1)
  • NewServer (84-147)
shared/management/http/api/types.gen.go (2)
shared/management/proto/management.pb.go (9)
  • BundleParameters (554-563)
  • BundleParameters (578-578)
  • BundleParameters (593-595)
  • BundleResult (625-631)
  • BundleResult (646-646)
  • BundleResult (661-663)
  • JobRequest (388-398)
  • JobRequest (413-413)
  • JobRequest (428-430)
management/server/types/job.go (1)
  • Workload (60-64)
management/server/route_test.go (2)
management/server/account.go (1)
  • BuildManager (180-268)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
client/status/status.go (2)
client/proto/daemon.pb.go (21)
  • FullStatus (1994-2008)
  • FullStatus (2021-2021)
  • FullStatus (2036-2038)
  • ManagementState (1682-1689)
  • ManagementState (1702-1702)
  • ManagementState (1717-1719)
  • SignalState (1621-1628)
  • SignalState (1641-1641)
  • SignalState (1656-1658)
  • LocalPeerState (1528-1539)
  • LocalPeerState (1552-1552)
  • LocalPeerState (1567-1569)
  • PeerState (1347-1369)
  • PeerState (1382-1382)
  • PeerState (1397-1399)
  • RelayState (1743-1750)
  • RelayState (1763-1763)
  • RelayState (1778-1780)
  • NSGroupState (1803-1811)
  • NSGroupState (1824-1824)
  • NSGroupState (1839-1841)
client/internal/peer/status.go (6)
  • FullStatus (151-161)
  • ManagementState (128-132)
  • SignalState (121-125)
  • LocalPeerState (106-112)
  • RosenpassState (135-138)
  • NSGroupState (142-148)
management/server/management_proto_test.go (3)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/account.go (1)
  • BuildManager (180-268)
management/internals/shared/grpc/server.go (1)
  • NewServer (84-147)
client/cmd/testutil_test.go (3)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/account.go (1)
  • BuildManager (180-268)
management/internals/shared/grpc/server.go (1)
  • NewServer (84-147)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: release_ui
  • GitHub Check: release_ui_darwin
  • GitHub Check: release
  • GitHub Check: iOS / Build
  • GitHub Check: Android / Build
  • GitHub Check: JS / Lint
  • GitHub Check: Windows
  • GitHub Check: Linux
  • GitHub Check: Darwin
  • GitHub Check: Client / Unit
  • GitHub Check: Client / Unit
  • GitHub Check: Build Cache
  • GitHub Check: Check External GPL/AGPL Licenses
  • GitHub Check: Client / Unit

Comment thread client/jobexec/executor.go
Comment thread management/server/job/channel.go
Comment thread management/server/peer.go
Comment thread management/server/types/job.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62ae8d8 and de78255.

📒 Files selected for processing (3)
  • client/internal/connect.go (7 hunks)
  • client/internal/engine.go (13 hunks)
  • client/internal/engine_test.go (9 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
client/internal/engine_test.go (5)
client/internal/engine.go (2)
  • NewEngine (235-276)
  • EngineConfig (83-142)
shared/management/client/mock.go (1)
  • MockClient (13-24)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/account.go (1)
  • BuildManager (180-268)
management/internals/shared/grpc/server.go (1)
  • NewServer (84-147)
client/internal/engine.go (7)
client/internal/profilemanager/config.go (1)
  • Config (89-160)
client/jobexec/executor.go (3)
  • Executor (23-24)
  • NewExecutor (26-28)
  • ErrJobNotImplemented (20-20)
shared/management/client/client.go (1)
  • Client (14-27)
shared/management/proto/management.pb.go (19)
  • JobRequest (388-398)
  • JobRequest (413-413)
  • JobRequest (428-430)
  • JobResponse (463-475)
  • JobResponse (490-490)
  • JobResponse (505-507)
  • JobStatus_failed (30-30)
  • JobRequest_Bundle (457-459)
  • JobRequest_Bundle (461-461)
  • JobStatus_succeeded (29-29)
  • BundleParameters (554-563)
  • BundleParameters (578-578)
  • BundleParameters (593-595)
  • SyncResponse (721-738)
  • SyncResponse (753-753)
  • SyncResponse (768-770)
  • BundleResult (625-631)
  • BundleResult (646-646)
  • BundleResult (661-663)
client/internal/state.go (1)
  • CtxGetState (31-33)
client/internal/debug/debug.go (2)
  • GeneratorDependencies (238-243)
  • BundleConfig (232-236)
client/server/server.go (1)
  • New (96-107)
client/internal/connect.go (3)
client/internal/engine.go (2)
  • NewEngine (235-276)
  • EngineConfig (83-142)
client/internal/profilemanager/config.go (1)
  • Config (89-160)
shared/management/proto/management.pb.go (3)
  • PeerConfig (2082-2098)
  • PeerConfig (2113-2113)
  • PeerConfig (2128-2130)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
  • GitHub Check: Relay / Unit (amd64, -race)
  • GitHub Check: Management / Unit (amd64, mysql)
  • GitHub Check: Management / Unit (amd64, postgres)
  • GitHub Check: Management / Unit (amd64, sqlite)
  • GitHub Check: Relay / Unit (386)
  • GitHub Check: Management / Integration (amd64, postgres)
  • GitHub Check: Management / Benchmark (API) (amd64, postgres)
  • GitHub Check: Client (Docker) / Unit
  • GitHub Check: Management / Benchmark (API) (amd64, sqlite)
  • GitHub Check: Client / Unit (amd64)
  • GitHub Check: Management / Benchmark (amd64, sqlite)
  • GitHub Check: Management / Benchmark (amd64, postgres)
  • GitHub Check: Client / Unit (386)
  • GitHub Check: Client / Unit
  • GitHub Check: release
  • GitHub Check: release_ui_darwin
  • GitHub Check: Android / Build
  • GitHub Check: Linux
  • GitHub Check: Client / Unit
  • GitHub Check: Darwin
  • GitHub Check: Windows
  • GitHub Check: JS / Lint
  • GitHub Check: Client / Unit
🔇 Additional comments (16)
client/internal/connect.go (5)

65-66: LGTM! logPath parameter threading is clean.

The public Run signature now accepts logPath and forwards it correctly. Mobile variants appropriately default to empty string, which is sensible given platform-specific logging constraints.

Also applies to: 87-87, 105-105


108-108: LGTM!

Internal run signature correctly updated to receive logPath.


251-251: LGTM!

Call to createEngineConfig correctly passes logPath parameter.


275-275: LGTM!

NewEngine call correctly updated with profile config parameter, enabling debug bundle generation support.


414-414: LGTM! EngineConfig enriched for debug bundle support.

The function signature and field assignments correctly thread logPath and ProfileConfig through to EngineConfig. Note that the entire profilemanager.Config is stored in ProfileConfig—this appears intentional for debug bundle generation, which may need access to ManagementURL and other config details.

Also applies to: 449-452

client/internal/engine_test.go (3)

28-28: LGTM!

Import added to support JobManager usage in test setup.


256-256: LGTM! Test updates are consistent.

All NewEngine calls correctly updated with two additional nil parameters (checks, profileConfig). Using nil is appropriate for tests that don't exercise debug bundle or checks functionality.

Also applies to: 418-424, 637-643, 802-808, 1004-1010, 1536-1536


1595-1595: LGTM! Test infrastructure correctly wired.

JobManager created and threaded through BuildManager and NewServer, enabling job subsystem testing. Using nil for metrics is appropriate for test context.

Also applies to: 1626-1626, 1632-1632

client/internal/engine.go (8)

34-34: LGTM!

Imports added to support debug bundle generation and job execution.

Also applies to: 52-52


138-141: LGTM!

EngineConfig fields added to support debug bundle generation. Clear comment documents the purpose.


206-207: LGTM! Proper concurrency primitives added.

syncRespMux provides dedicated locking for sync response persistence, separate from syncMsgMux. jobExecutorWG ensures clean shutdown of job-related goroutines.

Also applies to: 220-221


234-234: LGTM!

NewEngine signature correctly extended with profile config parameter. jobExecutor initialized during construction for handling management-service job requests.

Also applies to: 254-254


335-335: LGTM! Proper shutdown ordering.

jobExecutorWG.Wait() ensures job-related goroutines complete before resource cleanup. The preceding e.cancel() will signal job goroutines to exit via context cancellation.


523-523: LGTM! receiveJobEvents follows established patterns.

The function correctly:

  • Increments jobExecutorWG before spawning goroutine
  • Handles job requests synchronously via callback
  • Triggers client restart on Job stream errors (consistent with Management/Signal event handlers)
  • Returns proper error codes for unimplemented job types

Also applies to: 942-977


803-814: LGTM! Proper lock separation for sync response persistence.

The code correctly uses syncRespMux (not syncMsgMux) to protect sync response persistence state. RLock for reading the flag, then separate write lock for storing the response—proper lock upgrade pattern.


1874-1886: LGTM! Thread-safe sync response persistence.

Both functions correctly use syncRespMux for protection. GetLatestSyncResponse properly:

  • Reads state under RLock
  • Releases lock before expensive clone operation
  • Returns a clone to prevent external mutation

Also applies to: 1889-1911

Comment thread client/internal/engine.go
}

// GetJobByID fetches job by ID
func (s *SqlStore) GetPeerJobByID(ctx context.Context, accountID, jobID string) (*types.Job, error) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should also verify the peer this is coming from

Comment thread client/jobexec/executor.go Outdated

func (s *Server) startResponseReceiver(ctx context.Context, srv proto.ManagementService_JobServer) {
go func() {
for {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should also check if the ctx is done here or Err() != nil

Comment thread client/jobexec/executor.go
Comment thread management/server/store/sql_store.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
client/internal/engine.go (1)

1295-1326: Fix compile error: zone.GetSearchDomainDisabled is not defined on mgmProto.CustomZone

toDNSConfig currently does:

dnsZone := nbdns.CustomZone{
    Domain:               zone.GetDomain(),
    SearchDomainDisabled: zone.GetSearchDomainDisabled(),
    SkipPTRProcess:       zone.GetSkipPTRProcess(),
}

The pipeline error indicates GetSearchDomainDisabled does not exist on *mgmProto.CustomZone, so this code does not compile. You need to either:

  • Add a corresponding search_domain_disabled (or similarly named) field to the CustomZone message in shared/management/proto/management.proto and regenerate the Go bindings, or
  • Drop this mapping for now and rely on the default value in nbdns.CustomZone.

If you just want the current code to build without changing the proto, adjust as:

-        dnsZone := nbdns.CustomZone{
-            Domain:               zone.GetDomain(),
-            SearchDomainDisabled: zone.GetSearchDomainDisabled(),
-            SkipPTRProcess:       zone.GetSkipPTRProcess(),
-        }
+        dnsZone := nbdns.CustomZone{
+            Domain:         zone.GetDomain(),
+            SkipPTRProcess: zone.GetSkipPTRProcess(),
+        }

and reintroduce the mapping once the proto field exists.

♻️ Duplicate comments (1)
client/internal/engine.go (1)

958-993: Nil pointer risk when using ProfileConfig.ManagementURL in bundle job handler

handleBundle calls:

uploadKey, err := e.jobExecutor.BundleJob(
    e.ctx, bundleDeps, bundleJobParams, waitFor,
    e.config.ProfileConfig.ManagementURL.String(),
)

without checking whether e.config.ProfileConfig or ProfileConfig.ManagementURL are nil. Either being nil will panic when a bundle job arrives.

Add a defensive check before constructing bundleDeps and the URL, and return a failed job response if the configuration is missing. For example:

 func (e *Engine) handleBundle(params *mgmProto.BundleParameters) (*mgmProto.JobResponse_Bundle, error) {
@@
-    syncResponse, err := e.GetLatestSyncResponse()
+    syncResponse, err := e.GetLatestSyncResponse()
@@
-    bundleDeps := debug.GeneratorDependencies{
-        InternalConfig: e.config.ProfileConfig,
+    if e.config.ProfileConfig == nil || e.config.ProfileConfig.ManagementURL == nil {
+        return nil, errors.New("profile config or management URL not available for bundle generation")
+    }
+
+    bundleDeps := debug.GeneratorDependencies{
+        InternalConfig: e.config.ProfileConfig,
@@
-    waitFor := time.Duration(params.BundleForTime) * time.Minute
-
-    uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String())
+    waitFor := time.Duration(params.BundleForTime) * time.Minute
+    managementURL := e.config.ProfileConfig.ManagementURL.String()
+
+    uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, managementURL)

This ensures the job handler fails gracefully instead of panicking the client.

Also applies to: 995-1028

🧹 Nitpick comments (7)
shared/management/client/client_test.go (1)

76-76: Consider passing metrics to JobManager instead of nil.

The JobManager is created with nil for the metrics parameter, but a valid metrics object is created just below on lines 100-101. Consider reordering the initialization to create metrics first, then pass it to NewJobManager, unless nil metrics is intentionally acceptable for test scenarios.

Apply this diff to use the actual metrics object:

+	metrics, err := telemetry.NewDefaultAppMetrics(context.Background())
+	require.NoError(t, err)
+
-	jobManager := job.NewJobManager(nil, store)
+	jobManager := job.NewJobManager(metrics, store)
 	eventStore := &activity.InMemoryEventStore{}
 
 	ctrl := gomock.NewController(t)
 	t.Cleanup(ctrl.Finish)
 
 	permissionsManagerMock := permissions.NewMockManager(ctrl)
 	permissionsManagerMock.
 		EXPECT().
 		ValidateUserPermissions(
 			gomock.Any(),
 			gomock.Any(),
 			gomock.Any(),
 			gomock.Any(),
 			gomock.Any(),
 		).
 		Return(true, nil).
 		AnyTimes()
 
 	peersManger := peers.NewManager(store, permissionsManagerMock)
 	settingsManagerMock := settings.NewMockManager(ctrl)
 
 	ia, _ := integrations.NewIntegratedValidator(context.Background(), peersManger, settingsManagerMock, eventStore)
 
-	metrics, err := telemetry.NewDefaultAppMetrics(context.Background())
-	require.NoError(t, err)
-
shared/management/proto/management.proto (1)

91-96: Consider adding documentation for BundleParameters fields.

The field names could benefit from proto comments for clarity:

  • bundle_for is ambiguous—it's unclear what this flag enables without context from the HTTP API docs
  • bundle_for_time should document its unit (minutes, based on the HTTP API)
 message BundleParameters {
-  bool   bundle_for       = 1;
-  int64  bundle_for_time  = 2;
+  // Whether to generate a bundle for a specific timeframe.
+  bool   bundle_for       = 1;
+  // Time period in minutes for which to generate the bundle.
+  int64  bundle_for_time  = 2;
   int32  log_file_count   = 3;
+  // Whether sensitive data should be anonymized in the bundle.
   bool   anonymize        = 4;
 }
management/server/account/manager.go (1)

126-128: Normalize parameter ordering for peer job Manager methods

CreatePeerJob takes (accountID, peerID, userID string, ...), while GetAllPeerJobs and GetPeerJobByID use (accountID, userID, peerID string, ...). This asymmetry is easy to trip over in implementations and mocks.

Consider standardizing on one ordering (e.g. ctx, accountID, userID, peerID, ...) for all three methods and updating implementations accordingly.

client/internal/engine_test.go (1)

239-257: Consider centralizing NewEngine test construction to avoid duplicated nil, nil arguments

All the NewEngine(...) calls now end with ..., peer.NewRecorder("https://mgm"), nil, nil). This is correct for the current signature, but the repeated trailing nil, nil across multiple tests makes future signature changes easy to miss.

Consider a small helper (e.g. newTestEngine(t, ctx, cfg, mgmtClientOpts...)) that sets up the common arguments (including the last two) so only that helper needs updating if NewEngine evolves again.

Also applies to: 417-425, 637-644, 801-809, 1003-1011, 1536-1537

management/internals/shared/grpc/server.go (2)

55-77: Consider guarding against a nil jobManager in server construction

Server now depends on jobManager (CreateJobChannel, CloseChannel, HandleResponse), but NewServer accepts a *job.Manager without validating it. If any caller passes nil, the Job RPC will panic. Consider either:

  • Requiring non-nil and returning an error from NewServer if jobManager == nil, or
  • Making the Job handler explicitly reject calls when jobManager is nil with a gRPC Internal/Unavailable error.

This keeps failures explicit instead of panicking.

Also applies to: 79-132


175-210: Job streaming flow is coherent; error handling strategy is acceptable but could be tightened

The Job RPC handshake (decrypt first message, resolve account/peer, set context keys), background response receiver (startResponseReceiver), and sendJobsLoop/sendJob pipeline all look logically consistent and reuse existing encryption and context patterns from Sync. The goroutine cleanly exits on io.EOF/context.Canceled, and the job channel is closed via defer CloseChannel.

The only notable tradeoff is that sendJobsLoop logs and returns nil on sendJob failures, so the client sees a successful stream close rather than an error status; you already left a TODO to refine this. Given that, the current implementation is reasonable for an initial version.

Also applies to: 327-390, 450-473

client/internal/engine.go (1)

235-255: Unused c *profilemanager.Config parameter in NewEngine

NewEngine accepts a c *profilemanager.Config argument but doesn’t use it; all configuration is read from config *EngineConfig instead. Unless future work will use c, consider removing this parameter to avoid confusion or, alternatively, wiring c into config.ProfileConfig here.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between de78255 and 3e53c01.

⛔ Files ignored due to path filters (2)
  • client/proto/daemon.pb.go is excluded by !**/*.pb.go
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (29)
  • client/cmd/testutil_test.go (4 hunks)
  • client/internal/engine.go (13 hunks)
  • client/internal/engine_test.go (10 hunks)
  • client/proto/daemon.proto (0 hunks)
  • client/server/server.go (4 hunks)
  • client/server/server_test.go (4 hunks)
  • go.mod (2 hunks)
  • management/internals/server/boot.go (1 hunks)
  • management/internals/server/controllers.go (3 hunks)
  • management/internals/server/modules.go (1 hunks)
  • management/internals/shared/grpc/server.go (10 hunks)
  • management/server/account.go (4 hunks)
  • management/server/account/manager.go (1 hunks)
  • management/server/account_test.go (2 hunks)
  • management/server/activity/codes.go (2 hunks)
  • management/server/dns_test.go (2 hunks)
  • management/server/http/handlers/peers/peers_handler.go (3 hunks)
  • management/server/http/testing/testing_tools/channel/channel.go (3 hunks)
  • management/server/management_proto_test.go (4 hunks)
  • management/server/management_test.go (3 hunks)
  • management/server/mock_server/account_mock.go (2 hunks)
  • management/server/nameserver_test.go (2 hunks)
  • management/server/peer.go (1 hunks)
  • management/server/peer_test.go (5 hunks)
  • management/server/route_test.go (2 hunks)
  • shared/management/client/client_test.go (4 hunks)
  • shared/management/http/api/openapi.yml (2 hunks)
  • shared/management/http/api/types.gen.go (8 hunks)
  • shared/management/proto/management.proto (2 hunks)
💤 Files with no reviewable changes (1)
  • client/proto/daemon.proto
🚧 Files skipped from review as they are similar to previous changes (12)
  • management/server/account_test.go
  • management/server/dns_test.go
  • management/server/account.go
  • management/internals/server/modules.go
  • management/server/route_test.go
  • management/server/http/testing/testing_tools/channel/channel.go
  • management/server/peer_test.go
  • client/cmd/testutil_test.go
  • management/server/activity/codes.go
  • management/server/management_test.go
  • management/server/peer.go
  • management/server/nameserver_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-28T12:20:47.254Z
Learnt from: bcmmbaga
Repo: netbirdio/netbird PR: 4849
File: management/internals/modules/zones/manager/manager.go:55-86
Timestamp: 2025-11-28T12:20:47.254Z
Learning: In the NetBird management server, DNS zones without records are automatically filtered out in network map generation (filterPeerAppliedZones in management/internals/controllers/network_map/controller/controller.go checks `len(zone.Records) == 0`). Therefore, CreateZone operations don't need to call UpdateAccountPeers since empty zones don't affect the network map.

Applied to files:

  • shared/management/client/client_test.go
🧬 Code graph analysis (10)
management/server/account/manager.go (1)
management/server/types/job.go (1)
  • Job (34-58)
client/server/server_test.go (3)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/account.go (1)
  • BuildManager (178-266)
management/internals/shared/grpc/server.go (1)
  • NewServer (80-132)
management/internals/server/boot.go (1)
management/internals/shared/grpc/server.go (1)
  • NewServer (80-132)
management/server/mock_server/account_mock.go (1)
management/server/types/job.go (1)
  • Job (34-58)
management/server/management_proto_test.go (3)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/account.go (1)
  • BuildManager (178-266)
management/internals/shared/grpc/server.go (1)
  • NewServer (80-132)
shared/management/client/client_test.go (3)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/account.go (1)
  • BuildManager (178-266)
management/internals/shared/grpc/server.go (1)
  • NewServer (80-132)
client/server/server.go (1)
client/status/status.go (1)
  • ToProtoFullStatus (549-635)
management/internals/server/controllers.go (3)
management/internals/server/server.go (1)
  • BaseServer (45-68)
management/server/job/manager.go (2)
  • Manager (23-30)
  • NewJobManager (32-42)
management/internals/server/container.go (1)
  • Create (6-10)
management/server/http/handlers/peers/peers_handler.go (4)
management/server/context/auth.go (1)
  • GetUserAuthFromContext (25-30)
shared/management/http/util/util.go (3)
  • WriteError (84-120)
  • WriteErrorResponse (70-80)
  • WriteJSONObject (27-35)
shared/management/http/api/types.gen.go (3)
  • JobRequest (708-710)
  • JobResponse (713-721)
  • JobResponseStatus (724-724)
management/server/types/job.go (3)
  • NewJob (67-103)
  • Job (34-58)
  • Workload (60-64)
client/internal/engine_test.go (3)
client/internal/engine.go (2)
  • NewEngine (235-276)
  • EngineConfig (83-142)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/account.go (1)
  • BuildManager (178-266)
🪛 GitHub Actions: Linux
client/internal/engine.go

[error] 1312-1312: zone.GetSearchDomainDisabled undefined (type *"github.com/netbirdio/netbird/shared/management/proto".CustomZone has no field or method GetSearchDomainDisabled)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: Android / Build
  • GitHub Check: release_ui_darwin
  • GitHub Check: release
  • GitHub Check: release_ui
  • GitHub Check: JS / Lint
  • GitHub Check: iOS / Build
  • GitHub Check: Linux
  • GitHub Check: Darwin
  • GitHub Check: Windows
  • GitHub Check: Client / Unit
  • GitHub Check: Check External GPL/AGPL Licenses
  • GitHub Check: Client / Unit
  • GitHub Check: Client / Unit
🔇 Additional comments (21)
shared/management/client/client_test.go (2)

27-27: LGTM!

The import addition is necessary to support JobManager instantiation in the test setup.


123-123: LGTM!

The BuildManager call correctly includes the jobManager parameter in the expected position (5th parameter).

management/internals/server/controllers.go (1)

29-33: LGTM!

The JobManager() accessor correctly follows the established lazy initialization pattern using the Create helper, consistent with other accessors like PeersUpdateManager() and IntegratedValidator().

shared/management/proto/management.proto (2)

51-53: LGTM!

The bidirectional streaming RPC correctly follows the encrypted message pattern used by other management service endpoints, maintaining consistency in the API design.


75-89: Potential inconsistency between proto and HTTP API status values.

The JobStatus enum here defines unknown_status, succeeded, and failed, but the HTTP API types (JobResponseStatus in types.gen.go) include an additional pending status. If jobs can be in a pending state before completion, consider adding it to the proto enum for consistency:

 enum JobStatus {
   unknown_status = 0; //placeholder
-  succeeded = 1;
-  failed = 2;
+  pending = 1;
+  succeeded = 2;
+  failed = 3;
 }

Also, Reason on line 84 uses PascalCase while proto convention typically uses snake_case (reason).

shared/management/http/api/types.gen.go (1)

1-4: This is auto-generated code.

This file is generated by oapi-codegen. Any issues with the types or structure should be addressed in the source OpenAPI specification rather than this file.

management/internals/server/boot.go (1)

148-148: LGTM!

The JobManager is correctly wired into the gRPC server construction, following the established dependency injection pattern. The parameter position matches the NewServer function signature in management/internals/shared/grpc/server.go.

management/server/management_proto_test.go (2)

343-343: LGTM!

The JobManager instantiation with nil metrics is acceptable for test purposes—the implementation handles nil metrics gracefully.


372-373: LGTM!

The test correctly wires jobManager into both BuildManager and nbgrpc.NewServer, matching the updated function signatures and maintaining consistency with the production code in boot.go.

Also applies to: 386-386

client/server/server.go (2)

31-33: Status conversion now correctly uses shared helper

Switching to nbstatus.ToProtoFullStatus(fullStatus) centralizes the in‑memory→proto mapping and keeps this server aligned with the status package; the subsequent mutation of Events and SshServerState still fits the previous behaviour.

Also applies to: 1080-1088


1537-1545: ConnectClient.Run wiring matches new signature

Passing s.logFile into s.connectClient.Run(runningChan, s.logFile) correctly adapts to the new Run signature and ensures the engine has access to the daemon’s log path without altering existing retry semantics.

shared/management/http/api/openapi.yml (3)

35-38: Jobs tag fits existing API categorisation

Adding the Jobs tag (marked x-experimental) matches the tagging style of other feature areas and clearly scopes the new endpoints.


41-156: Workload and Job schemas are consistent and extensible

The WorkloadType/Bundle* schemas plus WorkloadRequest/Response and JobRequest/JobResponse are structured cleanly:

  • Discriminator on type with bundle mapping is valid and forward‑extensible.
  • JobResponse required fields avoid nullable properties (completed_at, failed_reason) while guaranteeing id, status, triggered_by, and workload.
  • Bundle parameter/result fields match the described bundle use case.

No structural issues from an OpenAPI standpoint.


2353-2456: Peer Job endpoints integrate cleanly into peers API

/api/peers/{peerId}/jobs (GET/POST) and /api/peers/{peerId}/jobs/{jobId} (GET):

  • Use the new JobRequest/JobResponse schemas correctly.
  • Reuse BearerAuth and TokenAuth security consistent with other peer operations.
  • Return an array for list, 201 with a single JobResponse for create, and 200 for get.

The shape and placement of these endpoints look correct and consistent with the rest of the spec.

client/server/server_test.go (1)

23-24: JobManager wiring in management test server matches production setup

Creating a jobManager := job.NewJobManager(nil, store) and threading it into server.BuildManager(...) and nbgrpc.NewServer(...) keeps the test server aligned with the new job‑enabled construction path without altering existing behaviour of this test.

Also applies to: 298-332

client/internal/engine_test.go (1)

27-29: Engine tests correctly integrate JobManager into management server startup

Importing management/server/job, constructing jobManager := job.NewJobManager(nil, store), and passing it through to server.BuildManager(...) and nbgrpc.NewServer(...) keeps this test’s management server in sync with the new job‑aware construction path and should let engine tests run against the same wiring as production.

Also applies to: 1595-1636

management/server/http/handlers/peers/peers_handler.go (2)

31-41: Peer job HTTP endpoints and handlers look consistent with existing patterns

The new /peers/{peerId}/jobs and /peers/{peerId}/jobs/{jobId} routes plus CreateJob, ListJobs, and GetJob follow the existing handler conventions: user auth from context, mux.Vars for IDs, JSON decoding with proper 400 on parse failure, and error propagation via util.WriteError. Delegation to accountManager for authorization and data access is appropriate. I don’t see correctness or security issues in this flow.

Also applies to: 51-142


594-614: Job→API response mapping is correct and defensive

toSingleJobResponse builds api.JobResponse from types.Job cleanly, including building the workload via BuildWorkloadResponse, handling optional FailedReason, and preserving timestamps and status. This matches the shared API types and is safe as long as BuildWorkloadResponse returns a non-nil workload when err == nil, which it’s already checked for.

management/server/mock_server/account_mock.go (1)

132-155: Job-related mock hooks are wired correctly

The new CreatePeerJobFunc, GetAllPeerJobsFunc, and GetPeerJobByIDFunc fields and their methods mirror the existing mock style (delegate if set, otherwise return codes.Unimplemented). This should integrate cleanly with the expanded account.Manager interface and make job behavior easily configurable in tests.

client/internal/engine.go (2)

205-212: Sync response persistence refactor with syncRespMux looks correct

The move from using syncMsgMux to a dedicated syncRespMux for persistSyncResponse/latestSyncResponse removes the race between sync handling and debug bundle reads:

  • handleSync now checks persistSyncResponse under RLock and conditionally writes latestSyncResponse under Lock.
  • SetSyncResponsePersistence and GetLatestSyncResponse both operate entirely under syncRespMux, with GetLatestSyncResponse cloning before returning.

The lock usage avoids deadlocks (no upgrade while holding RLock) and keeps sync message processing serialized via syncMsgMux as before.

Also applies to: 770-841, 1891-1929


958-993: Job event client wiring matches existing management/signal patterns

receiveJobEvents mirrors the existing receiveManagementEvents/receiveSignalEvents patterns: it uses a dedicated wait group, runs mgmClient.Job with the engine context, maps terminal errors to a global client restart (ErrResetConnection + clientCancel), and defaults job status to failed unless the handler succeeds. That’s a good, conservative default and makes job-stream failures consistent with other control-plane failures.

Comment thread shared/management/client/client_test.go Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
shared/management/client/client_test.go (1)

76-135: Construct jobManager with real metrics instead of nil to match production wiring.

Right now the job manager is created with a nil metrics handle even though metrics is initialized later via telemetry.NewDefaultAppMetrics. To avoid potential nil dereferences inside the job manager and to keep the test server wiring closer to production, it’s better to build the job manager with the real metrics instance after it’s created. The rest of the wiring (NewController, BuildManager, nbgrpc.NewServer) matches the signatures shown in the referenced snippets. Based on relevant_code_snippets from management/server/job/manager.go, management/server/account.go, and management/internals/shared/grpc/server.go.

You could adjust as follows:

-	jobManager := job.NewJobManager(nil, store)
-	eventStore := &activity.InMemoryEventStore{}
+	eventStore := &activity.InMemoryEventStore{}
@@
-	metrics, err := telemetry.NewDefaultAppMetrics(context.Background())
-	require.NoError(t, err)
+	metrics, err := telemetry.NewDefaultAppMetrics(context.Background())
+	require.NoError(t, err)
+	jobManager := job.NewJobManager(metrics, store)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3e53c01 and 80b6854.

⛔ Files ignored due to path filters (1)
  • shared/management/proto/management.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (1)
  • shared/management/client/client_test.go (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-28T12:20:47.254Z
Learnt from: bcmmbaga
Repo: netbirdio/netbird PR: 4849
File: management/internals/modules/zones/manager/manager.go:55-86
Timestamp: 2025-11-28T12:20:47.254Z
Learning: In the NetBird management server, DNS zones without records are automatically filtered out in network map generation (filterPeerAppliedZones in management/internals/controllers/network_map/controller/controller.go checks `len(zone.Records) == 0`). Therefore, CreateZone operations don't need to call UpdateAccountPeers since empty zones don't affect the network map.

Applied to files:

  • shared/management/client/client_test.go
🧬 Code graph analysis (1)
shared/management/client/client_test.go (4)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/internals/controllers/network_map/controller/controller.go (1)
  • NewController (75-111)
management/internals/modules/peers/ephemeral/manager/ephemeral.go (1)
  • NewEphemeralManager (53-61)
management/server/account.go (1)
  • BuildManager (178-266)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: release_ui
  • GitHub Check: release
  • GitHub Check: release_ui_darwin
  • GitHub Check: Build Cache
  • GitHub Check: JS / Lint
  • GitHub Check: Client / Unit
  • GitHub Check: Darwin
  • GitHub Check: Linux
  • GitHub Check: Windows
  • GitHub Check: iOS / Build
  • GitHub Check: Android / Build
  • GitHub Check: Client / Unit
  • GitHub Check: Client / Unit
🔇 Additional comments (1)
shared/management/client/client_test.go (1)

20-28: New imports for ephemeral manager and job manager look correct and are properly used.

The ephemeral_manager alias and job package are both referenced later in startManagement (for NewEphemeralManager and NewJobManager respectively), so the import changes are consistent and compile-time safe.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e1d594b and 469e147.

📒 Files selected for processing (2)
  • management/server/job/manager.go (1 hunks)
  • management/server/management_test.go (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • management/server/management_test.go
🧰 Additional context used
🧬 Code graph analysis (1)
management/server/job/manager.go (3)
management/server/job/channel.go (2)
  • Channel (18-21)
  • NewChannel (23-29)
management/server/store/store.go (1)
  • Store (50-211)
management/server/types/job.go (1)
  • Job (34-58)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (26)
  • GitHub Check: Relay / Unit (386)
  • GitHub Check: Management / Benchmark (amd64, postgres)
  • GitHub Check: Relay / Unit (amd64, -race)
  • GitHub Check: Management / Integration (amd64, sqlite)
  • GitHub Check: Management / Benchmark (API) (amd64, postgres)
  • GitHub Check: Management / Unit (amd64, postgres)
  • GitHub Check: Management / Integration (amd64, postgres)
  • GitHub Check: Management / Unit (amd64, sqlite)
  • GitHub Check: Signal / Unit (386)
  • GitHub Check: Client / Unit (amd64)
  • GitHub Check: Client / Unit (386)
  • GitHub Check: Management / Unit (amd64, mysql)
  • GitHub Check: Management / Benchmark (API) (amd64, sqlite)
  • GitHub Check: Management / Benchmark (amd64, sqlite)
  • GitHub Check: Client (Docker) / Unit
  • GitHub Check: Client / Unit
  • GitHub Check: Client / Unit
  • GitHub Check: Windows
  • GitHub Check: Darwin
  • GitHub Check: Linux
  • GitHub Check: Android / Build
  • GitHub Check: Client / Unit
  • GitHub Check: release_ui_darwin
  • GitHub Check: release
  • GitHub Check: release_ui
  • GitHub Check: JS / Lint
🔇 Additional comments (5)
management/server/job/manager.go (5)

17-21: LGTM!

The Event struct is a clean container for tracking job request/response pairs.


23-30: LGTM!

The Manager struct is properly designed with appropriate fields for concurrent job management.


32-42: LGTM!

The constructor properly initializes all Manager fields with sensible defaults.


153-159: LGTM!

The method correctly checks peer connection status with appropriate read lock usage.


161-171: LGTM!

The method correctly checks for pending jobs with appropriate read lock usage. The O(n) scan is acceptable for typical workloads.

Comment thread management/server/job/manager.go
Comment thread management/server/job/manager.go
Comment thread management/server/job/manager.go
Comment thread management/server/job/manager.go
Comment thread management/server/job/manager.go
Comment thread management/server/job/manager.go Outdated
Comment thread management/server/job/manager.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
management/server/types/job.go (1)

120-138: Potential JSON shape mismatch between stored proto.BundleResult and api.BundleResult

ApplyResponse stores resp.WorkloadResults.(*proto.JobResponse_Bundle).Bundle by JSON‑marshalling the proto struct, while buildBundleResponse later unmarshals that blob into api.BundleResult. If proto.BundleResult.UploadKey is still a wrapped type (e.g. a protobuf String message) rather than a plain string, the JSON will be { "upload_key": { "value": "..." } }, which will fail to unmarshal into api.BundleResult’s UploadKey *string. Consider mapping the proto result into an api.BundleResult and marshalling that instead, or updating the proto field to a plain string, so the stored JSON shape matches the HTTP type.

To confirm the field type and JSON shape, please run:

#!/bin/bash
# Inspect BundleResult in the proto package and compare to HTTP BundleResult
rg -n "message BundleResult" shared/management/proto -n -C3 || true
rg -n "type BundleResult struct" shared/management/http/api -n -C5 || true

Also applies to: 164-201

management/server/job/manager.go (1)

24-32: Race: SendJob can panic sending on a channel closed by CloseChannel/CreateJobChannel, and old streams can close new channels

SendJob grabs a *Channel under RLock, releases the mutex, then calls ch.AddEvent, while CloseChannel and CreateJobChannel can concurrently call ch.Close() on the same channel under Lock. This leaves a window where AddEvent writes into a channel that has just been closed, which will panic with “send on closed channel.” Additionally, because CloseChannel looks up the channel only by peerID, a previous Job stream’s deferred CloseChannel can close and delete a new channel created for the same peer by a later Job stream. You should either (a) ensure channels are not closed while AddEvent may be sending (e.g. by coordinating via the same mutex or a per‑channel state flag), or (b) make AddEvent robust to a concurrently closed channel (e.g. recovering from the panic and returning a typed ErrJobChannelClosed, which SendJob already routes into cleanup).

Also applies to: 34-45, 47-65, 68-91, 130-149, 151-162

🧹 Nitpick comments (2)
management/internals/shared/grpc/server.go (1)

446-449: Minor: comment still refers to “update message” instead of “job message”

The comment above sendJob mentions “update message” and “sync server”, which is copy‑pasted from the Sync path; updating it to refer to “job message”/“Job server” would avoid confusion.

management/server/store/sql_store.go (1)

109-118: Job persistence methods are scoped correctly; consider checking RowsAffected in CompletePeerJob

CreatePeerJob, MarkPendingJobsAsFailed, MarkAllPendingJobsAsFailed, GetPeerJobByID, and GetPeerJobs all use appropriate account/peer scoping and statuses, which should keep job rows well‑bounded. In CompletePeerJob, however, a missing job ID currently yields a successful return even if no rows were updated; consider checking result.RowsAffected and returning a NotFound‑style error when it is zero to avoid silently losing job completions.

Also applies to: 136-157, 159-193, 195-225

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 469e147 and b1cbddd.

📒 Files selected for processing (7)
  • management/internals/modules/peers/manager.go (2 hunks)
  • management/internals/server/controllers.go (3 hunks)
  • management/internals/shared/grpc/server.go (10 hunks)
  • management/server/job/manager.go (1 hunks)
  • management/server/store/sql_store.go (4 hunks)
  • management/server/store/store.go (1 hunks)
  • management/server/types/job.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
management/internals/modules/peers/manager.go (1)
management/server/store/store.go (1)
  • LockingStrengthNone (47-47)
management/server/types/job.go (1)
shared/management/http/api/types.gen.go (7)
  • JobRequest (708-710)
  • WorkloadResponse (1948-1950)
  • BundleParameters (361-373)
  • BundleResult (376-378)
  • BundleWorkloadResponse (391-399)
  • WorkloadTypeBundle (195-195)
  • JobResponse (713-721)
management/server/job/manager.go (3)
management/server/job/channel.go (2)
  • Channel (18-21)
  • NewChannel (23-29)
management/server/store/store.go (1)
  • Store (50-213)
management/internals/modules/peers/manager.go (1)
  • Manager (23-33)
management/server/store/store.go (1)
management/server/types/job.go (1)
  • Job (34-58)
management/server/store/sql_store.go (2)
management/server/types/job.go (3)
  • Job (34-58)
  • JobStatusPending (18-18)
  • JobStatusFailed (20-20)
management/server/store/store.go (2)
  • LockingStrength (40-40)
  • LockingStrengthNone (47-47)
🔇 Additional comments (6)
management/server/types/job.go (1)

66-103: Job creation and bundle parameter validation look consistent

NewJob and validateAndBuildBundleParams correctly derive the workload type, enforce tight bounds on BundleForTime and LogFileCount, and persist a normalized JSON parameters blob; this should interoperate cleanly with buildStreamBundleResponse and the proto BundleParameters.

Also applies to: 140-162

management/internals/server/controllers.go (1)

29-33: JobManager wiring via BaseServer accessor looks correct

The JobManager() accessor follows the existing Create pattern and passes the expected dependencies (Metrics, Store, PeersManager) into job.NewJobManager, so the wiring is coherent with the rest of the server.

management/internals/modules/peers/manager.go (1)

23-33: GetPeerID helper cleanly delegates to store

Extending Manager with GetPeerID and delegating to store.GetPeerIDByKey with LockingStrengthNone is straightforward and keeps peer‑ID resolution centralized in the store.

Also applies to: 165-167

management/server/job/manager.go (1)

93-128: HandleResponse flow (pending lookup, peer validation, DB completion) looks sound

Within the manager lock, HandleResponse now looks up the pending event by jobID, resolves the responding peer ID via peersManager.GetPeerID, enforces equality with event.PeerID, applies the proto response onto a types.Job, persists with CompletePeerJob, and only then deletes the pending entry; this fixes earlier concerns about peer spoofing and premature pending cleanup.

management/internals/shared/grpc/server.go (1)

79-132: Job streaming wiring and handshake/response receiver are coherent

The Job handler’s handshake, account/peer lookup, response receiver startup, and per‑peer CreateJobChannel/sendJobsLoop integration follow the existing Sync patterns and correctly tie into job.Manager and the store; the response receiver also exits cleanly on EOF/context cancellation while validating and dispatching decrypted JobResponses.

Also applies to: 175-207, 324-337, 339-362, 364-386, 446-469

management/server/store/store.go (1)

40-48: Store interface extensions for jobs and GetPeerIDByKey look appropriate

Adding job lifecycle methods and GetPeerIDByKey to Store matches the SqlStore implementation and the new job/peers flows; just ensure any non‑SQL Store implementations are updated accordingly to avoid build breakage.

If you have other Store implementations (e.g. a FileStore), please confirm they now implement these methods as well.

Also applies to: 206-213

Comment thread management/internals/shared/grpc/server.go
Comment thread management/server/store/sql_store.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
management/server/store/sql_store.go (1)

146-157: Address past review feedback: Verify peer ownership in CompletePeerJob.

As noted in past reviews, this method should verify that the job being completed actually belongs to the peer making the request. Currently, it only checks the job ID without validating the peer_id, which could allow a peer to complete another peer's job.

Apply this diff to add peer verification:

 func (s *SqlStore) CompletePeerJob(ctx context.Context, job *types.Job) error {
 	result := s.db.
 		Model(&types.Job{}).
-		Where(idQueryCondition, job.ID).
+		Where(accountAndIDQueryCondition+" AND id = ?", job.AccountID, job.PeerID, job.ID).
 		Updates(job)

 	if result.Error != nil {
 		log.WithContext(ctx).Errorf("failed to update job in store: %s", result.Error)
 		return status.Errorf(status.Internal, "failed to update job in store")
 	}
+
+	if result.RowsAffected == 0 {
+		return status.Errorf(status.NotFound, "job %s not found or does not belong to peer %s", job.ID, job.PeerID)
+	}
+
 	return nil
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b1cbddd and 26eaac0.

📒 Files selected for processing (1)
  • management/server/store/sql_store.go (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
management/server/store/sql_store.go (2)
management/server/types/job.go (3)
  • Job (34-58)
  • JobStatusPending (18-18)
  • JobStatusFailed (20-20)
management/server/store/store.go (2)
  • LockingStrength (40-40)
  • LockingStrengthNone (47-47)
🔇 Additional comments (3)
management/server/store/sql_store.go (3)

43-52: LGTM: Constants follow existing patterns.

The new accountAndPeerIDQueryCondition constant is consistent with the existing query condition constants and will be used for job filtering operations.


117-117: LGTM: Job entity added to auto-migration.

The addition of &types.Job{} to the AutoMigrate call ensures the jobs table schema is properly created and maintained.


4221-4239: LGTM: GetPeerIDByKey correctly uses GetKeyQueryCondition helper.

The method properly uses GetKeyQueryCondition(s) at line 4230 to handle MySQL's reserved key identifier, addressing the concern raised in past reviews about MySQL compatibility.

Comment thread management/server/store/sql_store.go
Comment thread client/jobexec/executor.go Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
client/jobexec/executor.go (1)

63-63: Fix grammatical error in log message.

The message "generated well" is grammatically incorrect and misleading since at this point both generation and upload have succeeded.

-	log.Infof("debug bundle has been generated well")
+	log.Infof("debug bundle has been uploaded successfully")
🧹 Nitpick comments (1)
client/jobexec/executor.go (1)

51-61: Consider retry-friendly cleanup strategy on upload failure.

Currently, the deferred os.Remove deletes the bundle file regardless of upload outcome. If the upload fails, the generated bundle is lost, requiring full regeneration for any retry attempt.

Consider keeping the file on upload failure to enable retry, or implementing retry logic within this method before cleanup.

-	defer func() {
-		if err := os.Remove(path); err != nil {
-			log.Errorf("failed to remove debug bundle file: %v", err)
-		}
-	}()
-
 	key, err := debug.UploadDebugBundle(ctx, types.DefaultBundleURL, mgmURL, path)
 	if err != nil {
 		log.Errorf("failed to upload debug bundle: %v", err)
 		return "", fmt.Errorf("upload debug bundle: %w", err)
 	}
+
+	if err := os.Remove(path); err != nil {
+		log.Errorf("failed to remove debug bundle file: %v", err)
+	}

Alternatively, implement retry logic before cleanup if transient upload failures are expected.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 26eaac0 and c71c0b6.

📒 Files selected for processing (1)
  • client/jobexec/executor.go (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-13T00:29:53.247Z
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/cmd/ssh_exec_unix.go:53-74
Timestamp: 2025-11-13T00:29:53.247Z
Learning: In client/ssh/server/executor_unix.go, the method ExecuteWithPrivilegeDrop(ctx context.Context, config ExecutorConfig) has a void return type (no error return). It handles failures by exiting the process directly with appropriate exit codes rather than returning errors to the caller.

Applied to files:

  • client/jobexec/executor.go
📚 Learning: 2025-11-14T13:05:31.729Z
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/ssh/server/userswitching_windows.go:89-139
Timestamp: 2025-11-14T13:05:31.729Z
Learning: In client/ssh/server/executor_windows.go, the WindowsExecutorConfig struct's Pty, PtyWidth, and PtyHeight fields are intentionally left unused for now and will be implemented in a future update.

Applied to files:

  • client/jobexec/executor.go
🧬 Code graph analysis (1)
client/jobexec/executor.go (3)
client/internal/debug/debug.go (3)
  • GeneratorDependencies (238-243)
  • BundleConfig (232-236)
  • NewBundleGenerator (245-264)
client/internal/debug/upload.go (1)
  • UploadDebugBundle (17-28)
upload-server/types/upload.go (1)
  • DefaultBundleURL (11-11)
🔇 Additional comments (5)
client/jobexec/executor.go (5)

1-14: LGTM!

Package declaration and imports are correctly structured. All imports are utilized within the file.


16-18: LGTM!

The 60-minute cap on bundle wait time is a reasonable safeguard against excessively long waits.


24-29: LGTM!

Empty struct with constructor is an acceptable pattern for future extensibility.


67-76: LGTM!

The waitFor implementation correctly handles context cancellation and returns the appropriate error, addressing prior review feedback. The logging provides good observability.


20-22: Remove this comment — ErrJobNotImplemented is actively used.

The error is referenced in client/internal/engine.go:979 where it's accessed via .Error() method. It's appropriately exported for use across packages.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
client/internal/engine.go (1)

235-255: Remove unused c *profilemanager.Config parameter in NewEngine

NewEngine’s last parameter c *profilemanager.Config is never used in the function body. This is dead API surface and (depending on tooling) can trigger compile or lint failures and unnecessary churn at call sites.

Consider dropping this parameter from the signature and corresponding call sites, since EngineConfig.ProfileConfig already carries the same information.

-func NewEngine(..., config *EngineConfig, mobileDep MobileDependency, statusRecorder *peer.Status, checks []*mgmProto.Checks, c *profilemanager.Config) *Engine {
+func NewEngine(..., config *EngineConfig, mobileDep MobileDependency, statusRecorder *peer.Status, checks []*mgmProto.Checks) *Engine {

…and update call sites accordingly.

♻️ Duplicate comments (4)
management/server/peer.go (3)

348-358: Potential nil-pointer on am.jobManager.

The code calls am.jobManager.IsPeerConnected(peerID) and am.jobManager.IsPeerHasPendingJobs(peerID) without checking if am.jobManager is non-nil. If the JobManager is ever nil (e.g., in certain test configurations or misconfigured environments), this will panic.

Consider adding a defensive check at the start of the function:

+	if am.jobManager == nil {
+		return status.Errorf(status.Internal, "job manager is not configured")
+	}
+
 	if !am.jobManager.IsPeerConnected(peerID) {

Additionally, the TODO comment on lines 353-355 correctly identifies a race condition where concurrent calls could overwrite pending job entries. This should be addressed in a follow-up to ensure job queue integrity.

Based on past review comments, this issue was previously flagged but remains unaddressed.


365-398: Send-before-persist ordering can cause inconsistency between runtime and persistent state.

The current implementation sends the job to the peer (line 366) before persisting it to the database (line 379). If the database write fails after the job has been dispatched, the job will execute on the client but will be absent from persistent storage and from /jobs API listings.

This ordering can lead to:

  • Jobs executing on peers with no database record
  • Inability to track or query these jobs via the API
  • Potential audit and compliance issues

Consider reversing the order:

  1. Persist the job first with status = pending
  2. Send the job to the peer
  3. If sending fails, update the job status to failed with an appropriate FailedReason

This ensures the database remains the source of truth and runtime state stays consistent.

Based on past review comments, this issue was previously flagged but remains unaddressed.


427-451: Missing job ownership validation - authorization bypass.

After fetching the job by jobID (line 445), the function does not verify that job.PeerID matches the requested peerID. This allows a caller to access any job within the account by manipulating the URL path:

Example: GET /peers/{peerA}/jobs/{jobOfPeerB} will return jobOfPeerB even though the path specifies peerA.

Add an ownership check after retrieving the job:

 	job, err := am.Store.GetPeerJobByID(ctx, accountID, jobID)
 	if err != nil {
 		return nil, err
 	}
+
+	if job.PeerID != peerID {
+		return nil, status.NewPeerNotPartOfAccountError()
+	}

 	return job, nil

Based on past review comments, this issue was previously flagged but remains unaddressed.

client/internal/engine.go (1)

987-1010: Guard against nil ProfileConfig and ManagementURL in handleBundle

handleBundle dereferences e.config.ProfileConfig and e.config.ProfileConfig.ManagementURL without checking for nil:

  • InternalConfig: e.config.ProfileConfig
  • e.config.ProfileConfig.ManagementURL.String()

If ProfileConfig or its ManagementURL is nil (e.g. in tests or alternative EngineConfig construction), this will panic during remote debug bundle generation. A previous review already flagged this; it’s still present.

 func (e *Engine) handleBundle(params *mgmProto.BundleParameters) (*mgmProto.JobResponse_Bundle, error) {
 	log.Infof("handle remote debug bundle request: %s", params.String())
@@
-	bundleDeps := debug.GeneratorDependencies{
-		InternalConfig: e.config.ProfileConfig,
+	if e.config.ProfileConfig == nil || e.config.ProfileConfig.ManagementURL == nil {
+		return nil, fmt.Errorf("profile config or management URL not available for bundle generation")
+	}
+
+	bundleDeps := debug.GeneratorDependencies{
+		InternalConfig: e.config.ProfileConfig,
@@
-	waitFor := time.Duration(params.BundleForTime) * time.Minute
-
-	uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String())
+	waitFor := time.Duration(params.BundleForTime) * time.Minute
+
+	managementURL := e.config.ProfileConfig.ManagementURL.String()
+	uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, managementURL)

This keeps the job API unchanged while ensuring the engine can’t panic on missing profile data.

🧹 Nitpick comments (3)
client/server/server.go (1)

31-33: Using nbstatus.ToProtoFullStatus simplifies and centralizes status conversion

Replacing the local toProtoFullStatus helper with nbstatus.ToProtoFullStatus(fullStatus) is a good consolidation. The returned pbFullStatus is still enriched locally with events and SSH server state before being attached to the response, so behavior remains consistent while reducing duplication.

Also applies to: 1078-1091

client/internal/connect.go (2)

64-67: Log path propagation through ConnectClient.Run and mobile variants is consistent

The new Run(runningChan, logPath) signature, plus forwarding of logPath into run(...), cleanly introduces log‑path awareness without disturbing existing control flow. Android/iOS wrappers deliberately pass an empty log path, which keeps their behavior unchanged while still benefiting from other refactors.

Also applies to: 70-88, 90-106, 108-108


251-256: EngineConfig enrichment with LogPath and ProfileConfig is correctly wired

createEngineConfig now:

  • Accepts logPath and assigns it to EngineConfig.LogPath.
  • Sets EngineConfig.ProfileConfig = config.

This, combined with the updated call createEngineConfig(myPrivateKey, c.config, peerConfig, logPath) and the use of these fields in the engine’s debug bundle path, ensures the engine has both the profile metadata and log location needed for remote debug bundles.

Aside from the unused extra parameter on NewEngine (flagged separately in engine.go), the data flow looks sound.

Also applies to: 417-456

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c71c0b6 and 3b60fc8.

📒 Files selected for processing (5)
  • client/internal/connect.go (7 hunks)
  • client/internal/engine.go (13 hunks)
  • client/server/server.go (4 hunks)
  • management/server/peer.go (2 hunks)
  • management/server/permissions/modules/module.go (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
client/internal/engine.go (3)
client/internal/profilemanager/config.go (1)
  • Config (89-160)
client/jobexec/executor.go (3)
  • Executor (24-25)
  • NewExecutor (27-29)
  • ErrJobNotImplemented (21-21)
client/internal/debug/debug.go (2)
  • GeneratorDependencies (238-243)
  • BundleConfig (232-236)
client/server/server.go (1)
client/status/status.go (1)
  • ToProtoFullStatus (549-635)
client/internal/connect.go (3)
client/internal/mobile_dependency.go (1)
  • MobileDependency (13-25)
client/internal/engine.go (2)
  • NewEngine (235-276)
  • EngineConfig (83-142)
client/internal/profilemanager/config.go (1)
  • Config (89-160)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
  • GitHub Check: Management / Unit (amd64, mysql)
  • GitHub Check: Client / Unit (amd64)
  • GitHub Check: Management / Benchmark (amd64, sqlite)
  • GitHub Check: Management / Benchmark (amd64, postgres)
  • GitHub Check: Relay / Unit (386)
  • GitHub Check: Client (Docker) / Unit
  • GitHub Check: Management / Unit (amd64, postgres)
  • GitHub Check: Management / Unit (amd64, sqlite)
  • GitHub Check: Relay / Unit (amd64, -race)
  • GitHub Check: Client / Unit (386)
  • GitHub Check: release_ui_darwin
  • GitHub Check: Client / Unit
  • GitHub Check: Client / Unit
  • GitHub Check: release
  • GitHub Check: JS / Lint
  • GitHub Check: Client / Unit
  • GitHub Check: Darwin
  • GitHub Check: Linux
  • GitHub Check: Windows
  • GitHub Check: iOS / Build
  • GitHub Check: Android / Build
🔇 Additional comments (6)
management/server/permissions/modules/module.go (1)

8-8: LGTM!

The new RemoteJobs module constant and its registration in the All map follow the established pattern correctly.

Also applies to: 25-25

management/server/peer.go (2)

34-34: LGTM!

The minimum version constant for remote jobs is clear and will enable proper feature gating.


400-425: Function looks good; note the TODO for dedicated permissions.

The permission checks and peer ownership validation are correctly implemented. The TODO comment on line 401 indicates planned work to create more granular job permissions, which is a reasonable future enhancement.

client/internal/engine.go (2)

205-222: Job executor lifecycle wiring looks correct

The addition of jobExecutor, its dedicated jobExecutorWG, the receiveJobEvents() call in Start(), and e.jobExecutorWG.Wait() in Stop() form a coherent lifecycle: job goroutine is tracked and properly drained on shutdown, without sharing shutdownWg with other engine goroutines. I don’t see deadlock or ordering issues given current usage.

Also applies to: 278-337, 524-527, 950-985


809-822: Sync response persistence refactor is thread‑safe

The move to a dedicated syncRespMux for persistSyncResponse and latestSyncResponse (with R/W separation and cloning in GetLatestSyncResponse) removes contention with syncMsgMux and avoids data races on the stored message. Lock ordering is one‑way (syncMsgMuxsyncRespMux) and I don’t see any reverse order, so deadlock risk is low.

Also applies to: 1889-1925

client/server/server.go (1)

1539-1547: Passing s.logFile into ConnectClient.Run matches new signature

Updating connect() to call s.connectClient.Run(runningChan, s.logFile) correctly propagates the daemon’s log file path into the client, aligning with the new Run signature and the debug bundle/log‑path plumbing in ConnectClient and EngineConfig.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b60fc8 and 9ce25b9.

📒 Files selected for processing (13)
  • client/cmd/testutil_test.go (4 hunks)
  • client/internal/engine_test.go (10 hunks)
  • client/server/server_test.go (4 hunks)
  • management/internals/modules/peers/manager_mock.go (1 hunks)
  • management/server/account_test.go (2 hunks)
  • management/server/dns_test.go (2 hunks)
  • management/server/http/testing/testing_tools/channel/channel.go (2 hunks)
  • management/server/management_proto_test.go (3 hunks)
  • management/server/management_test.go (4 hunks)
  • management/server/nameserver_test.go (2 hunks)
  • management/server/peer_test.go (5 hunks)
  • management/server/route_test.go (2 hunks)
  • shared/management/client/client_test.go (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • management/server/dns_test.go
  • management/server/management_proto_test.go
  • management/server/peer_test.go
  • management/server/http/testing/testing_tools/channel/channel.go
  • management/server/account_test.go
  • client/cmd/testutil_test.go
  • management/server/route_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-28T12:20:47.254Z
Learnt from: bcmmbaga
Repo: netbirdio/netbird PR: 4849
File: management/internals/modules/zones/manager/manager.go:55-86
Timestamp: 2025-11-28T12:20:47.254Z
Learning: In the NetBird management server, DNS zones without records are automatically filtered out in network map generation (filterPeerAppliedZones in management/internals/controllers/network_map/controller/controller.go checks `len(zone.Records) == 0`). Therefore, CreateZone operations don't need to call UpdateAccountPeers since empty zones don't affect the network map.

Applied to files:

  • shared/management/client/client_test.go
🧬 Code graph analysis (2)
management/server/management_test.go (3)
management/server/networks/routers/manager.go (1)
  • NewManager (40-46)
management/server/settings/manager.go (1)
  • NewManager (37-44)
management/server/job/manager.go (1)
  • NewJobManager (34-45)
management/server/nameserver_test.go (3)
management/internals/modules/peers/manager.go (1)
  • NewManager (44-49)
management/server/account.go (1)
  • BuildManager (178-266)
management/server/job/manager.go (1)
  • NewJobManager (34-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: Build Cache
  • GitHub Check: Linux
  • GitHub Check: JS / Lint
  • GitHub Check: Client / Unit
  • GitHub Check: Client / Unit
  • GitHub Check: Client / Unit
  • GitHub Check: release_ui
  • GitHub Check: release_ui_darwin
  • GitHub Check: release
  • GitHub Check: Darwin
  • GitHub Check: iOS / Build
  • GitHub Check: Windows
  • GitHub Check: Android / Build
  • GitHub Check: Check External GPL/AGPL Licenses
🔇 Additional comments (10)
management/internals/modules/peers/manager_mock.go (1)

100-113: LGTM! Auto-generated mock looks correct.

The mock method and recorder for GetPeerID follow standard gomock patterns and are consistent with the existing mock methods in this file. The signatures and implementations are correct.

client/server/server_test.go (2)

310-310: LGTM! JobManager correctly instantiated for tests.

The JobManager is created with nil metrics, which is appropriate for test setup. The store and peersManager dependencies are properly provided.


323-323: No action needed—BuildManager parameter positions are correct.

The jobManager is correctly positioned as the 5th parameter in the BuildManager call at line 323, matching the function signature where it follows networkMapController. All parameter alignments are accurate.

shared/management/client/client_test.go (2)

96-96: LGTM! JobManager wiring integrated correctly.

The JobManager is properly instantiated and will be threaded through the test setup.


122-122: Ephemeral manager import alias improves readability.

Using the ephemeral_manager alias avoids conflicts and makes the code clearer.

management/server/nameserver_test.go (1)

794-801: LGTM! Clean inline JobManager creation.

Creating the JobManager inline within the BuildManager call is concise and appropriate for test code. The peersManager dependency is correctly instantiated on the preceding line.

client/internal/engine_test.go (3)

1602-1635: LGTM! Management server test wiring is consistent.

The JobManager is correctly created and passed through to BuildManager and nbgrpc.NewServer, following the same pattern established in other test files.


1536-1536: No changes needed—NewEngine signature is correct.

NewEngine returns only *Engine (not (*Engine, error)), making the assignment e, err := NewEngine(...), nil valid Go syntax. This is not unconventional; Go functions can return single values, and assigning nil to err here is a straightforward tuple assignment with no issues.


256-256: The two trailing nil parameters passed to NewEngine correspond to the checks and c parameters in the function signature (line 235 of client/internal/engine.go), which are []*mgmProto.Checks and *profilemanager.Config respectively. These parameters are already documented in the function signature itself through their names and type annotations, so no additional documentation is needed. The speculation about "jobManager and logPath" does not match the actual implementation.

Likely an incorrect or invalid review comment.

management/server/management_test.go (1)

206-244: LGTM! JobManager integration is clean and consistent.

The test correctly:

  • Extracts peersManager into a variable (line 206)
  • Creates jobManager with nil metrics (line 207)
  • Passes jobManager to BuildManager (line 219)
  • Passes jobManager to nbgrpc.NewServer (line 244)

This follows the established pattern across all test files.

Comment thread client/server/server_test.go Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
shared/management/client/grpc.go (1)

305-427: LGTM: Consistent refactoring of internal methods.

The changes thread serverPubKey as a parameter instead of fetching it within each method. This eliminates redundant GetServerPublicKey calls and aligns with the centralized key retrieval in withMgmtStream.

🧹 Nitpick comments (1)
shared/management/client/grpc.go (1)

218-232: Consider documenting the handshake protocol.

The handshake sends a JobRequest with only an ID field populated (UUID string as bytes). While functional, this implicit protocol could benefit from documentation or a dedicated handshake message type if available in the proto definitions.

Optional: If proto supports it, consider:

-handshakeReq := &proto.JobRequest{
-  ID: []byte(uuid.New().String()),
-}
+// Use dedicated handshake message if available
+handshakeReq := &proto.JobHandshake{
+  SessionID: uuid.New().String(),
+}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9ce25b9 and e01d797.

📒 Files selected for processing (1)
  • shared/management/client/grpc.go (8 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
shared/management/client/grpc.go (2)
shared/management/proto/management.pb.go (6)
  • EncryptedMessage (322-333)
  • EncryptedMessage (348-348)
  • EncryptedMessage (363-365)
  • SyncResponse (721-738)
  • SyncResponse (753-753)
  • SyncResponse (768-770)
encryption/message.go (2)
  • EncryptMessage (10-24)
  • DecryptMessage (27-40)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: Darwin
  • GitHub Check: Android / Build
  • GitHub Check: iOS / Build
  • GitHub Check: Client / Unit
  • GitHub Check: Windows
  • GitHub Check: Linux
  • GitHub Check: JS / Lint
  • GitHub Check: Build Cache
  • GitHub Check: release_ui
  • GitHub Check: Client / Unit
  • GitHub Check: release_ui_darwin
  • GitHub Check: release
  • GitHub Check: Check External GPL/AGPL Licenses
  • GitHub Check: Client / Unit
🔇 Additional comments (6)
shared/management/client/grpc.go (6)

15-15: LGTM: Well-established library for UUID generation.

The addition of github.com/google/uuid is appropriate for generating unique handshake identifiers in the Job streaming flow.


234-260: LGTM: Proper message verification and decryption.

The peer ID verification at lines 246-251 is a good security practice, ensuring messages are intended for this specific peer before processing.


263-278: LGTM: Defensive nil-response handling.

The fallback at lines 269-276 ensures that even if the handler returns nil, a valid failure response is constructed. This prevents the send operation from failing unexpectedly.


280-303: LGTM: Comprehensive logging and error handling.

The job ID is included in all log messages, providing excellent traceability for debugging. Error handling properly propagates failures for retry logic.


120-126: Job RPC implementation is complete and follows established patterns.

The new Job method provides a clean public API for bidirectional job streaming, consistent with the existing Sync method. The server-side implementation in management/internals/shared/grpc/server.go (line 175) properly handles the handshake protocol and delegates to sendJobsLoop for stream management. The proto definition in shared/management/proto/management.proto (line 53) documents the RPC as a bidirectional stream with JobRequest and JobResponse message types.


205-208: This check is correct and requires no changes.

The code properly skips processing when jobReq is nil or has an empty ID. Since the ID is essential for correlating the response back at the server (used in sendJobResponse and processJobRequest), a request without an ID cannot be processed or responded to. Skipping silently is the correct behavior—sending a response with an empty ID would be orphaned at the server and unrecoverable.

Comment thread shared/management/client/grpc.go
@pappz
Copy link
Copy Markdown
Collaborator Author

pappz commented Dec 30, 2025

@coderabbitai close all your reviews and start agian

Co-authored-by: Pascal Fischer <32096965+pascal-fischer@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (5)
management/server/peer.go (2)

34-36: Remote job creation: version gating OK, but Send‑before‑persist and jobManager assumptions are risky

The remote job entrypoint is nicely scoped (new RemoteJobs module + Create op, explicit min version check via posture.MeetsMinVersion, and an account/peer ownership check), but a few non‑trivial risks remain:

  1. Send‑before‑persist can desync runtime and DB state
    The flow is:

    • Check pending jobs (IsPeerHasPendingJobs).
    • Build stream request.
    • SendJob(...).
    • Only if send succeeds, persist via CreatePeerJob in a transaction.

    If the DB write fails after SendJob succeeds (e.g., constraint, connection issue), the client can still be executing the job while /jobs and /jobs/{id} never show it. That makes observability and recovery hard. Persisting first (status = pending) and then marking it failed on send error would keep state consistent, even if that means occasionally recording “failed to send” jobs.

  2. Implicit non‑nil jobManager may panic in misconfigured environments
    am.jobManager is used unconditionally (IsPeerConnected, IsPeerHasPendingJobs, SendJob). If any construction path builds a DefaultAccountManager with a nil jobManager (older tests, custom deployments), the first use of this API will panic. Either:

    • Enforce non‑nil in constructor/tests, or
    • Add an early guard here returning an internal error like “job manager is not configured”.
  3. Pending‑job check is acknowledged as racy
    The TODO correctly notes the window between IsPeerHasPendingJobs and SendJob across concurrent callers. That’s acceptable for now given it’s documented, but if you later tighten semantics (“at most one job per peer”), you’ll likely want to move that invariant into the job manager or DB layer.

The version gate and permission checks themselves look good; the main suggestion is to re‑order and harden the send/persist and jobManager assumptions.

Also applies to: 325-398


427-451: GetPeerJobByID: missing peer/job association check lets callers see jobs for other peers

After verifying that peerID belongs to accountID, the method loads the job by (accountID, jobID) only:

job, err := am.Store.GetPeerJobByID(ctx, accountID, jobID)

but never asserts that job.PeerID == peerID. This allows a caller to hit /peers/{peerA}/jobs/{jobOfPeerB} (within the same account) and successfully retrieve a job that actually belongs to another peer.

You’ll likely want a post‑load check here:

job, err := am.Store.GetPeerJobByID(ctx, accountID, jobID)
if err != nil {
    return nil, err
}
if job.PeerID != peerID {
    return nil, status.NewPeerNotPartOfAccountError()
}

so that the peer path parameter and the underlying job record stay consistent.

client/internal/engine.go (1)

137-143: Guard against nil ProfileConfig / ManagementURL before bundle job

handleBundle dereferences e.config.ProfileConfig and e.config.ProfileConfig.ManagementURL without checks. If either is nil (e.g. tests, misconfigured profiles, or older configs), this will panic when handling a job.

This was flagged earlier and is still present; please add a defensive check and fail the job gracefully instead of panicking.

Proposed defensive check in handleBundle
 func (e *Engine) handleBundle(params *mgmProto.BundleParameters) (*mgmProto.JobResponse_Bundle, error) {
 	log.Infof("handle remote debug bundle request: %s", params.String())
 	syncResponse, err := e.GetLatestSyncResponse()
@@
-	bundleDeps := debug.GeneratorDependencies{
-		InternalConfig: e.config.ProfileConfig,
-		StatusRecorder: e.statusRecorder,
-		SyncResponse:   syncResponse,
-		LogPath:        e.config.LogPath,
-	}
+	if e.config == nil || e.config.ProfileConfig == nil || e.config.ProfileConfig.ManagementURL == nil {
+		return nil, fmt.Errorf("profile config or management URL not available for bundle generation")
+	}
+
+	bundleDeps := debug.GeneratorDependencies{
+		InternalConfig: e.config.ProfileConfig,
+		StatusRecorder: e.statusRecorder,
+		SyncResponse:   syncResponse,
+		LogPath:        e.config.LogPath,
+	}
@@
-	waitFor := time.Duration(params.BundleForTime) * time.Minute
-
-	uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String())
+	waitFor := time.Duration(params.BundleForTime) * time.Minute
+
+	uploadKey, err := e.jobExecutor.BundleJob(
+		e.ctx,
+		bundleDeps,
+		bundleJobParams,
+		waitFor,
+		e.config.ProfileConfig.ManagementURL.String(),
+	)

Also applies to: 1036-1058

management/internals/shared/grpc/server.go (1)

171-203: Propagate job send failures instead of returning OK gRPC status

The overall Job stream wiring (encrypted handshake, per‑peer channel, background response receiver) looks solid. However, sendJobsLoop still swallows sendJob failures:

if err := s.sendJob(ctx, peerKey, event, srv); err != nil {
    log.WithContext(ctx).Warnf("send job failed: %v", err)
    return nil
}

This causes the Job RPC to end with an OK status even when we fail to deliver a job to the client, which was already pointed out in prior review feedback. The same applies if you later add more failure modes in sendJob.

Suggested change to surface send errors to the client
 func (s *Server) sendJobsLoop(ctx context.Context, accountID string, peerKey wgtypes.Key, peer *nbpeer.Peer, updates *job.Channel, srv proto.ManagementService_JobServer) error {
 	// todo figure out better error handling strategy
 	defer s.jobManager.CloseChannel(ctx, accountID, peer.ID)
@@
-		if err := s.sendJob(ctx, peerKey, event, srv); err != nil {
-			log.WithContext(ctx).Warnf("send job failed: %v", err)
-			return nil
-		}
+		if err := s.sendJob(ctx, peerKey, event, srv); err != nil {
+			log.WithContext(ctx).Warnf("send job failed: %v", err)
+			// Propagate a gRPC status error so callers and observability can see failure
+			return err
+		}
 	}
 }

If you prefer to distinguish transport vs. logical errors, you can map specific failures to gRPC codes (e.g. codes.Unavailable for transport issues, codes.Internal for encryption problems) inside sendJob and keep sendJobsLoop returning them as‑is.

Also applies to: 323-361, 363-385, 445-468

management/server/store/sql_store.go (1)

194-224: GetPeerJobByID still doesn’t verify peer ownership

GetPeerJobByID currently filters only by account_id and id, so any caller with account‑level access can retrieve another peer’s job just by knowing the job ID. Earlier feedback suggested adding a peerID constraint (or clearly marking this as an admin‑only accessor); that hasn’t been applied yet.

Example: tighten GetPeerJobByID to a specific peer
-// GetJobByID fetches job by ID
-func (s *SqlStore) GetPeerJobByID(ctx context.Context, accountID, jobID string) (*types.Job, error) {
+// GetPeerJobByID fetches a job by ID scoped to a specific peer within an account.
+func (s *SqlStore) GetPeerJobByID(ctx context.Context, accountID, peerID, jobID string) (*types.Job, error) {
 	var job types.Job
-	err := s.db.
-		Where(accountAndIDQueryCondition, accountID, jobID).
+	err := s.db.
+		Where(accountAndPeerIDQueryCondition+" AND id = ?", accountID, peerID, jobID).
 		First(&job).Error

This would require updating the store interface and call sites, but makes the API unambiguous and avoids accidental cross‑peer access.

🧹 Nitpick comments (17)
client/server/server_test.go (2)

310-323: Consider passing metrics to JobManager for consistency.

The jobManager is created with nil for the metrics parameter (line 310), but a valid metrics instance is created on line 314 and passed to BuildManager (line 323). For consistency and better test coverage, consider reordering the code to create metrics first, then pass it to NewJobManager.

🔎 Proposed refactor
 	permissionsManagerMock := permissions.NewMockManager(ctrl)
 	peersManager := peers.NewManager(store, permissionsManagerMock)
 	settingsManagerMock := settings.NewMockManager(ctrl)
-
-	jobManager := job.NewJobManager(nil, store, peersManager)

 	ia, _ := integrations.NewIntegratedValidator(context.Background(), peersManager, settingsManagerMock, eventStore)

 	metrics, err := telemetry.NewDefaultAppMetrics(context.Background())
 	require.NoError(t, err)
+
+	jobManager := job.NewJobManager(metrics, store, peersManager)

 	settingsMockManager := settings.NewMockManager(ctrl)
 	groupsManager := groups.NewManagerMock()

332-332: JobManager integration looks correct; consider using available metrics instance.

The jobManager parameter is correctly positioned and passed to nbgrpc.NewServer. However, the metrics instance created on line 314 is available but not used in this call (currently passing nil for appMetrics). While tests can function with nil dependencies, passing the actual metrics instance would provide more thorough test coverage.

🔎 Optional improvement

If you address the metrics usage suggestion from line 310, you can also apply it here:

-	mgmtServer, err := nbgrpc.NewServer(config, accountManager, settingsMockManager, jobManager, secretsManager, nil, nil, &server.MockIntegratedValidator{}, networkMapController)
+	mgmtServer, err := nbgrpc.NewServer(config, accountManager, settingsMockManager, jobManager, secretsManager, metrics, nil, &server.MockIntegratedValidator{}, networkMapController)

Note: A past review comment also raised concerns about the nil parameters for appMetrics and authManager.

management/server/activity/codes.go (1)

186-187: Add JobCreatedByUser activity; consider minor naming/API enum follow-up

The new JobCreatedByUser constant and its activityMap entry are correctly wired and preserve existing enum values.

As optional polish:

  • To match surrounding messages like "Peer added"/"Route created", consider changing the message to something like "Job created for peer".
  • If these activities are surfaced via the HTTP API, consider extending EventActivityCode in shared/management/http/api/types.gen.go with "peer.job.create" so API enums stay in sync.

Also applies to: 301-302

client/cmd/up.go (1)

200-205: Propagate logPath into ConnectClient.Run in foreground mode

Passing util.FindFirstLogPath(logFiles) into connectClient.Run(nil, …) correctly wires the effective log file path into the client/engine for debug bundle support and matches the new Run signature.

Optionally, if FindFirstLogPath(logFiles) is used multiple times along this path, consider computing it once into a local logPath variable for clarity.

shared/management/proto/management.proto (1)

52-54: Job streaming RPC and job messages look structurally correct

The new Job bidi-stream RPC and the associated JobRequest/JobResponse/BundleParameters/BundleResult definitions are well-formed and consistent with the existing encryption pattern (wrapping concrete messages in EncryptedMessage).

As a small documentation improvement, consider extending the Job comment to note that the request/response bodies carry JobRequest and JobResponse messages, similar to how Login/Sync document their payloads.

Also applies to: 66-101

shared/management/http/api/types.gen.go (1)

111-117: HTTP Job and bundle workload models match the intended API

The additions for jobs and bundle workloads (JobRequest, JobResponse, JobResponseStatus*, BundleParameters, BundleResult, BundleWorkloadRequest, BundleWorkloadResponse, and WorkloadTypeBundle) give a clear and strongly-typed shape to the new job API and align with the underlying proto BundleParameters/BundleResult.

Exposing PostApiPeersPeerIdJobsJSONRequestBody as a JobRequest keeps the per-peer jobs endpoint straightforward for clients.

Minor generated-code nit: where you assign "bundle" as a discriminator string, using WorkloadTypeBundle instead would add a bit of compile-time safety; if you want that, it’s better addressed in the OpenAPI/oapi-codegen setup rather than by hand-editing this file.

Also applies to: 196-200, 366-405, 713-731

management/server/account/manager.go (1)

127-129: Parameter order inconsistency across Job methods.

The parameter ordering differs between these methods:

  • CreatePeerJob: accountID, peerID, userID
  • GetAllPeerJobs: accountID, userID, peerID
  • GetPeerJobByID: accountID, userID, peerID, jobID

Consider aligning the order for consistency (e.g., always accountID, userID, peerID or accountID, peerID, userID). This will reduce cognitive load and prevent parameter swapping mistakes by callers.

🔎 Suggested consistent ordering
-	CreatePeerJob(ctx context.Context, accountID, peerID, userID string, job *types.Job) error
-	GetAllPeerJobs(ctx context.Context, accountID, userID, peerID string) ([]*types.Job, error)
-	GetPeerJobByID(ctx context.Context, accountID, userID, peerID, jobID string) (*types.Job, error)
+	CreatePeerJob(ctx context.Context, accountID, userID, peerID string, job *types.Job) error
+	GetAllPeerJobs(ctx context.Context, accountID, userID, peerID string) ([]*types.Job, error)
+	GetPeerJobByID(ctx context.Context, accountID, userID, peerID, jobID string) (*types.Job, error)
shared/management/client/grpc.go (2)

235-304: Unused ctx parameters in helper functions.

The ctx parameter is passed to receiveJobRequest, processJobRequest, and sendJobResponse but never used. If these are placeholders for future context-aware operations (timeouts, cancellation), consider adding a //nolint:revive comment or a TODO. Otherwise, remove them to avoid misleading readers.


196-199: Graceful degradation on Unimplemented may hide feature unavailability.

Returning nil when the server returns codes.Unimplemented makes the Job() method appear successful, but no jobs will ever be processed. Consider returning a sentinel error or exposing this state to callers so they can take appropriate action (e.g., disable job-related UI, log at startup).

🔎 Alternative: Return a typed error
+var ErrJobsNotSupported = errors.New("job feature not supported by management server")
+
 case codes.Unimplemented:
 	log.Warn("Job feature is not supported by the current management server version. " +
 		"Please update the management service to use this feature.")
-	return nil
+	return backoff.Permanent(ErrJobsNotSupported)

Callers can then check for this specific error and handle accordingly.

shared/management/http/api/openapi.yml (3)

35-37: Jobs tag looks consistent with existing tagging; mark experimental in docs too

The new Jobs tag with x-experimental: true fits the pattern of other tags (e.g., lazy connection, traffic events). Just ensure UI/docs also surface this as experimental so consumers don’t treat it as fully stable yet.


2410-2477: List/Create peer jobs: path and shapes align; HTTP status code may diverge from implementation

The list/create /api/peers/{peerId}/jobs operations line up with the Go handlers in peers_handler.go (ListJobs and CreateJob) and the JobRequest/JobResponse shapes, so wiring looks correct.

One mismatch to be aware of: the OpenAPI declares 201 for the POST, but util.WriteJSONObject typically responds with 200 OK. Either adjust the handler to write 201 on success or update the spec to 200 to avoid contract drift.


2479-2513: Get single peer job: contract looks fine; consider explicit 404 in the spec

The /api/peers/{peerId}/jobs/{jobId} endpoint matches the handler (GetJob) and JobResponse shape. Given the code can return “not found” when the job/peer combo doesn’t exist, you may want to add a 404 response here for client clarity, similar to other APIs that expose explicit not‑found states.

management/server/http/handlers/peers/peers_handler.go (2)

51-85: Job handlers: good reuse of infra; consider refining error codes and partial failure behavior

The Create/List/Get job handlers follow established patterns (auth via nbcontext, path params via mux.Vars, error writing via util), and delegate all authorization/account scoping to accountManager, which keeps HTTP thin. That’s solid.

A couple of small refinements you might consider:

  • For CreateJob, if you want to honor the OpenAPI 201 contract, set the status code explicitly before writing the body (assuming util.WriteJSONObject doesn’t override it).
  • In ListJobs, if a single malformed job causes BuildWorkloadResponse to fail, the handler returns an error for the entire list. If corruption is possible, you might prefer to skip/best‑effort those entries or surface which job failed, instead of bailing mid‑loop.

Otherwise, the handlers look straightforward and consistent with the rest of this package.

Also applies to: 87-115, 117-142


620-640: toSingleJobResponse: mapping is correct; add light nil‑safety around workload

The translation from types.Job to api.JobResponse looks accurate: timestamps, TriggeredBy, status cast, optional FailedReason, and workload wiring are all aligned.

One defensive tweak to consider: BuildWorkloadResponse returns a pointer, and you immediately dereference it. If that method ever returns (nil, nil) (e.g., for an unsupported workload type), this will panic. A short guard like:

workload, err := job.BuildWorkloadResponse()
if err != nil {
    return nil, err
}
if workload == nil {
    return nil, fmt.Errorf("nil workload for job %s", job.ID)
}

would make the failure mode explicit and easier to diagnose.

management/server/store/store.go (1)

208-214: Job persistence API surface looks reasonable; consider clarifying locking and invariants

The new job methods on Store match how the account manager uses them (e.g., CreatePeerJob inside a transaction, GetPeerJobByID scoped by accountID, GetPeerJobs by (accountID, peerID), and failure‑marking helpers).

Two small design considerations for later:

  • Other store methods often carry an explicit LockingStrength; job methods don’t. If you ever need stronger consistency guarantees (e.g., no concurrent updates to a pending job), adding a lock strength or documenting expected usage (always within a transaction) would make the contract clearer.
  • With both MarkPendingJobsAsFailed and MarkAllPendingJobsAsFailed, it’s worth documenting whether they only touch Status=pending records and how they behave if the job is already succeeded/failed, so higher‑level code can rely on idempotency.

No blockers here; interface looks aligned with the job lifecycle logic in peer.go and the job manager.

client/internal/engine.go (1)

224-226: Job executor lifecycle wiring looks good; consider bounded wait on shutdown

The addition of jobExecutor, jobExecutorWG, receiveJobEvents, and the Wait() call in Stop() correctly tracks the job-stream goroutine and ties it to engine lifecycle. However, unlike shutdownWg, jobExecutorWG.Wait() has no timeout, so a stuck mgmClient.Job implementation would block engine shutdown indefinitely.

Not urgent, but consider using the same waitWithContext + timeout pattern used for shutdownWg (or folding this into shutdownWg) to keep shutdown bounded.

Also applies to: 277-341, 997-1034

management/server/store/sql_store.go (1)

135-193: Job persistence helpers are fine; be explicit about “no-op” semantics on missing rows

CreatePeerJob, CompletePeerJob, MarkPendingJobsAsFailed, and MarkAllPendingJobsAsFailed follow existing store patterns and look correct for basic CRUD.

Note, though, that the Mark*PendingJobsAsFailed methods treat “no matching rows” as success (no error, no logging), which is ok if they’re intended as best‑effort cleanups. If callers need to distinguish “no such pending job(s)” from “updated”, you may want to check RowsAffected and optionally log/return a status in that case.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e01d797 and 3757968.

⛔ Files ignored due to path filters (3)
  • client/proto/daemon.pb.go is excluded by !**/*.pb.go
  • go.sum is excluded by !**/*.sum
  • shared/management/proto/management.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (26)
  • client/cmd/up.go
  • client/embed/embed.go
  • client/internal/connect.go
  • client/internal/debug/debug.go
  • client/internal/engine.go
  • client/internal/engine_test.go
  • client/proto/daemon.proto
  • client/server/server.go
  • client/server/server_test.go
  • go.mod
  • management/internals/modules/peers/manager.go
  • management/internals/server/boot.go
  • management/internals/shared/grpc/server.go
  • management/server/account.go
  • management/server/account/manager.go
  • management/server/account_test.go
  • management/server/activity/codes.go
  • management/server/http/handlers/peers/peers_handler.go
  • management/server/mock_server/account_mock.go
  • management/server/peer.go
  • management/server/store/sql_store.go
  • management/server/store/store.go
  • shared/management/client/grpc.go
  • shared/management/http/api/openapi.yml
  • shared/management/http/api/types.gen.go
  • shared/management/proto/management.proto
💤 Files with no reviewable changes (1)
  • client/proto/daemon.proto
🚧 Files skipped from review as they are similar to previous changes (4)
  • management/internals/modules/peers/manager.go
  • management/server/mock_server/account_mock.go
  • management/server/account.go
  • client/embed/embed.go
🧰 Additional context used
🧬 Code graph analysis (15)
management/internals/server/boot.go (1)
management/internals/shared/grpc/server.go (1)
  • NewServer (80-132)
client/cmd/up.go (1)
util/log.go (1)
  • FindFirstLogPath (77-84)
management/server/account/manager.go (1)
management/server/types/job.go (1)
  • Job (34-58)
client/server/server_test.go (3)
management/server/job/manager.go (1)
  • NewJobManager (34-45)
management/server/account.go (1)
  • BuildManager (178-266)
management/internals/shared/grpc/server.go (1)
  • NewServer (80-132)
shared/management/client/grpc.go (4)
shared/management/proto/management.pb.go (6)
  • EncryptedMessage (322-333)
  • EncryptedMessage (348-348)
  • EncryptedMessage (363-365)
  • SyncResponse (721-738)
  • SyncResponse (753-753)
  • SyncResponse (768-770)
shared/management/status/error.go (3)
  • Errorf (70-75)
  • FromError (78-87)
  • PermissionDenied (18-18)
shared/management/proto/management_grpc.pb.go (2)
  • ManagementService_JobClient (169-173)
  • ManagementService_SyncClient (89-92)
encryption/message.go (2)
  • EncryptMessage (10-24)
  • DecryptMessage (27-40)
management/server/store/store.go (1)
management/server/types/job.go (1)
  • Job (34-58)
client/internal/connect.go (1)
client/internal/engine.go (1)
  • EngineConfig (84-143)
client/internal/debug/debug.go (4)
util/log.go (1)
  • SpecialLogs (25-28)
client/internal/profilemanager/profilemanager.go (1)
  • NewProfileManager (56-58)
client/status/status.go (3)
  • ToProtoFullStatus (549-635)
  • ConvertToStatusOutputOverview (122-169)
  • ParseToFullDetailSummary (532-547)
version/version.go (1)
  • NetbirdVersion (18-20)
client/server/server.go (1)
client/status/status.go (1)
  • ToProtoFullStatus (549-635)
management/server/store/sql_store.go (2)
management/server/types/job.go (3)
  • Job (34-58)
  • JobStatusPending (18-18)
  • JobStatusFailed (20-20)
management/server/store/store.go (2)
  • LockingStrength (40-40)
  • LockingStrengthNone (47-47)
management/server/peer.go (8)
management/server/types/job.go (2)
  • Job (34-58)
  • Workload (60-64)
management/server/permissions/modules/module.go (1)
  • RemoteJobs (8-8)
management/server/permissions/operations/operation.go (2)
  • Create (6-6)
  • Read (7-7)
shared/management/status/error.go (4)
  • NewPermissionValidationError (213-215)
  • NewPermissionDeniedError (209-211)
  • NewPeerNotPartOfAccountError (105-107)
  • Type (46-46)
management/server/store/store.go (1)
  • Store (50-215)
management/server/posture/nb_version.go (1)
  • MeetsMinVersion (57-72)
route/route.go (1)
  • ID (48-48)
management/server/activity/codes.go (1)
  • JobCreatedByUser (186-186)
shared/management/http/api/types.gen.go (1)
shared/management/proto/management.pb.go (6)
  • BundleParameters (554-563)
  • BundleParameters (578-578)
  • BundleParameters (593-595)
  • BundleResult (625-631)
  • BundleResult (646-646)
  • BundleResult (661-663)
client/internal/engine_test.go (3)
management/server/job/manager.go (1)
  • NewJobManager (34-45)
management/server/account.go (1)
  • BuildManager (178-266)
management/internals/shared/grpc/server.go (1)
  • NewServer (80-132)
management/server/account_test.go (1)
management/server/job/manager.go (1)
  • NewJobManager (34-45)
client/internal/engine.go (3)
client/internal/profilemanager/config.go (1)
  • Config (90-161)
client/jobexec/executor.go (3)
  • Executor (24-25)
  • NewExecutor (27-29)
  • ErrJobNotImplemented (21-21)
client/internal/debug/debug.go (1)
  • GeneratorDependencies (243-248)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: Windows
  • GitHub Check: Client / Unit
  • GitHub Check: iOS / Build
  • GitHub Check: Android / Build
  • GitHub Check: Darwin
  • GitHub Check: Linux
  • GitHub Check: Build Cache
  • GitHub Check: release_ui_darwin
  • GitHub Check: Client / Unit
  • GitHub Check: release
  • GitHub Check: release_ui
  • GitHub Check: FreeBSD Port / Build & Test
  • GitHub Check: JS / Lint
  • GitHub Check: Client / Unit
  • GitHub Check: Check External GPL/AGPL Licenses
🔇 Additional comments (20)
client/server/server_test.go (1)

23-23: LGTM: Import correctly added for JobManager.

The import is necessary for the new JobManager integration and follows the correct package path.

go.mod (2)

69-69: Appropriate dependency for API code generation support.

The oapi-codegen/runtime package provides runtime-specific code that oapi-codegen generates, aligning with the PR's addition of API endpoints for debug bundle operations. The package is widely imported (1,291+ projects) and carries an Apache-2.0 license.


136-136: This dependency is legitimate and necessary. The apapsch/go-jsonmerge/v2 package is a real, published library pulled in as an indirect dependency by oapi-codegen/runtime (a direct dependency in this project). It is a stable package (v2.0.0 released Nov 2021) with no known security vulnerabilities. While the upstream author is not actively developing it, the library remains production-ready and is packaged by distributions like Fedora through 2025.

client/server/server.go (1)

31-33: Use shared status conversion and wire logPath into ConnectClient.Run

Switching to nbstatus.ToProtoFullStatus(fullStatus) centralizes the protobuf conversion in the status package instead of a local helper, which reduces duplication and keeps the serialization logic in one place.

Passing s.logFile into s.connectClient.Run(runningChan, s.logFile) correctly forwards the daemon’s log path into the connection/engine stack so debug bundle generation can find the right log files.

Also applies to: 1088-1097, 1545-1553

client/internal/debug/debug.go (2)

221-269: logPath-based log collection and fallback behavior are coherent

Using logPath (and skipping entries present in util.SpecialLogs) ensures the bundle generator only tries to read actual log files, falling back to trySystemdLogFallback() when running with console/syslog logging or no path. Deriving logDir from g.logPath and reusing it for rotated logs plus stderr/stdout files preserves the previous layout while keeping the Darwin-specific overrides.

This wiring looks consistent with the new EngineConfig.LogPath/ConnectClient.Run APIs and should behave correctly across platforms.

Also applies to: 347-357, 719-749


395-418: addStatus now reuses status helpers and properly seeds anonymizer

The updated addStatus implementation:

  • Uses nbstatus.ToProtoFullStatus, ConvertToStatusOutputOverview, and ParseToFullDetailSummary to generate status.txt, keeping the bundle’s status output aligned with the CLI status command.
  • Looks up the active profile via profilemanager.NewProfileManager().GetActiveProfile() and passes its name into the overview, which is helpful in multi-profile scenarios and degrades gracefully if the lookup fails.
  • Calls seedFromStatus after generating the status file, ensuring the anonymizer is seeded from the full status before other anonymized artifacts (config, sync response, state, etc.) are added.

No changes needed here.

client/internal/connect.go (1)

72-75: Plumb logPath through ConnectClient and EngineConfig

Extending ConnectClient.Run/run to accept logPath and threading it into createEngineConfig (which in turn sets EngineConfig.LogPath and EngineConfig.ProfileConfig) cleanly exposes the log location and profile config to the engine and debug layers.

The desktop path now passes a real log path, while the Android/iOS entrypoints correctly pass an empty string, preserving existing behavior on mobile where file-based logs are not used.

Also applies to: 95-96, 113-114, 116-376, 461-500

shared/management/http/api/types.gen.go (1)

1992-2005: WorkloadRequest/WorkloadResponse union helpers are consistent

The WorkloadRequest and WorkloadResponse wrappers around json.RawMessage with discriminator-based helpers (AsBundle…, FromBundle…, MergeBundle…, Discriminator, ValueByDiscriminator) follow the usual oapi-codegen union pattern:

  • From… and Merge… force Type to "bundle" before marshalling, so clients can’t accidentally send a mismatched discriminator.
  • ValueByDiscriminator switches on "type" and currently supports the "bundle" case with a clear error for unknown types.
  • MarshalJSON/UnmarshalJSON delegate to the underlying json.RawMessage, which is idiomatic for this style of union.

This should make it straightforward for API consumers to both construct and inspect workload payloads.

Also applies to: 2168-2284

management/internals/server/boot.go (1)

147-152: Wire JobManager into gRPC management server construction

Updating the nbgrpc.NewServer call to pass s.JobManager() in the expected position (after SettingsManager() and before SecretsManager()) matches the new server signature and ensures the gRPC layer has access to the job subsystem for the new Job RPC.

This keeps the bootstrapping pattern consistent with other managers.

management/server/account_test.go (1)

2997-3004: LGTM: JobManager wiring correctly integrated into test setup.

The test setup properly creates the peersManager and passes it along with the store to job.NewJobManager. Passing nil for the metrics parameter is acceptable in test contexts where telemetry isn't being verified.

client/internal/engine_test.go (1)

1603-1636: LGTM: JobManager properly wired into management server test setup.

The test correctly creates the JobManager and passes it through both BuildManager and NewServer, aligning with the new constructor signatures. The integration enables Job-related functionality testing in the engine tests.

shared/management/client/grpc.go (3)

163-217: Missing connection state notification in Job stream.

Unlike handleSyncStream which calls c.notifyConnected() after successfully opening the stream (line 320), handleJobStream only calls c.notifyDisconnected(err) on errors. If the Job stream should also report connection state for monitoring/status purposes, consider adding notifyConnected() after the successful handshake.

 	log.Debug("job stream handshake sent successfully")
+	c.notifyConnected()

 	// Main loop: receive, process, respond

114-161: Well-structured refactoring of streaming logic.

The extraction of withMgmtStream effectively centralizes connection readiness checks, server key retrieval, and retry backoff. Both Sync and Job now share this common infrastructure while maintaining their distinct streaming behaviors. This improves maintainability and reduces code duplication.


219-233: Handshake sends minimal JobRequest with only ID.

The handshake message contains only a generated UUID with no additional metadata. Verify this aligns with the server's expectations for the Job handshake protocol. If the server expects additional fields (e.g., client capabilities, version), they should be included here.

management/server/http/handlers/peers/peers_handler.go (1)

31-40: New job routes correctly wired to peers handler

The three new routes under /peers/{peerId}/jobs cleanly map to their handlers and follow the existing router style (same base path prefix, same OPTIONS coverage). No issues from a routing perspective.

management/server/peer.go (1)

400-425: GetAllPeerJobs: access control and scoping look correct

GetAllPeerJobs validates RemoteJobs/Read permission, ensures the peer belongs to the account via GetAccountIDByPeerID, and then fetches jobs per (accountID, peerID). That gives you clean account‑scoped job listings. No functional issues spotted here.

shared/management/http/api/openapi.yml (1)

41-156: Schema constraints are properly synchronized with server-side validation—no action needed

Verified that BundleParameters constraints match the server-side validation in management/server/types/job.go:

  • bundle_for_time: schema specifies 1–5, server enforces 1–5 ✓
  • log_file_count: schema specifies 1–1000, server enforces 1–1000 ✓

The validation is already implemented in the validateAndBuildBundleParams function, so there is no risk of 500 errors for values the schema claims are valid.

client/internal/engine.go (1)

206-210: Sync-response persistence locking & cloning look correct

The new syncRespMux and the changes in handleSync, SetSyncResponsePersistence, and GetLatestSyncResponse avoid races with latestSyncResponse/persistSyncResponse, and cloning via proto.Clone ensures callers can’t mutate internal state. This is a solid concurrency/encapsulation improvement.

Also applies to: 858-871, 1949-1985

management/internals/shared/grpc/server.go (1)

323-361: startResponseReceiver Recv loop is straightforward and non‑blocking

The startResponseReceiver goroutine cleanly handles EOF/ctx cancellation and logs invalid responses without tearing down the whole stream. Given the bidirectional nature of Job, this “one goroutine recv, one goroutine send” pattern is appropriate and doesn’t contend with the handshake (which consumes only the first message before spawning the goroutine).

management/server/store/sql_store.go (1)

42-51: Job schema wiring and query constants are consistent

Adding accountAndPeerIDQueryCondition and including &types.Job{} in AutoMigrate aligns the new Job model with existing patterns (account‑scoped queries, standard migrations). This should create the required table/indexes without special handling.

Also applies to: 108-117

Comment thread management/server/store/sql_store.go
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Dec 30, 2025

@pappz, I'll resolve all my existing review comments and start a fresh review of the PR.

✅ Actions performed

Comments resolved.


Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
client/jobexec/executor.go (2)

57-61: Consider adding retry logic for upload failures.

Transient network issues could cause upload failures. A simple retry with backoff would improve reliability.

🔎 Example retry pattern
+const maxUploadRetries = 3
+
 func (e *Executor) BundleJob(ctx context.Context, debugBundleDependencies debug.GeneratorDependencies, params debug.BundleConfig, waitForDuration time.Duration, mgmURL string) (string, error) {
     // ... existing code ...

-    key, err := debug.UploadDebugBundle(ctx, types.DefaultBundleURL, mgmURL, path)
-    if err != nil {
-        log.Errorf("failed to upload debug bundle: %v", err)
-        return "", fmt.Errorf("upload debug bundle: %w", err)
-    }
+    var key string
+    var err error
+    for attempt := 1; attempt <= maxUploadRetries; attempt++ {
+        key, err = debug.UploadDebugBundle(ctx, types.DefaultBundleURL, mgmURL, path)
+        if err == nil {
+            break
+        }
+        if ctx.Err() != nil {
+            return "", fmt.Errorf("upload debug bundle: %w", ctx.Err())
+        }
+        log.Warnf("upload attempt %d/%d failed: %v", attempt, maxUploadRetries, err)
+        if attempt < maxUploadRetries {
+            time.Sleep(time.Duration(attempt) * time.Second)
+        }
+    }
+    if err != nil {
+        return "", fmt.Errorf("upload debug bundle after %d attempts: %w", maxUploadRetries, err)
+    }

63-63: Clarify log message to reflect upload completion.

At this point, both generation and upload succeeded. The message could be more accurate.

🔎 Suggested fix
-    log.Infof("debug bundle has been generated successfully")
+    log.Infof("debug bundle has been generated and uploaded successfully")
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3757968 and 369b5c3.

📒 Files selected for processing (1)
  • client/jobexec/executor.go
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-13T00:29:53.247Z
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/cmd/ssh_exec_unix.go:53-74
Timestamp: 2025-11-13T00:29:53.247Z
Learning: In client/ssh/server/executor_unix.go, the method ExecuteWithPrivilegeDrop(ctx context.Context, config ExecutorConfig) has a void return type (no error return). It handles failures by exiting the process directly with appropriate exit codes rather than returning errors to the caller.

Applied to files:

  • client/jobexec/executor.go
📚 Learning: 2025-11-14T13:05:31.729Z
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/ssh/server/userswitching_windows.go:89-139
Timestamp: 2025-11-14T13:05:31.729Z
Learning: In client/ssh/server/executor_windows.go, the WindowsExecutorConfig struct's Pty, PtyWidth, and PtyHeight fields are intentionally left unused for now and will be implemented in a future update.

Applied to files:

  • client/jobexec/executor.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: iOS / Build
  • GitHub Check: Android / Build
  • GitHub Check: Client / Unit
  • GitHub Check: Client / Unit
  • GitHub Check: Build Cache
  • GitHub Check: release
  • GitHub Check: FreeBSD Port / Build & Test
  • GitHub Check: Linux
  • GitHub Check: Windows
  • GitHub Check: Darwin
  • GitHub Check: Client / Unit
  • GitHub Check: JS / Lint
🔇 Additional comments (4)
client/jobexec/executor.go (4)

1-14: LGTM!

Package declaration and imports are clean and appropriate for the executor functionality.


24-29: LGTM!

The stateless executor pattern is appropriate. The struct can be extended later if state (e.g., retry policies, shared dependencies) is needed.


67-76: LGTM!

The waitFor function correctly handles context cancellation and returns appropriate errors, addressing the previous review feedback.


20-22: No action required. ErrJobNotImplemented is actively used in client/internal/engine.go to indicate unsupported job types during job dispatch.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
shared/management/client/mock.go (1)

79-84: Fix infinite recursion in GetPKCEAuthorizationFlow

GetPKCEAuthorizationFlow currently calls itself instead of the function field, leading to infinite recursion (stack overflow) whenever the mock is used for PKCE:

func (m *MockClient) GetPKCEAuthorizationFlow(serverKey wgtypes.Key) (*proto.PKCEAuthorizationFlow, error) {
	if m.GetPKCEAuthorizationFlowFunc == nil {
		return nil, nil
	}
	return m.GetPKCEAuthorizationFlow(serverKey) // recursive call
}

This should delegate to GetPKCEAuthorizationFlowFunc, like the other mock methods do.

Proposed fix
 func (m *MockClient) GetPKCEAuthorizationFlow(serverKey wgtypes.Key) (*proto.PKCEAuthorizationFlow, error) {
 	if m.GetPKCEAuthorizationFlowFunc == nil {
 		return nil, nil
 	}
-	return m.GetPKCEAuthorizationFlow(serverKey)
+	return m.GetPKCEAuthorizationFlowFunc(serverKey)
 }
management/internals/shared/grpc/server.go (1)

55-77: Guard Server.Job against nil jobManager

Server now carries a jobManager *job.Manager, and Job unconditionally uses it:

updates := s.jobManager.CreateJobChannel(ctx, accountID, peer.ID)
...
s.startResponseReceiver(ctx, srv)
...
return s.sendJobsLoop(ctx, accountID, peerKey, peer, updates, srv)

If NewServer is ever called with a nil jobManager (e.g., misconfigured wiring or feature disabled), any Job RPC will panic. Either:

  • Enforce non-nil jobManager in NewServer (panic or return error when nil), or
  • Treat a nil jobManager as “feature disabled” and have Job return codes.Unimplemented up front.

This will make the behavior explicit and avoid runtime panics.

Also applies to: 79-132, 171-203

♻️ Duplicate comments (10)
management/server/types/job.go (1)

164-201: JSON shape mismatch between proto.BundleResult and api.BundleResult.

The issue flagged in the past review remains unresolved. When serializing r.Bundle (type *proto.BundleResult) directly with json.Marshal at line 194, the protobuf UploadKey field (which appears to be a protobuf wrapper type) will produce a different JSON structure than expected by api.BundleResult (which has UploadKey *string).

This will cause json.Unmarshal in buildBundleResponse (line 126) to fail when trying to deserialize succeeded jobs.

Consider mapping the proto result to the API result type before marshaling:

🔎 Recommended fix
 case *proto.JobResponse_Bundle:
-    if j.Workload.Result, err = json.Marshal(r.Bundle); err != nil {
-        return fmt.Errorf("failed to marshal workload results: %w", err)
-    }
+    apiResult := api.BundleResult{}
+    if r.Bundle != nil && r.Bundle.UploadKey != nil {
+        key := r.Bundle.UploadKey.GetValue()
+        apiResult.UploadKey = &key
+    }
+    if j.Workload.Result, err = json.Marshal(apiResult); err != nil {
+        return fmt.Errorf("failed to marshal workload results: %w", err)
+    }
management/server/store/sql_store.go (2)

195-224: GetPeerJobByID still ignores peer ownership (matches prior review concern)

GetPeerJobByID only filters on account_id and id, so any caller with an account-level context can read any peer’s job, regardless of which peer it belongs to. Earlier review feedback suggested verifying the requesting peer as well.

If this method is used from a peer-authenticated path (not just admin/management), consider either:

  • Extending the signature to GetPeerJobByID(ctx, accountID, peerID, jobID) and using accountAndPeerIDQueryCondition+" AND id = ?"; or
  • Restricting this method’s usage to privileged management flows and introducing a peer-scoped variant for peer access.

4197-4215: Handle “peer not found” explicitly in GetPeerIDByKey (prior feedback still applies)

GetPeerIDByKey uses .Scan(&peerID) and returns peerID without checking whether any row matched. On a missing peer, GORM leaves peerID empty and result.Error == nil, so callers see ("", nil) and can easily misinterpret that as “found a peer with empty ID”.

To match the patterns used by other getters here (GetPeerIdByLabel, GetPeerByPeerPubKey, etc.), consider:

  • Treating peerID == "" (or result.RowsAffected == 0) as “not found”, and
  • Returning a NotFound error (e.g. status.NewPeerNotFoundError(key) or similar).

This avoids silent misses and makes error handling clearer at the call site.

client/internal/engine.go (1)

1036-1069: Guard against nil ProfileConfig / ManagementURL in handleBundle (same concern as prior review)

handleBundle currently does:

uploadKey, err := e.jobExecutor.BundleJob(
    e.ctx, bundleDeps, bundleJobParams, waitFor,
    e.config.ProfileConfig.ManagementURL.String(),
)

If e.config.ProfileConfig or ProfileConfig.ManagementURL is nil (e.g., certain test setups or non‑profile-based runs), this will panic when a remote bundle job arrives.

Consider adding a defensive check and returning a failed job response instead of crashing the client, e.g.:

if e.config.ProfileConfig == nil || e.config.ProfileConfig.ManagementURL == nil {
    return nil, fmt.Errorf("profile config or management URL not available for bundle generation")
}
mgmtURL := e.config.ProfileConfig.ManagementURL.String()
uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, mgmtURL)
management/server/job/manager.go (2)

47-65: Race condition: Store operation before lock acquisition.

The call to MarkAllPendingJobsAsFailed at lines 50-52 runs before acquiring the lock at line 54. This creates a race with methods like SendJob or HandleResponse that may concurrently modify the pending map or peer state. The store operation and in-memory cleanup should be coordinated under the same lock.

🔎 Suggested fix
 func (jm *Manager) CreateJobChannel(ctx context.Context, accountID, peerID string) *Channel {
+	jm.mu.Lock()
+	defer jm.mu.Unlock()
+
 	// all pending jobs stored in db for this peer should be failed
 	if err := jm.Store.MarkAllPendingJobsAsFailed(ctx, accountID, peerID, "Pending job cleanup: marked as failed automatically due to being stuck too long"); err != nil {
 		log.WithContext(ctx).Error(err.Error())
 	}
 
-	jm.mu.Lock()
-	defer jm.mu.Unlock()
-
 	if ch, ok := jm.jobChannels[peerID]; ok {
 		ch.Close()
 		delete(jm.jobChannels, peerID)
 	}
 
 	ch := NewChannel()
 	jm.jobChannels[peerID] = ch
 	return ch
 }

67-91: Race condition: Channel reference may become stale after releasing read lock.

After releasing the read lock at line 71, another goroutine can call CloseChannel, which closes and deletes the channel from the map. The ch reference then points to a closed channel, and calling AddEvent at line 85 will panic with "send on closed channel".

This can occur when:

  1. SendJob acquires ch and releases the read lock
  2. The gRPC stream ends, triggering CloseChannel from another goroutine
  3. SendJob calls ch.AddEvent on the now-closed channel
🔎 Suggested fix: hold write lock across the operation
 func (jm *Manager) SendJob(ctx context.Context, accountID, peerID string, req *proto.JobRequest) error {
-	jm.mu.RLock()
+	jm.mu.Lock()
+	defer jm.mu.Unlock()
+
 	ch, ok := jm.jobChannels[peerID]
-	jm.mu.RUnlock()
 	if !ok {
 		return fmt.Errorf("peer %s has no channel", peerID)
 	}
 
 	event := &Event{
 		PeerID:  peerID,
 		Request: req,
 	}
 
-	jm.mu.Lock()
 	jm.pending[string(req.ID)] = event
-	jm.mu.Unlock()
 
 	if err := ch.AddEvent(ctx, jm.responseWait, event); err != nil {
 		jm.cleanup(ctx, accountID, string(req.ID), err.Error())
 		return err
 	}
 
 	return nil
 }

Note: If holding the lock during AddEvent (which has a 5-minute timeout) is unacceptable, consider a different synchronization strategy such as checking channel validity after re-acquiring the lock or using a channel-level closed flag.

management/server/job/channel.go (1)

18-21: Channel.AddEvent can panic on send after Close – guard against closed channel

Close() closes jc.events, but AddEvent can still execute jc.events <- event concurrently, which will panic if the channel has just been closed. sync.Once prevents double-close, not send-on-closed.

Consider adding internal closed state with locking, or otherwise guaranteeing that:

  • AddEvent returns ErrJobChannelClosed without attempting a send when the channel is closed, and
  • Close cannot close the channel while an AddEvent is in the middle of a blocking send.

For example, introduce a mutex + closed flag on Channel, hold a read-lock around the send, and set closed under a write-lock before closing the channel, so sends cannot proceed after closure.

You may also want to move the timeout responsibility out of AddEvent (as per the TODO) into the job lifecycle manager, so enqueueing isn’t rejected solely because the buffer is full while a peer is slow or unresponsive.

Also applies to: 31-41, 43-47

management/server/peer.go (2)

325-398: Peer job creation: guard against nil jobManager and consider send-vs-persist ordering

A couple of points in CreatePeerJob:

  1. Nil jobManager will panic
    am.jobManager is used unconditionally:

    if !am.jobManager.IsPeerConnected(peerID) { ... }
    if am.jobManager.IsPeerHasPendingJobs(peerID) { ... }
    if err := am.jobManager.SendJob(...); err != nil { ... }

    If DefaultAccountManager is constructed with a nil jobManager (e.g., in older tests or miswired setups), this will panic the server on first job creation. Either:

    • Enforce a non-nil jobManager at construction and in all call sites, or
    • Add a defensive check at the top of CreatePeerJob returning an internal error or codes.Unimplemented when jobs are disabled.
  2. Job is sent before it is persisted
    The current order is:

    • Check for existing pending jobs.
    • Convert to stream request and call jobManager.SendJob(...).
    • Only then create the job in the store inside a transaction.

    If the DB write fails, the peer may already be executing a job that doesn’t exist in persistent storage or APIs. Consider persisting the job first (status = pending), then sending it to the peer, and, if sending fails, updating the job status to failed with an appropriate reason. This keeps runtime and storage in sync even on failures.


400-451: Ensure job ownership is enforced in GetPeerJobByID

GetPeerJobByID validates that the peer belongs to the account, but doesn’t ensure that the returned job actually belongs to the requested peer:

job, err := am.Store.GetPeerJobByID(ctx, accountID, jobID)
if err != nil {
    return nil, err
}
return job, nil

A caller can request /peers/{peerA}/jobs/{jobOfPeerB} (same account) and still see jobOfPeerB. After loading the job, you should verify that job.PeerID == peerID and return a suitable error (e.g. status.NewPeerNotPartOfAccountError()) otherwise.

This is important to keep the per-peer job API consistent with the URL path and to avoid leaking jobs between peers within an account.

management/internals/shared/grpc/server.go (1)

363-385: Don’t swallow sendJob errors in sendJobsLoop

In sendJobsLoop, if sendJob fails you only log and return nil:

if err := s.sendJob(ctx, peerKey, event, srv); err != nil {
    log.WithContext(ctx).Warnf("send job failed: %v", err)
    return nil
}

This causes the Job RPC to complete with an OK status even when encryption, key retrieval, or Send fails, making it impossible for clients or observability to distinguish success from failure.

Consider:

  • Propagating the error (possibly via status.Errorf(codes.Internal, ...)) so the RPC status reflects job delivery failure.
  • Optionally marking the job as failed in the job manager/store when sendJob returns an error.

This will give callers accurate feedback and improve debuggability of job delivery issues.

Also applies to: 445-468

🧹 Nitpick comments (12)
client/internal/debug/upload.go (1)

61-64: Consider limiting error response body size.

The code reads the entire error response body without a size limit. While this is in error paths and less critical, a malicious or misconfigured server could return very large bodies that consume excessive memory.

🔎 Optional: limit error body reads
 	if putResp.StatusCode != http.StatusOK {
-		body, _ := io.ReadAll(putResp.Body)
+		body, _ := io.ReadAll(io.LimitReader(putResp.Body, 1024*1024)) // 1 MB limit
 		return fmt.Errorf("upload status %d: %s", putResp.StatusCode, string(body))
 	}

Apply similar change at line 84 for the GET request error handling.

Also applies to: 83-86

management/server/account_test.go (1)

37-37: JobManager wiring into tests looks correct; consider reusing peersManager instance

The new job import and job.NewJobManager(nil, store, peersManager) usage in createManager align with the updated BuildManager signature and keep tests close to production wiring. One minor improvement would be to pass the existing peersManager into ephemeral_manager.NewEphemeralManager instead of constructing a second peers.NewManager(store, permissionsManager) instance, to avoid duplicate managers over the same store in this test setup.

Also applies to: 2996-3005

management/server/route_test.go (1)

24-24: Router tests now wire JobManager consistently with account tests

Using peersManager := peers.NewManager(store, permissionsManager) and passing job.NewJobManager(nil, store, peersManager) into BuildManager aligns route tests with the new job‑aware account manager constructor and keeps test wiring close to production. As a minor clean‑up, you could reuse peersManager when creating the ephemeral manager instead of instantiating a second peers manager, but this is non‑blocking for tests.

Also applies to: 1292-1301

client/internal/engine_test.go (1)

28-28: JobManager wiring in test management server looks good; reuse metrics instead of nil

The new job.Manager integration in startManagement is consistent with the extended BuildManager and nbgrpc.NewServer signatures. To keep metrics usage consistent, you can construct the JobManager with the real metrics instance instead of nil.

Proposed tweak to construct JobManager with metrics
-	permissionsManager := permissions.NewManager(store)
-	peersManager := peers.NewManager(store, permissionsManager)
-	jobManager := job.NewJobManager(nil, store, peersManager)
-
-	ia, _ := integrations.NewIntegratedValidator(context.Background(), peersManager, nil, eventStore)
-
-	metrics, err := telemetry.NewDefaultAppMetrics(context.Background())
+	permissionsManager := permissions.NewManager(store)
+	peersManager := peers.NewManager(store, permissionsManager)
+
+	ia, _ := integrations.NewIntegratedValidator(context.Background(), peersManager, nil, eventStore)
+
+	metrics, err := telemetry.NewDefaultAppMetrics(context.Background())
 	require.NoError(t, err)
+
+	jobManager := job.NewJobManager(metrics, store, peersManager)

Also applies to: 1601-1638

management/server/nameserver_test.go (1)

21-21: Use the existing metrics when constructing JobManager in tests

The new JobManager wiring in createNSManager is correct. Since you already have a metrics instance, you can pass it into job.NewJobManager instead of nil to keep instrumentation consistent and avoid having a partially wired JobManager in tests.

Proposed change
-	permissionsManager := permissions.NewManager(store)
-	peersManager := peers.NewManager(store, permissionsManager)
+	permissionsManager := permissions.NewManager(store)
+	peersManager := peers.NewManager(store, permissionsManager)
@@
-	return BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(nil, store, peersManager), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)
+	return BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(metrics, store, peersManager), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)

Also applies to: 793-802

management/server/peer_test.go (1)

37-38: JobManager wiring across peer tests is correct; consider reusing metrics and sharing setup

The updated tests correctly construct a peersManager and pass a concrete JobManager into BuildManager, which is required after the signature change. Two small cleanups would make this more consistent and DRY:

  1. Pass metrics into job.NewJobManager instead of nil in each test, since you already create metrics and JobManager supports it.
  2. Factor a small helper that builds (accountManager, peersManager, jobManager) so Test_RegisterPeerByUser, Test_RegisterPeerBySetupKey, Test_RegisterPeerRollbackOnFailure, and Test_LoginPeer don’t repeat the same wiring.
Illustrative change for one test (`Test_RegisterPeerByUser`)
-	permissionsManager := permissions.NewManager(s)
-	peersManager := peers.NewManager(s, permissionsManager)
-
-	ctx := context.Background()
-	updateManager := update_channel.NewPeersUpdateManager(metrics)
+	permissionsManager := permissions.NewManager(s)
+	peersManager := peers.NewManager(s, permissionsManager)
+
+	ctx := context.Background()
+	updateManager := update_channel.NewPeersUpdateManager(metrics)
@@
-	networkMapController := controller.NewController(ctx, s, metrics, updateManager, requestBuffer, MockIntegratedValidator{}, settingsMockManager, "netbird.cloud", port_forwarding.NewControllerMock(), ephemeral_manager.NewEphemeralManager(s, peers.NewManager(s, permissionsManager)), &config.Config{})
-
-	am, err := BuildManager(context.Background(), nil, s, networkMapController, job.NewJobManager(nil, s, peersManager), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)
+	networkMapController := controller.NewController(ctx, s, metrics, updateManager, requestBuffer, MockIntegratedValidator{}, settingsMockManager, "netbird.cloud", port_forwarding.NewControllerMock(), ephemeral_manager.NewEphemeralManager(s, peers.NewManager(s, permissionsManager)), &config.Config{})
+
+	jobManager := job.NewJobManager(metrics, s, peersManager)
+	am, err := BuildManager(context.Background(), nil, s, networkMapController, jobManager, nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)

You can apply the same pattern to the other three tests or extract a shared helper that returns am.

Also applies to: 1289-1301, 1370-1387, 1528-1541, 1605-1622

shared/management/proto/management.proto (1)

52-54: Job RPC and job messages are well-shaped; consider using Duration and string for clarity

The new Job streaming RPC and its associated messages (JobRequest, JobResponse, BundleParameters, BundleResult) look structurally sound and fit the existing pattern of wrapping concrete payloads inside EncryptedMessage.

Two non‑blocking API tweaks that could improve clarity:

  1. Model job duration explicitly

You already import google.protobuf.Duration and use it elsewhere. If bundle_for_time represents a duration, using Duration makes units explicit and avoids integer‑unit confusion:

message BundleParameters {
  bool bundle_for = 1;
- int64 bundle_for_time = 2;
+ google.protobuf.Duration bundle_for_time = 2;
  int32 log_file_count = 3;
  bool anonymize = 4;
}
  1. Use string for human‑readable reasons

If JobResponse.Reason is meant to carry a textual failure/success reason rather than opaque binary data, a string field is more idiomatic and avoids encoding ambiguity:

message JobResponse {
  bytes ID = 1;
  JobStatus status = 2;
- bytes Reason = 3;
+ string reason = 3;
  oneof workload_results {
    BundleResult bundle = 10;
  }
}

If you deliberately intend to carry serialized or encrypted blobs in Reason, keeping it as bytes is fine, but consider a more explicit name (e.g., reason_blob) to signal that.

Also applies to: 66-101

client/cmd/testutil_test.go (1)

21-21: Test management server correctly wires JobManager; reuse metrics instead of nil

The updated startManagement helper now correctly creates a peersmanager, a jobManager, and threads jobManager into both BuildManager and nbgrpc.NewServer, keeping this test utility aligned with the new APIs.

To keep metrics usage consistent with the rest of the setup, you can pass the existing metrics into job.NewJobManager instead of nil:

Proposed adjustment
-	permissionsManagerMock := permissions.NewMockManager(ctrl)
-	peersmanager := peers.NewManager(store, permissionsManagerMock)
-	settingsManagerMock := settings.NewMockManager(ctrl)
-
-	jobManager := job.NewJobManager(nil, store, peersmanager)
+	permissionsManagerMock := permissions.NewMockManager(ctrl)
+	peersmanager := peers.NewManager(store, permissionsManagerMock)
+	settingsManagerMock := settings.NewMockManager(ctrl)
@@
-	accountManager, err := mgmt.BuildManager(context.Background(), config, store, networkMapController, jobManager, nil, "", eventStore, nil, false, iv, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManagerMock, false)
+	jobManager := job.NewJobManager(metrics, store, peersmanager)
+
+	accountManager, err := mgmt.BuildManager(context.Background(), config, store, networkMapController, jobManager, nil, "", eventStore, nil, false, iv, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManagerMock, false)
@@
-	mgmtServer, err := nbgrpc.NewServer(config, accountManager, settingsMockManager, jobManager, secretsManager, nil, nil, &mgmt.MockIntegratedValidator{}, networkMapController)
+	mgmtServer, err := nbgrpc.NewServer(config, accountManager, settingsMockManager, jobManager, secretsManager, nil, nil, &mgmt.MockIntegratedValidator{}, networkMapController)

Also applies to: 100-105, 124-125, 133-134

management/server/http/handlers/peers/peers_handler.go (1)

31-41: Job handlers are correctly wired but consider aligning create response status with OpenAPI

Auth extraction, peer scoping, and delegation to types.NewJob and accountManager.{CreatePeerJob,GetAllPeerJobs,GetPeerJobByID} are consistent with existing patterns, and error handling via util.WriteError/WriteErrorResponse looks fine.

One divergence: CreateJob uses util.WriteJSONObject, which always returns HTTP 200, while the OpenAPI spec declares 201 for successful creates. Consider either:

  • Switching to a custom encoder here that sets w.WriteHeader(http.StatusCreated), or
  • Adjusting the spec to 200, if you prefer that convention.

Also applies to: 51-142

shared/management/http/api/openapi.yml (1)

2410-2513: Job paths look correct; consider tightening response contract

The new peer‑scoped job endpoints are consistent with existing peer routes in terms of auth, path params, and payload types.

Two contract nits:

  • POST /api/peers/{peerId}/jobs documents 201 but the handler currently responds with 200 via WriteJSONObject; either adjust the handler or change the spec to 200.
  • For GET operations, you may want to add a documented 404 response (reusing components/responses/not_found) for missing peers/jobs, as other resources often do.
client/internal/engine.go (1)

224-226: Job stream integration is sound, but not tracked in shutdownWg

Wiring jobExecutor via NewEngine and starting the mgmClient.Job loop in receiveJobEvents() with its own jobExecutorWG is reasonable: Stop() waits on jobExecutorWG before tearing down the engine, so job goroutines won’t outlive the context or DB clients.

If you want full symmetry with other long‑running loops, you could also account for this goroutine in shutdownWg; functionally it’s already safely terminated, so this is just an observability/consistency tweak.

Also applies to: 239-271, 997-1034

management/server/job/manager.go (1)

172-182: Consider using an index for peer-to-jobs lookup.

IsPeerHasPendingJobs iterates over all pending jobs to find any belonging to the given peer. If the pending map grows large, this O(n) scan could become a bottleneck. Consider maintaining a secondary index like peerJobs map[string]map[string]struct{} (peerID → set of jobIDs) for O(1) lookup.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a8604ef and 369b5c3.

⛔ Files ignored due to path filters (4)
  • client/proto/daemon.pb.go is excluded by !**/*.pb.go
  • go.sum is excluded by !**/*.sum
  • shared/management/proto/management.pb.go is excluded by !**/*.pb.go
  • shared/management/proto/management_grpc.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (54)
  • client/cmd/debug.go
  • client/cmd/status.go
  • client/cmd/testutil_test.go
  • client/cmd/up.go
  • client/embed/embed.go
  • client/internal/connect.go
  • client/internal/debug/debug.go
  • client/internal/debug/upload.go
  • client/internal/debug/upload_test.go
  • client/internal/engine.go
  • client/internal/engine_test.go
  • client/jobexec/executor.go
  • client/proto/daemon.proto
  • client/server/debug.go
  • client/server/server.go
  • client/server/server_test.go
  • client/status/status.go
  • client/status/status_test.go
  • client/ui/debug.go
  • go.mod
  • management/internals/modules/peers/manager.go
  • management/internals/modules/peers/manager_mock.go
  • management/internals/server/boot.go
  • management/internals/server/controllers.go
  • management/internals/server/modules.go
  • management/internals/shared/grpc/server.go
  • management/server/account.go
  • management/server/account/manager.go
  • management/server/account_test.go
  • management/server/activity/codes.go
  • management/server/dns_test.go
  • management/server/http/handlers/peers/peers_handler.go
  • management/server/http/testing/testing_tools/channel/channel.go
  • management/server/job/channel.go
  • management/server/job/manager.go
  • management/server/management_proto_test.go
  • management/server/management_test.go
  • management/server/mock_server/account_mock.go
  • management/server/nameserver_test.go
  • management/server/peer.go
  • management/server/peer_test.go
  • management/server/permissions/modules/module.go
  • management/server/route_test.go
  • management/server/store/sql_store.go
  • management/server/store/store.go
  • management/server/types/job.go
  • shared/management/client/client.go
  • shared/management/client/client_test.go
  • shared/management/client/grpc.go
  • shared/management/client/mock.go
  • shared/management/http/api/generate.sh
  • shared/management/http/api/openapi.yml
  • shared/management/http/api/types.gen.go
  • shared/management/proto/management.proto
💤 Files with no reviewable changes (1)
  • client/proto/daemon.proto
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-13T00:29:53.247Z
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/cmd/ssh_exec_unix.go:53-74
Timestamp: 2025-11-13T00:29:53.247Z
Learning: In client/ssh/server/executor_unix.go, the method ExecuteWithPrivilegeDrop(ctx context.Context, config ExecutorConfig) has a void return type (no error return). It handles failures by exiting the process directly with appropriate exit codes rather than returning errors to the caller.

Applied to files:

  • client/embed/embed.go
  • client/jobexec/executor.go
📚 Learning: 2025-11-28T12:20:47.254Z
Learnt from: bcmmbaga
Repo: netbirdio/netbird PR: 4849
File: management/internals/modules/zones/manager/manager.go:55-86
Timestamp: 2025-11-28T12:20:47.254Z
Learning: In the NetBird management server, DNS zones without records are automatically filtered out in network map generation (filterPeerAppliedZones in management/internals/controllers/network_map/controller/controller.go checks `len(zone.Records) == 0`). Therefore, CreateZone operations don't need to call UpdateAccountPeers since empty zones don't affect the network map.

Applied to files:

  • shared/management/client/client_test.go
🧬 Code graph analysis (31)
management/internals/server/controllers.go (2)
management/server/job/manager.go (2)
  • Manager (24-32)
  • NewJobManager (34-45)
management/internals/server/container.go (1)
  • Create (6-10)
client/server/server_test.go (3)
management/server/job/manager.go (1)
  • NewJobManager (34-45)
management/server/account.go (1)
  • BuildManager (178-266)
management/internals/shared/grpc/server.go (1)
  • NewServer (80-132)
shared/management/client/client.go (3)
management/server/types/job.go (1)
  • Job (34-58)
shared/management/http/api/types.gen.go (2)
  • JobRequest (714-716)
  • JobResponse (719-727)
shared/management/proto/management.pb.go (6)
  • JobRequest (388-398)
  • JobRequest (413-413)
  • JobRequest (428-430)
  • JobResponse (463-475)
  • JobResponse (490-490)
  • JobResponse (505-507)
client/cmd/testutil_test.go (3)
management/server/job/manager.go (1)
  • NewJobManager (34-45)
management/server/account.go (1)
  • BuildManager (178-266)
management/internals/shared/grpc/server.go (1)
  • NewServer (80-132)
management/server/route_test.go (3)
management/internals/modules/peers/manager.go (1)
  • NewManager (44-49)
management/server/account.go (1)
  • BuildManager (178-266)
management/server/job/manager.go (1)
  • NewJobManager (34-45)
management/server/management_proto_test.go (1)
management/server/job/manager.go (1)
  • NewJobManager (34-45)
management/server/http/testing/testing_tools/channel/channel.go (3)
management/server/job/manager.go (1)
  • NewJobManager (34-45)
management/internals/modules/peers/ephemeral/manager/ephemeral.go (1)
  • NewEphemeralManager (53-61)
management/server/account.go (1)
  • BuildManager (178-266)
management/server/types/job.go (2)
shared/management/proto/management.pb.go (20)
  • JobStatus (25-25)
  • JobStatus (57-59)
  • JobStatus (61-63)
  • JobStatus (70-72)
  • JobRequest (388-398)
  • JobRequest (413-413)
  • JobRequest (428-430)
  • BundleParameters (554-563)
  • BundleParameters (578-578)
  • BundleParameters (593-595)
  • BundleResult (625-631)
  • BundleResult (646-646)
  • BundleResult (661-663)
  • JobResponse (463-475)
  • JobResponse (490-490)
  • JobResponse (505-507)
  • JobResponse_Bundle (548-550)
  • JobResponse_Bundle (552-552)
  • JobRequest_Bundle (457-459)
  • JobRequest_Bundle (461-461)
shared/management/http/api/types.gen.go (8)
  • JobRequest (714-716)
  • WorkloadResponse (1998-2000)
  • BundleParameters (367-379)
  • BundleResult (382-384)
  • BundleWorkloadResponse (397-405)
  • WorkloadTypeBundle (198-198)
  • WorkloadRequest (1993-1995)
  • JobResponse (719-727)
client/internal/debug/upload.go (1)
upload-server/types/upload.go (3)
  • GetURLResponse (15-18)
  • ClientHeader (5-5)
  • ClientHeaderValue (7-7)
client/cmd/status.go (1)
client/status/status.go (1)
  • ConvertToStatusOutputOverview (122-169)
management/server/mock_server/account_mock.go (1)
management/server/types/job.go (1)
  • Job (34-58)
shared/management/client/grpc.go (3)
shared/management/proto/management.pb.go (6)
  • EncryptedMessage (322-333)
  • EncryptedMessage (348-348)
  • EncryptedMessage (363-365)
  • SyncResponse (721-738)
  • SyncResponse (753-753)
  • SyncResponse (768-770)
shared/management/proto/management_grpc.pb.go (2)
  • ManagementService_JobClient (169-173)
  • ManagementService_SyncClient (89-92)
encryption/message.go (2)
  • EncryptMessage (10-24)
  • DecryptMessage (27-40)
client/cmd/up.go (1)
util/log.go (1)
  • FindFirstLogPath (77-84)
management/server/store/store.go (1)
management/server/types/job.go (1)
  • Job (34-58)
management/internals/shared/grpc/server.go (4)
management/server/job/manager.go (2)
  • Manager (24-32)
  • Event (18-22)
shared/management/proto/management.pb.go (3)
  • EncryptedMessage (322-333)
  • EncryptedMessage (348-348)
  • EncryptedMessage (363-365)
management/server/job/channel.go (2)
  • Channel (18-21)
  • ErrJobChannelClosed (15-15)
encryption/message.go (1)
  • EncryptMessage (10-24)
client/ui/debug.go (1)
shared/management/status/error.go (1)
  • Error (54-57)
management/internals/server/boot.go (1)
management/internals/shared/grpc/server.go (1)
  • NewServer (80-132)
management/server/nameserver_test.go (3)
management/internals/modules/peers/manager.go (1)
  • NewManager (44-49)
management/server/account.go (1)
  • BuildManager (178-266)
management/server/job/manager.go (1)
  • NewJobManager (34-45)
shared/management/client/client_test.go (4)
management/server/job/manager.go (1)
  • NewJobManager (34-45)
management/internals/controllers/network_map/controller/controller.go (1)
  • NewController (75-111)
management/internals/modules/peers/ephemeral/manager/ephemeral.go (1)
  • NewEphemeralManager (53-61)
management/server/account.go (1)
  • BuildManager (178-266)
client/internal/engine.go (5)
client/internal/profilemanager/config.go (1)
  • Config (90-161)
client/jobexec/executor.go (2)
  • Executor (24-25)
  • NewExecutor (27-29)
shared/management/proto/management.pb.go (19)
  • JobRequest (388-398)
  • JobRequest (413-413)
  • JobRequest (428-430)
  • JobResponse (463-475)
  • JobResponse (490-490)
  • JobResponse (505-507)
  • JobRequest_Bundle (457-459)
  • JobRequest_Bundle (461-461)
  • BundleParameters (554-563)
  • BundleParameters (578-578)
  • BundleParameters (593-595)
  • JobResponse_Bundle (548-550)
  • JobResponse_Bundle (552-552)
  • SyncResponse (721-738)
  • SyncResponse (753-753)
  • SyncResponse (768-770)
  • BundleResult (625-631)
  • BundleResult (646-646)
  • BundleResult (661-663)
shared/management/status/error.go (2)
  • Errorf (70-75)
  • Error (54-57)
client/internal/debug/debug.go (2)
  • GeneratorDependencies (243-248)
  • BundleConfig (237-241)
management/server/peer_test.go (3)
management/server/permissions/manager.go (1)
  • NewManager (33-37)
management/server/account.go (1)
  • BuildManager (178-266)
management/server/job/manager.go (1)
  • NewJobManager (34-45)
management/server/account/manager.go (1)
management/server/types/job.go (1)
  • Job (34-58)
client/internal/connect.go (1)
client/internal/engine.go (1)
  • EngineConfig (84-143)
client/internal/debug/upload_test.go (2)
client/internal/debug/upload.go (1)
  • UploadDebugBundle (17-28)
upload-server/types/upload.go (1)
  • GetURLPath (9-9)
management/server/dns_test.go (2)
management/server/account.go (1)
  • BuildManager (178-266)
management/server/job/manager.go (1)
  • NewJobManager (34-45)
client/status/status_test.go (1)
client/status/status.go (1)
  • ConvertToStatusOutputOverview (122-169)
management/server/http/handlers/peers/peers_handler.go (4)
management/server/context/auth.go (1)
  • GetUserAuthFromContext (25-30)
shared/management/http/util/util.go (3)
  • WriteError (84-120)
  • WriteErrorResponse (70-80)
  • WriteJSONObject (27-35)
shared/management/http/api/types.gen.go (3)
  • JobRequest (714-716)
  • JobResponse (719-727)
  • JobResponseStatus (730-730)
management/server/types/job.go (3)
  • NewJob (67-103)
  • Job (34-58)
  • Workload (60-64)
management/server/store/sql_store.go (2)
management/server/types/job.go (3)
  • Job (34-58)
  • JobStatusPending (18-18)
  • JobStatusFailed (20-20)
management/server/store/store.go (2)
  • LockingStrength (40-40)
  • LockingStrengthNone (47-47)
client/server/server.go (1)
client/status/status.go (1)
  • ToProtoFullStatus (549-635)
management/server/job/manager.go (2)
management/server/job/channel.go (2)
  • Channel (18-21)
  • NewChannel (23-29)
management/server/types/job.go (1)
  • Job (34-58)
management/internals/server/modules.go (1)
management/server/account.go (1)
  • BuildManager (178-266)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
  • GitHub Check: Management / Unit (amd64, mysql)
  • GitHub Check: Relay / Unit (amd64, -race)
  • GitHub Check: Management / Benchmark (API) (amd64, sqlite)
  • GitHub Check: Management / Benchmark (API) (amd64, postgres)
  • GitHub Check: Client / Unit (386)
  • GitHub Check: Management / Benchmark (amd64, sqlite)
  • GitHub Check: Management / Unit (amd64, sqlite)
  • GitHub Check: Client / Unit (amd64)
  • GitHub Check: Management / Unit (amd64, postgres)
  • GitHub Check: Management / Benchmark (amd64, postgres)
  • GitHub Check: Client (Docker) / Unit
  • GitHub Check: Android / Build
  • GitHub Check: Client / Unit
  • GitHub Check: Client / Unit
  • GitHub Check: release
  • GitHub Check: Windows
  • GitHub Check: Client / Unit

Comment on lines +47 to +59
req, err := http.NewRequestWithContext(ctx, "PUT", response.URL, fileData)
if err != nil {
return fmt.Errorf("create PUT request: %w", err)
}

req.ContentLength = stat.Size()
req.Header.Set("Content-Type", "application/octet-stream")

putResp, err := http.DefaultClient.Do(req)
if err != nil {
return fmt.Errorf("upload failed: %v", err)
}
defer putResp.Body.Close()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Configure HTTP client timeout to prevent indefinite hangs.

The http.DefaultClient has no timeout configured. If the upload service is unresponsive, the request could hang indefinitely, blocking the debug bundle generation.

🔎 Recommended fix: use context-aware client with timeout
-	req, err := http.NewRequestWithContext(ctx, "PUT", response.URL, fileData)
-	if err != nil {
-		return fmt.Errorf("create PUT request: %w", err)
-	}
-
-	req.ContentLength = stat.Size()
-	req.Header.Set("Content-Type", "application/octet-stream")
-
-	putResp, err := http.DefaultClient.Do(req)
+	req, err := http.NewRequestWithContext(ctx, "PUT", response.URL, fileData)
+	if err != nil {
+		return fmt.Errorf("create PUT request: %w", err)
+	}
+
+	req.ContentLength = stat.Size()
+	req.Header.Set("Content-Type", "application/octet-stream")
+
+	client := &http.Client{Timeout: 5 * time.Minute}
+	putResp, err := client.Do(req)

Note: A 5-minute timeout is reasonable for a 50 MB upload over typical network conditions. Adjust based on your expected network performance.

🤖 Prompt for AI Agents
In client/internal/debug/upload.go around lines 47 to 59, the code uses
http.DefaultClient which has no timeout and can hang indefinitely; replace use
of http.DefaultClient.Do with a dedicated http.Client configured with a
reasonable Timeout (e.g., 5 * time.Minute) and use the existing ctx on the
request so the request is cancellable; instantiate the client locally (or reuse
a package-level client) with Timeout set, call client.Do(req), and keep the
existing defer putResp.Body.Close() handling.

Comment on lines +70 to +81
getReq, err := http.NewRequestWithContext(ctx, "GET", url+"?id="+id, nil)
if err != nil {
return nil, fmt.Errorf("create GET request: %w", err)
}

getReq.Header.Set(types.ClientHeader, types.ClientHeaderValue)

resp, err := http.DefaultClient.Do(getReq)
if err != nil {
return nil, fmt.Errorf("get presigned URL: %w", err)
}
defer resp.Body.Close()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Configure HTTP client timeout for presigned URL request.

Similar to the upload request, the http.DefaultClient has no timeout configured for fetching the presigned URL. This could hang indefinitely if the service is unresponsive.

🔎 Recommended fix: use client with timeout
 	getReq, err := http.NewRequestWithContext(ctx, "GET", url+"?id="+id, nil)
 	if err != nil {
 		return nil, fmt.Errorf("create GET request: %w", err)
 	}
 
 	getReq.Header.Set(types.ClientHeader, types.ClientHeaderValue)
 
-	resp, err := http.DefaultClient.Do(getReq)
+	client := &http.Client{Timeout: 30 * time.Second}
+	resp, err := client.Do(getReq)

Note: A 30-second timeout is reasonable for fetching a presigned URL.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
getReq, err := http.NewRequestWithContext(ctx, "GET", url+"?id="+id, nil)
if err != nil {
return nil, fmt.Errorf("create GET request: %w", err)
}
getReq.Header.Set(types.ClientHeader, types.ClientHeaderValue)
resp, err := http.DefaultClient.Do(getReq)
if err != nil {
return nil, fmt.Errorf("get presigned URL: %w", err)
}
defer resp.Body.Close()
getReq, err := http.NewRequestWithContext(ctx, "GET", url+"?id="+id, nil)
if err != nil {
return nil, fmt.Errorf("create GET request: %w", err)
}
getReq.Header.Set(types.ClientHeader, types.ClientHeaderValue)
client := &http.Client{Timeout: 30 * time.Second}
resp, err := client.Do(getReq)
if err != nil {
return nil, fmt.Errorf("get presigned URL: %w", err)
}
defer resp.Body.Close()
🤖 Prompt for AI Agents
In client/internal/debug/upload.go around lines 70 to 81, the code uses
http.DefaultClient to fetch the presigned URL which has no timeout and can hang;
replace that call with a dedicated http.Client configured with a reasonable
timeout (e.g., 30s) and use that client's Do method (or reuse the existing
upload client if available) to perform the request, propagating context as
before and keeping the same header and error handling.

Comment thread management/server/activity/codes.go Outdated

AccountAutoUpdateVersionUpdated: {"Account AutoUpdate Version updated", "account.settings.auto.version.update"},

JobCreatedByUser: {"Create Job for peer", "peer.job.create"},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use past tense for consistency with existing messages.

The message format should follow the established pattern of using simple past tense. All existing activity messages use past tense (e.g., "Peer added", "Route created", "User joined"), while "Create Job for peer" uses imperative form.

🔎 Suggested fix
-	JobCreatedByUser: {"Create Job for peer", "peer.job.create"},
+	JobCreatedByUser: {"Job created for peer", "peer.job.create"},
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
JobCreatedByUser: {"Create Job for peer", "peer.job.create"},
JobCreatedByUser: {"Job created for peer", "peer.job.create"},
🤖 Prompt for AI Agents
In management/server/activity/codes.go around line 301, the activity message for
JobCreatedByUser uses imperative tense ("Create Job for peer") instead of past
tense like the other entries; change the message string to past tense — e.g.,
"Job created for peer" — keeping the same activity key ("peer.job.create") so
the pair remains {"Job created for peer", "peer.job.create"}.

Comment on lines +135 to +192
// SaveJob persists a job in DB
func (s *SqlStore) CreatePeerJob(ctx context.Context, job *types.Job) error {
result := s.db.Create(job)
if result.Error != nil {
log.WithContext(ctx).Errorf("failed to create job in store: %s", result.Error)
return status.Errorf(status.Internal, "failed to create job in store")
}
return nil
}

func (s *SqlStore) CompletePeerJob(ctx context.Context, job *types.Job) error {
result := s.db.
Model(&types.Job{}).
Where(idQueryCondition, job.ID).
Updates(job)

if result.Error != nil {
log.WithContext(ctx).Errorf("failed to update job in store: %s", result.Error)
return status.Errorf(status.Internal, "failed to update job in store")
}
return nil
}

// job was pending for too long and has been cancelled
func (s *SqlStore) MarkPendingJobsAsFailed(ctx context.Context, accountID, peerID, jobID, reason string) error {
now := time.Now().UTC()
result := s.db.
Model(&types.Job{}).
Where(accountAndPeerIDQueryCondition+" AND id = ?"+" AND status = ?", accountID, peerID, jobID, types.JobStatusPending).
Updates(types.Job{
Status: types.JobStatusFailed,
FailedReason: reason,
CompletedAt: &now,
})
if result.Error != nil {
log.WithContext(ctx).Errorf("failed to mark pending jobs as Failed job in store: %s", result.Error)
return status.Errorf(status.Internal, "failed to mark pending job as Failed in store")
}
return nil
}

// job was pending for too long and has been cancelled
func (s *SqlStore) MarkAllPendingJobsAsFailed(ctx context.Context, accountID, peerID, reason string) error {
now := time.Now().UTC()
result := s.db.
Model(&types.Job{}).
Where(accountAndPeerIDQueryCondition+" AND status = ?", accountID, peerID, types.JobStatusPending).
Updates(types.Job{
Status: types.JobStatusFailed,
FailedReason: reason,
CompletedAt: &now,
})
if result.Error != nil {
log.WithContext(ctx).Errorf("failed to mark pending jobs as Failed job in store: %s", result.Error)
return status.Errorf(status.Internal, "failed to mark pending job as Failed in store")
}
return nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Consider checking RowsAffected in job update helpers

CreatePeerJob, CompletePeerJob, MarkPendingJobsAsFailed, and MarkAllPendingJobsAsFailed log DB errors but don’t distinguish between “no matching row” and a successful update. That makes silent no-ops hard to detect, especially for idempotent job transitions.

If you care about detecting missing jobs, consider checking result.RowsAffected and returning a NotFound status error when it’s zero, similar to patterns used elsewhere in this file (e.g. UpdateAccountDomainAttributes, SavePeerStatus).

Comment thread shared/management/client/grpc.go
Comment thread shared/management/client/grpc.go Outdated
Comment on lines +246 to +252
// Verify that the message is intended for this peer
currentPeerID := c.key.PublicKey().String()
if encryptedMsg.WgPubKey != currentPeerID {
err := fmt.Errorf("job request peer ID mismatch: expected %s, got %s", currentPeerID, encryptedMsg.WgPubKey)
log.Warnf("%v", err)
return nil, err
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the actual code in the file around the mentioned lines
fd -t f grpc.go | grep -E "shared/management/client"

Repository: netbirdio/netbird

Length of output: 93


🏁 Script executed:

# Get the actual implementation of receiveJobRequest and surrounding code
cat -n shared/management/client/grpc.go | sed -n '240,270p'

Repository: netbirdio/netbird

Length of output: 1207


🏁 Script executed:

# Search for the sendJob implementation on the server side
rg -t go "func.*sendJob" -A 15

Repository: netbirdio/netbird

Length of output: 3863


🏁 Script executed:

# Look for how WgPubKey is set in the server implementation
rg -t go "WgPubKey.*=" -B 3 -A 3

Repository: netbirdio/netbird

Length of output: 9479


🏁 Script executed:

# Check how other encrypted management messages validate WgPubKey
rg -t go "encryptedMsg.WgPubKey" -B 5 -A 5

Repository: netbirdio/netbird

Length of output: 809


Job request peer-ID check is inverted – causes all job messages to be rejected

The client's receiveJobRequest validates that incoming messages have WgPubKey equal to the client's own public key:

currentPeerID := c.key.PublicKey().String()
if encryptedMsg.WgPubKey != currentPeerID {
    return nil, fmt.Errorf("job request peer ID mismatch: expected %s, got %s", currentPeerID, encryptedMsg.WgPubKey)
}

However, on the server side, sendJob sets WgPubKey to the server's WireGuard public key. This causes the client to reject every job request as a peer ID mismatch.

Validate against the expected serverPubKey instead (which is available in the function signature), matching the message semantics where the sender's key is embedded.

🤖 Prompt for AI Agents
In shared/management/client/grpc.go around lines 246 to 252, the peer-ID
comparison is inverted: the code compares the incoming message's WgPubKey to the
client's own public key, but the message contains the sender (server) WireGuard
key. Change the check to validate encryptedMsg.WgPubKey against the serverPubKey
parameter (i.e., if encryptedMsg.WgPubKey != serverPubKey then log and return an
error), and update the error/log message to state the expected serverPubKey and
the received WgPubKey for clarity.

Comment thread shared/management/http/api/types.gen.go
# Conflicts:
#	client/cmd/debug.go
#	client/cmd/testutil_test.go
#	client/internal/engine_test.go
#	client/server/server.go
#	client/server/server_test.go
#	client/ui/debug.go
#	go.sum
#	management/internals/server/boot.go
#	management/internals/shared/grpc/server.go
#	management/server/account/manager.go
#	management/server/activity/codes.go
#	management/server/http/testing/testing_tools/channel/channel.go
#	management/server/management_proto_test.go
#	management/server/mock_server/account_mock.go
#	management/server/permissions/modules/module.go
#	management/server/store/sql_store.go
#	management/server/store/store.go
#	shared/management/client/client_test.go
#	shared/management/http/api/openapi.yml
#	shared/management/http/api/types.gen.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
client/server/server.go (1)

15-20: Remove duplicate wgtypes import on line 19.

The same package is imported twice at lines 16 and 19, which will cause a Go compilation error.

🛠️ Proposed fix
-	"golang.zx2c4.com/wireguard/wgctrl/wgtypes"
🤖 Fix all issues with AI agents
In `@management/internals/shared/grpc/server.go`:
- Around line 177-208: The Server.Job handler must guard against a nil job
manager to avoid a panic; before calling s.jobManager.CreateJobChannel (and
before s.startResponseReceiver/sendJobsLoop), check if s.jobManager == nil and
return a gRPC error (e.g., status.Errorf(codes.Unavailable, "job manager not
available") or codes.Internal with an explanatory message). Update the nil-check
in the Job method around the CreateJobChannel usage and ensure you return the
gRPC status error instead of continuing when s.jobManager is nil.
♻️ Duplicate comments (9)
management/internals/shared/grpc/server.go (1)

371-391: sendJobsLoop still swallows send errors.
Same issue as previously flagged: returning nil on sendJob failure hides stream errors from callers.

client/internal/engine.go (1)

1036-1059: Potential nil dereference in handleBundle still present.
e.config.ProfileConfig or ManagementURL can be nil, causing a panic.

🛠️ Suggested fix
func (e *Engine) handleBundle(params *mgmProto.BundleParameters) (*mgmProto.JobResponse_Bundle, error) {
 	log.Infof("handle remote debug bundle request: %s", params.String())
 	syncResponse, err := e.GetLatestSyncResponse()
 	if err != nil {
 		log.Warnf("get latest sync response: %v", err)
 	}
@@
 	waitFor := time.Duration(params.BundleForTime) * time.Minute
-
-	uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String())
+	if e.config == nil || e.config.ProfileConfig == nil || e.config.ProfileConfig.ManagementURL == nil {
+		return nil, errors.New("profile config or management URL not available for bundle generation")
+	}
+	uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String())
 	if err != nil {
 		return nil, err
 	}
shared/management/http/api/types.gen.go (1)

387-400: Type mismatch with proto definition persists.

As noted in a previous review, BundleForTime uses int here but int64 in the proto definition, and LogFileCount uses int here but int32 in proto. Since this is generated code, the fix should be applied to the OpenAPI specification (openapi.yml) by explicitly specifying format: int64 for BundleForTime and format: int32 for LogFileCount.

management/server/peer.go (3)

329-355: Guard against nil jobManager before use.

CreatePeerJob dereferences am.jobManager directly; if it’s nil in any deployment/test wiring, this will panic.

🛠️ Suggested guard
 func (am *DefaultAccountManager) CreatePeerJob(ctx context.Context, accountID, peerID, userID string, job *types.Job) error {
+	if am.jobManager == nil {
+		return status.Errorf(status.Internal, "job manager is not configured")
+	}
 	allowed, err := am.permissionsManager.ValidateUserPermissions(ctx, accountID, userID, modules.RemoteJobs, operations.Create)
 	if err != nil {
 		return status.NewPermissionValidationError(err)
 	}

369-399: Persist before dispatch to avoid orphaned jobs.

Sending the job prior to DB persistence can leave a running job with no record if the transaction fails. Consider storing first (pending), then sending, and marking failed if dispatch fails.


431-454: Verify job ownership in GetPeerJobByID.

The method returns any job by jobID within the account without confirming it belongs to peerID. That allows cross‑peer access within the same account.

🔒 Suggested ownership check
 	job, err := am.Store.GetPeerJobByID(ctx, accountID, jobID)
 	if err != nil {
 		return nil, err
 	}
+	if job.PeerID != peerID {
+		return nil, status.NewPeerNotPartOfAccountError()
+	}
 
 	return job, nil
management/server/store/sql_store.go (3)

158-205: Return NotFound when no job rows are updated.

CompletePeerJob and the “mark failed” helpers treat zero‑row updates as success, which can silently mask missing jobs. Consider checking RowsAffected and returning a NotFound error.

✅ Example pattern (apply similarly to other job updates)
 result := s.db.
 	Model(&types.Job{}).
 	Where(accountAndPeerIDQueryCondition+" AND id = ?"+" AND status = ?", accountID, peerID, jobID, types.JobStatusPending).
 	Updates(types.Job{
 		Status:       types.JobStatusFailed,
 		FailedReason: reason,
 		CompletedAt:  &now,
 	})
 if result.Error != nil {
 	log.WithContext(ctx).Errorf("failed to mark pending jobs as Failed job in store: %s", result.Error)
 	return status.Errorf(status.Internal, "failed to mark pending job as Failed in store")
 }
+if result.RowsAffected == 0 {
+	return status.Errorf(status.NotFound, "job %s not found", jobID)
+}
 return nil

207-221: Scope job fetches to peer ownership.

GetPeerJobByID filters by account_id + id only. If peer scoping is required, add peer_id to the query (or make this method explicitly “account‑wide only”).


4459-4477: Return NotFound when key lookup yields no rows.

Scan(&peerID) can return nil even when no rows match, resulting in ("", nil) and ambiguous callers.

✅ Suggested handling
 	result := tx.Model(&nbpeer.Peer{}).
 		Select("id").
 		Where(GetKeyQueryCondition(s), key).
 		Limit(1).
 		Scan(&peerID)
 	if result.Error != nil {
 		log.WithContext(ctx).Errorf("failed to get peer ID by key: %s", result.Error)
 		return "", status.Errorf(status.Internal, "failed to get peer ID by key")
 	}
+	if peerID == "" {
+		return "", status.Errorf(status.NotFound, "peer not found for key %s", key)
+	}
 
 	return peerID, nil
🧹 Nitpick comments (5)
client/status/status.go (1)

584-584: Non-deterministic ordering of Networks field.

maps.Keys returns keys in non-deterministic order. If consistent ordering matters for display, debugging, or test stability, consider sorting the result.

🔧 Suggested fix to ensure deterministic ordering
-	pbFullStatus.LocalPeerState.Networks = maps.Keys(fullStatus.LocalPeerState.Routes)
+	networks := maps.Keys(fullStatus.LocalPeerState.Routes)
+	sort.Strings(networks)
+	pbFullStatus.LocalPeerState.Networks = networks

Similarly for line 605:

-			Networks:                   maps.Keys(peerState.GetRoutes()),
+			Networks:                   func() []string {
+				keys := maps.Keys(peerState.GetRoutes())
+				sort.Strings(keys)
+				return keys
+			}(),

Also applies to: 605-605

client/server/debug.go (1)

49-49: Consider propagating the request context.

The function receives a context parameter (unused _) but uses context.Background() for the upload. If the upload should respect cancellation signals, consider using the request context instead:

-	key, err := debug.UploadDebugBundle(context.Background(), req.GetUploadURL(), s.config.ManagementURL.String(), path)
+	key, err := debug.UploadDebugBundle(ctx, req.GetUploadURL(), s.config.ManagementURL.String(), path)

If using context.Background() is intentional (e.g., to ensure uploads complete regardless of client disconnection), a brief comment explaining this would clarify the intent.

shared/management/proto/management.proto (2)

66-89: Inconsistent field naming and consider string for Reason.

Two observations on these message definitions:

  1. Naming inconsistency: ID and Reason use PascalCase while workload_parameters and workload_results use snake_case. Proto3 convention typically uses snake_case for field names.

  2. Reason as bytes: Using bytes for a failure reason is unusual. Unless this field needs to carry binary/encrypted data, string would be more appropriate for human-readable error messages.

Suggested fix
 message JobRequest {
-  bytes ID = 1;
+  bytes id = 1;
 
   oneof workload_parameters {
     BundleParameters bundle = 10;
   }
 }
 
 message JobResponse{
-  bytes ID = 1;
+  bytes id = 1;
   JobStatus status=2;
-  bytes Reason=3;
+  string reason=3;
   oneof workload_results {
     BundleResult bundle = 10;
   }
 }

91-96: Clarify bundle_for field purpose.

The field name bundle_for as a bool is ambiguous. The name suggests a target/destination ("bundle for whom/what"), but a boolean doesn't convey that meaning. Consider renaming to clarify its purpose, such as:

  • enabled if it controls whether bundling is active
  • include_<something> if it's a toggle for including specific content
management/server/http/handlers/peers/peers_handler.go (1)

52-86: Consider returning HTTP 201 Created for resource creation.

The CreateJob handler uses util.WriteJSONObject which returns HTTP 200 OK. For RESTful APIs, resource creation via POST should typically return HTTP 201 Created.

Suggested approach

Either create a new utility function or inline the response:

w.Header().Set("Content-Type", "application/json; charset=UTF-8")
w.WriteHeader(http.StatusCreated)
if err := json.NewEncoder(w).Encode(resp); err != nil {
    util.WriteError(ctx, err, w)
    return
}

Alternatively, if util.WriteJSONObject is used project-wide for POST creation endpoints and returning 200 is an established pattern, this can remain as-is for consistency.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 369b5c3 and fa41f8d.

⛔ Files ignored due to path filters (3)
  • client/proto/daemon.pb.go is excluded by !**/*.pb.go
  • go.sum is excluded by !**/*.sum
  • shared/management/proto/management.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (41)
  • client/cmd/debug.go
  • client/cmd/status.go
  • client/cmd/testutil_test.go
  • client/cmd/up.go
  • client/embed/embed.go
  • client/internal/connect.go
  • client/internal/engine.go
  • client/internal/engine_test.go
  • client/proto/daemon.proto
  • client/server/debug.go
  • client/server/server.go
  • client/server/server_test.go
  • client/status/status.go
  • client/status/status_test.go
  • go.mod
  • management/internals/modules/peers/manager.go
  • management/internals/server/boot.go
  • management/internals/server/controllers.go
  • management/internals/server/modules.go
  • management/internals/shared/grpc/server.go
  • management/server/account.go
  • management/server/account/manager.go
  • management/server/account_test.go
  • management/server/activity/codes.go
  • management/server/dns_test.go
  • management/server/http/handlers/peers/peers_handler.go
  • management/server/http/testing/testing_tools/channel/channel.go
  • management/server/management_proto_test.go
  • management/server/management_test.go
  • management/server/mock_server/account_mock.go
  • management/server/nameserver_test.go
  • management/server/peer.go
  • management/server/peer_test.go
  • management/server/permissions/modules/module.go
  • management/server/route_test.go
  • management/server/store/sql_store.go
  • management/server/store/store.go
  • shared/management/client/client_test.go
  • shared/management/http/api/openapi.yml
  • shared/management/http/api/types.gen.go
  • shared/management/proto/management.proto
💤 Files with no reviewable changes (1)
  • client/proto/daemon.proto
🚧 Files skipped from review as they are similar to previous changes (14)
  • management/internals/modules/peers/manager.go
  • management/server/account_test.go
  • client/embed/embed.go
  • go.mod
  • management/internals/server/controllers.go
  • management/server/activity/codes.go
  • management/server/account/manager.go
  • management/server/http/testing/testing_tools/channel/channel.go
  • management/server/mock_server/account_mock.go
  • client/cmd/testutil_test.go
  • management/server/route_test.go
  • management/server/dns_test.go
  • client/status/status_test.go
  • management/internals/server/boot.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-28T12:20:47.254Z
Learnt from: bcmmbaga
Repo: netbirdio/netbird PR: 4849
File: management/internals/modules/zones/manager/manager.go:55-86
Timestamp: 2025-11-28T12:20:47.254Z
Learning: In the NetBird management server, DNS zones without records are automatically filtered out in network map generation (filterPeerAppliedZones in management/internals/controllers/network_map/controller/controller.go checks `len(zone.Records) == 0`). Therefore, CreateZone operations don't need to call UpdateAccountPeers since empty zones don't affect the network map.

Applied to files:

  • management/server/store/store.go
  • shared/management/client/client_test.go
🧬 Code graph analysis (20)
management/server/http/handlers/peers/peers_handler.go (4)
management/server/context/auth.go (1)
  • GetUserAuthFromContext (25-30)
shared/management/http/util/util.go (3)
  • WriteError (84-120)
  • WriteErrorResponse (70-80)
  • WriteJSONObject (27-35)
shared/management/http/api/types.gen.go (3)
  • JobRequest (816-818)
  • JobResponse (821-829)
  • JobResponseStatus (832-832)
management/server/types/job.go (3)
  • NewJob (67-103)
  • Job (34-58)
  • Workload (60-64)
client/cmd/up.go (1)
util/log.go (1)
  • FindFirstLogPath (77-84)
client/internal/engine_test.go (3)
management/server/job/manager.go (1)
  • NewJobManager (34-45)
management/server/account.go (1)
  • BuildManager (178-266)
management/internals/server/server.go (1)
  • NewServer (71-83)
management/server/peer_test.go (4)
management/server/permissions/manager.go (1)
  • NewManager (33-37)
management/server/users/manager.go (1)
  • NewManager (22-26)
management/server/account.go (1)
  • BuildManager (178-266)
management/server/job/manager.go (1)
  • NewJobManager (34-45)
client/internal/engine.go (3)
client/internal/profilemanager/config.go (1)
  • Config (91-162)
client/jobexec/executor.go (2)
  • Executor (24-25)
  • NewExecutor (27-29)
client/internal/debug/debug.go (1)
  • GeneratorDependencies (243-248)
management/server/management_test.go (7)
management/internals/modules/peers/manager.go (1)
  • NewManager (46-51)
management/server/settings/manager.go (1)
  • NewManager (34-41)
management/server/networks/routers/manager.go (1)
  • NewManager (40-46)
management/server/networks/resources/manager.go (1)
  • NewManager (42-49)
management/server/permissions/manager.go (1)
  • NewManager (33-37)
management/server/groups/manager.go (1)
  • NewManager (36-42)
management/server/job/manager.go (1)
  • NewJobManager (34-45)
management/server/nameserver_test.go (2)
management/server/account.go (1)
  • BuildManager (178-266)
management/server/job/manager.go (1)
  • NewJobManager (34-45)
client/status/status.go (2)
client/internal/peer/status.go (6)
  • FullStatus (152-163)
  • ManagementState (129-133)
  • SignalState (122-126)
  • LocalPeerState (107-113)
  • RosenpassState (136-139)
  • NSGroupState (143-149)
client/proto/daemon.pb.go (21)
  • FullStatus (2140-2154)
  • FullStatus (2167-2167)
  • FullStatus (2182-2184)
  • ManagementState (1820-1827)
  • ManagementState (1840-1840)
  • ManagementState (1855-1857)
  • SignalState (1759-1766)
  • SignalState (1779-1779)
  • SignalState (1794-1796)
  • LocalPeerState (1666-1677)
  • LocalPeerState (1690-1690)
  • LocalPeerState (1705-1707)
  • PeerState (1485-1507)
  • PeerState (1520-1520)
  • PeerState (1535-1537)
  • RelayState (1881-1888)
  • RelayState (1901-1901)
  • RelayState (1916-1918)
  • NSGroupState (1941-1949)
  • NSGroupState (1962-1962)
  • NSGroupState (1977-1979)
client/cmd/status.go (1)
client/status/status.go (1)
  • ConvertToStatusOutputOverview (123-170)
management/internals/shared/grpc/server.go (5)
management/server/job/manager.go (2)
  • Manager (24-32)
  • Event (18-22)
shared/management/status/error.go (7)
  • FromError (78-87)
  • NotFound (21-21)
  • Errorf (70-75)
  • PermissionDenied (18-18)
  • Unauthenticated (39-39)
  • InvalidArgument (27-27)
  • Internal (24-24)
shared/management/proto/management.pb.go (9)
  • JobRequest (388-398)
  • JobRequest (413-413)
  • JobRequest (428-430)
  • JobResponse (463-475)
  • JobResponse (490-490)
  • JobResponse (505-507)
  • EncryptedMessage (322-333)
  • EncryptedMessage (348-348)
  • EncryptedMessage (363-365)
management/server/job/channel.go (2)
  • Channel (18-21)
  • ErrJobChannelClosed (15-15)
encryption/message.go (1)
  • EncryptMessage (10-24)
client/internal/connect.go (3)
client/internal/profilemanager/config.go (1)
  • Config (91-162)
shared/management/proto/management.pb.go (3)
  • PeerConfig (2082-2100)
  • PeerConfig (2115-2115)
  • PeerConfig (2130-2132)
client/internal/engine.go (1)
  • EngineConfig (84-143)
management/server/peer.go (4)
management/server/types/job.go (2)
  • Job (34-58)
  • Workload (60-64)
management/server/permissions/modules/module.go (1)
  • RemoteJobs (8-8)
shared/management/status/error.go (5)
  • NewPermissionValidationError (213-215)
  • NewPermissionDeniedError (209-211)
  • NewPeerNotPartOfAccountError (105-107)
  • Errorf (70-75)
  • Type (46-46)
management/server/posture/nb_version.go (1)
  • MeetsMinVersion (57-72)
management/server/store/sql_store.go (2)
management/server/types/job.go (3)
  • Job (34-58)
  • JobStatusPending (18-18)
  • JobStatusFailed (20-20)
management/server/store/store.go (2)
  • LockingStrength (43-43)
  • LockingStrengthNone (50-50)
client/server/debug.go (1)
client/internal/debug/upload.go (1)
  • UploadDebugBundle (17-28)
management/internals/server/modules.go (1)
management/server/account.go (1)
  • BuildManager (178-266)
management/server/store/store.go (1)
management/server/types/job.go (1)
  • Job (34-58)
shared/management/http/api/types.gen.go (1)
shared/management/proto/management.pb.go (6)
  • BundleParameters (554-563)
  • BundleParameters (578-578)
  • BundleParameters (593-595)
  • JobRequest (388-398)
  • JobRequest (413-413)
  • JobRequest (428-430)
shared/management/client/client_test.go (4)
management/server/job/manager.go (1)
  • NewJobManager (34-45)
management/internals/controllers/network_map/controller/controller.go (1)
  • NewController (76-112)
management/internals/modules/peers/ephemeral/manager/ephemeral.go (1)
  • NewEphemeralManager (53-61)
management/server/account.go (1)
  • BuildManager (178-266)
client/server/server_test.go (3)
management/server/job/manager.go (1)
  • NewJobManager (34-45)
management/server/account.go (1)
  • BuildManager (178-266)
management/internals/shared/grpc/server.go (1)
  • NewServer (84-138)
management/server/management_proto_test.go (2)
management/server/job/manager.go (1)
  • NewJobManager (34-45)
management/internals/shared/grpc/server.go (1)
  • NewServer (84-138)
🔇 Additional comments (43)
shared/management/client/client_test.go (4)

21-27: LGTM!

Import additions are correctly placed and necessary to support the new job management functionality in the test setup.


96-96: LGTM!

The jobManager instantiation correctly matches the NewJobManager signature. Passing nil for metrics is appropriate in test context.


122-123: LGTM!

Both calls correctly match their respective function signatures:

  • ephemeral_manager.NewEphemeralManager uses the aliased import correctly
  • BuildManager includes jobManager in the correct position (5th parameter) per the updated signature

134-134: No issues found. The nbgrpc.NewServer call matches the function signature exactly. All 10 arguments are in the correct order and type:

  1. config*nbconfig.Config
  2. accountManageraccount.Manager
  3. settingsMockManagersettings.Manager
  4. jobManager*job.Manager
  5. secretsManagerSecretsManager
  6. niltelemetry.AppMetrics
  7. nilauth.Manager
  8. mgmt.MockIntegratedValidator{}integrated_validator.IntegratedValidator
  9. networkMapControllernetwork_map.Controller
  10. nilidp.OAuthConfigProvider
management/server/permissions/modules/module.go (1)

5-38: LGTM: RemoteJobs is fully registered in permissions.

Constant and All map entry are consistent.

management/internals/server/modules.go (1)

88-91: LGTM: JobManager is wired into BuildManager.

Call site aligns with the updated signature.

client/server/server.go (1)

1524-1527: LGTM: log path is now passed into ConnectClient.Run.

No issues with the updated call.

shared/management/http/api/openapi.yml (3)

41-43: LGTM: Jobs tag added and marked experimental.


47-162: LGTM: Job/workload schemas are well-structured and consistent.


2677-2780: LGTM: Job endpoints are clearly defined with auth and responses.

client/internal/connect.go (2)

72-75: LGTM: logPath is threaded through Run and mobile entry points.

Also applies to: 95-96, 113-116


286-286: LGTM: EngineConfig now carries LogPath and ProfileConfig.

Also applies to: 473-513

client/cmd/up.go (1)

200-204: Log path propagation into Run looks correct.
This keeps foreground mode aligned with the new log‑path‑aware debug bundle flow.

management/internals/shared/grpc/server.go (3)

3-31: JobManager wiring in the server constructor is consistent.
The field + constructor parameter threading looks clean.

Also applies to: 57-122


331-369: Handshake + response receiver flow looks solid.
The sequence and error handling are coherent for the Job stream.


436-448: Encrypted send paths look consistent.
sendUpdate and sendJob follow the same encryption + send pattern.

Also applies to: 453-475

client/server/server_test.go (1)

22-24: JobManager wiring in tests is updated correctly.
The constructor call chain now matches the new signatures.

Also applies to: 310-333

client/internal/engine.go (4)

34-54: Config/engine struct extensions look good.
LogPath/ProfileConfig and job executor fields integrate cleanly.

Also applies to: 137-143, 206-225


239-271: Engine init + shutdown coordination for job executor looks solid.
New executor creation and Wait() on shutdown are appropriate.

Also applies to: 339-341


527-530: Job stream hookup is clear and consistent with the existing streams.
Nice to see it aligned with the Signal/Sync error handling pattern.

Also applies to: 999-1034


858-869: Sync response persistence is now properly guarded.
The dedicated lock and cloning behavior are good improvements.

Also applies to: 1957-1994

client/internal/engine_test.go (1)

28-29: JobManager wiring in engine tests is aligned with new signatures.
The setup looks consistent across server construction paths.

Also applies to: 1603-1637

client/status/status.go (3)

14-18: LGTM on new imports.

The new imports for durationpb, timestamppb, and maps are appropriate for the proto conversion functionality being added.


123-145: LGTM on signature change.

The updated ConvertToStatusOutputOverview signature correctly accepts a fully constructed *proto.FullStatus and a separate daemonVersion string parameter, aligning with the proto structure refactoring.


558-644: Well-structured conversion function.

The ToProtoFullStatus function correctly converts all fields from the internal peer.FullStatus to *proto.FullStatus. Error handling for ManagementState.Error, SignalState.Error, relay errors, and DNS state errors is properly implemented with nil checks before calling .Error().

shared/management/http/api/types.gen.go (1)

1-4: Generated file - changes should be made upstream.

This file is auto-generated by oapi-codegen. Any structural changes (like the type mismatch noted in past reviews for BundleParameters.BundleForTime and LogFileCount) should be addressed in the OpenAPI specification rather than in this file directly.

client/cmd/debug.go (1)

350-360: LGTM on field rename.

The change from LogFile to LogPath aligns with the broader refactoring in the debug package that renames log file references to path-based fields.

client/cmd/status.go (1)

102-102: LGTM on updated function call.

The call to ConvertToStatusOutputOverview correctly passes resp.GetFullStatus() as the first argument and resp.GetDaemonVersion() as the third argument, matching the updated function signature.

management/server/nameserver_test.go (2)

21-21: LGTM on new import.

The job package import is required for the new JobManager wiring.


794-801: LGTM on JobManager wiring.

The test setup correctly:

  1. Creates a peersManager using peers.NewManager(store, permissionsManager)
  2. Passes job.NewJobManager(nil, store, peersManager) to BuildManager

This follows the established pattern for wiring JobManager into test infrastructure, consistent with the broader PR changes.

management/server/account.go (1)

18-18: LGTM!

The JobManager dependency is properly wired into DefaultAccountManager following the existing patterns for other managers. The import, field declaration, parameter addition, and assignment are all consistent and correctly implemented.

Also applies to: 74-74, 183-183, 206-206

management/server/peer_test.go (1)

1293-1300: LGTM!

The test wiring consistently creates peersManager via peers.NewManager(s, permissionsManager) and passes job.NewJobManager(nil, s, peersManager) to BuildManager across all four test functions. Using nil for metrics is appropriate in test contexts.

Also applies to: 1379-1386, 1533-1540, 1614-1621

client/server/debug.go (1)

27-57: LGTM on the refactoring.

The field rename from LogFile to LogPath improves clarity, and delegating upload logic to debug.UploadDebugBundle properly centralizes the functionality.

shared/management/proto/management.proto (1)

51-53: LGTM on the new Job RPC.

The bidirectional streaming RPC for job execution is well-suited for long-running operations where status updates need to flow in both directions.

management/server/http/handlers/peers/peers_handler.go (1)

39-41: LGTM on the job endpoints and helper.

The new routes, ListJobs, GetJob handlers, and toSingleJobResponse helper follow existing patterns in the codebase. Error handling is consistent, and the response transformation correctly handles the optional FailedReason field.

Also applies to: 88-143, 620-640

management/server/management_test.go (2)

33-34: No review note.


205-245: JobManager wiring in tests looks consistent.

The updated setup threads the new dependency through BuildManager and NewServer as expected.

management/server/management_proto_test.go (2)

34-34: No review note.


365-387: JobManager wiring in test setup looks good.

The dependency is passed through BuildManager and NewServer consistently with the updated API.

management/server/peer.go (1)

34-34: No review note.

management/server/store/store.go (1)

229-235: Store interface extension matches new job lifecycle needs.

management/server/store/sql_store.go (2)

46-55: No review note.


124-130: No review note.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +177 to +208
func (s *Server) Job(srv proto.ManagementService_JobServer) error {
reqStart := time.Now()
ctx := srv.Context()

peerKey, err := s.handleHandshake(ctx, srv)
if err != nil {
return err
}

accountID, err := s.accountManager.GetAccountIDForPeerKey(ctx, peerKey.String())
if err != nil {
// nolint:staticcheck
ctx = context.WithValue(ctx, nbContext.AccountIDKey, "UNKNOWN")
log.WithContext(ctx).Tracef("peer %s is not registered", peerKey.String())
if errStatus, ok := internalStatus.FromError(err); ok && errStatus.Type() == internalStatus.NotFound {
return status.Errorf(codes.PermissionDenied, "peer is not registered")
}
return err
}
// nolint:staticcheck
ctx = context.WithValue(ctx, nbContext.AccountIDKey, accountID)
peer, err := s.accountManager.GetStore().GetPeerByPeerPubKey(ctx, store.LockingStrengthNone, peerKey.String())
if err != nil {
return status.Errorf(codes.Unauthenticated, "peer is not registered")
}

s.startResponseReceiver(ctx, srv)

updates := s.jobManager.CreateJobChannel(ctx, accountID, peer.ID)
log.WithContext(ctx).Debugf("Job: took %v", time.Since(reqStart))

return s.sendJobsLoop(ctx, accountID, peerKey, peer, updates, srv)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard Job() when jobManager is nil.
If a server is instantiated without a JobManager, calling Job() will panic on CreateJobChannel. Return a gRPC error instead.

🛠️ Suggested fix
func (s *Server) Job(srv proto.ManagementService_JobServer) error {
 	reqStart := time.Now()
 	ctx := srv.Context()
+
+	if s.jobManager == nil {
+		return status.Error(codes.Unimplemented, "job manager not configured")
+	}
 
 	peerKey, err := s.handleHandshake(ctx, srv)
 	if err != nil {
 		return err
 	}
🤖 Prompt for AI Agents
In `@management/internals/shared/grpc/server.go` around lines 177 - 208, The
Server.Job handler must guard against a nil job manager to avoid a panic; before
calling s.jobManager.CreateJobChannel (and before
s.startResponseReceiver/sendJobsLoop), check if s.jobManager == nil and return a
gRPC error (e.g., status.Errorf(codes.Unavailable, "job manager not available")
or codes.Internal with an explanatory message). Update the nil-check in the Job
method around the CreateJobChannel usage and ensure you return the gRPC status
error instead of continuing when s.jobManager is nil.

@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@management/server/peer.go`:
- Around line 356-362: The current IsPeerHasPendingJobs check is racy; change
the logic in JobManager to provide a single atomic operation (e.g.,
AddPendingJobIfNone or TryEnqueuePendingJob) that both checks for an existing
pending job for peerID and enqueues/records the new pending job under the same
lock, returning a conflict/error if a job already exists; update the call site
in peer.go to call this new JobManager method instead of IsPeerHasPendingJobs so
the check-and-insert is protected by the JobManager's mutex and cannot be
interleaved by concurrent threads.
♻️ Duplicate comments (3)
management/server/peer.go (3)

329-354: Add a nil guard for am.jobManager before first use.

am.jobManager is dereferenced on Line 352. If it’s ever unset (tests/misconfig), this will panic.

🛠️ Suggested guard
 func (am *DefaultAccountManager) CreatePeerJob(ctx context.Context, accountID, peerID, userID string, job *types.Job) error {
+	if am.jobManager == nil {
+		return status.Errorf(status.Internal, "job manager is not configured")
+	}
 	allowed, err := am.permissionsManager.ValidateUserPermissions(ctx, accountID, userID, modules.RemoteJobs, operations.Create)
 	if err != nil {
 		return status.NewPermissionValidationError(err)
 	}

369-399: Avoid “send before persist” to prevent orphaned jobs.

If the DB transaction fails after SendJob, the peer may execute a job that doesn’t exist in storage. Prefer persisting first (status=pending), then send, and mark failed if send errors.


449-454: Verify the job belongs to the requested peer.

GetPeerJobByID doesn’t check that job.PeerID == peerID, so a caller can fetch another peer’s job within the same account.

🛠️ Suggested check
 	job, err := am.Store.GetPeerJobByID(ctx, accountID, jobID)
 	if err != nil {
 		return nil, err
 	}
+	if job.PeerID != peerID {
+		return nil, status.NewPeerNotPartOfAccountError()
+	}
 
 	return job, nil

Comment thread management/server/peer.go
Comment on lines +356 to +362
// check if already has pending jobs
// todo: The job checks here are not protected. The user can run this function from multiple threads,
// and each thread can think there is no job yet. This means entries in the pending job map will be overwritten,
// and only one will be kept, but potentially another one will overwrite it in the queue.
if am.jobManager.IsPeerHasPendingJobs(peerID) {
return status.Errorf(status.BadRequest, "peer already has pending job")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make the pending-job check atomic (current check is racy).

The TODO already notes the race: concurrent calls can both pass and then overwrite pending jobs. This can silently drop or misroute work. Consider a single JobManager method that atomically “check‑and‑enqueue” or returns a conflict.

🤖 Prompt for AI Agents
In `@management/server/peer.go` around lines 356 - 362, The current
IsPeerHasPendingJobs check is racy; change the logic in JobManager to provide a
single atomic operation (e.g., AddPendingJobIfNone or TryEnqueuePendingJob) that
both checks for an existing pending job for peerID and enqueues/records the new
pending job under the same lock, returning a conflict/error if a job already
exists; update the call site in peer.go to call this new JobManager method
instead of IsPeerHasPendingJobs so the check-and-insert is protected by the
JobManager's mutex and cannot be interleaved by concurrent threads.

@pappz pappz merged commit 58daa67 into main Jan 19, 2026
40 of 44 checks passed
@pappz pappz deleted the feature/remote-debug-release branch January 19, 2026 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants