Skip to content

[management] Fix L4 service creation deadlock on single-connection databases#5779

Merged
lixmal merged 2 commits intomainfrom
fix-l4-service-creation
Apr 2, 2026
Merged

[management] Fix L4 service creation deadlock on single-connection databases#5779
lixmal merged 2 commits intomainfrom
fix-l4-service-creation

Conversation

@lixmal
Copy link
Copy Markdown
Collaborator

@lixmal lixmal commented Apr 2, 2026

Describe your changes

Creating an L4 (TCP/UDP/TLS) reverse proxy service hangs indefinitely on deployments using SQLite.

ensureL4Port queries the proxies table via ClusterSupportsCustomPorts through the main DB handle while inside a transaction that already holds the single SQLite connection, causing a self-deadlock. The request blocks until the 5-minute transaction timeout, then returns 500.

  • Move the cluster capability query before the transaction to avoid the deadlock
  • Force L4 targets to enabled=true during validation, since per-target disable is meaningless for single-target L4 services and the Go zero value (false) causes empty proto mappings

Issue ticket number and link

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • I added/updated documentation for this change
  • Documentation is not needed for this change (explain why)

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

  • Bug Fixes
    • Ensure Layer 4 targets are consistently treated as enabled so routing and path mappings behave predictably.
    • Fix service create/update flow to validate and assign listening ports correctly, preventing misassigned ports after domain or cluster changes and reducing transactional issues.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 2, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e33958ba-f4d9-4c89-af57-8eb94b890a5d

📥 Commits

Reviewing files that changed from the base of the PR and between 0ea39bb and fd0e1c2.

📒 Files selected for processing (1)
  • management/internals/modules/reverseproxy/service/manager/manager.go

📝 Walkthrough

Walkthrough

Refactors L4 port handling to precompute cluster custom-port capability outside DB transactions and forces L4 targets to be treated as enabled during validation; passes the precomputed capability into L4 port validation/assignment across service creation and update flows.

Changes

Cohort / File(s) Summary
Manager transaction refactoring
management/internals/modules/reverseproxy/service/manager/manager.go
Added clusterCustomPorts(ctx, svc) to compute customPorts before transactions. Updated persistNewService, persistNewEphemeralService, persistServiceUpdate, and executeServiceUpdate to accept and pass customPorts into ensureL4Port, removing internal capability lookups from transactions.
L4 target validation
management/internals/modules/reverseproxy/service/service.go
validateL4Target now unconditionally sets target.Enabled = true for L4 targets prior to validation, ensuring targets are always considered enabled for downstream path mapping.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Manager
    participant ClusterCap as ClusterCapabilityService
    participant DB as Database

    Client->>Manager: create/update L4 service request
    Manager->>ClusterCap: ClusterSupportsCustomPorts(cluster)  (precompute)
    ClusterCap-->>Manager: customPorts (bool)
    Manager->>DB: ExecuteInTransaction(persist/modify service) with customPorts
    DB-->>Manager: transaction result
    Manager-->>Client: response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • pascal-fischer
  • bcmmbaga

Poem

🐰 I hopped through code, ports in my paw,

Precompute the cluster, skip the transaction’s law.
L4 targets bright, forever enabled they be,
A hop, a stitch, now services agree. 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main fix: resolving a deadlock issue in L4 service creation on SQLite databases by moving cluster capability checks outside transactions.
Description check ✅ Passed The description covers the key problem, root cause, and solutions. The checklist is properly filled (bug fix marked), and documentation status is explained. However, no issue ticket link and no test references are provided.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-l4-service-creation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@management/internals/modules/reverseproxy/service/manager/manager.go`:
- Around line 530-536: The bug is that customPorts is computed before
executeServiceUpdate resolves the effective cluster, so clusterCustomPorts may
use an empty or stale cluster and cause ensureL4Port to make wrong decisions; in
persistServiceUpdate, delay computing customPorts until after
executeServiceUpdate has determined service.ProxyCluster (i.e., after
executeServiceUpdate returns or by resolving the effective cluster first), then
call clusterCustomPorts with the resolved cluster and pass that into
ensureL4Port/other checks; update references to clusterCustomPorts,
executeServiceUpdate, ensureL4Port, and service.ProxyCluster/serviceUpdateInfo
so the capability check uses the actual target cluster rather than the
precomputed value.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 41790b8c-44e0-448c-9378-ddf16a0cf9d6

📥 Commits

Reviewing files that changed from the base of the PR and between c2c6396 and 0ea39bb.

📒 Files selected for processing (2)
  • management/internals/modules/reverseproxy/service/manager/manager.go
  • management/internals/modules/reverseproxy/service/service.go

Comment thread management/internals/modules/reverseproxy/service/manager/manager.go Outdated
crn4
crn4 previously approved these changes Apr 2, 2026
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Apr 2, 2026

@lixmal lixmal merged commit 5bf2372 into main Apr 2, 2026
45 checks passed
@lixmal lixmal deleted the fix-l4-service-creation branch April 2, 2026 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants