[management] fix domain uniqueness#5529
Conversation
📝 WalkthroughWalkthroughThis PR removes the Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
management/internals/modules/reverseproxy/service/manager/manager.go (2)
807-821:⚠️ Potential issue | 🟠 MajorKeep peer-domain operations scoped to the requested account.
GetServiceByDomainis now global, but this path never verifies that the returned row belongs toaccountID. That turns wrong-account lookups intoPermissionDeniedinstead ofNotFound, leaks whether another account owns the domain, and can resolve the wrong row on legacy duplicated data.Suggested guard
func (m *Manager) lookupPeerService(ctx context.Context, accountID, peerID, domain string) (*service.Service, error) { svc, err := m.store.GetServiceByDomain(ctx, domain) if err != nil { return nil, err } + if svc.AccountID != accountID { + return nil, status.Errorf(status.NotFound, "service with domain %s not found", domain) + } if svc.Source != service.SourceEphemeral { return nil, status.Errorf(status.PermissionDenied, "cannot operate on API-created service via peer expose") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/internals/modules/reverseproxy/service/manager/manager.go` around lines 807 - 821, The lookupPeerService flow uses m.store.GetServiceByDomain which is global and doesn't check the account, so add an account-scoping guard after the call: in Manager.lookupPeerService, after svc, err := m.store.GetServiceByDomain(ctx, domain) verify that svc.AccountID == accountID (or call a domain+account store method if available), and if it does not match return a NotFound error (status.Errorf(status.NotFound, "service not found")) instead of PermissionDenied; keep the existing Source and SourcePeer checks intact after this guard so incorrect-account lookups do not leak ownership or resolve the wrong row.
264-275:⚠️ Potential issue | 🟠 MajorDon’t treat this preflight lookup as the only uniqueness guard.
Once
services.domainis enforced by a DB unique constraint, two concurrent create/update transactions can both pass this read. The loser then falls throughCreateService/UpdateServiceasstatus.Internal, so a normal domain collision intermittently becomes a 500 instead ofAlreadyExists.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/internals/modules/reverseproxy/service/manager/manager.go` around lines 264 - 275, The preflight check in checkDomainAvailable is not enough to guarantee uniqueness under concurrent requests; update the post-write error handling in the code paths that call CreateService and UpdateService (and any lower-level store methods that persist services) to detect the database unique-constraint violation for services.domain and translate it to status.AlreadyExists (instead of letting it surface as status.Internal). Specifically, add logic after calls to CreateService/UpdateService (or inside those functions in the store implementation) to inspect the returned error (SQL state / driver-specific unique-constraint error or a wrapped sentinel) and return status.Errorf(status.AlreadyExists, "service with domain %s already exists", domain) when it indicates a duplicate key; keep checkDomainAvailable for fast-fail but make the DB-level uniqueness translation the authoritative guard.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@management/internals/modules/reverseproxy/service/service.go`:
- Line 137: The Domain field is stored verbatim which allows case variants to
bypass UNIQUE across DBs; add normalization so Domain is lowercased before
validation/persistence. Implement a GORM model hook (e.g., a BeforeSave(db
*gorm.DB) error method on the struct that contains the Domain field) that sets
s.Domain = strings.ToLower(strings.TrimSpace(s.Domain)), and also update the
code paths that assign Domain (the functions that currently set Domain before
validation/persistence) to lowercase the value before any validation or
Create/Save calls to ensure consistent behavior across engines.
- Line 137: Auto-migrate will enforce the new unique index on Service.Domain
(the Domain field on rpservice.Service) immediately, but there is no
pre-migration to handle existing duplicate domains; add a migration in
getMigrationsPreAuto that runs before sql_store.go calls AutoMigrate to either
(a) detect and fail with a clear error listing duplicate Domain values, or (b)
deduplicate entries (e.g., keep one row per Domain and merge or remove others)
and log what was changed. Implement the migration to query the service table for
duplicate Domain groups, perform the chosen resolution, and return an error if
unresolved duplicates remain so AutoMigrate won’t fail; reference the Domain
field on rpservice.Service, getMigrationsPreAuto, and the AutoMigrate call in
management/server/store/sql_store.go when locating where to add this migration.
In `@management/server/store/sql_store.go`:
- Around line 4980-4982: Add a pre-auto-migration dedupe step that removes
duplicate rpservice.Service.Domain values before AutoMigrate runs (same pattern
as the existing RemoveDuplicatePeerKeys in migratePreAuto); implement a function
(e.g., RemoveDuplicateServiceDomains) that queries services grouped by domain,
keeps one canonical record per domain (by lowest ID or created_at), deletes the
other rows (using s.db.Where("domain = ? AND id != ?", domain,
keepID).Delete(&rpservice.Service{}) in a transaction), and invoke it from the
migratePreAuto flow before AutoMigrate(&rpservice.Service{}); update logging to
report how many duplicates were removed and ensure this protects
SqlStore.GetServiceByDomain and the new gorm:"uniqueIndex" on
rpservice.Service.Domain.
---
Outside diff comments:
In `@management/internals/modules/reverseproxy/service/manager/manager.go`:
- Around line 807-821: The lookupPeerService flow uses
m.store.GetServiceByDomain which is global and doesn't check the account, so add
an account-scoping guard after the call: in Manager.lookupPeerService, after
svc, err := m.store.GetServiceByDomain(ctx, domain) verify that svc.AccountID ==
accountID (or call a domain+account store method if available), and if it does
not match return a NotFound error (status.Errorf(status.NotFound, "service not
found")) instead of PermissionDenied; keep the existing Source and SourcePeer
checks intact after this guard so incorrect-account lookups do not leak
ownership or resolve the wrong row.
- Around line 264-275: The preflight check in checkDomainAvailable is not enough
to guarantee uniqueness under concurrent requests; update the post-write error
handling in the code paths that call CreateService and UpdateService (and any
lower-level store methods that persist services) to detect the database
unique-constraint violation for services.domain and translate it to
status.AlreadyExists (instead of letting it surface as status.Internal).
Specifically, add logic after calls to CreateService/UpdateService (or inside
those functions in the store implementation) to inspect the returned error (SQL
state / driver-specific unique-constraint error or a wrapped sentinel) and
return status.Errorf(status.AlreadyExists, "service with domain %s already
exists", domain) when it indicates a duplicate key; keep checkDomainAvailable
for fast-fail but make the DB-level uniqueness translation the authoritative
guard.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a1277f2d-da71-4d7a-916c-8cf35b3e048c
📒 Files selected for processing (7)
management/internals/modules/reverseproxy/service/manager/expose_tracker_test.gomanagement/internals/modules/reverseproxy/service/manager/manager.gomanagement/internals/modules/reverseproxy/service/manager/manager_test.gomanagement/internals/modules/reverseproxy/service/service.gomanagement/server/store/sql_store.gomanagement/server/store/store.gomanagement/server/store/store_mock.go
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@management/internals/modules/reverseproxy/service/service.go`:
- Line 137: The Domain struct tag now limits Domain to 255 chars but Validate()
only checks non-empty, so update the Validate() method (on the service
struct/type that defines Domain and includes a Validate() func) to also enforce
max length 255 for Domain (e.g., check len(s.Domain) > 255 and return a
validation error). Use the same 255 value used in the gorm tag (or extract to a
constant like domainMaxLen used by the Domain field and Validate()) so database
and validation rules stay in sync.
- Line 137: Add a post-AutoMigrate migration in getMigrationsPostAuto() to drop
the legacy non-unique index on the Domain column and create a new unique index
with a stable name: locate the ReverseProxy service model field "Domain" and in
getMigrationsPostAuto() (management/server/store/store.go) run raw SQL (or use
GORM's Migrator) to DROP INDEX if exists for the old non-unique index name and
then CREATE UNIQUE INDEX <stable_name> ON <table>(Domain); ensure the migration
is idempotent (check existence before drop/create) so upgrades from older
installs remove the old index and enforce uniqueness.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e6f78c0a-8506-41bf-b3a8-ee08ae8ec655
📒 Files selected for processing (1)
management/internals/modules/reverseproxy/service/service.go
| AccountID string `gorm:"index"` | ||
| Name string | ||
| Domain string `gorm:"index"` | ||
| Domain string `gorm:"type:varchar(255);uniqueIndex"` |
There was a problem hiding this comment.
Mirror the new domain length limit in Validate().
Line 137 now caps the column at 255 characters, but Validate() still only checks that Domain is non-empty. Overlong values will fail later with a DB error instead of a validation error.
💡 Proposed fix
func (s *Service) Validate() error {
if s.Name == "" {
return errors.New("service name is required")
}
if len(s.Name) > 255 {
return errors.New("service name exceeds maximum length of 255 characters")
}
if s.Domain == "" {
return errors.New("service domain is required")
}
+ if len(s.Domain) > 255 {
+ return errors.New("service domain exceeds maximum length of 255 characters")
+ }
if len(s.Targets) == 0 {
return errors.New("at least one target is required")
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Domain string `gorm:"type:varchar(255);uniqueIndex"` | |
| func (s *Service) Validate() error { | |
| if s.Name == "" { | |
| return errors.New("service name is required") | |
| } | |
| if len(s.Name) > 255 { | |
| return errors.New("service name exceeds maximum length of 255 characters") | |
| } | |
| if s.Domain == "" { | |
| return errors.New("service domain is required") | |
| } | |
| if len(s.Domain) > 255 { | |
| return errors.New("service domain exceeds maximum length of 255 characters") | |
| } | |
| if len(s.Targets) == 0 { | |
| return errors.New("at least one target is required") | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@management/internals/modules/reverseproxy/service/service.go` at line 137,
The Domain struct tag now limits Domain to 255 chars but Validate() only checks
non-empty, so update the Validate() method (on the service struct/type that
defines Domain and includes a Validate() func) to also enforce max length 255
for Domain (e.g., check len(s.Domain) > 255 and return a validation error). Use
the same 255 value used in the gorm tag (or extract to a constant like
domainMaxLen used by the Domain field and Validate()) so database and validation
rules stay in sync.
🧩 Analysis chain
🏁 Script executed:
# First, let's locate and examine the files mentioned in the review
find . -type f -name "service.go" | grep -i reverseproxyRepository: netbirdio/netbird
Length of output: 123
🏁 Script executed:
# Look for sql_store.go
find . -type f -name "sql_store.go"Repository: netbirdio/netbird
Length of output: 147
🏁 Script executed:
# Search for rpservice.Service or similar to understand the structure
rg -t go "type Service struct" --max-count 5Repository: netbirdio/netbird
Length of output: 260
🌐 Web query:
GORM v2 AutoMigrate index to uniqueIndex upgrade behavior
💡 Result:
In GORM v2, AutoMigrate is not an “alter everything to match tags” schema diff. It mainly creates what’s missing (tables/columns/indexes/constraints) and won’t reliably “upgrade” an existing non-unique index into a unique one for you. (gorm.io)
What happens if you change gorm:"index" → gorm:"uniqueIndex"
- If a non-unique index already exists,
AutoMigratewill typically try to create a new unique index (often with a different name), and leave the old non-unique index in place (because it doesn’t treat that as a “missing index” vs “needs alteration” case). (gorm.io) - In some cases, name-resolution/matching issues can cause repeated creation of additional unique indexes on subsequent runs (e.g.,
name, thenname_2,name_3, …). (github.com) - Depending on dialect/driver versions and the generated names, you may also see attempts to drop or reference a constraint/index name that doesn’t exist, due to mismatches between what GORM expects and what’s actually in the DB. (stackoverflow.com)
Practical takeaway / recommended migration approach
If you truly need to “upgrade” to unique, do it as an explicit migration:
- Drop the old non-unique index (by its real DB name).
- Create the new unique index (preferably with an explicit, stable name via
uniqueIndex:your_name). - Then
AutoMigratecan keep doing additive changes safely. (gorm.io)
Citations:
- 1: https://gorm.io/id_ID/docs/migration.html?utm_source=openai
- 2: https://gorm.io/id_ID/docs/migration.html?utm_source=openai
- 3: Unnecessarily unique indexes created when running
AutoMigratego-gorm/gorm#6224 - 4: https://stackoverflow.com/questions/78764311/why-gorm-automigrate-will-try-to-drop-a-non-exist-index?utm_source=openai
- 5: https://gorm.io/id_ID/docs/migration.html?utm_source=openai
🏁 Script executed:
# Read the Service struct definition in reverseproxy service
cat -n ./management/internals/modules/reverseproxy/service/service.go | head -150Repository: netbirdio/netbird
Length of output: 5737
🏁 Script executed:
# Check the sql_store.go file at lines 129-137
cat -n ./management/server/store/sql_store.go | sed -n '120,150p'Repository: netbirdio/netbird
Length of output: 1705
🏁 Script executed:
# Search for any explicit migrations related to Service or rpservice
rg -t go "type.*Migration|func.*[Mm]igrate.*Service|rpservice" --max-count 10Repository: netbirdio/netbird
Length of output: 10390
🏁 Script executed:
# Check git history for changes to the Domain field index tag
cd ./management/internals/modules/reverseproxy/service && git log -p --follow -S "index\|uniqueIndex" -- service.go | head -100Repository: netbirdio/netbird
Length of output: 43
🏁 Script executed:
# Search for migratePostAuto and migratePreAuto functions
rg -t go "func.*migratePostAuto|func.*migratePreAuto" -A 20Repository: netbirdio/netbird
Length of output: 2638
🏁 Script executed:
# Look for migration files that might handle Service table or Domain index
find . -type f -name "*migrat*" | grep -v node_modules | grep -v ".git"Repository: netbirdio/netbird
Length of output: 373
🏁 Script executed:
# Search for any explicit migration code handling indexes on Domain field
rg -t go "Domain.*uniqueIndex|Domain.*index|ALTER.*Domain|DROP.*index" -B 2 -A 2Repository: netbirdio/netbird
Length of output: 3072
🏁 Script executed:
# Check git log with broader context to see if index tag was changed
git log --oneline --follow -- ./management/internals/modules/reverseproxy/service/service.go | head -20Repository: netbirdio/netbird
Length of output: 91
🏁 Script executed:
# Get the full getMigrationsPostAuto function to see all migrations
rg -t go "func getMigrationsPostAuto" -A 100 management/server/store/store.goRepository: netbirdio/netbird
Length of output: 2999
🏁 Script executed:
# Check if there's a migration for Service Domain index
rg -t go "Service.*Domain|rpservice.*migration" management/server/migration/migration.goRepository: netbirdio/netbird
Length of output: 43
🏁 Script executed:
# Check the git commit that updated the type
git show ccbd554b --statRepository: netbirdio/netbird
Length of output: 50374
🏁 Script executed:
# Search for any recent Service-related migrations
rg -t go "Service\|rpservice" management/server/migration/migration.goRepository: netbirdio/netbird
Length of output: 43
🏁 Script executed:
# Search for any migrations explicitly handling Service or Domain
rg -t go "Service|Domain|rpservice" management/server/migration/migration.go -B 2 -A 2Repository: netbirdio/netbird
Length of output: 43
🏁 Script executed:
# Look at the full migration.go file to see all available migration helpers
wc -l management/server/migration/migration.goRepository: netbirdio/netbird
Length of output: 105
🏁 Script executed:
# Check migration package for index-related functions
rg -t go "CreateIndexIfNotExists|DropIndex" management/server/migration/migration.goRepository: netbirdio/netbird
Length of output: 394
🏁 Script executed:
# See if there are any Service-related migrations in post-auto
rg -t go "Service\|Domain" management/server/store/store.go | grep -i migratRepository: netbirdio/netbird
Length of output: 43
🏁 Script executed:
# Check git diff for recent changes to the Domain field
git diff HEAD~10..HEAD -- management/internals/modules/reverseproxy/service/service.go 2>/dev/null || echo "Not enough history or file unchanged recently"Repository: netbirdio/netbird
Length of output: 106
🏁 Script executed:
# Read the migration.go file to understand available helpers
cat -n management/server/migration/migration.go | head -100Repository: netbirdio/netbird
Length of output: 3597
🏁 Script executed:
# Look for the full CreateIndexIfNotExists function implementation
rg -t go "func CreateIndexIfNotExists" -A 30 management/server/migration/migration.goRepository: netbirdio/netbird
Length of output: 1028
🏁 Script executed:
# Check if there's any explicit migration for Service Domain
rg -t go "Domain\|rpservice\.Service" management/server/migration/migration.goRepository: netbirdio/netbird
Length of output: 43
🏁 Script executed:
# Search for any existing migrations that might handle Service or reverse proxy
find . -type f -name "*.go" -path "*/migration*" -exec grep -l "rpservice\|Service.*Domain" {} \;Repository: netbirdio/netbird
Length of output: 43
Add an explicit migration to handle the Domain field index change.
GORM v2's AutoMigrate does not reliably convert existing non-unique indexes to unique ones. When the Domain field tag changes from gorm:"index" to gorm:"uniqueIndex", AutoMigrate may create a new unique index while leaving the old non-unique one in place, resulting in duplicate indexes without enforcing the uniqueness guarantee on existing data.
Add an explicit migration in getMigrationsPostAuto() (management/server/store/store.go) to:
- Drop the old non-unique index on the Domain column
- Create the new unique index with an explicit, stable name
This ensures older installations properly enforce the uniqueness constraint on upgrade.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@management/internals/modules/reverseproxy/service/service.go` at line 137,
Add a post-AutoMigrate migration in getMigrationsPostAuto() to drop the legacy
non-unique index on the Domain column and create a new unique index with a
stable name: locate the ReverseProxy service model field "Domain" and in
getMigrationsPostAuto() (management/server/store/store.go) run raw SQL (or use
GORM's Migrator) to DROP INDEX if exists for the old non-unique index name and
then CREATE UNIQUE INDEX <stable_name> ON <table>(Domain); ensure the migration
is idempotent (check existence before drop/create) so upgrades from older
installs remove the old index and enforce uniqueness.
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
management/internals/modules/reverseproxy/service/manager/manager.go (1)
806-822: Add explicitaccountIDvalidation for defense-in-depth.With
GetServiceByDomainnow performing a global lookup, theaccountIDparameter is no longer validated within this function. While downstream calls todeletePeerServicewould eventually fail due to account-scopedGetServiceByID, explicitly checkingsvc.AccountID == accountIDhere would:
- Fail fast with a clear, consistent error message
- Avoid relying on downstream behavior for security invariants
- Prevent subtle information leakage via differing error responses
🛡️ Proposed fix to add explicit account validation
func (m *Manager) lookupPeerService(ctx context.Context, accountID, peerID, domain string) (*service.Service, error) { svc, err := m.store.GetServiceByDomain(ctx, domain) if err != nil { return nil, err } + if svc.AccountID != accountID { + return nil, status.Errorf(status.NotFound, "service not found") + } + if svc.Source != service.SourceEphemeral { return nil, status.Errorf(status.PermissionDenied, "cannot operate on API-created service via peer expose") } if svc.SourcePeer != peerID { return nil, status.Errorf(status.PermissionDenied, "cannot operate on service exposed by another peer") } return svc, nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/internals/modules/reverseproxy/service/manager/manager.go` around lines 806 - 822, lookupPeerService currently does a global GetServiceByDomain lookup but doesn't validate the passed accountID; add an explicit check in lookupPeerService that svc.AccountID == accountID and return a PermissionDenied (e.g., status.Errorf(status.PermissionDenied, "cannot operate on service owned by another account")) when it mismatches so callers like deletePeerService don't have to rely on downstream checks; locate the check immediately after retrieving svc from m.store.GetServiceByDomain in the lookupPeerService method and keep the error type/message consistent with existing PermissionDenied responses.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@management/internals/modules/reverseproxy/service/manager/manager.go`:
- Around line 806-822: lookupPeerService currently does a global
GetServiceByDomain lookup but doesn't validate the passed accountID; add an
explicit check in lookupPeerService that svc.AccountID == accountID and return a
PermissionDenied (e.g., status.Errorf(status.PermissionDenied, "cannot operate
on service owned by another account")) when it mismatches so callers like
deletePeerService don't have to rely on downstream checks; locate the check
immediately after retrieving svc from m.store.GetServiceByDomain in the
lookupPeerService method and keep the error type/message consistent with
existing PermissionDenied responses.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 699b2399-39ae-49db-a6db-9b845b605a96
📒 Files selected for processing (1)
management/internals/modules/reverseproxy/service/manager/manager.go



Describe your changes
Issue ticket number and link
Stack
Checklist
Documentation
Select exactly one:
Docs PR URL (required if "docs added" is checked)
Paste the PR link from https://github.com/netbirdio/docs here:
https://github.com/netbirdio/docs/pull/__
Summary by CodeRabbit