Skip to content

[management] add domain and service cleanup migration#5850

Merged
pascal-fischer merged 2 commits intomainfrom
fix/add-migration-to-clean-domains
Apr 11, 2026
Merged

[management] add domain and service cleanup migration#5850
pascal-fischer merged 2 commits intomainfrom
fix/add-migration-to-clean-domains

Conversation

@pascal-fischer
Copy link
Copy Markdown
Collaborator

@pascal-fischer pascal-fischer commented Apr 10, 2026

Describe your changes

Issue ticket number and link

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • I added/updated documentation for this change
  • Documentation is not needed for this change (explain why)

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

  • Bug Fixes

    • Automatically detect and remove orphaned records during migrations while skipping cleanup when foreign-key constraints are present.
  • Tests

    • Added extensive tests covering missing tables, mixed valid/orphaned data, full deletions, idempotency, and behavior when foreign-key constraints exist.
  • Chores

    • Integrated orphan-cleanup into pre-migration steps so it runs as part of update routines.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 10, 2026

📝 Walkthrough

Walkthrough

Adds a generic migration CleanupOrphanedResources[T, R] and hasForeignKey helper to detect FKs and delete orphaned child rows; integrates two cleanup migrations into the pre-auto-migration pipeline and adds comprehensive tests covering FK presence and deletion scenarios.

Changes

Cohort / File(s) Summary
Orphan cleanup implementation
management/server/migration/migration.go
Added hasForeignKey(db, table, column) bool and generic CleanupOrphanedResources[T, R](ctx, db, fkColumn) error. Validates table/column existence, skips if FK exists, and deletes child rows whose fkColumn has no matching parent id. New logging and error paths added.
Orphan cleanup tests
management/server/migration/migration_test.go
Added tests and helpers: testParent, testChild, testChildWithFK, setupOrphanTestDB. Covers missing tables, empty tables, no-orphans, all-orphans (deletion), mixed sets, idempotency, and engine-gated FK-detection behavior for Postgres/MySQL.
Migration pipeline integration
management/server/store/store.go
Appended two pre-auto-migration steps invoking CleanupOrphanedResources for rpservice.Service and domain.Domain scoped by account_id.

Sequence Diagram(s)

sequenceDiagram
    actor Migrator
    participant Cleanup as CleanupOrphanedResources
    participant Schema as SchemaParser
    participant FK as FKDetector
    participant DB as Database

    Migrator->>Cleanup: Invoke cleanup(child T, parent R, fkColumn)
    Cleanup->>DB: Check child table exists
    alt child table missing
        Cleanup-->>Migrator: return (no-op)
    else
        Cleanup->>DB: Check parent table exists
        alt parent table missing
            Cleanup-->>Migrator: return (no-op)
        else
            Cleanup->>Schema: Parse child & parent models -> table names
            Cleanup->>Schema: Verify child has fkColumn
            alt fkColumn missing
                Cleanup-->>Migrator: return (error)
            else
                Cleanup->>FK: hasForeignKey(db, childTable, fkColumn)?
                alt FK exists
                    Cleanup-->>Migrator: return (no-op)
                else
                    Cleanup->>DB: DELETE FROM child WHERE fkColumn NOT IN (SELECT id FROM parent)
                    Cleanup-->>Migrator: return (success)
                end
            end
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐇 I nibble through schemas, tidy and spry,

Orphans removed with a hop and a sigh,
FK checks first, then a careful sweep,
Databases tidy, restful and deep,
Hop—cleanup done, now time for a nap!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is largely a template with minimal content. Required sections like 'Describe your changes' and 'Issue ticket number and link' are empty, and no explanation is provided for why documentation is not needed despite adding new public functions. Fill in 'Describe your changes' with a summary of the new CleanupOrphanedResources migration. Add the linked issue ticket number. Briefly explain why documentation isn't needed or update if it should be required for the new public API.
Docstring Coverage ⚠️ Warning Docstring coverage is 13.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: adding a cleanup migration for domains and services, which aligns with the core functionality introduced in the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/add-migration-to-clean-domains

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@management/server/migration/migration_test.go`:
- Around line 594-620: The test
TestCleanupOrphanedResources_SkipsWhenForeignKeyExists executes PostgreSQL-only
DDL (DROP CONSTRAINT, NOT VALID) but setupDatabase() defaults to SQLite, causing
failures on non-Postgres backends; gate the test to run only for Postgres by
detecting the store engine and skipping otherwise (e.g. check
os.Getenv("NETBIRD_STORE_ENGINE") == "postgres" or inspect db.Dialector().Name()
after calling setupDatabase(t)), and call t.Skipf with a short message before
running any Postgres-specific SQL so the rest of the test (the ALTER/NOT VALID
statements and migration.CleanupOrphanedResources[testChildWithFK, testParent])
only runs on Postgres.

In `@management/server/migration/migration.go`:
- Around line 493-529: Update hasForeignKey to return (bool, error) and
propagate DB errors instead of swallowing them: change the signature of
hasForeignKey(db *gorm.DB, table, column string) to hasForeignKey(db *gorm.DB,
table, column string) (bool, error), capture and check the error returned by
each db.Raw(...).Scan(...) call (for postgres/mysql and for sqlite where you
call db.Raw(fmt.Sprintf("PRAGMA foreign_key_list(%s)", table)).Scan(&fks)) and
return false with that error on failure; for sqlite still inspect fkInfo.From
entries and return (true, nil) when found or (false, nil) when none found; then
update callers (e.g., CleanupOrphanedResources) to handle the error return and
abort/propagate instead of assuming false.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 23c5f1cd-0b45-4544-acae-61ec8347f380

📥 Commits

Reviewing files that changed from the base of the PR and between ebd78e0 and 261228e.

📒 Files selected for processing (3)
  • management/server/migration/migration.go
  • management/server/migration/migration_test.go
  • management/server/store/store.go

Comment thread management/server/migration/migration_test.go
Comment on lines +493 to +529
func hasForeignKey(db *gorm.DB, table, column string) bool {
var count int64

switch db.Name() {
case "postgres":
db.Raw(`
SELECT COUNT(*) FROM information_schema.key_column_usage kcu
JOIN information_schema.table_constraints tc
ON tc.constraint_name = kcu.constraint_name
AND tc.table_schema = kcu.table_schema
WHERE tc.constraint_type = 'FOREIGN KEY'
AND kcu.table_name = ?
AND kcu.column_name = ?
`, table, column).Scan(&count)
case "mysql":
db.Raw(`
SELECT COUNT(*) FROM information_schema.key_column_usage
WHERE table_schema = DATABASE()
AND table_name = ?
AND column_name = ?
AND referenced_table_name IS NOT NULL
`, table, column).Scan(&count)
default: // sqlite
type fkInfo struct {
From string
}
var fks []fkInfo
db.Raw(fmt.Sprintf("PRAGMA foreign_key_list(%s)", table)).Scan(&fks)
for _, fk := range fks {
if fk.From == column {
return true
}
}
return false
}

return count > 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find the hasForeignKey function and surrounding context
rg -n "func hasForeignKey" management/server/migration/ -A 50

Repository: netbirdio/netbird

Length of output: 3828


🏁 Script executed:

# Also check for CleanupOrphanedResources to see how hasForeignKey is called
rg -n "CleanupOrphanedResources" management/server/migration/ -B 5 -A 20

Repository: netbirdio/netbird

Length of output: 15120


🏁 Script executed:

# Check if hasForeignKey is called elsewhere in the codebase
rg -n "hasForeignKey" --type go

Repository: netbirdio/netbird

Length of output: 397


🏁 Script executed:

# Get the context around the hasForeignKey call at line 567
sed -n '560,580p' management/server/migration/migration.go

Repository: netbirdio/netbird

Length of output: 816


🏁 Script executed:

# Check if there's error checking after hasForeignKey call
sed -n '560,585p' management/server/migration/migration.go

Repository: netbirdio/netbird

Length of output: 990


🏁 Script executed:

# Verify the exact error handling in hasForeignKey - check if Error is ever checked
sed -n '493,530p' management/server/migration/migration.go | grep -i "error\|\.Error"

Repository: netbirdio/netbird

Length of output: 43


Change hasForeignKey to return (bool, error) so metadata lookup failures abort instead of silently allowing data loss.

hasForeignKey() ignores .Scan().Error for every dialect. If metadata introspection fails (connection issue, permission denial, SQL syntax error), the function returns false, and CleanupOrphanedResources() proceeds with the destructive DELETE as if no FK existed. Return (bool, error) and have the caller check and propagate errors.

Proposed fix
-func hasForeignKey(db *gorm.DB, table, column string) bool {
+func hasForeignKey(db *gorm.DB, table, column string) (bool, error) {
 	var count int64

 	switch db.Name() {
 	case "postgres":
-		db.Raw(`
+		if err := db.Raw(`
 			SELECT COUNT(*) FROM information_schema.key_column_usage kcu
 			JOIN information_schema.table_constraints tc
 			  ON tc.constraint_name = kcu.constraint_name
 			  AND tc.table_schema = kcu.table_schema
 			WHERE tc.constraint_type = 'FOREIGN KEY'
 			  AND kcu.table_name = ?
 			  AND kcu.column_name = ?
-		`, table, column).Scan(&count)
+		`, table, column).Scan(&count).Error; err != nil {
+			return false, fmt.Errorf("inspect foreign keys for %s.%s: %w", table, column, err)
+		}
 	case "mysql":
-		db.Raw(`
+		if err := db.Raw(`
 			SELECT COUNT(*) FROM information_schema.key_column_usage
 			WHERE table_schema = DATABASE()
 			  AND table_name = ?
 			  AND column_name = ?
 			  AND referenced_table_name IS NOT NULL
-		`, table, column).Scan(&count)
+		`, table, column).Scan(&count).Error; err != nil {
+			return false, fmt.Errorf("inspect foreign keys for %s.%s: %w", table, column, err)
+		}
 	default: // sqlite
 		type fkInfo struct {
 			From string
 		}
 		var fks []fkInfo
-		db.Raw(fmt.Sprintf("PRAGMA foreign_key_list(%s)", table)).Scan(&fks)
+		if err := db.Raw(fmt.Sprintf("PRAGMA foreign_key_list(%s)", table)).Scan(&fks).Error; err != nil {
+			return false, fmt.Errorf("inspect foreign keys for %s.%s: %w", table, column, err)
+		}
 		for _, fk := range fks {
 			if fk.From == column {
-				return true
+				return true, nil
 			}
 		}
-		return false
+		return false, nil
 	}

-	return count > 0
+	return count > 0, nil
 }
-	if hasForeignKey(db, childTable, fkColumn) {
+	hasFK, err := hasForeignKey(db, childTable, fkColumn)
+	if err != nil {
+		return err
+	}
+	if hasFK {
 		log.WithContext(ctx).Debugf("foreign key constraint for %s already exists on %s, no cleanup needed", fkColumn, childTable)
 		return nil
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/migration/migration.go` around lines 493 - 529, Update
hasForeignKey to return (bool, error) and propagate DB errors instead of
swallowing them: change the signature of hasForeignKey(db *gorm.DB, table,
column string) to hasForeignKey(db *gorm.DB, table, column string) (bool,
error), capture and check the error returned by each db.Raw(...).Scan(...) call
(for postgres/mysql and for sqlite where you call db.Raw(fmt.Sprintf("PRAGMA
foreign_key_list(%s)", table)).Scan(&fks)) and return false with that error on
failure; for sqlite still inspect fkInfo.From entries and return (true, nil)
when found or (false, nil) when none found; then update callers (e.g.,
CleanupOrphanedResources) to handle the error return and abort/propagate instead
of assuming false.

@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@management/server/migration/migration_test.go`:
- Around line 621-628: The FOREIGN_KEY_CHECKS toggles are session-scoped so
calling db.Exec separately can hit different connections; wrap the sequence in a
single pinned connection (use db.Transaction or db.Begin) and run all Exec calls
inside that transaction to ensure they execute on the same connection.
Concretely, replace the separate db.Exec calls with db.Transaction(func(tx
*gorm.DB) error { tx.Exec("SET FOREIGN_KEY_CHECKS = 0"); tx.Exec("ALTER TABLE
test_children DROP FOREIGN KEY fk_test_children_parent"); tx.Exec("DELETE FROM
test_parents WHERE id = ?", "p2"); tx.Exec("ALTER TABLE test_children ADD
CONSTRAINT fk_test_children_parent FOREIGN KEY (parent_id) REFERENCES
test_parents(id)"); tx.Exec("SET FOREIGN_KEY_CHECKS = 1"); return tx.Error })
(or use db.DB().Conn(ctx) and exec on the returned *sql.Conn) so all statements
run on the same connection.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 893ab728-0979-4f87-9230-6843edfd700b

📥 Commits

Reviewing files that changed from the base of the PR and between 261228e and 4e2357b.

📒 Files selected for processing (1)
  • management/server/migration/migration_test.go

Comment on lines +621 to +628
require.NoError(t, db.Exec("SET FOREIGN_KEY_CHECKS = 0").Error)
require.NoError(t, db.Exec("ALTER TABLE test_children DROP FOREIGN KEY fk_test_children_parent").Error)
require.NoError(t, db.Exec("DELETE FROM test_parents WHERE id = ?", "p2").Error)
require.NoError(t, db.Exec(
"ALTER TABLE test_children ADD CONSTRAINT fk_test_children_parent "+
"FOREIGN KEY (parent_id) REFERENCES test_parents(id)",
).Error)
require.NoError(t, db.Exec("SET FOREIGN_KEY_CHECKS = 1").Error)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify current MySQL FK test uses unpinned sequential Exec calls.
rg -n -C3 'SET FOREIGN_KEY_CHECKS|DROP FOREIGN KEY|ADD CONSTRAINT fk_test_children_parent' management/server/migration/migration_test.go
rg -n -C2 'Transaction\\(|\\.DB\\(\\)|Conn\\(' management/server/migration/migration_test.go

Repository: netbirdio/netbird

Length of output: 1260


🌐 Web query:

In MySQL, is FOREIGN_KEY_CHECKS session-scoped, and in Go database/sql can consecutive Exec calls on *sql.DB run on different pooled connections unless pinned via sql.Conn or a transaction?

💡 Result:

Yes, FOREIGN_KEY_CHECKS is session-scoped in MySQL. It can be set per session with SET foreign_key_checks=0 (or 1), affecting only that connection/session. There is also a global scope (SET GLOBAL), but session settings override for the current connection, and new sessions inherit from global. Consecutive Exec calls on *sql.DB can run on different pooled connections, as *sql.DB manages a connection pool and each call acquires a connection from the pool (possibly different ones sequentially). To pin to the same connection, use sql.Conn (via db.Conn(ctx)) or a transaction (tx := db.BeginTx(ctx, nil)), where operations are guaranteed on the same connection.

Citations:


Pin MySQL FK-check statements to one connection to avoid flaky behavior.

FOREIGN_KEY_CHECKS is session-scoped in MySQL. Each db.Exec() call on *sql.DB can acquire a different connection from the pool, making it nondeterministic whether checks remain disabled across all statements on lines 621–628.

Suggested fix (pin to a dedicated SQL connection for the mysql branch)
 case "mysql":
-		require.NoError(t, db.Exec("SET FOREIGN_KEY_CHECKS = 0").Error)
-		require.NoError(t, db.Exec("ALTER TABLE test_children DROP FOREIGN KEY fk_test_children_parent").Error)
-		require.NoError(t, db.Exec("DELETE FROM test_parents WHERE id = ?", "p2").Error)
-		require.NoError(t, db.Exec(
-			"ALTER TABLE test_children ADD CONSTRAINT fk_test_children_parent "+
-				"FOREIGN KEY (parent_id) REFERENCES test_parents(id)",
-		).Error)
-		require.NoError(t, db.Exec("SET FOREIGN_KEY_CHECKS = 1").Error)
+		sqlDB, err := db.DB()
+		require.NoError(t, err)
+		conn, err := sqlDB.Conn(context.Background())
+		require.NoError(t, err)
+		t.Cleanup(func() {
+			_, _ = conn.ExecContext(context.Background(), "SET FOREIGN_KEY_CHECKS = 1")
+			_ = conn.Close()
+		})
+
+		_, err = conn.ExecContext(context.Background(), "SET FOREIGN_KEY_CHECKS = 0")
+		require.NoError(t, err)
+		_, err = conn.ExecContext(context.Background(), "ALTER TABLE test_children DROP FOREIGN KEY fk_test_children_parent")
+		require.NoError(t, err)
+		_, err = conn.ExecContext(context.Background(), "DELETE FROM test_parents WHERE id = ?", "p2")
+		require.NoError(t, err)
+		_, err = conn.ExecContext(
+			context.Background(),
+			"ALTER TABLE test_children ADD CONSTRAINT fk_test_children_parent "+
+				"FOREIGN KEY (parent_id) REFERENCES test_parents(id)",
+		)
+		require.NoError(t, err)
+		_, err = conn.ExecContext(context.Background(), "SET FOREIGN_KEY_CHECKS = 1")
+		require.NoError(t, err)
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
require.NoError(t, db.Exec("SET FOREIGN_KEY_CHECKS = 0").Error)
require.NoError(t, db.Exec("ALTER TABLE test_children DROP FOREIGN KEY fk_test_children_parent").Error)
require.NoError(t, db.Exec("DELETE FROM test_parents WHERE id = ?", "p2").Error)
require.NoError(t, db.Exec(
"ALTER TABLE test_children ADD CONSTRAINT fk_test_children_parent "+
"FOREIGN KEY (parent_id) REFERENCES test_parents(id)",
).Error)
require.NoError(t, db.Exec("SET FOREIGN_KEY_CHECKS = 1").Error)
case "mysql":
sqlDB, err := db.DB()
require.NoError(t, err)
conn, err := sqlDB.Conn(context.Background())
require.NoError(t, err)
t.Cleanup(func() {
_, _ = conn.ExecContext(context.Background(), "SET FOREIGN_KEY_CHECKS = 1")
_ = conn.Close()
})
_, err = conn.ExecContext(context.Background(), "SET FOREIGN_KEY_CHECKS = 0")
require.NoError(t, err)
_, err = conn.ExecContext(context.Background(), "ALTER TABLE test_children DROP FOREIGN KEY fk_test_children_parent")
require.NoError(t, err)
_, err = conn.ExecContext(context.Background(), "DELETE FROM test_parents WHERE id = ?", "p2")
require.NoError(t, err)
_, err = conn.ExecContext(
context.Background(),
"ALTER TABLE test_children ADD CONSTRAINT fk_test_children_parent "+
"FOREIGN KEY (parent_id) REFERENCES test_parents(id)",
)
require.NoError(t, err)
_, err = conn.ExecContext(context.Background(), "SET FOREIGN_KEY_CHECKS = 1")
require.NoError(t, err)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/migration/migration_test.go` around lines 621 - 628, The
FOREIGN_KEY_CHECKS toggles are session-scoped so calling db.Exec separately can
hit different connections; wrap the sequence in a single pinned connection (use
db.Transaction or db.Begin) and run all Exec calls inside that transaction to
ensure they execute on the same connection. Concretely, replace the separate
db.Exec calls with db.Transaction(func(tx *gorm.DB) error { tx.Exec("SET
FOREIGN_KEY_CHECKS = 0"); tx.Exec("ALTER TABLE test_children DROP FOREIGN KEY
fk_test_children_parent"); tx.Exec("DELETE FROM test_parents WHERE id = ?",
"p2"); tx.Exec("ALTER TABLE test_children ADD CONSTRAINT fk_test_children_parent
FOREIGN KEY (parent_id) REFERENCES test_parents(id)"); tx.Exec("SET
FOREIGN_KEY_CHECKS = 1"); return tx.Error }) (or use db.DB().Conn(ctx) and exec
on the returned *sql.Conn) so all statements run on the same connection.

@pascal-fischer pascal-fischer merged commit 5259e5d into main Apr 11, 2026
49 of 52 checks passed
@pascal-fischer pascal-fischer deleted the fix/add-migration-to-clean-domains branch April 11, 2026 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants