Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make unlocking more reliable #59

Merged
merged 1 commit into from
Sep 26, 2024
Merged

Make unlocking more reliable #59

merged 1 commit into from
Sep 26, 2024

Conversation

jstudler
Copy link
Contributor

In some cases, the Prefix and IpAddress reconcilers are stuck because a resource update prior to unlocking failed (e.g. due to concurrent edit in the kube-api). This could be e.g. the following line https://github.com/netbox-community/netbox-operator/blob/main/internal/controller/prefix_controller.go#L182

The next run of the reconciler will have nil for ll in https://github.com/netbox-community/netbox-operator/blob/main/internal/controller/prefix_controller.go#L203 and thus not try to unlock it. Example log output of the netbox-operator:

2024-09-12T11:48:16+02:00       ERROR   Reconciler error        {"controller": "prefix", "controllerGroup": "netbox.dev", "controllerKind": "Prefix", "Prefix": {"name":"example","namespace":"default"}, "namespace": "default", "name": "example", "reconcileID": "1c20d680-fc77-48fb-a949-730a20ebd83a", "error": "Operation cannot be fulfilled on prefixes.netbox.dev \"example\": the object has been modified; please apply your changes to the latest version and try again"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler
        /Users/joel/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:316
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem
        /Users/joel/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2
        /Users/joel/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224
2024-09-12T11:48:16+02:00       INFO    prefix reconcile loop started   {"controller": "prefix", "controllerGroup": "netbox.dev", "controllerKind": "Prefix", "Prefix": {"name":"example","namespace":"default"}, "namespace": "default", "name": "example", "reconcileID": "f12a1389-4102-4507-b5b0-21873447dc16"}

After this, the entire reconciler will be blocked because the next object trying to aquire the lock for the same parentPrefix will time out (to the configured 60 seconds) and thus all operations on the same CRs will be blocked for these 60 seconds (each reconcile loop is single threaded).

This code change enhances the unlocking by moving the leaselocker Unlock() call up in the reconciler to the position where it actually makes sense (right after the Prefix or IP was updated in NetBox). We also move up the status update right after unlocking in order not to lose the reference to the object in NetBox.

@jstudler jstudler self-assigned this Sep 12, 2024
@henrybear327 henrybear327 force-pushed the fix/enhance-unlocking branch 2 times, most recently from fbbfbbe to 78049aa Compare September 24, 2024 06:53
@jstudler jstudler requested a review from bruelea September 24, 2024 08:10
Copy link
Collaborator

@bruelea bruelea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
Small note about the update of the status fields: If the update of the status fields fails the reference to the object in NetBox won't be lost, it will be added the next time the reconcile function runs.

@henrybear327 henrybear327 merged commit 110cbb7 into main Sep 26, 2024
5 checks passed
@henrybear327 henrybear327 deleted the fix/enhance-unlocking branch September 26, 2024 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants