Skip to content

Node IP handler: Fix (include) keepalived VIPs#4168

Merged
tssurya merged 10 commits into
ovn-kubernetes:masterfrom
martinkennelly:fix-keepalived-vips
Feb 28, 2024
Merged

Node IP handler: Fix (include) keepalived VIPs#4168
tssurya merged 10 commits into
ovn-kubernetes:masterfrom
martinkennelly:fix-keepalived-vips

Conversation

@martinkennelly
Copy link
Copy Markdown
Contributor

#4040 introduced a constraint to filter VIPs. Remove that.
Also remove the external call to get these addresses.
Previously, we needed to use an external pkg func call because we wanted to filter EIPs but that is no longer needed so get the IP addrs locally and filter the addresses locally, and dont do it in two places which is confusing.

@coveralls
Copy link
Copy Markdown

Coverage Status

coverage: 52.017% (+0.03%) from 51.984%
when pulling 327f353 on martinkennelly:fix-keepalived-vips
into 23c8e0c on ovn-org:master.

@tssurya tssurya self-assigned this Feb 20, 2024
Copy link
Copy Markdown
Contributor

@tssurya tssurya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the commit splits! made review easier.
reviewed all commits except 6th, will come back to that with fresh mind tomorrow

Comment thread go-controller/pkg/node/node_ip_handler_linux.go
Comment thread go-controller/pkg/node/node_ip_handler_linux.go
// address but only later update kubelet configuration and restart kubelet (which in turn will update the reported
// IP address inside the node's status field).
nodeInformer := c.watchFactory.NodeInformer()
_, err := nodeInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: c.watchFactory.NodeInformer().AddEventHandler(cache.ResourceEventHandlerFuncs{... ? I find it weird we are initializing a new variable with the informer cache? Is it intentional because we don't want to change the watchFactory node informer?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it weird we are initializing a new variable with the informer cache?

Not sure I am following you. var nodeInformer is just retrieving the existing node informer that was init by factory.

func (wf *WatchFactory) NodeInformer() cache.SharedIndexInformer {
	return wf.informers[NodeType].inf
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah no nothing major my comment was simply: c.watchFactory.NodeInformer().AddEventHandler( why don't we just do this in one go like that without assigning it to a variable first and if we had a reason to have a variable in the first place.. nw, its a nit..

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya - its a bit needless to do that. I just copy, pasted.

}()
}

// runInternal can be used by testcases to provide a fake subscription function
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its not just used by testcases right? we probably wanna update the func desc to be more informative

Comment thread go-controller/pkg/node/node_ip_handler_linux.go
})

Context("valid addresses", func() {
It("allows keepalived VIP", func() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make sure we add a unit test for:

link local addresses, OVN reserved IPs, and addresses marked as secondary or deprecated.

and double check to see these are not filtered at this stage?
if I am asking for too much and perhaps most of these are in the same line as what you have added, push back please but since you already have all the plumbing in place I was hoping this will be easier

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link local addresses would be filtered - but uniq link local addresses are not and I added a test for that.

Copy link
Copy Markdown
Contributor

@tssurya tssurya Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack thanks! and perhaps also for masquerade IPs (with combo of V6 addresses as well for robustness)

Comment thread go-controller/pkg/node/node_ip_handler_linux_test.go
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack based on your commit message, if you feel we need to have a cleanup done or consolidate things please open an upstream issue (perhaps we can try to mark that as "good-first-issue" or something)

fakeBridgeConfiguration := &bridgeConfiguration{bridgeName: "breth0"}

k := &kube.Kube{KClient: tc.fakeClient}
if createNs {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I would just make this another util:
configureKubeOVNContextWithNs which calls configureKubeOVNContext and then adds the extra plumbing you need like replace the ipManager or whatever..
then use configureKubeOVNContext from the old test and configureKubeOVNContextWithNs from your new test in next commit

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, thats cleaner. Thanks!

Expect(setupPrimaryInfFn()).ShouldNot(HaveOccurred())
}
}
//FIXME (mk): newAddressManagerInternal calls a sync - when this is removed, remove the need to call that func within NS
Copy link
Copy Markdown
Contributor

@tssurya tssurya Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment very well :( I'll come back to this tomorrow. (too much "this" and "that" for my empty stomach)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explained on slack but for anyone else - a call to newAddressManagerInternal performs the following commands:

// reproducibility of unit tests.
func newAddressManagerInternal(nodeName string, k kube.Interface, config *managementPortConfig, watchFactory factory.NodeWatchFactory, gwBridge *bridgeConfiguration, useNetlink bool) *addressManager {
	mgr := &addressManager{
		nodeName:       nodeName,
		watchFactory:   watchFactory,
		cidrs:          sets.New[string](),
		mgmtPortConfig: config,
		gatewayBridge:  gwBridge,
		OnChanged:      func() {},
		useNetlink:     useNetlink,
		syncPeriod:     30 * time.Second,
	}
	mgr.nodeAnnotator = kube.NewNodeAnnotator(k, nodeName)
	mgr.sync()  <------Notice
	return mgr
}

Notice the sync() call. Therefore, to ensure sync() exec in the correct netns, we must call newAddressManagerInternal in the correct netns.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Mairt! makes sense,

Copy link
Copy Markdown
Contributor

@tssurya tssurya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewed all commits, logic lgtm
so once you push with:
#4168 (comment)
#4168 (comment)
#4168 (comment)
rebased on master
we can merge this

thank you Martin! ++ cookies for the tests

addrSubscribeOptions := netlink.AddrSubscribeOptions{
ErrorCallback: func(err error) {
klog.Errorf("Failed during AddrSubscribe callback: %v", err)
// Note: Not calling sync() from here: it is redudant and unsafe when stopChan is closed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Note: Not calling sync() from here: it is redudant and unsafe when stopChan is closed.
// Note: Not calling sync() from here: it is redundant and unsafe when stopChan is closed.

doneWg.Add(1)
go func() {
c.runInternal(stopChan, subscribe)
c.runInternal(stopChan, c.getNetlinkAddrSubFunc(stopChan))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self-note: the same channel is passed into runInternal and into addrSubscribeAt
the former simply returns if stop is called and latter has a goroutine which listens to that channel and then calls socket close..

Previous changes in PR ovn-kubernetes#4040 added a new constraint
to not include keepalived vips in an external func
even though we filtered node IPs within node IP handler.

Remove the external func which filters addresses
and keep the filtering logic within node ip handler
code.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
…ternal

Do this in-order to allow follow on work to allow func
'runInternal' to be exec directly with 'go runInternal()'

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Because it is harder to test this func when we spawn the go routine
within our main logic func for node IP handler.

No logic change within runInternal.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
No func change.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
..to allow new tests to consume

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
For testing, we do not want to call run directly because
it will spawn a goroutine which can escape any network
namespace in which it is called.
Seperate it into its own func so it can be called during
testing and pinned to a net namespace.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
This test case was difficult to add to existing test cases
because they are focused on netlink addr update events and ignored
the sync func.

This commit creates a new test cases that test events and sync func.
Decided to use a new netns instead of mocking as it seems simpler and
easier to extend however there maybe overlap with existing test cases.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
…N reserved IPs

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Otherwise, we get exception printed out in logs.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
tssurya
tssurya previously approved these changes Feb 28, 2024
Copy link
Copy Markdown
Contributor

@tssurya tssurya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tests soo nice! thanks Martin
change lgtm (checked only the new changes/bits in 3rd iteration).
will merge once CI is green.

tc.doneWg.Wait()
tc.watchFactory.Shutdown()
close(tc.addrChan)
util.ResetRunner()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nice, learnt something new today.

useNetlink := true
var tc *testCtx
testNs.Do(func(netNS ns.NetNS) error {
tc = configureKubeOVNContext(nodeName, useNetlink)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lovely!

@martinkennelly
Copy link
Copy Markdown
Contributor Author

martinkennelly commented Feb 28, 2024

TC allows unique local address passed locally for me.. checking why it may fail.
Edit: ipv6 is disabled so in sync() it wipes out the added address... Seems like a problem that addr "event" handler doesnt respect the enabled IP family.

Some of the tcs require v6.
Unable to switch this support on per tc, because
we currently start the node ip handler before
running each test case and it inspects this IP
support before a tc is ever executed.

Enable for all tcs.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Copy link
Copy Markdown
Contributor

@tssurya tssurya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug All issues that are bugs and PRs opened to fix bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants