Node IP handler: Fix (include) keepalived VIPs#4168
Conversation
tssurya
left a comment
There was a problem hiding this comment.
thanks for the commit splits! made review easier.
reviewed all commits except 6th, will come back to that with fresh mind tomorrow
| // address but only later update kubelet configuration and restart kubelet (which in turn will update the reported | ||
| // IP address inside the node's status field). | ||
| nodeInformer := c.watchFactory.NodeInformer() | ||
| _, err := nodeInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{ |
There was a problem hiding this comment.
nit: c.watchFactory.NodeInformer().AddEventHandler(cache.ResourceEventHandlerFuncs{... ? I find it weird we are initializing a new variable with the informer cache? Is it intentional because we don't want to change the watchFactory node informer?
There was a problem hiding this comment.
I find it weird we are initializing a new variable with the informer cache?
Not sure I am following you. var nodeInformer is just retrieving the existing node informer that was init by factory.
func (wf *WatchFactory) NodeInformer() cache.SharedIndexInformer {
return wf.informers[NodeType].inf
}
There was a problem hiding this comment.
yeah no nothing major my comment was simply: c.watchFactory.NodeInformer().AddEventHandler( why don't we just do this in one go like that without assigning it to a variable first and if we had a reason to have a variable in the first place.. nw, its a nit..
There was a problem hiding this comment.
ya - its a bit needless to do that. I just copy, pasted.
| }() | ||
| } | ||
|
|
||
| // runInternal can be used by testcases to provide a fake subscription function |
There was a problem hiding this comment.
its not just used by testcases right? we probably wanna update the func desc to be more informative
| }) | ||
|
|
||
| Context("valid addresses", func() { | ||
| It("allows keepalived VIP", func() { |
There was a problem hiding this comment.
can you make sure we add a unit test for:
link local addresses, OVN reserved IPs, and addresses marked as secondary or deprecated.
and double check to see these are not filtered at this stage?
if I am asking for too much and perhaps most of these are in the same line as what you have added, push back please but since you already have all the plumbing in place I was hoping this will be easier
There was a problem hiding this comment.
Link local addresses would be filtered - but uniq link local addresses are not and I added a test for that.
There was a problem hiding this comment.
ack thanks! and perhaps also for masquerade IPs (with combo of V6 addresses as well for robustness)
There was a problem hiding this comment.
ack based on your commit message, if you feel we need to have a cleanup done or consolidate things please open an upstream issue (perhaps we can try to mark that as "good-first-issue" or something)
| fakeBridgeConfiguration := &bridgeConfiguration{bridgeName: "breth0"} | ||
|
|
||
| k := &kube.Kube{KClient: tc.fakeClient} | ||
| if createNs { |
There was a problem hiding this comment.
so I would just make this another util:
configureKubeOVNContextWithNs which calls configureKubeOVNContext and then adds the extra plumbing you need like replace the ipManager or whatever..
then use configureKubeOVNContext from the old test and configureKubeOVNContextWithNs from your new test in next commit
There was a problem hiding this comment.
Ya, thats cleaner. Thanks!
| Expect(setupPrimaryInfFn()).ShouldNot(HaveOccurred()) | ||
| } | ||
| } | ||
| //FIXME (mk): newAddressManagerInternal calls a sync - when this is removed, remove the need to call that func within NS |
There was a problem hiding this comment.
I don't understand this comment very well :( I'll come back to this tomorrow. (too much "this" and "that" for my empty stomach)
There was a problem hiding this comment.
I explained on slack but for anyone else - a call to newAddressManagerInternal performs the following commands:
// reproducibility of unit tests.
func newAddressManagerInternal(nodeName string, k kube.Interface, config *managementPortConfig, watchFactory factory.NodeWatchFactory, gwBridge *bridgeConfiguration, useNetlink bool) *addressManager {
mgr := &addressManager{
nodeName: nodeName,
watchFactory: watchFactory,
cidrs: sets.New[string](),
mgmtPortConfig: config,
gatewayBridge: gwBridge,
OnChanged: func() {},
useNetlink: useNetlink,
syncPeriod: 30 * time.Second,
}
mgr.nodeAnnotator = kube.NewNodeAnnotator(k, nodeName)
mgr.sync() <------Notice
return mgr
}
Notice the sync() call. Therefore, to ensure sync() exec in the correct netns, we must call newAddressManagerInternal in the correct netns.
There was a problem hiding this comment.
thanks Mairt! makes sense,
There was a problem hiding this comment.
reviewed all commits, logic lgtm
so once you push with:
#4168 (comment)
#4168 (comment)
#4168 (comment)
rebased on master
we can merge this
thank you Martin! ++ cookies for the tests
| addrSubscribeOptions := netlink.AddrSubscribeOptions{ | ||
| ErrorCallback: func(err error) { | ||
| klog.Errorf("Failed during AddrSubscribe callback: %v", err) | ||
| // Note: Not calling sync() from here: it is redudant and unsafe when stopChan is closed. |
There was a problem hiding this comment.
| // Note: Not calling sync() from here: it is redudant and unsafe when stopChan is closed. | |
| // Note: Not calling sync() from here: it is redundant and unsafe when stopChan is closed. |
| doneWg.Add(1) | ||
| go func() { | ||
| c.runInternal(stopChan, subscribe) | ||
| c.runInternal(stopChan, c.getNetlinkAddrSubFunc(stopChan)) |
There was a problem hiding this comment.
self-note: the same channel is passed into runInternal and into addrSubscribeAt
the former simply returns if stop is called and latter has a goroutine which listens to that channel and then calls socket close..
Previous changes in PR ovn-kubernetes#4040 added a new constraint to not include keepalived vips in an external func even though we filtered node IPs within node IP handler. Remove the external func which filters addresses and keep the filtering logic within node ip handler code. Signed-off-by: Martin Kennelly <mkennell@redhat.com>
…ternal Do this in-order to allow follow on work to allow func 'runInternal' to be exec directly with 'go runInternal()' Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Because it is harder to test this func when we spawn the go routine within our main logic func for node IP handler. No logic change within runInternal. Signed-off-by: Martin Kennelly <mkennell@redhat.com>
No func change. Signed-off-by: Martin Kennelly <mkennell@redhat.com>
..to allow new tests to consume Signed-off-by: Martin Kennelly <mkennell@redhat.com>
For testing, we do not want to call run directly because it will spawn a goroutine which can escape any network namespace in which it is called. Seperate it into its own func so it can be called during testing and pinned to a net namespace. Signed-off-by: Martin Kennelly <mkennell@redhat.com>
This test case was difficult to add to existing test cases because they are focused on netlink addr update events and ignored the sync func. This commit creates a new test cases that test events and sync func. Decided to use a new netns instead of mocking as it seems simpler and easier to extend however there maybe overlap with existing test cases. Signed-off-by: Martin Kennelly <mkennell@redhat.com>
…N reserved IPs Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Otherwise, we get exception printed out in logs. Signed-off-by: Martin Kennelly <mkennell@redhat.com>
327f353 to
6402f56
Compare
tssurya
left a comment
There was a problem hiding this comment.
the tests soo nice! thanks Martin
change lgtm (checked only the new changes/bits in 3rd iteration).
will merge once CI is green.
| tc.doneWg.Wait() | ||
| tc.watchFactory.Shutdown() | ||
| close(tc.addrChan) | ||
| util.ResetRunner() |
There was a problem hiding this comment.
oh nice, learnt something new today.
| useNetlink := true | ||
| var tc *testCtx | ||
| testNs.Do(func(netNS ns.NetNS) error { | ||
| tc = configureKubeOVNContext(nodeName, useNetlink) |
|
TC |
f34c07f to
bd82594
Compare
Some of the tcs require v6. Unable to switch this support on per tc, because we currently start the node ip handler before running each test case and it inspects this IP support before a tc is ever executed. Enable for all tcs. Signed-off-by: Martin Kennelly <mkennell@redhat.com>
bd82594 to
7a9819b
Compare
#4040 introduced a constraint to filter VIPs. Remove that.
Also remove the external call to get these addresses.
Previously, we needed to use an external pkg func call because we wanted to filter EIPs but that is no longer needed so get the IP addrs locally and filter the addresses locally, and dont do it in two places which is confusing.