-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Service / LoadBalancer synchronization fixes #1352
Conversation
f26798a
to
5197676
Compare
cilium service delete
5197676
to
42fd6fb
Compare
Hitting an issue in the K8s multinode tests which I haven't seen before, so I'm removing this from pending-review until I can get those tests passing. |
cd07c03
to
2ae19fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOOI - Out Of Order Imports
@@ -47,6 +48,10 @@ type LBBackEnd struct { | |||
Weight uint16 | |||
} | |||
|
|||
func (lbbe *LBBackEnd) String() string { | |||
return fmt.Sprintf("%s, weight: %d", lbbe.L3n4Addr.String(), lbbe.Weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there a String() already for this? I was 80% sure there was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since LBBackEnd
is embedded with an L3n4Addr
, I guess we could call String()
on it, and it would use the L3n4Addr.String()
function, but we wouldn't get the value for the weight for the LBBackEnd
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean for LBBackend. It's fine now
daemon/loadbalancer.go
Outdated
log.Debugf("adding service %d to BPF maps", feCilium.ID) | ||
|
||
// Try to delete service before adding it and ignore errors as it might not exist. | ||
_ = d.svcDeleteByFrontendLocked(&feCilium.L3n4Addr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at least run the erros in debug mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
numBackends := uint16(vval.GetCount()) | ||
|
||
// ServiceKeys are unique by their slave number, which corresponds to the number of backends. Delete each of these. | ||
for i := numBackends; i > 0; i-- { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to delete all map entries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're deleting the service, we should clean all of the entries out of the BPF map. Why would we not want to delete all of the entries?
daemon/loadbalancer.go
Outdated
if err := lbmap.DeleteService(svcKey); err != nil { | ||
// Clean services and rev nats from BPF maps that failed to be restored. | ||
for _, svc := range failedSyncSVC { | ||
log.Debugf("Unable to restore, so rmoving service: %s", svc.FE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo s/rmoving/removing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
pkg/maps/lbmap/lbmap.go
Outdated
@@ -19,6 +19,7 @@ import ( | |||
"net" | |||
"unsafe" | |||
|
|||
log "github.com/Sirupsen/logrus" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOOI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@@ -361,17 +368,18 @@ func AddSVC2BPFMap(fe ServiceKey, besValues []ServiceValue, addRevNAT bool, revN | |||
|
|||
// L3n4Addr2ServiceKey converts the given l3n4Addr to a ServiceKey with the slave ID | |||
// set to 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update function's comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
2ae19fc
to
7c2e074
Compare
51bf723
to
1a45982
Compare
Dismissing review - I have addressed the comments and have requested another review.
common/types/loadbalancer.go
Outdated
@@ -372,7 +391,12 @@ func (a *L3n4Addr) IsIPv6() bool { | |||
// KVStore. | |||
type L3n4AddrID struct { | |||
L3n4Addr | |||
ID ServiceID | |||
ID ServiceID | |||
Slave uint16 // Number of slaves associated with this L3n4AddrID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the structure used in the kvstore. Is this safe? What will happen if we upgrade Cilium and the kvstore contains old services entries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. This actually isn't used anywhere in the cilium-agent code, so I'm going to revert it. I figured it would be nice to have to add it now and then actually implement it at a later date, but if it's going to mess with the kv-store, I don't think it's worth it at this time. I'll have this removed in the next upload.
// Deletes a service by the frontend address | ||
func (d *Daemon) svcDeleteByFrontend(frontend *types.L3n4Addr) error { | ||
d.loadBalancer.BPFMapMU.Lock() | ||
defer d.loadBalancer.BPFMapMU.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is confusing, can't we take this lock in the loadbalancer itself? Is this lock protecting the data structures in the daemon as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this function to give flexibility when we need to delete a service by its L3n4Addr
depending on if we have already acquired BPFMapMU
. This is similar to other functions throughout the daemon. Can we sync up over video chat about this? It would be easier to explain that way.
891aa36
to
d0fd73e
Compare
d0fd73e
to
81df82f
Compare
ee2d9e6
to
4e964c5
Compare
* common/types: add String() function for LBBackend, add debug logs throughout, and update LBSVC object with its SHA256 hash. * daemon: clean all LB, RevNAT maps on host when restore mode is not enabled. Add debug logs in the service-related code. When a service is updated or deleted, clean BPF LB maps of old map entries for the given service. Fix logic for syncing Cilium's LB maps with the BPF maps on disk when Cilium is restarted and restore mode is enabled. * pkg/maps/lbmap: add logs when services or RevNAT entries change state. * tests: refactor 06-lb.sh into functions to reduce duplicate code, and add test that ensures restore mode works appropriately for LB Maps when Cilium is restarted. Signed-off by: Ian Vernon <[email protected]>
4e964c5
to
216d5a6
Compare
Please review. |
I have addressed comments and have added more changes to the code since the previous review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we check if we actually need all debug messages?
@@ -47,6 +48,10 @@ type LBBackEnd struct { | |||
Weight uint16 | |||
} | |||
|
|||
func (lbbe *LBBackEnd) String() string { | |||
return fmt.Sprintf("%s, weight: %d", lbbe.L3n4Addr.String(), lbbe.Weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean for LBBackend. It's fine now
delete(lb.SVCMap, oldSvc.Sha256) | ||
} | ||
log.Debugf("adding service %s with SHA %s to lb.SVCMap", svc.FE.String(), svc.Sha256) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need all this debug messages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that the debug messages I added were useful in determining what was going on in this code. There were very few logs when I started out with this branch in the service code. I don't see a downside to having this log - it provides important information about the state managed in Cilium.
return &L3n4Addr{IP: ip, L4Addr: *lbport}, nil | ||
|
||
addr := L3n4Addr{IP: ip, L4Addr: *lbport} | ||
log.Debugf("created new L3n4Addr %s", addr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the disadvantage of having more debug logs - see my comments above.
Global NodePort services are not supported when they are managed by iptables/ipvs, since they do not know about remote endpoints. Hence, let's skip the related connectivity tests when running in multicluster mode and KPR NodePort support is disabled, to prevent spurious failures. Related: cilium#23128 Related: cilium#23266 Fixes: cilium#1352 Signed-off-by: Marco Iorio <[email protected]>
member to L3n4AddrID, add debug logs throughout, and update LBSVC object
with its SHA256 hash.
Add debug logs in the service-related code. When a service is updated or
deleted, clean BPF LB maps of old map entries for the given service. Fix
logic for syncing Cilium's LB maps with the BPF maps on disk when Cilium is
restarted and restore mode is enabled.
test that ensures restore mode works appropriately for LB Maps when Cilium
is restarted.
Signed-off by: Ian Vernon [email protected]
Fixes #1295