Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service / LoadBalancer synchronization fixes #1352

Merged
merged 1 commit into from
Sep 14, 2017
Merged

Conversation

ianvernon
Copy link
Member

@ianvernon ianvernon commented Aug 23, 2017

  • common/types: add String() function for LBBackend & L3n4AddrID, add Slave
    member to L3n4AddrID, add debug logs throughout, and update LBSVC object
    with its SHA256 hash.
  • daemon: clean all LB, RevNAT maps on host when restore mode is not enabled.
    Add debug logs in the service-related code. When a service is updated or
    deleted, clean BPF LB maps of old map entries for the given service. Fix
    logic for syncing Cilium's LB maps with the BPF maps on disk when Cilium is
    restarted and restore mode is enabled.
  • pkg/maps/lbmap: add logs when services or RevNAT entries change state.
  • tests: refactor 06-lb.sh into functions to reduce duplicate code, and add
    test that ensures restore mode works appropriately for LB Maps when Cilium
    is restarted.

Signed-off by: Ian Vernon [email protected]
Fixes #1295

@ianvernon ianvernon added the wip label Aug 23, 2017
@ianvernon ianvernon force-pushed the fix-service-delete branch 3 times, most recently from f26798a to 5197676 Compare August 23, 2017 21:37
@ianvernon ianvernon changed the title WIP: fixing cilium service delete Service / LoadBalancer synchronization fixes Aug 23, 2017
@ianvernon ianvernon requested review from tgraf and aanm August 23, 2017 21:42
@ianvernon ianvernon added the kind/bug This is a bug in the Cilium logic. label Aug 23, 2017
@ianvernon
Copy link
Member Author

Hitting an issue in the K8s multinode tests which I haven't seen before, so I'm removing this from pending-review until I can get those tests passing.

@ianvernon ianvernon force-pushed the fix-service-delete branch 4 times, most recently from cd07c03 to 2ae19fc Compare August 28, 2017 13:46
aanm
aanm previously requested changes Aug 28, 2017
Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOOI - Out Of Order Imports

@@ -47,6 +48,10 @@ type LBBackEnd struct {
Weight uint16
}

func (lbbe *LBBackEnd) String() string {
return fmt.Sprintf("%s, weight: %d", lbbe.L3n4Addr.String(), lbbe.Weight)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there a String() already for this? I was 80% sure there was.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since LBBackEnd is embedded with an L3n4Addr, I guess we could call String() on it, and it would use the L3n4Addr.String() function, but we wouldn't get the value for the weight for the LBBackEnd.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean for LBBackend. It's fine now

log.Debugf("adding service %d to BPF maps", feCilium.ID)

// Try to delete service before adding it and ignore errors as it might not exist.
_ = d.svcDeleteByFrontendLocked(&feCilium.L3n4Addr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least run the erros in debug mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

numBackends := uint16(vval.GetCount())

// ServiceKeys are unique by their slave number, which corresponds to the number of backends. Delete each of these.
for i := numBackends; i > 0; i-- {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to delete all map entries?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're deleting the service, we should clean all of the entries out of the BPF map. Why would we not want to delete all of the entries?

if err := lbmap.DeleteService(svcKey); err != nil {
// Clean services and rev nats from BPF maps that failed to be restored.
for _, svc := range failedSyncSVC {
log.Debugf("Unable to restore, so rmoving service: %s", svc.FE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo s/rmoving/removing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -19,6 +19,7 @@ import (
"net"
"unsafe"

log "github.com/Sirupsen/logrus"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOOI

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -361,17 +368,18 @@ func AddSVC2BPFMap(fe ServiceKey, besValues []ServiceValue, addRevNAT bool, revN

// L3n4Addr2ServiceKey converts the given l3n4Addr to a ServiceKey with the slave ID
// set to 0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update function's comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@ianvernon ianvernon requested a review from aanm August 28, 2017 20:38
@ianvernon ianvernon force-pushed the fix-service-delete branch 2 times, most recently from 51bf723 to 1a45982 Compare August 29, 2017 20:09
@ianvernon ianvernon dismissed aanm’s stale review August 29, 2017 22:06

Dismissing review - I have addressed the comments and have requested another review.

tgraf
tgraf previously requested changes Aug 30, 2017
@@ -372,7 +391,12 @@ func (a *L3n4Addr) IsIPv6() bool {
// KVStore.
type L3n4AddrID struct {
L3n4Addr
ID ServiceID
ID ServiceID
Slave uint16 // Number of slaves associated with this L3n4AddrID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the structure used in the kvstore. Is this safe? What will happen if we upgrade Cilium and the kvstore contains old services entries?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. This actually isn't used anywhere in the cilium-agent code, so I'm going to revert it. I figured it would be nice to have to add it now and then actually implement it at a later date, but if it's going to mess with the kv-store, I don't think it's worth it at this time. I'll have this removed in the next upload.

// Deletes a service by the frontend address
func (d *Daemon) svcDeleteByFrontend(frontend *types.L3n4Addr) error {
d.loadBalancer.BPFMapMU.Lock()
defer d.loadBalancer.BPFMapMU.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing, can't we take this lock in the loadbalancer itself? Is this lock protecting the data structures in the daemon as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this function to give flexibility when we need to delete a service by its L3n4Addr depending on if we have already acquired BPFMapMU. This is similar to other functions throughout the daemon. Can we sync up over video chat about this? It would be easier to explain that way.

@ianvernon ianvernon force-pushed the fix-service-delete branch 2 times, most recently from 891aa36 to d0fd73e Compare September 1, 2017 17:35
* common/types: add String() function for LBBackend, add debug logs throughout,
and update LBSVC object with its SHA256 hash.
* daemon: clean all LB, RevNAT maps on host when restore mode is not enabled.
Add debug logs in the service-related code. When a service is updated or
deleted, clean BPF LB maps of old map entries for the given service. Fix
logic for syncing Cilium's LB maps with the BPF maps on disk when Cilium is
restarted and restore mode is enabled.
* pkg/maps/lbmap: add logs when services or RevNAT entries change state.
* tests: refactor 06-lb.sh into functions to reduce duplicate code, and add
test that ensures restore mode works appropriately for LB Maps when Cilium
is restarted.

Signed-off by: Ian Vernon <[email protected]>
@ianvernon
Copy link
Member Author

Please review.

@ianvernon ianvernon dismissed tgraf’s stale review September 13, 2017 14:53

I have addressed comments and have added more changes to the code since the previous review.

Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check if we actually need all debug messages?

@@ -47,6 +48,10 @@ type LBBackEnd struct {
Weight uint16
}

func (lbbe *LBBackEnd) String() string {
return fmt.Sprintf("%s, weight: %d", lbbe.L3n4Addr.String(), lbbe.Weight)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean for LBBackend. It's fine now

delete(lb.SVCMap, oldSvc.Sha256)
}
log.Debugf("adding service %s with SHA %s to lb.SVCMap", svc.FE.String(), svc.Sha256)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need all this debug messages?

Copy link
Member Author

@ianvernon ianvernon Sep 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that the debug messages I added were useful in determining what was going on in this code. There were very few logs when I started out with this branch in the service code. I don't see a downside to having this log - it provides important information about the state managed in Cilium.

return &L3n4Addr{IP: ip, L4Addr: *lbport}, nil

addr := L3n4Addr{IP: ip, L4Addr: *lbport}
log.Debugf("created new L3n4Addr %s", addr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the disadvantage of having more debug logs - see my comments above.

@tgraf tgraf merged commit 31a88cf into master Sep 14, 2017
@tgraf tgraf deleted the fix-service-delete branch September 14, 2017 17:55
michi-covalent pushed a commit to michi-covalent/cilium that referenced this pull request May 30, 2023
Global NodePort services are not supported when they are managed by
iptables/ipvs, since they do not know about remote endpoints. Hence,
let's skip the related connectivity tests when running in multicluster
mode and KPR NodePort support is disabled, to prevent spurious failures.

Related: cilium#23128
Related: cilium#23266
Fixes: cilium#1352

Signed-off-by: Marco Iorio <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants