Skip to content

Conversation

@ctelfer
Copy link
Contributor

@ctelfer ctelfer commented May 22, 2018

Cherry-pick commits 4aeb1fc and 2af06a0 to the bump_17.06 branch. These commits combined allow libnetwork to gracefully remove load balancing endpoints while not disconnecting the backend endpoints until they finish servicing existing connections.

This PR contains a fix for moby/moby#30321. There was a moby/moby#31142
PR intending to fix the issue by adding a delay between disabling the
service in the cluster and the shutdown of the tasks. However
disabling the service was not deleting the service info in the cluster.
Added a fix to delete service info from cluster and verified using siege
to ensure there is zero downtime on rolling update of a service.

Signed-off-by: abhi <[email protected]>
(cherry picked from commit 4aeb1fc)
@ctelfer ctelfer requested a review from fcrisciani May 22, 2018 22:13
This patch attempts to allow endpoints to complete servicing connections
while being removed from a service.  The change adds a flag to the
endpoint.deleteServiceInfoFromCluster() method to indicate whether this
removal should fully remove connectivity through the load balancer
to the endpoint or should just disable directing further connections to
the endpoint.  If the flag is 'false', then the load balancer assigns
a weight of 0 to the endpoint but does not remove it as a linux load
balancing destination.  It does remove the endpoint as a docker load
balancing endpoint but tracks it in a special map of "disabled-but-not-
destroyed" load balancing endpoints.  This allows traffic to continue
flowing, at least under Linux.  If the flag is 'true', then the code
removes the endpoint entirely as a load balancing destination.

The sandbox.DisableService() method invokes deleteServiceInfoFromCluster()
with the flag sent to 'false', while the endpoint.sbLeave() method invokes
it with the flag set to 'true' to complete the removal on endpoint
finalization.  Renaming the endpoint invokes deleteServiceInfoFromCluster()
with the flag set to 'true' because renaming attempts to completely
remove and then re-add each endpoint service entry.

The controller.rmServiceBinding() method, which carries out the operation,
similarly gets a new flag for whether to fully remove the endpoint.  If
the flag is false, it does the job of moving the endpoint from the
load balancing set to the 'disabled' set.  It then removes or
de-weights the entry in the OS load balancing table via
network.rmLBBackend().  It removes the service entirely via said method
ONLY IF there are no more live or disabled load balancing endpoints.
Similarly network.addLBBackend() requires slight tweaking to properly
manage the disabled set.

Finally, this change requires propagating the status of disabled
service endpoints via the networkDB.  Accordingly, the patch includes
both code to generate and handle service update messages.  It also
augments the service structure with a ServiceDisabled boolean to convey
whether an endpoint should ultimately be removed or just disabled.
This, naturally, required a rebuild of the protocol buffer code as well.

Signed-off-by: Chris Telfer <[email protected]>
(cherry picked from commit 2af06a0)
Copy link

@fcrisciani fcrisciani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fcrisciani fcrisciani merged commit 6451c32 into moby:bump_17.06 May 23, 2018
@ctelfer ctelfer deleted the bump_17.06 branch May 24, 2018 13:56
@ctelfer
Copy link
Contributor Author

ctelfer commented May 24, 2018 via email

@thaJeztah
Copy link
Member

cherry-picked commits come from #1824 and #2112

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants