feat(deploy): Disable nu-orca instances during deploy #635

lwander · 2017-08-01T19:08:08Z

This gracefully handles a mix of nu & old orcas by only disabling (and then flagging for either deletion or scaling down depending on the age) orca instances that implement the new /admin/instance/enabled endpoint. All other orcas are untouched & reported as "unkillable". See sample behavior here:

jtk54

LGTM, has a few comments/questions.

jtk54 · 2017-08-02T14:35:14Z

...halyard/deploy/spinnaker/v1/service/distributed/kubernetes/KubernetesDistributedService.java

+    String jobId = getJobExecutor().startJob(request);
+
+    // Wait for the proxy to spin up.
+    DaemonTaskHandler.safeSleep(TimeUnit.SECONDS.toMillis(5));


Maybe poll this a few times rather than waiting and trying once.

Tricky thing is that there is nothing clean to poll on (no clear success condition). If we try to open a connection to the local port, it's possible the container isn't accepting requests yet, or that there is already something else bound to the local port (false negative and false positive response respectively).

jtk54 · 2017-08-02T14:38:21Z

...oy/src/main/java/com/netflix/spinnaker/halyard/deploy/deployment/v1/DistributedDeployer.java

+    if (!unknownVersions.isEmpty()) {
+      String versions = String.join(", ", unknownVersions.stream().map(orca::getVersionedName).collect(Collectors.toList()));
+      throw new HalException(new ProblemBuilder(Problem.Severity.ERROR, "The following orca versions (" + versions + ") could not safely be drained of work.")
+          .setRemediation("Please make sure that no pipelines are running, and manually destroy the server groups at those versions.").build());


What is the correct user action if Halyard can't reap the old Orcas? Is the remediation is to manually drain the Orcas, destroy them, and then retry the deploy again?

It should be exactly that help text, namely making sure that you aren't running pipelines, and then deleting those orcas by hand. Checking for no running pipelines is kinda race-condition prone since there is no guarantee that those nu-orca nodes won't pick up work again. The safest thing to do is to do this by hand.

feat(deploy): Disable nu-orca instances during deploy

19f0a56

lwander requested review from jtk54 and duftler August 1, 2017 19:08

jtk54 reviewed Aug 2, 2017

View reviewed changes

lwander merged commit d8019a0 into spinnaker:master Aug 2, 2017

lwander deleted the halyard-orca-fancy-update branch August 2, 2017 16:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(deploy): Disable nu-orca instances during deploy #635

feat(deploy): Disable nu-orca instances during deploy #635

lwander commented Aug 1, 2017

jtk54 left a comment

jtk54 Aug 2, 2017

lwander Aug 2, 2017

jtk54 Aug 2, 2017

lwander Aug 2, 2017

feat(deploy): Disable nu-orca instances during deploy #635

feat(deploy): Disable nu-orca instances during deploy #635

Conversation

lwander commented Aug 1, 2017

jtk54 left a comment

Choose a reason for hiding this comment

jtk54 Aug 2, 2017

Choose a reason for hiding this comment

lwander Aug 2, 2017

Choose a reason for hiding this comment

jtk54 Aug 2, 2017

Choose a reason for hiding this comment

lwander Aug 2, 2017

Choose a reason for hiding this comment