Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(deploy): Disable nu-orca instances during deploy #635

Merged
merged 1 commit into from
Aug 2, 2017

Conversation

lwander
Copy link
Member

@lwander lwander commented Aug 1, 2017

This gracefully handles a mix of nu & old orcas by only disabling (and then flagging for either deletion or scaling down depending on the age) orca instances that implement the new /admin/instance/enabled endpoint. All other orcas are untouched & reported as "unkillable". See sample behavior here:
hal-nu-orca

@lwander lwander requested review from jtk54 and duftler August 1, 2017 19:08
Copy link
Contributor

@jtk54 jtk54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, has a few comments/questions.

String jobId = getJobExecutor().startJob(request);

// Wait for the proxy to spin up.
DaemonTaskHandler.safeSleep(TimeUnit.SECONDS.toMillis(5));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe poll this a few times rather than waiting and trying once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tricky thing is that there is nothing clean to poll on (no clear success condition). If we try to open a connection to the local port, it's possible the container isn't accepting requests yet, or that there is already something else bound to the local port (false negative and false positive response respectively).

if (!unknownVersions.isEmpty()) {
String versions = String.join(", ", unknownVersions.stream().map(orca::getVersionedName).collect(Collectors.toList()));
throw new HalException(new ProblemBuilder(Problem.Severity.ERROR, "The following orca versions (" + versions + ") could not safely be drained of work.")
.setRemediation("Please make sure that no pipelines are running, and manually destroy the server groups at those versions.").build());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the correct user action if Halyard can't reap the old Orcas? Is the remediation is to manually drain the Orcas, destroy them, and then retry the deploy again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be exactly that help text, namely making sure that you aren't running pipelines, and then deleting those orcas by hand. Checking for no running pipelines is kinda race-condition prone since there is no guarantee that those nu-orca nodes won't pick up work again. The safest thing to do is to do this by hand.

@lwander lwander merged commit d8019a0 into spinnaker:master Aug 2, 2017
@lwander lwander deleted the halyard-orca-fancy-update branch August 2, 2017 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants