-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
auto promote canaries #5719
auto promote canaries #5719
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the core logic looks good! An e2e test would be nice or at least an integration test possibly in the nomad/
server package to test without so much mocking.
Great work!
Backwards compat
I think it's possible for HealthyCanaries to be permanently inaccurate if servers are upgraded mid-deploy. However those deploys could never have AutoPromote=true so I don't think it matters.
c02910f
to
dbee31a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Auto promote! E2E improvements! Typo fixes! So much to enjoy in one nice concise PR. My last 2 comments are blockers. Merge whenever you feel it's ready!
e966eb6
to
1ae24b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Love the tests.
Have a question about autopromotion with multi-taskgroup jobs and few other nitpicks.
@@ -30,6 +30,7 @@ IMPROVEMENTS: | |||
* core: Add `-verbose` flag to `nomad status` wrapper command [[GH-5516](https://github.com/hashicorp/nomad/pull/5516)] | |||
* core: Reduce the size of the raft transaction for plans by only sending fields updated by the plan applier [[GH-5602](https://github.com/hashicorp/nomad/pull/5602)] | |||
* core: Add ability to filter job deployments by most recent version of job [[GH-5702](https://github.com/hashicorp/nomad/issues/5702)] | |||
* core: Add job update `auto_promote` flag, which causes deployments to promote themselves when all canaries become healthy [[GH-5719](https://github.com/hashicorp/nomad/pull/5719)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest updating changelog after merging PR as the changelog is common cause of conflicts.
TASKGROUP: | ||
for tk, tv := range d.TaskGroups { | ||
if !tv.AutoPromote || tv.DesiredCanaries != len(tv.PlacedCanaries) { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind if you elaborate on auto promotion and multiple task groups? I would have expected auto promotion of a job to take place if all its task group were set for AutoPromote or if AutoPromote was a job level property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto_promote will only update the job if all the task groups are healthy and marked for auto_promote, it will otherwise stay in the pending state
41bb0d5
to
68a9503
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question about job and task group update merging - but it's lgtm otherwise.
api/jobs.go
Outdated
@@ -433,6 +435,9 @@ func (u *UpdateStrategy) Copy() *UpdateStrategy { | |||
copy.Canary = intToPtr(*u.Canary) | |||
} | |||
|
|||
// boolPtrs need to preserve nil unset value | |||
copy.AutoPromote = u.AutoPromote |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite follow this comment and why the copying here is different?
I presume that the rest of assignments used boolToPtr
to avoid having two structures share the same reference when deep copying the structure; but I don't know for sure and we don't really do reference manipulation here afaik.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, I thought about that wrong. copy and merge should match the old behavior
command/agent/job_endpoint.go
Outdated
au.Merge(taskGroup.Update) | ||
} else { | ||
au = taskGroup.Update | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect the merging of jobUpdate might be a subtle change in behavior. How about use taskGroup.Update if it's not nil, otherwise use jobUpdate
? Either way, we should note it in the docs/changelog/release notes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can go either way on this, the current behavior does not match what the docs claim: "If specified at the job level, the configuration will apply to all groups within the job. If multiple update stanzas are specified, they are merged with the group stanza taking the highest precedence and then the job" I think if it's safer to keep the existing behavior, we should change the docs, and if we keep this change we can leave them as is.
0ee7235
to
6664e63
Compare
Co-Authored-By: Michael Schurter <[email protected]>
Co-Authored-By: Mahmood Ali <[email protected]>
6664e63
to
c9a837c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome!
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Add a new configuration option, auto_promote, that causes deployments with healthy canary allocations to be promoted automatically. This fixes #3636