-
-
Notifications
You must be signed in to change notification settings - Fork 160
staging workflow #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
staging workflow #26
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,103 @@ | ||
| --- | ||
| feature: staging-workflow | ||
| start-date: 2018-03-05 | ||
| author: Vladimír Čunát (@vcunat) | ||
| co-authors: Frederik Rietdijk (@FRidh) | ||
| related-issues: | ||
| --- | ||
|
|
||
| # Summary | ||
| [summary]: #summary | ||
|
|
||
| Define a new workflow for the `staging` branch that can better accomodate the | ||
| current and future influx of changes in order to deliver mass-rebuilds faster to | ||
| master. As part of this new workflow an additional branch, `staging-next`, shall | ||
| be introduced. | ||
|
|
||
|
|
||
| # Motivation | ||
| [motivation]: #motivation | ||
|
|
||
| The current workflow cannot handle the high amount of mass-rebuilds that are | ||
| continuously delivered, resulting in long delays for these deliveries to reach | ||
| `master`. When a certain delivery causes failures, attemps are typically made to | ||
| fix these failures and stabilize `staging` so that the specific delivery can still | ||
| reach `master`. | ||
|
|
||
| Often it happens that during this period of stabilization other mass-rebuilds | ||
| are submitted, and it is not uncommon that these also introduce failures, thus | ||
| again increasing the time it takes for a delivery to reach `master`. This is | ||
| especially worrysome in case of security fixes that need to be delivered as soon | ||
| as possible. | ||
|
|
||
| # Detailed design | ||
| [design]: #detailed-design | ||
|
|
||
| There shall be the following branches: | ||
| - `master` is the main branch where all small deliveries go; | ||
| - `staging` is branched from `master` and mass-rebuilds and other large deliveries go to this branch; | ||
| - `staging-next` is branched from `staging` and only fixes to stabilize and security fixes shall be delivered to this branch. This branch shall be merged into `master` when deemed of sufficiently high quality. | ||
|
|
||
| Binary packages shall be build by Hydra for each of these branches. The | ||
| following table gives an overview of the branches, the check interval in hours, | ||
| amount of shares, and the jobset that they build. | ||
|
|
||
|
|
||
| | Branch | Interval | Shares | Jobset | ||
| |----------------|----------|--------|----------- | ||
| | `master` | 4 | High | release.nix | ||
| | `staging-next` | 12 | Medium | release.nix | ||
| | `staging` | 6 | Medium | release-small.nix | ||
|
|
||
|
|
||
| The check interval of `staging-next` is reduced from 24 hours (the current value | ||
| for `staging`) to 12 hours. This can be done because only stabilization fixes | ||
| shall be submitted and thus fewer rebuilds shall typically have to be performed. | ||
|
|
||
| The `staging` branch shall have a short interval of only 6 hours. This is because | ||
| of the relatively small jobset, and to obtain a higher resolution to detect any | ||
| troublesome deliveries. | ||
|
|
||
| # Drawbacks | ||
| [drawbacks]: #drawbacks | ||
|
|
||
| A potential drawback of this new workflow is that the additional branch may be | ||
| considered complicated and/or more difficult to work with. However, for most | ||
| contributors the workflow will remain the same, that is, choose `master` or | ||
| `staging` depending on the number of rebuilds. | ||
|
|
||
| # Alternatives | ||
| [alternatives]: #alternatives | ||
|
|
||
| ## Maintain the status quo | ||
|
|
||
| The current situation could be kept, however, that would not solve any of the | ||
| issues mentioned in the "Motivation" section. | ||
|
|
||
| ## Single branch | ||
|
|
||
| Instead of multiple branches only a single branch, say `master`, could be kept | ||
| for development. While this removes the issue of merge conflicts, it will result | ||
| in continuous mass-rebuilds on `master`, slowing down the delivery of binary | ||
| substitutes and thus development. | ||
|
|
||
| ## Reduce Hydra jobset size | ||
|
|
||
| Reducing the size of the Hydra jobset would mean the iteration pace could be | ||
| higher, but has the downside of testing fewer packages, and having fewer binary | ||
| substitutes available. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another option here is to reduce the channel-blocking Hydra jobset size, but have a 48-hourly cycle of trying to build the long tail. Still probably increases amount of things broken in the channel, but might be a middle ground. (I am not saying I prefer that compromise to the current proposal)
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The channel-update criteria contain waiting for all builds to finish, but the thing is doable by having two jobsets on the branch. Personally I think the channel-blocking set should grow and not shrink, especially for the release branches.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree, but I also think it is good to have options enumerated and explicitly dismissed with a reason given.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a sentence describing the approach. |
||
|
|
||
| The part about fewer binary substitutes could be partially mitigated by adding | ||
| another slower larger jobset that wouldn't block the channel. | ||
|
|
||
| # Unresolved questions | ||
| [unresolved]: #unresolved-questions | ||
|
|
||
| - The exact amount of shares, which is something that has the be found out. | ||
|
|
||
| # Future work | ||
| [future]: #future-work | ||
|
|
||
| - Document the new workflow; | ||
| - Create the new branch; | ||
| - Create a Hydra jobset for the new branch and adjust the existing jobs. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the proposal, I am not completely sure if it will in fact reduce or increase the overall number of jobs for Hydra to build. It would increase usefuk information for unit of build time, but it might increase the load.
This risk should either be mentioned as a possible drawback or there should be a discussion why this risk is not likely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we will see how the load turns out in real life and adjust shares/intervals or even the whole workflow. People actively working on staging stabilization should also be able cancel jobs and force evaluations.
In any case, I would like to push towards more power in Hydra over the long term, as I believe it will help with mass rebuilds and human time is more precious than machine time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about Hydra access (this is a part of proposed workflow, right?), and I agree in principle about expanding Hydra checks.
I just want things mentioned explicitly.
I think such explicit discussion could also be a way to better understand the trade-offs relative to complicated alternatives:
staging (release-small)->staging-next (release-small)->staging-next (release)(manually triggered, unless Hydra can have jobsets with preconditions).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I was always doing cancels and evals on Hydra when working with staging. The right to manage Hydra jobsets doesn't seem more security-sensitive that push access to nixpkgs. Overall, this text so far hasn't mentioned people – who will be doing this staging stabilization, etc.
Complicated alternatives: another thing to consider is that Hydra seems likely to be over-powered on x86 linux (relative to other platforms), and even for other reasons it makes sense for most changes to stabilize them for
x86_64-linuxfirst/deeper before others.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it doesn't mention people but it describes what tools these people are expected to use (and how). You do discuss possible evaluation frequencies.
And yes, I would find it nice if you mentioned possible using buildpower disbalance.