Skip to content
/ rfcs Public
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 103 additions & 0 deletions rfcs/0026-staging-workflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
---
feature: staging-workflow
start-date: 2018-03-05
author: Vladimír Čunát (@vcunat)
co-authors: Frederik Rietdijk (@FRidh)
related-issues:
---

# Summary
[summary]: #summary

Define a new workflow for the `staging` branch that can better accomodate the
current and future influx of changes in order to deliver mass-rebuilds faster to
master. As part of this new workflow an additional branch, `staging-next`, shall
be introduced.


# Motivation
[motivation]: #motivation

The current workflow cannot handle the high amount of mass-rebuilds that are
continuously delivered, resulting in long delays for these deliveries to reach
`master`. When a certain delivery causes failures, attemps are typically made to
fix these failures and stabilize `staging` so that the specific delivery can still
reach `master`.

Often it happens that during this period of stabilization other mass-rebuilds
are submitted, and it is not uncommon that these also introduce failures, thus
again increasing the time it takes for a delivery to reach `master`. This is
especially worrysome in case of security fixes that need to be delivered as soon
as possible.

# Detailed design
[design]: #detailed-design

There shall be the following branches:
- `master` is the main branch where all small deliveries go;
- `staging` is branched from `master` and mass-rebuilds and other large deliveries go to this branch;
- `staging-next` is branched from `staging` and only fixes to stabilize and security fixes shall be delivered to this branch. This branch shall be merged into `master` when deemed of sufficiently high quality.

Binary packages shall be build by Hydra for each of these branches. The
following table gives an overview of the branches, the check interval in hours,
amount of shares, and the jobset that they build.


| Branch | Interval | Shares | Jobset
|----------------|----------|--------|-----------
| `master` | 4 | High | release.nix
| `staging-next` | 12 | Medium | release.nix
| `staging` | 6 | Medium | release-small.nix


The check interval of `staging-next` is reduced from 24 hours (the current value
for `staging`) to 12 hours. This can be done because only stabilization fixes
shall be submitted and thus fewer rebuilds shall typically have to be performed.

The `staging` branch shall have a short interval of only 6 hours. This is because
of the relatively small jobset, and to obtain a higher resolution to detect any
troublesome deliveries.

# Drawbacks
[drawbacks]: #drawbacks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the proposal, I am not completely sure if it will in fact reduce or increase the overall number of jobs for Hydra to build. It would increase usefuk information for unit of build time, but it might increase the load.

This risk should either be mentioned as a possible drawback or there should be a discussion why this risk is not likely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we will see how the load turns out in real life and adjust shares/intervals or even the whole workflow. People actively working on staging stabilization should also be able cancel jobs and force evaluations.

In any case, I would like to push towards more power in Hydra over the long term, as I believe it will help with mass rebuilds and human time is more precious than machine time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about Hydra access (this is a part of proposed workflow, right?), and I agree in principle about expanding Hydra checks.

I just want things mentioned explicitly.

I think such explicit discussion could also be a way to better understand the trade-offs relative to complicated alternatives: staging (release-small) -> staging-next (release-small) -> staging-next (release) (manually triggered, unless Hydra can have jobsets with preconditions).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I was always doing cancels and evals on Hydra when working with staging. The right to manage Hydra jobsets doesn't seem more security-sensitive that push access to nixpkgs. Overall, this text so far hasn't mentioned people – who will be doing this staging stabilization, etc.

Complicated alternatives: another thing to consider is that Hydra seems likely to be over-powered on x86 linux (relative to other platforms), and even for other reasons it makes sense for most changes to stabilize them for x86_64-linux first/deeper before others.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it doesn't mention people but it describes what tools these people are expected to use (and how). You do discuss possible evaluation frequencies.

And yes, I would find it nice if you mentioned possible using buildpower disbalance.

A potential drawback of this new workflow is that the additional branch may be
considered complicated and/or more difficult to work with. However, for most
contributors the workflow will remain the same, that is, choose `master` or
`staging` depending on the number of rebuilds.

# Alternatives
[alternatives]: #alternatives

## Maintain the status quo

The current situation could be kept, however, that would not solve any of the
issues mentioned in the "Motivation" section.

## Single branch

Instead of multiple branches only a single branch, say `master`, could be kept
for development. While this removes the issue of merge conflicts, it will result
in continuous mass-rebuilds on `master`, slowing down the delivery of binary
substitutes and thus development.

## Reduce Hydra jobset size

Reducing the size of the Hydra jobset would mean the iteration pace could be
higher, but has the downside of testing fewer packages, and having fewer binary
substitutes available.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option here is to reduce the channel-blocking Hydra jobset size, but have a 48-hourly cycle of trying to build the long tail. Still probably increases amount of things broken in the channel, but might be a middle ground. (I am not saying I prefer that compromise to the current proposal)

Copy link
Member Author

@vcunat vcunat Jun 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The channel-update criteria contain waiting for all builds to finish, but the thing is doable by having two jobsets on the branch. Personally I think the channel-blocking set should grow and not shrink, especially for the release branches.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but I also think it is good to have options enumerated and explicitly dismissed with a reason given.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a sentence describing the approach.


The part about fewer binary substitutes could be partially mitigated by adding
another slower larger jobset that wouldn't block the channel.

# Unresolved questions
[unresolved]: #unresolved-questions

- The exact amount of shares, which is something that has the be found out.

# Future work
[future]: #future-work

- Document the new workflow;
- Create the new branch;
- Create a Hydra jobset for the new branch and adjust the existing jobs.