Skip to content
/ rfcs Public
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
186 changes: 186 additions & 0 deletions rfcs/0020-security-on-call.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
---
feature: security-on-call
start-date: 2017-10-30
author: Graham Christensen
co-authors: Franz Pletz
related-issues:
---

# Summary
[summary]: #summary

Organize and distribute the handling of public vulnerability
disclosures, through the use of a community-based team and rotating
"point" or "on-call" assignments.

# Motivation
[motivation]: #motivation

## 1. Security Posture, Prompt Patching, and Documented Process

Our process for handling security issues is currently fairly well executed, but
porly defined. How to participate and how to do the job is very
nebulous and boils down to:

1. Everybody pays attention
2. Everybody patches

This system means there is no division of labor and no mechanisms
to ensure that no issues will be lost. Furthermore, there is no
documentation about the procedures individuals should follow in
order to achieve predictable, high-quality results.

## 2. Community and Commitment Sensitivity

As a community, we have a few dedicated contributors patching
the vast majority of security issues. This has worked well, and has
even ensured fairly good coverage for some time now.

Security patching work is an easy way to burn contributors out. The
perpetual feed of new issues to fix is exhausting. This is concerning,
because most of the patches are being applieed by highly skilled,
involved, and "core" members of the NixOS community. It would be a
shame to lose them.

At the same time, most security patches are easy to apply:

- Most announcements to security email lists include easy-to-apply
patches.
- Well established, well funded distributions have full time
employees focusing on security matters. These other distributions
regularly publish their minimal security patches on their bug
trackers or in their source trees.
- Many security issues can be easily fixed by minor package bumps.

Some of the more tricky parts are determining how to backport the
patch, and finding build capacity for testing large rebuild changes.
Both of these questions can easily be answered by asking more
experienced community members for help.

# Detailed design
[design]: #detailed-design

I propose we create a 24 hour on-call rotation of volunteers who will be
responsible for handling any publicly disclosed security issues that
arise during their shift, where "handling an issue" means:

- Create a ticket for the given issue to ensure that others can
see, follow, and contribute to the current state of affairs in
a coordinated manner.

- Determine whether Nixpkgs and/or NixOS is vulnerable to the given
vulnerability and record your findings in the appropriate ticket.

- If Nix *is* vulnerable, then

(a) fix the issue by applying the appropriate patches or version
updates or

(b) ping the package's maintainers and ask them to apply the
necessary fixes.

Furthermore, it is good practice for the volunteers to track issues
they are responsible for beyond their respective shifts and to keep an
eye on the progress, possible "nudging" others to complete the necessary
steps.

This team will generally not handle issues under embargo.

## Patching Team

The team should be of at least 10 members, preferably over 14.

Members of this team should range from new contributors looking to
participate, to more skilled and well known contributors.

### Requirements

Each team member should:

- Know how patches work, or at least be willing to learn
- Be comfortable with Git and our backporting workflow
- Know their personal limits and be confident asking for help

## On Call

An on-call rotation system should be used or made to handle scheduling
and informing people about their shift. A shift is 24hrs long, and
should probably start at midnight UTC, to be equally unfair to
everyone.

### Responsibilities

1. Monitor a well defined list of mailing lists for new issues.
2. Ensure each issue is triaged and addressed if needed.

#2 is a bit vaguely worded, as the person is not required to
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"# 2" ==> "Responsibility # 2" to not trigger a markdown header.

_actually_ fix the issue. They are allowed to delegate the patching to
other people. However, they _are_ responsible for ensuring the issue
is _fixed_.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we have a MAINTAINERS file, we could delegate security issues in non-critical packages (let's say not in nixos-small) to the respective maintainers by default.


#### Triage and Fixing

1. Check to see if the issue impacts each supported version of NixOS.
2. Write and / or backport patches as applicable, either by version
bumps, large patches, or minimal patches.
3. Prepare an advisory to send to the nix-security-announce mailing
list, which a member of the NixOS Security Team will send.

## The Well Defined List of Mailing Lists

The list should not live in the RFC documentation, but an external set
of documentation used to document the security patching process.
However, an initial starting list to consider:

1. oss-security
2. full-disclosure
3. an assortment of distro advisory announcements:

- Arch
- Debian
- Gentoo
- Red Hat
- SUSE

## Ensuring Complete Mailing List Coverage

This is a tricky problem, and I propose that the first implemention
be naive, simple, quick, and ugly.

I propose we have a shared email account with a norm that if you mark
an issue read, you are obligated to handle the issue. Once issues are
patched and released to channels, they should be removed from the
inbox. Each member of the patching team will have access to the
account.

# Drawbacks
[drawbacks]: #drawbacks

This process will take time away from other projects contributors may
be interested in undertaking.

This project will introduce more mass rebuilds and additional load on
Hydra.

# Alternatives
[alternatives]: #alternatives

1. RequestTracker for email-to-issues, but the RT module is somewhat
broken, not to mention the scars we all have around RT.
2. Custom email-to-issue software
3. Allow certain community members to be single points of failure

# Unresolved questions
[unresolved]: #unresolved-questions

1. A place to house documentation and run-books for the patching team
2. A review process for advisories
3. Guidelines for backporting vs. separate patches when fixing a
package for Stable
4. A tool for handling the On Call schedule asignments and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All team members should be on a shared channel (mailing list, IRC, wire.com, Hangouts, whatever), and at the end of their shift the currently responsible team member should ping their successor and remind them about the upcoming shift.

on-call/off-call notification reminders.

# Future work
[future]: #future-work

Please see Unresolved Questions :)