Skip to content
Closed
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
165 changes: 165 additions & 0 deletions rfcs/0020-security-on-call.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
---
feature: security-on-call
start-date: 2017-10-30
author: Graham Christensen
co-authors: Franz Pletz
related-issues:
---

# Summary
[summary]: #summary

Organize and distribute the handling of public vulnerability
disclosures, through the use of a community-based team and rotating
"point" or "on-call" assignments.

# Motivation
[motivation]: #motivation

## 1. Security Posture, Prompt Patching, and Documented Process

Our process for security issues is currently fairly well executed, but
porly defined. How to participate and how to do the job is very

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

porly -> poorly

nebulous and boils down to:

1. Everybody pays attention
2. Everybody patches

This system means there is no division of labor, and no documented way
to ensure patches are patched the same way, every time.

## 2. Community and Commitment Sensitivity

As a community, we have a few dedicated contributors patching
the vast majority of security issues. This has worked well, and has
even ensured fairly good coverage for some time now.

Security patching work is an easy way to burn contributors out. The
perpetual feed of new issues to fix is exhausting. This is concerning,
because most of the patches are being applieed by highly skilled,
involved, and "core" members of the NixOS community. It would be a
shame to lose them.

At the same time, most security patches are easy to apply:

- Most announcements to security email lists include easy-to-apply
patches.
- Well established, well funded distributions have full time
employees focusing on security matters. These other distributions
regularly publish their minimal security patches on their bug
trackers or in their source trees.
- Many security issues can be easily fixed by minor package bumps.

Some of the more tricky parts are determining how to backport the
patch, and finding build capacity for testing large rebuild changes.
Both of these questions can easily be answered by asking more
experienced community members for help.

# Detailed design
[design]: #detailed-design

I propose we create an on-call rotation for publicly disclosed
security issues. Each member of the team will be on call for 24 hours,
and expected to handle every new issue which is created within their
on-call period.

The patching team will not handle issues under embargo.

## Patching Team

@peti peti Nov 2, 2017

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not refer to that group of people as "patching team". I'd rather call them "incident managers" or something like that. The way I see it, the primary role of those volunteers is not to fix the actual security issue. Their role is to ensure that:

  • no security issue is overlooked,
  • everything is recorded properly in some bug tracking system, so that others can check the current state of affairs there,
  • the process of fixing the issue is delegated to the package's maintainer,
  • maintainers are notified (and reminded) of these issues.

While I think it's fine for members of that on-call team to fix security issues themselves, I believe that the documentation part of the job is equally or even more important.


The team should be of at least 10 members, preferably over 14.

Members of this team should range from new contributors looking to
participate, to more skilled and well known contributors.

### Requirements

Each team member should:

- Know how patches work, or at least be willing to learn
- Be comfortable with Git and our backporting workflow
- Know their personal limits and be confident asking for help

## On Call

An on-call rotation system should be used or made to handle scheduling
and informing people about their shift. A shift is 24hrs long, and
should probably start at midnight UTC, to be equally unfair to
everyone.

### Responsibilities

1. Monitor a well defined list of mailing lists for new issues.
2. Ensure each issue is triaged and addressed if needed.

#2 is a bit vaguely worded, as the person is not required to

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"# 2" ==> "Responsibility # 2" to not trigger a markdown header.

_actually_ fix the issue. They are allowed to delegate the patching to
other people. However, they _are_ responsible for ensuring the issue
is _fixed_.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we have a MAINTAINERS file, we could delegate security issues in non-critical packages (let's say not in nixos-small) to the respective maintainers by default.


#### Triage and Fixing

1. Check to see if the issue impacts each supported version of NixOS.
2. Write and / or backport patches as applicable, either by version
bumps, large patches, or minimal patches.
3. Prepare an advisory to send to the nix-security-announce mailing
list, which a member of the NixOS Security Team will send.

## The Well Defined List of Mailing Lists

The list should not live in the RFC documentation, but an external set
of documentation used to document the security patching process.
However, an initial starting list to consider:

1. oss-security
2. full-disclosure
3. an assortment of distro advisory announcements:

- Arch
- Debian
- Gentoo
- Red Hat
- SUSE

## Ensuring Complete Mailing List Coverage

This is a tricky problem, and I propose that the first implemention
be naive, simple, quick, and ugly.

I propose we have a shared email account with a norm that if you mark
an issue read, you are obligated to handle the issue. Once issues are
patched and released to channels, they should be removed from the
inbox. Each member of the patching team will have access to the
account.

# Drawbacks
[drawbacks]: #drawbacks

This process will take time away from other projects contributors may
be interested in undertaking.

This project will introduce more mass rebuilds and additional load on
Hydra.

# Alternatives
[alternatives]: #alternatives

1. RequestTracker for email-to-issues, but the RT module is somewhat
broken, not to mention the scars we all have around RT.
2. Custom email-to-issue software
3. Allow certain community members to be single points of failure

# Unresolved questions
[unresolved]: #unresolved-questions

1. A place to house documentation and run-books for the patching team
2. A review process for advisories
3. Guidelines for backporting vs. separate patches when fixing a
package for Stable
4. A tool for handling the On Call schedule asignments and

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All team members should be on a shared channel (mailing list, IRC, wire.com, Hangouts, whatever), and at the end of their shift the currently responsible team member should ping their successor and remind them about the upcoming shift.

on-call/off-call notification reminders.

# Future work
[future]: #future-work

Please see Unresolved Questions :)