Skip to content

Latest commit

 

History

History
56 lines (43 loc) · 5.19 KB

ownership.md

File metadata and controls

56 lines (43 loc) · 5.19 KB

Production Services, Ownership and Maintenance

For applications, services and tools that we run in "production" (whether that is on developer machines, on a server or in the browser) there is a basic set of requirements that are expected to be met. These requirements exist to ensure consistent maintainability, security, observability and resiliency standards across our estate.

On top of that, every codebase should have ongoing security and upkeep maintenance applied to it. This may only be lightweight and relatively infrequent when the project is low priority/risk.

Baseline requirements for production services and applications

N.B. This guidance only intended as a minimum baseline; in practice the expectation will be much higher for critical projects.

CI/CD

  • All source code should be version controlled using GitHub
  • CI/CD should be employed
    • An appropriate testing strategy should be considered. We do not aim for a specific % of test coverage, but important business logic should be unit tested
    • For CI, use GitHub Actions
    • For most (non-library) projects, deployment will be done via Riff-Raff

Security

  • A basic security assessment should be performed to understand the risks and available controls. E.g. authentication, network security, encryption, secret management. Expert guidance from outside the team should sought for high risk applications (e.g. processing user data)
  • Any dependency manifest files should be scanned using Dependabot
  • Internal tools should be behind Google Authentication
    • A helper exists for Scala and authentication can be added to an ALB directly
    • Network-layer restrictions may also be recommended based on the context
  • All new 3rd party software / vendors used as part of a service should have approval via the official process

Infrastructure

  • AWS infrastructure should not be deployed to the Developer Playground account. This is for data security/privacy and cost/scale reasons. We also periodically purge infrastructure from that account.
  • All new AWS infrastructure should be defined using CDK, and legacy services should be migrated when possible
  • Infrastructure costs should be estimated and within existing budget (unless otherwise approved by Head/Director of Engineering)

Observability and support

  • Monitoring and alerting should exist to ensure the owner is notified when the core functionality of a service is unavailable or impaired
  • Application logs and (if necessary/applicable) telemetry data should be shipped, so that service impairment can be debugged effectively for common problems (e.g. application errors, resource constraints)
  • For high priority applications, a runbook should be created to describe to other engineers how to debug and address service impairment incidents (example)

High level maintenance expectations

The broad expectations of ongoing maintenance of a production application are:

  1. Security vulnerabilities are addressed as a priority
  2. Dependencies are kept up to date, so that security patching requirements can be met efficiently
  3. Unused functionality is removed
  4. Impaired functionality is addressed, with priority determined by its importance
  5. Costs are monitored. Baseline monitoring is added globally
  6. Architecture/design is reviewed periodically to ensure adherence to best practices and any SLA/SLOs at the team or org level

Orphaned Projects

In the vast majority of cases we address these by assigning ownership of source code to an official engineering team. Official engineering teams have useful properties for this task, like resourcing oversight and planning ceremonies.

There are some repositories (containing production software) we have created that have no clear official engineering team owner. In this case we expect a small group of volunteers to be responsible for the above maintenance tasks. We expect these volunteers to:

  • Create a GitHub team and add volunteers as team members. Notify the Developer Experience team of this (the team should be admins of all relevant GitHub repos)
  • Create a Google group / email address
  • Find replacements for any leavers (can be delegated to their line manager if needed)
  • Create a lightweight process for regular maintenance work (e.g. a recurring meeting)

Each of these teams should include an Engineering Manager who will be able to help manage permissions and resourcing.