How they SRE

A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)

Introduction

How They SRE is a curated knowledge repository of best practices, tools, techniques, and culture of SRE adopted by the leading technology or tech-savvy organizations.

Many organizations regularly come forward and share their best practices, tools, techniques and offer an insight into engineering culture on various public platforms like engineering blogs, conferences & meetups. The content is curated from these avenues and shared in this repository.

Note to readers: This list refers to some of the articles, posts, videos, tools, and techniques published before 2015. Please use such material with caution as there may be recent advances in technology and practices which offer better alternatives and perspectives.

Topics

Site Reliability Engineering
Hiring and Building SRE teams
SRE Culture
DevOps
Monitoring & Observability
Alerting
Incident Response & Post-Mortem
On-Call
Testing in Production
Chaos Engineering
Automation
Performance

Organizations

Airbnb

Blog Posts

Algolia

Blog Posts

Asana

Blog Posts

ASOS

Blog Posts

Atlassian

Blog Posts

BackMarket

Blog Posts

How Back Market SREs prepared for Black Friday

Baidu

Videos

Basecamp

Blog Posts

Books

Shape Up

Bloomberg

Videos

Booking.com

Blog Posts

Videos

Capital One

Blog Posts

Major incidents & analysis reports

Videos

DBS

Blog Posts

Videos

SREcon Conversations Asia/Pacific with Koon Seng Lim, DBS

DeepSource

Blog Posts

Dropbox

Blog Posts

Videos

Service Discovery Challenges at Scale

Facebook

Videos

Fastly

Videos

eBay

Blog Posts

Video

Madaari: Ordering for the Monkeys

Etsy

Blog Posts

Videos

Expedia

Blog Posts

GitHub

Blog Posts

Major incidents & analysis reports

Videos

One on One SRE

GoCardless

Blog Posts

Major incidents & analysis reports

Google

Blog Posts

Books

Videos

Gojek

Blog Posts

Why We Swear by the RCA

Grab

Blog Posts

Grammarly

Blog Posts

Security Operations in an AWS Environment

Heroku

Blog Posts

Incident Response at Heroku

Indeed

Blog Posts

Videos

Are We Getting Better Yet? Progress Toward Safer Operations

Khan Academy

Blog Posts

Videos

Mercari

Blog Posts

Microsoft

Videos

MIRO

Blog Posts

Monzo

Blog Posts

Videos

Eventually Consistent Service Discovery

Netflix

Blog Posts

Major incidents & analysis reports

Post-mortem of October 22, 2012 AWS degradation

Videos

PayPal

Videos

Blog Posts

Videos

Postman

Blog Posts

Learn how your Kubernetes clusters respond to failure using Gremlin and Grafana

Scribd

Blog Posts

Shopify

Blog Posts

Videos

Slack

Blog Posts

Videos

Soundcloud

Blog Posts

Spotify

Blog Posts

Videos

Tracing, Fast and Slow: Digging into and Improving Your Web Service's Performance

Squarespace

Blog Posts

Under the Hood: Ensuring Site Reliability

Videos

Stack Overflow

Blog Posts

Videos

Low Context DevOps: Improving SRE Team Culture through Defaults, Documentation, and Discipline

Stripe

Blog Posts

Videos

Target

Blog Posts

Trivago

Blog Posts

How To Get Fooled By Metrics

Uber

Blog Posts

Videos

VGW

Blog Posts

The SRE Incident Response game

Videos

Level Up Your Incident Response With Gameplay

Wikimedia Foundation

Videos

Zerodha

Blog Posts

Infrastructure monitoring with Prometheus at Zerodha

SRECon Mix Playlist

Videos

Resources

Books

Events

Other Goodies

Credits

Inspired by Howtheytest from Abhijeet Vaikar
The list of organizations is referred from my other repo awesome-engineering.
Banner image Cartoon vector created by vectorjuice - www.freepik.com

Contribute

Contributions welcome! Read the contribution guidelines first.

License

To the extent possible under law, Unmesh Gundecha has waived all copyright and related or neighboring rights to this work.

If you decide to use this anywhere please give a credit to @upgundecha on twitter, also If you like my work, check out other projects on my Github.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github		.github
.gitattributes		.gitattributes
.markdownlint.json		.markdownlint.json
.yo-rc.json		.yo-rc.json
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
banner.png		banner.png
code-of-conduct.md		code-of-conduct.md
contributing.md		contributing.md
index.html		index.html

License

vmaillot/howtheysre

Folders and files

Latest commit

History

Repository files navigation

How they SRE

Introduction

Topics

Organizations

Blog Posts

Blog Posts

Blog Posts

Blog Posts

Blog Posts

Blog Posts

Videos

Blog Posts

Books

Videos

Blog Posts

Videos

Blog Posts

Major incidents & analysis reports

Videos

Blog Posts

Videos

Blog Posts

Blog Posts

Videos

Videos

Videos

Blog Posts

Video

Blog Posts

Videos

Blog Posts

Blog Posts

Major incidents & analysis reports

Videos

Blog Posts

Major incidents & analysis reports

Blog Posts

Books

Videos

Blog Posts

Blog Posts

Blog Posts

Blog Posts

Blog Posts

Videos

Blog Posts

Blog Posts

Videos

Mercari

Blog Posts

Videos

MIRO

Blog Posts

Blog Posts

Videos

Blog Posts

Major incidents & analysis reports

Videos

Videos

Blog Posts

Videos

Blog Posts

Blog Posts

Blog Posts

Videos

Blog Posts

Videos

Soundcloud

Blog Posts

Blog Posts

Videos

Blog Posts

Videos

Blog Posts

Packages