Skip to content

Commit b25c186

Browse files
committed
til: error budget
1 parent 8b1ee4b commit b25c186

File tree

1 file changed

+21
-0
lines changed

1 file changed

+21
-0
lines changed

sre/error-budget.md

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Error budget
2+
3+
Source: <https://www.atlassian.com/incident-management/kpis/error-budget>
4+
5+
Every development, operations, and IT team knows that sometimes incidents happen. This reality means SLAs should never promise 100% uptime.
6+
7+
It also means that if your company is very good at avoiding or resolving incidents, you might not consistently knock your uptime goals out of the park. Perhaps you promise 99% uptime and actually come closer to 99.5%. Perhaps you promise 99.5% uptime and actually reach 99.99% on a typical month.
8+
9+
When that happens, industry experts recommend that instead of setting user expectations too high by constantly overshooting your promises, you consider that extra .99% an error budget—time that your team can use to take risks.
10+
11+
## What is an error budget?
12+
13+
An **error budget** is the maximum amount of time that a technical system can fail without contratual consequences.
14+
15+
For example, if your SLA specifies that systems will function 99.99% of the time before the business hsa to composensate customers for the outage, that measns your error budget (or the time your systems can go down without consequences) is 52 minutes and 35 seconds per years.
16+
17+
## How to use an error budget
18+
19+
- Error budgets based on uptime:
20+
- Most teams monitor uptime on a monthly basis. If availability is above the number promised by the SLA/SLO, the team can release new features and take risks. If it’s below the target, releases halt until the target numbers are back on track.
21+
- Error budgets based on successful requests:

0 commit comments

Comments
 (0)