diff --git a/docs/during/external_communication_guidelines.md b/docs/during/external_communication_guidelines.md index ce290b7..23df9de 100644 --- a/docs/during/external_communication_guidelines.md +++ b/docs/during/external_communication_guidelines.md @@ -1,13 +1,11 @@ --- cover: assets/img/covers/whos_on-call.png description: Information on how to manage external communications -hero: assets/img/headers/who_oncall.png -hero_alt_text: External Communication Guidelines --- -## External Communication Guidelines +Information on how to manage external communications during an incident. See our [role descriptions](../before/different_roles/) for information about who is responsible for external communications. -### When to communicate publicly +## When to communicate publicly Before you decide to communicate an incident, it’s important to have an agreed-upon set of criteria for when a major incident is communicated. False alarms and short-lived issues can sometimes kick off incident calls, so knowing when communication is appropriate will help your customers avoid widespread panic. This can be tied to your organization’s definition of [what an incident is](https://response.pagerduty.com/before/what_is_an_incident/), and/or your [severity levels](https://response.pagerduty.com/before/severity_levels/). @@ -21,9 +19,9 @@ You might consider the following criteria as well: We also recommend coming up with a set of templates for different stages of an incident, including options for the communications below as well as special situations (long-running incidents, small or limited customer impact, incidents opened with immediate resolution, etc.) -### How to communicate +## How to communicate -#### Initial communication: +### Initial communication: The first communication should indicate that an incident is under investigation. The goal here is to avoid a customer experiencing symptoms of the incident, checking status pages or Twitter accounts, and not seeing awareness of the issue from the business. @@ -31,7 +29,7 @@ The first communication should indicate that an incident is under investigation. - These messages should be entirely templated for ease of action. - These messages can be minimal in revealing scope which might not be known yet, but should indicate that scope will be coming soon. -#### Second communication: Initial Scoping of Impact +### Second communication: Initial Scoping of Impact This is a message that should be delivered within 5 minutes of the first communication, once some scope of impact is known. This post should outline: @@ -39,7 +37,7 @@ This is a message that should be delivered within 5 minutes of the first communi - An update of which components and/or functionality are impacted - Which regions are affected. -#### Updates +### Updates Depending on the length of the incident, periodic updates will be necessary. These updates should be delivered **at least** every 20 minutes from the scoping update during the first two hours of an incident. After two hours, you may choose to update with reduced frequency and shift to a long incident communication model (see below). Regardless of expected frequency, when the degree of impact has meaningfully changed, updates should be posted. These updates should: @@ -49,7 +47,7 @@ Depending on the length of the incident, periodic updates will be necessary. The Customers with special contracts around their Customer Support or Customer Success, such as a customer on a Premium Support plan, should also receive communication of impact delivered individually, whether through a Customer Liaison or their account team. -#### Long Incidents +### Long Incidents Incidents longer than two hours should be considered a long incident, and have different communication procedures as a result. When we know an incident will be extended, customer expectations have to be set appropriately, and customer notification fatigue due to content-less updates should be avoided. When in doubt, notify at the frequency which keeps updates meaningful. @@ -57,7 +55,7 @@ Incidents longer than two hours should be considered a long incident, and have d - For incidents where we know a long running recovery, indicate this in an update when known. - If planning to reduce update frequency, continue to provide expectations of when the next update will be posted. -#### Resolution +### Resolution Your final communication should be posted when full recovery of the incident has been confirmed by the Incident Commander. This update should include: @@ -67,6 +65,6 @@ Your final communication should be posted when full recovery of the incident has Once this is posted, continue to follow the steps for [After an Incident](https://response.pagerduty.com/after/after_an_incident/) and the [Postmortem Process](https://response.pagerduty.com/after/post_mortem_process/). -### Quick Reference +## Quick Reference ![Quick reference rubric for external communications spanning from initial investigation communication to resolution.](../assets/img/misc/decision-tree.png)