-
-
Notifications
You must be signed in to change notification settings - Fork 4
doc: add web-team IRP #42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
304f554
doc: add web-team IRP
RafaelGSS 5e1ae23
Update INCIDENT_RESPONSE_PLAN.md
RafaelGSS 9ec4e44
Update INCIDENT_RESPONSE_PLAN.md
RafaelGSS 6b63490
Update INCIDENT_RESPONSE_PLAN.md
RafaelGSS c669fe6
Update INCIDENT_RESPONSE_PLAN.md
RafaelGSS 62d41a7
Update INCIDENT_RESPONSE_PLAN.md
RafaelGSS File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| # Node.js Web-Infra — Incident Response Plan (IRP) | ||
|
|
||
| ## Scope | ||
|
|
||
| This IRP covers incidents affecting Node.js web properties and supporting services operated by **@nodejs/web-infra** and **@nodejs/web-admins**, including: | ||
|
|
||
| * Repositories: `nodejs.org`, `nodejs.dev`, `node.js.org`, `website-cloudflare-worker`, `discord-status-worker`, `release-cloudflare-worker`, `doc-kit` | ||
|
|
||
| * External services: **Cloudflare** (zones, Workers, WAF, DNS), **Vercel** (deployments), **Sentry** (monitoring), **1Password** (secrets), **Chromatic/Codecov** (integrations). Cloudflare account access is governed together with **@nodejs/build**; `web-infra` generally has Write, `web-admins` Admin. | ||
|
|
||
| This plan excludes CI/build farm operations and distribution servers primarily owned by **nodejs/build**, but includes Web-Infra automations that touch them (e.g., release workers) and any user-visible impact on `nodejs.org`. | ||
RafaelGSS marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## IC & Escalation | ||
|
|
||
| * **Incident Commander (IC):** Any `@nodejs/web-infra` member who first takes charge. | ||
RafaelGSS marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| **Escalation:** | ||
| IC → `@nodejs/web-infra` → `@nodejs/web-admins` → `@nodejs/build` (Cloudflare account/zone-critical) and/or `@nodejs/security-wg` (security incidents) -> `@nodejs/tsc`. | ||
|
|
||
| ## Severity Levels & SLAs | ||
|
|
||
| * **P0 – Critical user impact** (global outage/defacement/security breach): | ||
|
|
||
| * Acknowledge: TBD | ||
|
|
||
| * **P1 – Major degradation** (partial outage, broken downloads/docs on a locale/route): | ||
|
|
||
| * Acknowledge: TBD | ||
|
|
||
| * **P2 – Minor** (noncritical errors, single integration down): | ||
|
|
||
| * Acknowledge: TBD | ||
|
|
||
| When in doubt, start at higher severity and downgrade later. | ||
|
|
||
| ## Canonical Response Workflow | ||
|
|
||
| 1. **Declare** severity; assign IC and Comms Lead. | ||
|
|
||
| 2. **Stabilize users first:** | ||
| * Roll back to last good deploy | ||
| * If needed, enable Cloudflare “Under Attack/WAF rules” and emergency caching on critical paths. | ||
|
|
||
| 3. **Communicate:** post an initial status summary and known impact; repeat per SLA. (Use blog/announcements or org channel as appropriate; precedent: public [post-mortem for March 17 incident](https://nodejs.org/en/blog/announcements/node-js-march-17-incident). | ||
|
|
||
| 4. **Contain & eradicate:** revoke keys/tokens, disable compromised deploy hooks, patch, and purge caches safely. | ||
|
|
||
| 5. **Recover:** redeploy clean artifact, validate, then progressively relax mitigations. | ||
|
|
||
| 6. **Review:** draft a blameless post-mortem, impact, root cause, and follow-up engineering actions \+ process fixes | ||
|
|
||
| ## Common Incidents — What Happens & What They Cause | ||
|
|
||
| | Incident | Likely Cause | What users see | Immediate actions | Primary owner | | ||
| | ----- | ----- | ----- | ----- | ----- | | ||
| | **Token/secret leak in repo (EXAMPLE)** | Accidental commit or exposed CI logs. | Subsequent unauthorized changes/deploys. | Invalidate in provider; rotate in 1Password; hunt for usage in audit logs; force redeploy clean. | Service owner + Web-Admins. ([GitHub](https://raw.githubusercontent.com/nodejs/web-team/1cc6db145256efaaa5d11684249361139dff602c/PERMISSIONS.md)) | | ||
RafaelGSS marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## Communications | ||
|
|
||
| **Internal (private):** `@nodejs/web-infra` channel/thread; if Cloudflare account action is required, loop in `@nodejs/build`. | ||
RafaelGSS marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| **Public (as needed):** short status updates; if user impact was material, publish a brief blog post or addendum to an incident page (example precedent exists). | ||
|
|
||
| ### Notes on authority & ownership | ||
|
|
||
| * Cloudflare account-level actions (e.g., role changes) are coordinated with **@nodejs/build**; Web-Infra holds write/admin depending on team (`web-infra` vs `web-admins`). Keep this in mind when planning mitigations that require account scope. | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.