-
Notifications
You must be signed in to change notification settings - Fork 615
chore: Initial draft of public-testnet-runbook #10085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 2 commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
b66e3ac
initial draft of testnet-runbook
stevenplatt d8c3d30
formatting nit
stevenplatt 54b7766
updated connection info section
stevenplatt 1dd0cb3
Updated devrel validator note
stevenplatt 6ee4efd
updated details to address review feedback.
stevenplatt File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,116 @@ | ||
| # Aztec Protocol: Testnet Engineering Runbook | ||
|
|
||
| ## Overview | ||
|
|
||
| This runbook outlines the engineering team's responsibilities for managing Aztec Protocol testnets. The engineering team coordinates the building, testing, and deployment of testnet(s) for each release while providing technical support for protocol and product queries from the community. This document describes the team's responsibilities during a release cycle and outlines actions for various testnet scenarios. The process spans from code-freeze to deployment completion, including both the QA phase (internal testing) and the public release phase. | ||
|
|
||
| ## Releases | ||
|
|
||
| The engineering team's testnet responsibilities begin after code-freeze. Here are the primary tasks: | ||
|
|
||
| 1. Confirm with engineering and product teams that all required PRs are merged | ||
| 2. Create a release branch (eg: `<repository>-v<major>.<minor>.<patch>`, e.g., `aztec-packages-v0.62.0`) | ||
| 3. Cherry-pick bug-fixes into the release branch for bugs discovered during release testing. | ||
| 4. Initiate a final build by pushing an empty commit into the release branch to trigger the `release-please` CI workflow. | ||
|
|
||
| ### Release Notes and Artifact Builds | ||
|
|
||
| Verify the `release-please` CI workflow completed successfully and that release notes have been published. | ||
| A successful CI run publishes the following Barretenberg artifacts with the release notes: | ||
|
|
||
| - Barretenberg for Mac (x86 64-bit) | ||
| - Barretenberg for Mac (Arm 64-bit) | ||
| - Barretenberg for Linux (x86 64-bit) | ||
| - Barretenberg for WASM | ||
|
|
||
| Additionally, the following NPM packages are published: | ||
|
|
||
| - BB.js | ||
| - l1-contracts | ||
| - yarn-project (see [publish_npm.sh](https://github.com/AztecProtocol/aztec-packages/blob/aztec-packages-v0.63.0/yarn-project/publish_npm.sh)) | ||
|
|
||
| The following Docker containers are also published: | ||
|
|
||
| - aztecprotocol/aztec:latest | ||
| - aztecprotocol/aztec-nargo:latest | ||
| - aztecprotocol/cli-wallet:latest | ||
|
|
||
| Lastly, any changes made to developer documentation are published to <https://docs.aztec.network> | ||
|
|
||
| ## Deployment | ||
|
|
||
| After cutting a release, deploy a testnet (typically with 48 validators) using the new Docker containers. Verbose logging on Aztec nodes should be enabled by default using the following `ENV VARS`: | ||
|
stevenplatt marked this conversation as resolved.
Outdated
|
||
|
|
||
| - `LOG_JSON=1` | ||
| - `LOG_LEVEL=debug` | ||
| - `DEBUG=discv5*,aztec:*,-aztec:avm_simulator*,-aztec:circuits:artifact_hash,-json-rpc*,-aztec:world-state:database,-aztec:l2_block_stream*` | ||
|
|
||
| Deployments are initiated from CI by manually running the (_name pending_) workflow. | ||
|
|
||
| ### Sanity Check | ||
|
|
||
| After testnet deployment, perform these sanity checks (these items can also be script automated): | ||
|
|
||
| 1. Monitor for crashes and network-level health: | ||
| - Review testnet dashboard at `https://grafana.aztec.network/` to confirm node uptime and block production | ||
| - Verify overall TPS performance | ||
| - Create Github issues for new crash scenarios | ||
|
|
||
| 2. Spot check pod logs for component health: | ||
| - Tx gossiping (Bot: `Generated IVC proof`) | ||
| - Peer discovery (Validator (failure case): `Failed FINDNODE request`) | ||
| - Block proposal (Validator: `Can propose block`) | ||
| - Block processing (Validator: `l2BlockSourceHash`) | ||
| - Block proving (Prover: `Processed 1 new L2 blocks`) | ||
| - Epoch proving (Prover: `Submitted proof for epoch`) | ||
|
|
||
| 3. Test external node connection and sync | ||
|
stevenplatt marked this conversation as resolved.
|
||
|
|
||
| ### Network Connection Info | ||
|
|
||
| After a successful sanity check, share the following network connection information in the `#team-alpha` slack channel and with the wider Aztec community: | ||
|
stevenplatt marked this conversation as resolved.
Outdated
|
||
|
|
||
| 1. AZTEC_IMAGE (`aztecprotocol/aztec:latest`) | ||
| 2. ETHEREUM_HOST (Kubernetes: `kubectl get services -n <namespace> | (head -1; grep ethereum)`) | ||
| - ethereum-lb: `<EXTERNAL-IP>:8545` | ||
| 3. BOOT_NODE_URL (Kubernetes: `kubectl get services -n <namespace> | (head -3; grep boot)`) | ||
| - boot-node-lb-tcp: `<EXTERNAL-IP>:40400` | ||
| - boot-node-lb-udp: `<EXTERNAL-IP>:40400` | ||
|
|
||
|
stevenplatt marked this conversation as resolved.
|
||
| This latest node connection information must also be updated in any existing node connection guides and where referenced at <https://docs.aztec.network>. | ||
|
|
||
| ## Support | ||
|
|
||
| The following items are a shortlist of support items that may be required either during deployment or after a successful launch. | ||
|
|
||
| ### Issue Resolution Matrix | ||
|
|
||
| | Event | Action | Criticality | Owner(s) | | ||
| |-------|---------|------------|-----------| | ||
| | Build failure | Rerun CI or revert problematic changes | Blocker | | | ||
| | Deployment issues | Reference deployment `README` or escalate to Delta Team | Blocker | Delta Team | | ||
| | Network instability* | Create detailed issue report for Alpha team | Blocker | Alpha Team | | ||
| | Challenge completion errors | Document issue and assess challenge viability | Major | Product Team | | ||
| | Minor operational issues | Create tracking issue | Minor | Delta Team | | ||
| | Hotfix deployment | Update testnet and verify fix | Major | Delta Team | | ||
|
|
||
| _*Defining Network Instability:_ | ||
|
|
||
| A testnet is considered unstable if experiencing any of the following: | ||
|
|
||
| 1. Block production stalls | ||
| 2. Proof generation failures | ||
| 3. Transaction inclusion issues | ||
| 4. Node synchronization problems | ||
| 5. Persistent crashes affecting network operation | ||
| 6. Persistent chain reorgs affecting network operation | ||
| 7. Bridge contract failures | ||
|
|
||
| ### Release Support Matrix | ||
|
|
||
| | Event | Action | Criticality | Owner(s) | | ||
| |-------|---------|------------|-----------| | ||
| | Challenge completion issues | Provide guidance or create issue | Minor | DevRel Team | | ||
| | Node stability issues | Collect logs and create issue | Major | Delta Team | | ||
| | Network-wide problems | Escalate to Delta team | Critical | Alpha/Delta Teams | | ||
| | Bridge/Contract issues | Investigate and escalate if needed | Critical | Alpha Team | | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.