diff --git a/hadoop-hdds/docs/content/feature/OM-HA.md b/hadoop-hdds/docs/content/feature/OM-HA.md index 1a0a46481d6f..7eb83c5e5302 100644 --- a/hadoop-hdds/docs/content/feature/OM-HA.md +++ b/hadoop-hdds/docs/content/feature/OM-HA.md @@ -125,7 +125,37 @@ ozone om [global options (optional)] --bootstrap --force Note that using the _force_ option during bootstrap could crash the OM process if it does not have updated configurations. +## Automatic Snapshot Installation for Stale Ozone Managers + +Sometimes an OM follower node may be offline or fall far behind the OM leader's raft log. +Then, it cannot easily catch up by replaying individual log entries. +The OM HA implementation includes an automatic snapshot installation +and recovery process for such cases. + +How it works: + +1. Leader determines that the follower is too far behind. +2. Leader notifies the follower to install a snapshot. +3. The follower downloads and installs the latest snapshot from the leader. +4. After installing the snapshot, the follower OM resumes normal operation and log replication from the new state. + +This logic is implemented in the `OzoneManagerStateMachine.notifyInstallSnapshotFromLeader()`; +see the [code](https://github.com/apache/ozone/blob/ozone-2.0.0/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java#L520-L531) +in Release 2.0.0. + +Note that this `Raft Snapshot`, used for OM HA state synchronization, is distinct from `Ozone Snapshot`, which is used for data backup and recovery purposes. + +In most scenarios, stale OMs will recover automatically, even if they have missed a large number of operations. +Manual intervention (such as running `ozone om --bootstrap`) is only required when adding a new OM node to the cluster. + +**Important Note on Ozone Manager (OM) Disk Space for Snapshots** + +When an Ozone Manager (OM) acts as a follower in an HA setup, it downloads snapshot tarballs from the leader to its +local metadata directory. Therefore, always ensure your OM disks have at least 2x the current OM database size to +accommodate the existing data and incoming snapshots, preventing disk space issues and maintaining cluster stability. + ## References * Check [this page]({{< ref "design/omha.md" >}}) for the links to the original design docs * Ozone distribution contains an example OM HA configuration, under the `compose/ozone-om-ha` directory which can be tested with the help of [docker-compose]({{< ref "start/RunningViaDocker.md" >}}). +* [Apache Ratis State Machine API documentation](https://github.com/apache/ratis/blob/ratis-3.1.3/ratis-server-api/src/main/java/org/apache/ratis/statemachine/StateMachine.java)