apache · jojochuang · Jun 14, 2025 · Jun 8, 2025 · Jun 10, 2025 · Jun 10, 2025
diff --git a/hadoop-hdds/docs/content/feature/OM-HA.md b/hadoop-hdds/docs/content/feature/OM-HA.md
@@ -125,7 +125,37 @@ ozone om [global options (optional)] --bootstrap --force
 
 Note that using the _force_ option during bootstrap could crash the OM process if it does not have updated configurations.
 
+## Automatic Snapshot Installation for Stale Ozone Managers
+
+Sometimes an OM follower node may be offline or fall far behind the OM leader's raft log.
+Then, it cannot easily catch up by replaying individual log entries.
+The OM HA implementation includes an automatic snapshot installation
+and recovery process for such cases.
+
+How it works:
+
+1. Leader determines that the follower is too far behind.
+2. Leader notifies the follower to install a snapshot.
+3. The follower downloads and installs the latest snapshot from the leader.
+4. After installing the snapshot, the follower OM resumes normal operation and log replication from the new state.
+
+This logic is implemented in the `OzoneManagerStateMachine.notifyInstallSnapshotFromLeader()`;
+see the [code](https://github.com/apache/ozone/blob/ozone-2.0.0/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java#L520-L531)
+in Release 2.0.0.
+
+Note that this `Raft Snapshot`, used for OM HA state synchronization, is distinct from `Ozone Snapshot`, which is used for data backup and recovery purposes.
+
+In most scenarios, stale OMs will recover automatically, even if they have missed a large number of operations.
+Manual intervention (such as running `ozone om --bootstrap`) is only required when adding a new OM node to the cluster.
+
+**Important Note on Ozone Manager (OM) Disk Space for Snapshots**
+
+When an Ozone Manager (OM) acts as a follower in an HA setup, it downloads snapshot tarballs from the leader to its
+local metadata directory. Therefore, always ensure your OM disks have at least 2x the current OM database size to
+accommodate the existing data and incoming snapshots, preventing disk space issues and maintaining cluster stability.
+
 ## References
 
  * Check [this page]({{< ref "design/omha.md" >}}) for the links to the original design docs
  * Ozone distribution contains an example OM HA configuration, under the `compose/ozone-om-ha` directory which can be tested with the help of [docker-compose]({{< ref "start/RunningViaDocker.md" >}}).
+* [Apache Ratis State Machine API documentation](https://github.com/apache/ratis/blob/ratis-3.1.3/ratis-server-api/src/main/java/org/apache/ratis/statemachine/StateMachine.java)