openshift · vikram-redhat · Feb 10, 2021 · Oct 14, 2020
diff --git a/modules/machine-health-checks-about.adoc b/modules/machine-health-checks-about.adoc
@@ -33,6 +33,19 @@ To limit the disruptive impact of machine deletions, the controller drains and d
 
 To stop the check, remove the custom resource.
 
+[id="machine-health-checks-bare-metal_{context}"]
+== MachineHealthChecks on Bare Metal
+
+Machine deletion on bare metal cluster triggers reprovisioning of a bare metal host.
+Usually bare metal reprovisioning is a lengthy process, during which the cluster
+is missing compute resources and applications might be interrupted.
+To change the default remediation process from machine deletion to host power-cycle,
+annotate the MachineHealthCheck resource with the
+`machine.openshift.io/remediation-strategy: external-baremetal` annotation.
+
+After you set the annotation, unhealthy machines are power-cycled by using
+BMC credentials.
+
 [id="machine-health-checks-limitations_{context}"]
 == Limitations when deploying machine health checks
 

diff --git a/modules/machine-health-checks-resource.adoc b/modules/machine-health-checks-resource.adoc
@@ -7,9 +7,49 @@
 [id="machine-health-checks-resource_{context}"]
 = Sample `MachineHealthCheck` resource
 
-The `MachineHealthCheck` resource resembles the following YAML file:
+The `MachineHealthCheck` resource resembles one of the following YAML files:
 
-.`MachineHealthCheck`
+.`MachineHealthCheck` for bare metal
+[source,yaml]
+----
+apiVersion: machine.openshift.io/v1beta1
+kind: MachineHealthCheck
+metadata:
+  name: example <1>
+  namespace: openshift-machine-api
+  annotations:
+    machine.openshift.io/remediation-strategy: external-baremetal <2>
+spec:
+  selector:
+    matchLabels:
+      machine.openshift.io/cluster-api-machine-role: <role> <3>
+      machine.openshift.io/cluster-api-machine-type: <role> <3>
+      machine.openshift.io/cluster-api-machineset: <cluster_name>-<label>-<zone> <4>
+  unhealthyConditions:
+  - type:    "Ready"
+    timeout: "300s" <5>
+    status: "False"
+  - type:    "Ready"
+    timeout: "300s" <5>
+    status: "Unknown"
+  maxUnhealthy: "40%" <6>
+  nodeStartupTimeout: "10m" <7>
+----
+
+<1> Specify the name of the machine health check to deploy.
+<2> For bare metal clusters, you must include the `machine.openshift.io/remediation-strategy: external-baremetal` annotation in the `annotations` section to enable power-cycle remediation. With this remediation strategy, unhealthy hosts are rebooted instead of removed from the cluster.
+<3> Specify a label for the machine pool that you want to check.
+<4> Specify the machine set to track in `<cluster_name>-<label>-<zone>` format. For example, `prod-node-us-east-1a`.
+<5> Specify the timeout duration for a node condition. If a condition is met for the duration of the timeout, the machine will be remediated. Long timeouts can result in long periods of downtime for a workload on an unhealthy machine.
+<6> Specify the amount of unhealthy machines allowed in the targeted pool. This can be set as a percentage or an integer.
+<7> Specify the timeout duration that a machine health check must wait for a node to join the cluster before a machine is determined to be unhealthy.
+
+[NOTE]
+====
+The `matchLabels` are examples only; you must map your machine groups based on your specific needs.
+====
+
+.`MachineHealthCheck` for all other installation types
 [source,yaml]
 ----
 apiVersion: machine.openshift.io/v1beta1