diff --git a/docs/user/agent/add-node/add-nodes.md b/docs/user/agent/add-node/add-nodes.md index bfa990db756..143ebd86790 100644 --- a/docs/user/agent/add-node/add-nodes.md +++ b/docs/user/agent/add-node/add-nodes.md @@ -64,7 +64,7 @@ hosts: macAddress: 00:02:46:e3:9e:9c ## ISO generation -Run the [node-joiner.sh](./node-joiner.sh): +Run [node-joiner.sh](./node-joiner.sh): ```bash $ ./node-joiner.sh ``` @@ -84,11 +84,12 @@ $ ./node-joiner.sh config.yaml Use the iso image to boot all the nodes listed in the configuration file, and wait for the related certificate signing requests (CSRs) to appear. When adding a new node to the cluster, two pending CSRs will be generated, and they must be manually approved by the user. -Use the following command to monitor the pending certificates: + +Use the following command or [node-joiner-monitor.sh](./node-joiner-monitor.sh) described below to monitor the pending certificates: ``` $ oc get csr ``` -User the `oc` `approve` command to approve them: +Use the `oc` `approve` command to approve them: ``` $ oc adm certificate approve ``` @@ -100,4 +101,57 @@ extra-worker-0 Ready worker 1h v1.29.3+8628c3c master-0 Ready control-plane,master 31h v1.29.3+8628c3c master-1 Ready control-plane,master 32h v1.29.3+8628c3c master-2 Ready control-plane,master 32h v1.29.3+8628c3c -``` \ No newline at end of file +``` + +# Monitoring +After a node is booted using the ISO image, progress can be monitored using the node-joiner-monitor.sh script. + +Download the [node-joiner-monitor.sh](./node-joiner-monitor.sh) script to a local directory. + +The script requires the IP address of the node to monitor. + +Run [node-joiner-monitor.sh](./node-joiner-monitor.sh): +```bash +$ ./node-joiner-monitor.sh 192.168.111.90 +``` + +The script will execute a command to monitor the node using a temporary namespace with +prefix `openshift-node-joiner-monitor` in the target cluster. The output of this command +is printed out to stdout. + +The script shows useful information about the node as it joins the cluster. +* Pre-flight validations. In case the node does not pass one or more validations, the installation will not start. The output of the failed validations are reported to allow users to fix the problem(s) when required. +* Installation progress indicating the current stage is shown. For example, writing of the image to disk, and initial reboot are reported. +* CSRs requiring the user's approval are shown. + +The script exits either after the node has joined the cluster and is in ready state or after 90 minutes have elapsed. + +Sample monitoring output: +``` +INFO[2024-04-29T22:45:39-04:00] Monitoring IPs: [192.168.111.90] +INFO[2024-04-29T22:48:17-04:00] Node 192.168.111.90: Assisted Service API is available +INFO[2024-04-29T22:48:17-04:00] Node 192.168.111.90: Cluster is adding hosts +INFO[2024-04-29T22:48:17-04:00] Node 192.168.111.90: Updated image information (Image type is "full-iso", SSH public key is set) +INFO[2024-04-29T22:48:22-04:00] Node 192.168.111.90: Host ca241aa5-4f86-42bf-95a3-6b7ab7d4d66a: Successfully registered +WARNING[2024-04-29T22:48:32-04:00] Node 192.168.111.90: Host couldn't synchronize with any NTP server +WARNING[2024-04-29T22:48:32-04:00] Node 192.168.111.90: Host extraworker-0: updated status from discovering to insufficient (Host does not meet the minimum hardware requirements: Host couldn't synchronize with any NTP server) +INFO[2024-04-29T22:49:28-04:00] Node 192.168.111.90: Host extraworker-0: updated status from known to installing (Installation is in progress) +INFO[2024-04-29T22:50:28-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 5% +INFO[2024-04-29T22:50:33-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 16% +INFO[2024-04-29T22:50:38-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 28% +INFO[2024-04-29T22:50:43-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 40% +INFO[2024-04-29T22:50:48-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 51% +INFO[2024-04-29T22:50:53-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 67% +INFO[2024-04-29T22:50:58-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 77% +INFO[2024-04-29T22:51:03-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 88% +INFO[2024-04-29T22:51:08-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 93% +INFO[2024-04-29T22:51:13-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Rebooting +INFO[2024-04-29T22:56:35-04:00] Node 192.168.111.90: Kubelet is running +INFO[2024-04-29T22:56:45-04:00] Node 192.168.111.90: First CSR Pending approval +INFO[2024-04-29T22:56:45-04:00] CSR csr-257ms with signerName kubernetes.io/kube-apiserver-client-kubelet and username system:serviceaccount:openshift-machine-config-operator:node-bootstrapper is Pending and awaiting approval +INFO[2024-04-29T22:58:50-04:00] Node 192.168.111.90: Second CSR Pending approval +INFO[2024-04-29T22:58:50-04:00] CSR csr-tc8xt with signerName kubernetes.io/kubelet-serving and username system:node:extraworker-0 is Pending and awaiting approval +INFO[2024-04-29T22:58:50-04:00] Node 192.168.111.90: Node joined cluster +INFO[2024-04-29T23:00:00-04:00] Node 192.168.111.90: Node is Ready +``` + diff --git a/docs/user/agent/add-node/node-joiner-monitor.sh b/docs/user/agent/add-node/node-joiner-monitor.sh new file mode 100755 index 00000000000..12b183001be --- /dev/null +++ b/docs/user/agent/add-node/node-joiner-monitor.sh @@ -0,0 +1,115 @@ +#!/bin/bash + +set -eu + +if [ $# -eq 0 ]; then + echo "At least one IP address must be provided" + exit 1 +fi + +ipAddresses=$@ + +# Setup a cleanup function to ensure to remove the temporary +# file when the script will be completed. +cleanup() { + if [ -f "$pullSecretFile" ]; then + echo "Removing temporary file $pullSecretFile" + rm "$pullSecretFile" + fi +} +trap cleanup EXIT TERM + +# Retrieve the pullsecret and store it in a temporary file. +pullSecretFile=$(mktemp -p "/tmp" -t "nodejoiner-XXXXXXXXXX") +oc get secret -n openshift-config pull-secret -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d > "$pullSecretFile" + +# Extract the baremetal-installer image pullspec from the current cluster. +nodeJoinerPullspec=$(oc adm release info --image-for=baremetal-installer --registry-config="$pullSecretFile") + +# Use the same random temp file suffix for the namespace. +namespace=$(echo "openshift-node-joiner-${pullSecretFile#/tmp/nodejoiner-}" | tr '[:upper:]' '[:lower:]') + +# Create the namespace to run the node-joiner-monitor, along with the required roles and bindings. +staticResources=$(cat </dev/null 2>&1 & \ No newline at end of file