Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 58 additions & 4 deletions docs/user/agent/add-node/add-nodes.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ hosts:
macAddress: 00:02:46:e3:9e:9c

## ISO generation
Run the [node-joiner.sh](./node-joiner.sh):
Run [node-joiner.sh](./node-joiner.sh):
```bash
$ ./node-joiner.sh
```
Expand All @@ -84,11 +84,12 @@ $ ./node-joiner.sh config.yaml
Use the iso image to boot all the nodes listed in the configuration file, and wait for the related
certificate signing requests (CSRs) to appear. When adding a new node to the cluster, two pending CSRs will
be generated, and they must be manually approved by the user.
Use the following command to monitor the pending certificates:

Use the following command or [node-joiner-monitor.sh](./node-joiner-monitor.sh) described below to monitor the pending certificates:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed that below there's a small pre-existing typo, User instead of use

```
$ oc get csr
```
User the `oc` `approve` command to approve them:
Use the `oc` `approve` command to approve them:
```
$ oc adm certificate approve <csr_name>
```
Expand All @@ -100,4 +101,57 @@ extra-worker-0 Ready worker 1h v1.29.3+8628c3c
master-0 Ready control-plane,master 31h v1.29.3+8628c3c
master-1 Ready control-plane,master 32h v1.29.3+8628c3c
master-2 Ready control-plane,master 32h v1.29.3+8628c3c
```
```

# Monitoring
After a node is booted using the ISO image, progress can be monitored using the node-joiner-monitor.sh script.

Download the [node-joiner-monitor.sh](./node-joiner-monitor.sh) script to a local directory.

The script requires the IP address of the node to monitor.

Run [node-joiner-monitor.sh](./node-joiner-monitor.sh):
```bash
$ ./node-joiner-monitor.sh 192.168.111.90
```

The script will execute a command to monitor the node using a temporary namespace with
prefix `openshift-node-joiner-monitor` in the target cluster. The output of this command
is printed out to stdout.

The script shows useful information about the node as it joins the cluster.
* Pre-flight validations. In case the node does not pass one or more validations, the installation will not start. The output of the failed validations are reported to allow users to fix the problem(s) when required.
* Installation progress indicating the current stage is shown. For example, writing of the image to disk, and initial reboot are reported.
* CSRs requiring the user's approval are shown.

The script exits either after the node has joined the cluster and is in ready state or after 90 minutes have elapsed.

Sample monitoring output:
```
INFO[2024-04-29T22:45:39-04:00] Monitoring IPs: [192.168.111.90]
INFO[2024-04-29T22:48:17-04:00] Node 192.168.111.90: Assisted Service API is available
INFO[2024-04-29T22:48:17-04:00] Node 192.168.111.90: Cluster is adding hosts
INFO[2024-04-29T22:48:17-04:00] Node 192.168.111.90: Updated image information (Image type is "full-iso", SSH public key is set)
INFO[2024-04-29T22:48:22-04:00] Node 192.168.111.90: Host ca241aa5-4f86-42bf-95a3-6b7ab7d4d66a: Successfully registered
WARNING[2024-04-29T22:48:32-04:00] Node 192.168.111.90: Host couldn't synchronize with any NTP server
WARNING[2024-04-29T22:48:32-04:00] Node 192.168.111.90: Host extraworker-0: updated status from discovering to insufficient (Host does not meet the minimum hardware requirements: Host couldn't synchronize with any NTP server)
INFO[2024-04-29T22:49:28-04:00] Node 192.168.111.90: Host extraworker-0: updated status from known to installing (Installation is in progress)
INFO[2024-04-29T22:50:28-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 5%
INFO[2024-04-29T22:50:33-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 16%
INFO[2024-04-29T22:50:38-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 28%
INFO[2024-04-29T22:50:43-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 40%
INFO[2024-04-29T22:50:48-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 51%
INFO[2024-04-29T22:50:53-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 67%
INFO[2024-04-29T22:50:58-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 77%
INFO[2024-04-29T22:51:03-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 88%
INFO[2024-04-29T22:51:08-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 93%
INFO[2024-04-29T22:51:13-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Rebooting
INFO[2024-04-29T22:56:35-04:00] Node 192.168.111.90: Kubelet is running
INFO[2024-04-29T22:56:45-04:00] Node 192.168.111.90: First CSR Pending approval
INFO[2024-04-29T22:56:45-04:00] CSR csr-257ms with signerName kubernetes.io/kube-apiserver-client-kubelet and username system:serviceaccount:openshift-machine-config-operator:node-bootstrapper is Pending and awaiting approval
INFO[2024-04-29T22:58:50-04:00] Node 192.168.111.90: Second CSR Pending approval
INFO[2024-04-29T22:58:50-04:00] CSR csr-tc8xt with signerName kubernetes.io/kubelet-serving and username system:node:extraworker-0 is Pending and awaiting approval
INFO[2024-04-29T22:58:50-04:00] Node 192.168.111.90: Node joined cluster
INFO[2024-04-29T23:00:00-04:00] Node 192.168.111.90: Node is Ready
```

115 changes: 115 additions & 0 deletions docs/user/agent/add-node/node-joiner-monitor.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
#!/bin/bash

set -eu

if [ $# -eq 0 ]; then
echo "At least one IP address must be provided"
exit 1
fi

ipAddresses=$@

# Setup a cleanup function to ensure to remove the temporary
# file when the script will be completed.
cleanup() {
if [ -f "$pullSecretFile" ]; then
echo "Removing temporary file $pullSecretFile"
rm "$pullSecretFile"
fi
}
trap cleanup EXIT TERM

# Retrieve the pullsecret and store it in a temporary file.
pullSecretFile=$(mktemp -p "/tmp" -t "nodejoiner-XXXXXXXXXX")
oc get secret -n openshift-config pull-secret -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d > "$pullSecretFile"

# Extract the baremetal-installer image pullspec from the current cluster.
nodeJoinerPullspec=$(oc adm release info --image-for=baremetal-installer --registry-config="$pullSecretFile")

# Use the same random temp file suffix for the namespace.
namespace=$(echo "openshift-node-joiner-${pullSecretFile#/tmp/nodejoiner-}" | tr '[:upper:]' '[:lower:]')

# Create the namespace to run the node-joiner-monitor, along with the required roles and bindings.
staticResources=$(cat <<EOF
apiVersion: v1
kind: Namespace
metadata:
name: ${namespace}
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: node-joiner-monitor
namespace: ${namespace}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: node-joiner-monitor
rules:
- apiGroups:
- certificates.k8s.io
resources:
- certificatesigningrequests
verbs:
- get
- list
- apiGroups:
- ""
resources:
- pods
- nodes
verbs:
- get
- list
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: node-joiner-monitor
subjects:
- kind: ServiceAccount
name: node-joiner-monitor
namespace: ${namespace}
roleRef:
kind: ClusterRole
name: node-joiner-monitor
apiGroup: rbac.authorization.k8s.io
EOF
)
echo "$staticResources" | oc apply -f -

# Run the node-joiner-monitor to monitor node joining cluster
nodeJoinerPod=$(cat <<EOF
apiVersion: v1
kind: Pod
metadata:
name: node-joiner-monitor
namespace: ${namespace}
annotations:
openshift.io/scc: anyuid
labels:
app: node-joiner-monitor
spec:
restartPolicy: Never
serviceAccountName: node-joiner-monitor
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: node-joiner-monitor
imagePullPolicy: IfNotPresent
image: $nodeJoinerPullspec
command: ["/bin/sh", "-c", "node-joiner monitor-add-nodes $ipAddresses --log-level=info; sleep 5"]
EOF
)
echo "$nodeJoinerPod" | oc apply -f -

oc project "${namespace}"

oc wait --for=condition=Ready=true --timeout=300s pod/node-joiner-monitor

oc logs -f -n "${namespace}" node-joiner-monitor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be the logs printed in real time? Not sure if the wait prevents the output to be displayed only when everything is completed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wait ends after the pod becomes running. The log should be printed in real time after the wait.


echo "Cleaning up"
oc delete namespace "${namespace}" --grace-period=0 >/dev/null 2>&1 &