Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ done
echo "Gathering bootstrap journals ..."
mkdir -p "${ARTIFACTS}/bootstrap/journals"
for service in approve-csr bootkube crio crio-configure image-customization ironic ironic-dnsmasq ironic-httpd ironic-ramdisk-logs \
kubelet master-bmh-update metal3-baremetal-operator release-image release-image-download sssd
kubelet master-bmh-update metal3-baremetal-operator release-image release-image-download sssd node-image-pull
do
journalctl --boot --no-pager --output=short --unit="${service}" > "${ARTIFACTS}/bootstrap/journals/${service}.log"
done
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
#!/bin/bash
set -euo pipefail

# shellcheck source=bootstrap-service-record.sh
. /usr/local/bin/bootstrap-service-record.sh

record_service_stage_start "node-image-pull"

# shellcheck source=release-image.sh.template
. /usr/local/bin/release-image.sh

Expand Down Expand Up @@ -64,6 +69,7 @@ ref=$(ostree refs --repo "${ostree_repo}" | grep ^ostree/container/image/docker)
if [ $(echo "$ref" | wc -l) != 1 ]; then
echo "Expected single docker ref, found:"
echo "$ref"
record_service_stage_failure
exit 1
fi
ostree refs --repo "${ostree_repo}" "$ref" --create coreos/node-image
Expand All @@ -88,3 +94,5 @@ if grep -q coreos.liveiso= /proc/cmdline; then
echo "Deleting temporary repo"
rm -rf "${ostree_repo}"
fi

record_service_stage_success
10 changes: 10 additions & 0 deletions pkg/gather/service/analyze.go
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ func analyzeGatherBundle(bundleFile io.Reader) error {
optional bool
}{
{name: "release-image", check: checkReleaseImageDownload, optional: false},
{name: "node-image-pull", check: checkNodeImagePull, optional: false},
{name: "bootkube", check: checkBootkubeService, optional: false},
Comment on lines 87 to 89
Copy link
Member

@tthvo tthvo Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{name: "release-image", check: checkReleaseImageDownload, optional: false},
{name: "node-image-pull", check: checkNodeImagePull, optional: false},
{name: "bootkube", check: checkBootkubeService, optional: false},
{name: "node-image-pull", check: checkNodeImagePull, optional: false},
{name: "release-image", check: checkReleaseImageDownload, optional: false},
{name: "bootkube", check: checkBootkubeService, optional: false},

I think the order matters right, according to #4751 (comment)?

IIUC, node-image-pull is first to start before the other two 🤔 as I saw the release-image never seemed to start when node-image-pull is throwing errors...Though, I am clueless how that works because the service unit files don't define such dependencies 😞

$ cat log-bundle-20251027132247/bootstrap/journals/node-image-pull.log 
...output-omitted...
Oct 27 19:48:33 ip-10-0-160-222 node-image-pull.sh[1949]: Failed to fetch release image; retrying...
Oct 27 19:48:43 ip-10-0-160-222 ostree-containe[2243]: Fetching ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c7ba2a9638c369c24f9d564f9bfa8d59154df08085bb75510454b98aa0fda51e
Oct 27 19:48:44 ip-10-0-160-222 node-image-pull.sh[2243]: error: Creating importer: failed to invoke method OpenImage: failed to invoke method OpenImage: reading manifest sha256:c7ba2a9638c369c24f9d564f9bfa8d59154df08085bb75510454b98aa0fda51e in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized
...output-omitted...

$ cat log-bundle-20251027132247/bootstrap/journals/release-image.log 
-- No entries --

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current change, we will only ever see the below, which is not what we want right?

$ openshift-install analyze --file=log-bundle-20251027132247.tar.gz 
ERROR The bootstrap machine did not execute the release-image.service systemd unit 

If I change the order as above comment, we can now see:

$ openshift-install analyze --file=log-bundle-20251027132247.tar.gz 
ERROR Node image pull failed on the bootstrap machine 
INFO        

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the empty INFO line, which is supposed to print the last 3 lines of service logs. Here, it is not. It seems like the node-image-pull service is looping on the bootstrap and never ends; so its error is never captured.

while ! ostree container image pull --authfile "/root/.docker/config.json" \
"${ostree_repo}" ostree-unverified-image:docker://"${COREOS_IMAGE}"; do
echo 'Failed to fetch release image; retrying...'
sleep 10
done

$ systemctl status node-image-pull
● node-image-pull.service - Node Image Pull
     Loaded: loaded (/etc/systemd/system/node-image-pull.service; static)
     Active: activating (start) since Mon 2025-10-27 19:47:56 UTC; 1h 9min ago
    Process: 1943 ExecStartPre=chcon --reference=/usr/bin/ostree /usr/local/bin/node-image-pull.sh (code=exited, status=0/SUCCESS)
   Main PID: 1949 (node-image-pull)
      Tasks: 2 (limit: 99952)
     Memory: 608.0M
        CPU: 1min 10.703s
     CGroup: /system.slice/node-image-pull.service
             ├─1949 /bin/bash /usr/local/bin/node-image-pull.sh
             └─7897 sleep 10

Oct 27 20:57:05 ip-10-0-160-222 node-image-pull.sh[1949]: Failed to fetch release image; retrying...
Oct 27 20:57:15 ip-10-0-160-222 ostree-containe[7814]: Fetching ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c7ba2a9638c369c24f9d564f9bf>
Oct 27 20:57:15 ip-10-0-160-222 node-image-pull.sh[7814]: error: Creating importer: failed to invoke method OpenImage: failed to invoke method OpenImage: reading manifest s>
Oct 27 20:57:15 ip-10-0-160-222 node-image-pull.sh[1949]: Failed to fetch release image; retrying...
Oct 27 20:57:25 ip-10-0-160-222 ostree-containe[7826]: Fetching ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c7ba2a9638c369c24f9d564f9bf>
Oct 27 20:57:26 ip-10-0-160-222 node-image-pull.sh[7826]: error: Creating importer: failed to invoke method OpenImage: failed to invoke method OpenImage: reading manifest s>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can also improve the UX a bit by checking if the error message is present. If not, we can direct the user to the log file. It seems like the simplest way. WDYT @patrickdillon ?

func (a analysis) logLastError() {
for _, l := range strings.Split(a.lastError, "\n") {
logrus.Info(l)
}
}

}
for _, check := range analysisChecks {
Expand Down Expand Up @@ -114,6 +115,15 @@ func checkReleaseImageDownload(a analysis) bool {
return false
}

func checkNodeImagePull(a analysis) bool {
if a.successful {
return true
}
logrus.Error("Node image pull failed on the bootstrap machine")
a.logLastError()
return false
}

// bootstrap-verify-api-servel-urls.sh is currently running as part of the bootkube service.
// And the verification of the API and API-Int URLs are the only stage where a failure is
// currently reported. So, here we are able to conclude that a failure corresponds to a
Expand Down
60 changes: 42 additions & 18 deletions pkg/gather/service/analyze_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,15 @@ func failedReleaseImage() []logrus.Entry {
}
}

func failedNodeImagePull() []logrus.Entry {
return []logrus.Entry{
{Level: logrus.ErrorLevel, Message: "Node image pull failed on the bootstrap machine"},
{Level: logrus.InfoLevel, Message: "Line 1"},
{Level: logrus.InfoLevel, Message: "Line 2"},
{Level: logrus.InfoLevel, Message: "Line 3"},
}
}

func failedURLChecks() []logrus.Entry {
return []logrus.Entry{
{Level: logrus.InfoLevel, Message: "Line 1"},
Expand Down Expand Up @@ -69,56 +78,69 @@ func TestAnalyzeGatherBundle(t *testing.T) {
{
name: "bootkube not started",
files: map[string]string{
"log-bundle/bootstrap/services/release-image.json": generateSuccessOutput("pull-release-image"),
"log-bundle/bootstrap/services/bootkube.json": "[]",
"log-bundle/bootstrap/services/release-image.json": generateSuccessOutput("pull-release-image"),
"log-bundle/bootstrap/services/node-image-pull.json": generateSuccessOutput("node-image-pull"),
"log-bundle/bootstrap/services/bootkube.json": "[]",
},
expectedOutput: []logrus.Entry{
{Level: logrus.ErrorLevel, Message: "The bootstrap machine did not execute the bootkube.service systemd unit"},
},
},
{
name: "release-image and API Server URL successful",
name: "release-image, node-image and API Server URL successful",
files: map[string]string{
"log-bundle/bootstrap/services/release-image.json": generateSuccessOutput("pull-release-image"),
"log-bundle/bootstrap/services/bootkube.json": generateSuccessOutput("check-api-url"),
"log-bundle/bootstrap/services/release-image.json": generateSuccessOutput("pull-release-image"),
"log-bundle/bootstrap/services/node-image-pull.json": generateSuccessOutput("node-image-pull"),
"log-bundle/bootstrap/services/bootkube.json": generateSuccessOutput("check-api-url"),
},
},
{
name: "release-image and API Server URL successful bootstrap-in-place",
files: map[string]string{
"log-bundle/log-bundle-bootstrap/bootstrap/services/release-image.json": generateSuccessOutput("pull-release-image"),
"log-bundle/bootstrap/services/bootkube.json": generateSuccessOutput("check-api-url"),
"log-bundle/log-bundle-bootstrap/bootstrap/services/release-image.json": generateSuccessOutput("pull-release-image"),
"log-bundle/log-bundle-bootstrap/bootstrap/services/node-image-pull.json": generateSuccessOutput("node-image-pull"),
"log-bundle/bootstrap/services/bootkube.json": generateSuccessOutput("check-api-url"),
},
},
{
name: "only release-image failed",
files: map[string]string{
"log-bundle/bootstrap/services/release-image.json": generateFailureOutput("pull-release-image"),
"log-bundle/bootstrap/services/bootkube.json": generateSuccessOutput("check-api-url"),
},
expectedOutput: failedReleaseImage(),
},
{
name: "only node-image-pull failed",
files: map[string]string{
"log-bundle/bootstrap/services/release-image.json": generateSuccessOutput("pull-release-image"),
"log-bundle/bootstrap/services/node-image-pull.json": generateFailureOutput("node-image-pull"),
},
expectedOutput: failedNodeImagePull(),
},
{
name: "API Server URL failed",
files: map[string]string{
"log-bundle/log-bundle-bootstrap/bootstrap/services/release-image.json": generateSuccessOutput("pull-release-image"),
"log-bundle/bootstrap/services/bootkube.json": generateFailureOutput("check-api-url"),
"log-bundle/log-bundle-bootstrap/bootstrap/services/release-image.json": generateSuccessOutput("pull-release-image"),
"log-bundle/log-bundle-bootstrap/bootstrap/services/node-image-pull.json": generateSuccessOutput("node-image-pull"),
"log-bundle/bootstrap/services/bootkube.json": generateFailureOutput("check-api-url"),
},
expectedOutput: failedURLChecks(),
},
{
name: "API-INT Server URL failed",
files: map[string]string{
"log-bundle/log-bundle-bootstrap/bootstrap/services/release-image.json": generateSuccessOutput("pull-release-image"),
"log-bundle/bootstrap/services/bootkube.json": generateFailureOutput("check-api-int-url"),
"log-bundle/log-bundle-bootstrap/bootstrap/services/release-image.json": generateSuccessOutput("pull-release-image"),
"log-bundle/log-bundle-bootstrap/bootstrap/services/node-image-pull.json": generateSuccessOutput("node-image-pull"),
"log-bundle/bootstrap/services/bootkube.json": generateFailureOutput("check-api-int-url"),
},
expectedOutput: failedURLChecks(),
},
{
name: "both release-image and API Server URLs failed",
files: map[string]string{
"log-bundle/log-bundle-bootstrap/bootstrap/services/release-image.json": generateFailureOutput("pull-release-image"),
"log-bundle/bootstrap/services/bootkube.json": generateFailureOutput("check-api-url"),
"log-bundle/log-bundle-bootstrap/bootstrap/services/release-image.json": generateFailureOutput("pull-release-image"),
"log-bundle/log-bundle-bootstrap/bootstrap/services/node-image-pull.json": generateFailureOutput("node-image-pull"),
"log-bundle/bootstrap/services/bootkube.json": generateFailureOutput("check-api-url"),
},
expectedOutput: failedReleaseImage(),
},
Expand All @@ -135,8 +157,9 @@ func TestAnalyzeGatherBundle(t *testing.T) {
{
name: "empty bootkube.json",
files: map[string]string{
"log-bundle/bootstrap/services/release-image.json": generateSuccessOutput("pull-release-image"),
"log-bundle/bootstrap/services/bootkube.json": "",
"log-bundle/bootstrap/services/release-image.json": generateSuccessOutput("pull-release-image"),
"log-bundle/log-bundle-bootstrap/bootstrap/services/node-image-pull.json": generateSuccessOutput("node-image-pull"),
"log-bundle/bootstrap/services/bootkube.json": "",
},
expectedOutput: []logrus.Entry{
{Level: logrus.InfoLevel, Message: "Could not analyze the bootkube.service: service entries file does not begin with a token: EOF"},
Expand All @@ -156,8 +179,9 @@ func TestAnalyzeGatherBundle(t *testing.T) {
{
name: "malformed bootkube.json",
files: map[string]string{
"log-bundle/bootstrap/services/release-image.json": generateSuccessOutput("pull-release-image"),
"log-bundle/bootstrap/services/bootkube.json": "{}",
"log-bundle/bootstrap/services/release-image.json": generateSuccessOutput("pull-release-image"),
"log-bundle/log-bundle-bootstrap/bootstrap/services/node-image-pull.json": generateSuccessOutput("node-image-pull"),
"log-bundle/bootstrap/services/bootkube.json": "{}",
},
expectedOutput: []logrus.Entry{
{Level: logrus.InfoLevel, Message: "Could not analyze the bootkube.service: service entries file does not begin with an array"},
Expand Down