Skip to content

AGENT-863: node-joiner cluster script#8242

Merged
openshift-merge-bot[bot] merged 9 commits intoopenshift:masterfrom
andfasano:agent-day2-cluster-script
Apr 26, 2024
Merged

AGENT-863: node-joiner cluster script#8242
openshift-merge-bot[bot] merged 9 commits intoopenshift:masterfrom
andfasano:agent-day2-cluster-script

Conversation

@andfasano
Copy link
Contributor

This patch adds to the installer image the node-joiner binary, along with its required dependencies.
It also adds a node-joiner.sh script to allow running the node-joiner tool within the target cluster to be expanded.
Documentation on how to use it it's also provided.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 8, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Apr 8, 2024

@andfasano: This pull request references AGENT-863 which is a valid jira issue.

Details

In response to this:

This patch adds to the installer image the node-joiner binary, along with its required dependencies.
It also adds a node-joiner.sh script to allow running the node-joiner tool within the target cluster to be expanded.
Documentation on how to use it it's also provided.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from bfournie and rwsu April 8, 2024 17:46
@andfasano andfasano force-pushed the agent-day2-cluster-script branch from 489e176 to 469c264 Compare April 8, 2024 17:58
Copy link
Member

@zaneb zaneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. How close are we to having something similar for the wait-for command?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there's an opportunity here to modify the command to allow separate input and output directories, and have some sort of built-in signalling/waiting mechanism.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there really a need for having separate folders? Usually we kept the output in the same assets folder. Adding anyhow the file touch to the code, to have a simpler command

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary, but if the go command did what you actually wanted then you wouldn't need to write a shell script here, so that's why I said it's an opportunity 🙂
Any code that has to be built in to oc (i.e. the contents of this script) is very hard to change because you have basically no control/insight of what version the user uses.
Even putting this shell script into the container image would be better than having it here.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 15, 2024
@andfasano andfasano force-pushed the agent-day2-cluster-script branch from 6619444 to 0db9133 Compare April 15, 2024 14:53
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 15, 2024
@andfasano
Copy link
Contributor Author

This is great. How close are we to having something similar for the wait-for command?

@rwsu is making good progresses in #8171, once landed we could add a sibling script for the monitor command

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we should check the KUBECONFIG environment variable is specified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, in any case the oc command is going to tell that?

@andfasano andfasano force-pushed the agent-day2-cluster-script branch from f094417 to a86f74b Compare April 17, 2024 13:28
@andfasano andfasano force-pushed the agent-day2-cluster-script branch from a86f74b to 70d4619 Compare April 17, 2024 19:10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary, but if the go command did what you actually wanted then you wouldn't need to write a shell script here, so that's why I said it's an opportunity 🙂
Any code that has to be built in to oc (i.e. the contents of this script) is very hard to change because you have basically no control/insight of what version the user uses.
Even putting this shell script into the container image would be better than having it here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest a separate directory for this stuff. There are going to be 3 files that go together on a topic separate to the rest of the agent stuff, so it could be confusing to have them all mixed together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dedicated folder sounds good to me, but I read two files only (add-nodes.md and node-joiner.sh)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thought that just occurred to me: if the cluster has FIPS enabled we want to run this in FIPS mode, with the dynamically-linked binary. This is safe because we're always running it in our own container, which will have the right deps.
So I think we actually want to add this to the baremetal-installer container, not this one. That's actually a better fit because it contains binaries for only one CPU architecture.
Also we'll want to set CGO_ENABLED=1 in hack/build-node-joiner.sh.
Finally, we'll want to set the fips=1 karg in the ISO when FIPS is enabled on the cluster, if we don't already (suggest you raise a separate ticket for that).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create a new ticket for the FIPS support in the new GA epic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is as much for UPI users as agent users, so maybe we should just name this node.x86_64.iso?
We should think about how we will handle the architecture not being known (and potentially there being multiple in the future). Not needed for now, but it is coming.
Also PXE, presumably there will be a tarfile or something for that.
It may be time to create an Epic for GA-ing this feature, and start creating stories under it to track all the known work items that we'll need in addition to native oc support.

Copy link
Member

@zaneb zaneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RBAC for Secrets in all namespaces is something to tidy up at some point.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 24, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zaneb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 24, 2024
andfasano and others added 5 commits April 24, 2024 07:52
…th the required dependencies (nmstate and oc)
command now generates directly an exit code file.
random namespace generation.
config file name customizable.
Co-authored-by: Richard Su <rwsu@redhat.com>
@andfasano andfasano force-pushed the agent-day2-cluster-script branch from 5519a6d to 89bcfdf Compare April 24, 2024 11:52
Copy link
Contributor

@rwsu rwsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
# Runt the node-joiner pod to generate the ISO
# Run the node-joiner pod to generate the ISO

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 25, 2024
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD e45745b and 2 for PR HEAD 89bcfdf in total

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 65b89ed and 1 for PR HEAD 89bcfdf in total

@rwsu
Copy link
Contributor

rwsu commented Apr 25, 2024

/retest-required

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 25c12fc and 0 for PR HEAD 89bcfdf in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 25, 2024

@andfasano: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-e2e-aws-ovn-upgrade 89bcfdf link false /test okd-e2e-aws-ovn-upgrade
ci/prow/e2e-agent-ha-dualstack 89bcfdf link false /test e2e-agent-ha-dualstack
ci/prow/e2e-agent-compact-ipv4-appliance 89bcfdf link false /test e2e-agent-compact-ipv4-appliance
ci/prow/okd-e2e-agent-compact-ipv4 89bcfdf link false /test okd-e2e-agent-compact-ipv4
ci/prow/e2e-agent-compact-ipv4-appliance-diskimage 89bcfdf link false /test e2e-agent-compact-ipv4-appliance-diskimage
ci/prow/e2e-metal-assisted 89bcfdf link false /test e2e-metal-assisted

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link
Contributor

/hold

Revision 89bcfdf was retested 3 times: holding

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 25, 2024
@andfasano
Copy link
Contributor Author

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 26, 2024
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 59cc24d and 2 for PR HEAD 89bcfdf in total

@openshift-merge-bot openshift-merge-bot bot merged commit cb05c07 into openshift:master Apr 26, 2024
joepvd added a commit to joepvd/ocp-build-data that referenced this pull request Apr 29, 2024
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/ocp-build-data that referenced this pull request Apr 29, 2024
rwsu added a commit to rwsu/installer that referenced this pull request May 7, 2024
Derived from a similar script by Andrea Fasano
to generate the add-nodes ISO.

openshift#8242

This script tweaks it and creates a node-joiner-monitor
pod to monitor adding nodes to a cluster.

Co-authored-by: Andrea Fasano <andfasano@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants