AGENT-903: monitor-add-nodes should only show CSRs matching node#8376
AGENT-903: monitor-add-nodes should only show CSRs matching node#8376openshift-merge-bot[bot] merged 11 commits intoopenshift:masterfrom
Conversation
The function now requires kubeconfig file path, rendezvousIP, and sshKey as parameters. Previously it had a single parameter, assetStore, and it searched the asset store to determine the three parameters above.
Adds the ability to monitor a node being added during day2. The command is: node-joiner monitor-add-nodes --kubeconfig <kubeconfig-file-path> <IP-address-of-node-to-monitor> Both the kubeconfig file and IP address are required. Multi node monitoring will be added in a future PR.
NewCluster needs both assetDir for install workflow and kubeconfigPath for addnodes workflow. Cluster.assetDir should only be initialized for the install workflow.
using feedback from Andrea Fasano.
Co-authored-by: Andrea Fasano <afasano@redhat.com>
|
@rwsu: This pull request references AGENT-903 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
I don't think we can assume that. The hostname is optional, and node-config.yaml should not be an input to monitor-add-nodes. |
The main issue here is the pending csr matching heuristic. Not sure if there's a better way, but the hostname is required to identify the host the csr is referring to, so we need to find a way to get it somehow. An alternative approach be trying to use net.LookupAddr on the specified IP(s) from the command line? |
|
How would this normally be done? e.g. if you scaled out a machineset? I know in IPI that there's a nodelink controller that links a Node to a Machine by matching the IP address of the Node to one of the IPs of the Machine, and successful linkage results in the CSR being approved. So my thought was that we can look up the Node with the IP matching the one the user gave us and then sign the matching CSR. Does that only apply to the second CSR? How does the other one get approved? |
|
There's pretty good docs here: https://github.com/openshift/cluster-machine-approver?tab=readme-ov-file#node-client-csr-approval-workflow
The NodeInternalDNS is the hostname obtained from inspection on baremetal. The server one is what I described above. It's slightly hard than I remembered, though: all of the alternative names in the certificate must match IPs or hostnames in the Machine. We'll only get one IP from the user. Usually the ones in the Machine are populated by inspection. Does assisted-service post anything back to the cluster (e.g. create a Machine object) that could help us? If not maybe we need it to. We could post a ConfigMap to a special namespace, and embed client creds with the ability to do that in the ISO. There's a reason we delayed actually approving CSRs beyond the initial preview release 🙂 For now if we're just filtering CSRs to decide which ones to tell the user about (and letting them approve manually), I think looking up the Node name in DNS as you suggested and checking the IP matches (for the client cert), and checking that one of the alternative names matches the IP (for the server cert) is sufficient. |
|
@rwsu: This pull request references AGENT-903 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
789eb50 to
3f022a9
Compare
pkg/agent/monitoraddnodes.go
Outdated
There was a problem hiding this comment.
if the http.Get(url) is successful (no err), then shouldn't isKubeletRunningOnNode() return true?
pkg/agent/monitoraddnodes.go
Outdated
There was a problem hiding this comment.
nit: could be worth resolving the hostname once (at the begininng, in the newAddNodeMonitor() and store the result in as a field in the struct, rather than trying to resolve it every time?
pkg/agent/monitoraddnodes.go
Outdated
There was a problem hiding this comment.
Logging this condition as an error may sound a little bit confusing for the user. Since it's a condition that does not prevent the monitor workflow to proceed, I think it could be logged with a message indicating that the hostname cannot be resolved, and thus the CSR checks are skipped.
pkg/agent/monitoraddnodes.go
Outdated
There was a problem hiding this comment.
Rather than returning an empty list, I'd suggest to refactor this portion of code so that if the hostname cannot be resolved then clusterHasFirstCSRPending and clusterHasSecondCSRPending would return true, to clearly indicate that these checks are skipped
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andfasano The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The first and second CSRs pending approval have the node name (hostname) embedded in their specs. monitor-add-nodes should only show CSRs pending approval for a specific node. Currently it shows all CSRs pending approval for all nodes. If the IP address of the node cannot be resolved to a hostname, we will not be able to determine if there are any CSRs pending approval for that node. The monitoring command will skip showing CSRs pending approval. In this case, users can still approve the CSRs, and the monitoring command will continue to check if the node has joined the cluster and has become Ready.
Previously, the code was resolving each time the CSRs were checked. Changed error logging to improve clarity.
|
/lgtm |
|
/label acknowledge-critical-fixes-only |
|
/override ci/prow/e2e-agent-compact-ipv4 |
|
@rwsu: Overrode contexts on behalf of rwsu: ci/prow/e2e-agent-compact-ipv4 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/override ci/prow/e2e-agent-compact-ipv4 |
|
@rwsu: Overrode contexts on behalf of rwsu: ci/prow/e2e-agent-compact-ipv4 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@rwsu: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/retest-required |
|
/override ci/prow/e2e-agent-compact-ipv4 |
|
@rwsu: Overrode contexts on behalf of rwsu: ci/prow/e2e-agent-compact-ipv4 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/override ci/prow/e2e-aws-ovn |
|
@rwsu: Overrode contexts on behalf of rwsu: ci/prow/e2e-aws-ovn DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[ART PR BUILD NOTIFIER] This PR has been included in build ose-installer-altinfra-container-v4.16.0-202405170342.p0.g52e2540.assembly.stream.el9 for distgit ose-installer-altinfra. |
The first and second CSRs pending approval have the node name (hostname) embedded in their specs. monitor-add-nodes should only show CSRs pending approval for a specific node. Currently it shows all CSRs pending approval for all nodes.
If the IP address of the node cannot be resolved to a hostname, we will not be able to determine if there are any CSRs pending approval for that node. The monitoring command will skip showing CSRs pending approval. In this case, users can still approve the CSRs, and the monitoring command will continue to check if the node has joined the cluster and has become Ready.