-
Notifications
You must be signed in to change notification settings - Fork 438
Introduce Agent platform #760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
✔️ Deploy Preview for hypershift-docs ready! 🔨 Explore the source changes: 6a13985 🔍 Inspect the deploy log: https://app.netlify.com/sites/hypershift-docs/deploys/61b841b3e78c10000979fb44 😎 Browse the preview: https://deploy-preview-760--hypershift-docs.netlify.app |
vendor/modules.txt
Outdated
| ## explicit | ||
| github.com/docker/libtrust | ||
| # github.com/eranco74/cluster-api-provider-agent/api v0.0.0-20211202085429-b633556695e5 | ||
| ## explicit; go 1.16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#703 bumped to 1.17 - does this need updating?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TMU this means that this cluster-api-provider-agent version specifies go 1.16 in go.mod.
It shouldn't affect anything (see other packages below with explicit older go versions).
We will bump the cluster-api-provider-agent to 1.17 and vendor again once we have some functionality update
| </td> | ||
| </tr><tr><td><p>"Agent"</p></td> | ||
| <td><p>AgentPlatform represents user supplied insfrastructure booted with agents.</p> | ||
| </td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need some docs somewhere that describe the management cluster dependencies that allow the agent provider to work (I'm not sure exactly where, but without prior knowledge this doc isn't self-explanatory like the cloud provider platforms)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I opened a task on that
alvaroaleman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least some docs on how to use this and ideally an e2e test are also needed
| return "", fmt.Errorf("unable to fetch node objects: %w", err) | ||
| } | ||
| if len(nodes.Items) < 1 { | ||
| return "", fmt.Errorf("no node objects found: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The err here is always nil, as you return before if its non-nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was moved from the None platform as-is for reuse by the Agent platform, I'd rather not change it in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The err here is always nil, as you return before if its non-nil
Why though? would .List though an error if nodes.Items is of len 0? If so +1 to address separately and keep this PR focus on the multi-platform structure.
| }, | ||
| }, | ||
| } | ||
| specHash := hashStruct(agentMachineTemplate.Spec.Template.Spec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that if the spec ever gets changed (Doesn't happen today but might in the future?) we will create a new template, is that intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this from AWSMachineTemplate in the same file, I assumed this is how it was intended in HyperShift
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any change to the nodePool platform specifics it's propagated and result in a new template being created so rolling upgrades can be supportd. This is by design atm. Template cleanup needs to be implemented in a platform agnostic way aside from this PR #760 (comment)
| span.AddEvent("reconciled awsmachinetemplate", trace.WithAttributes(attribute.String("name", machineTemplate.GetName()))) | ||
| case hyperv1.AgentPlatform: | ||
| machineTemplate = AgentMachineTemplate(nodePool, controlPlaneNamespace) | ||
| if err := r.Get(ctx, client.ObjectKeyFromObject(machineTemplate), machineTemplate); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use CreateOrUpdate? The current name-changing approach means that we orphan old templates on config change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I copied this from AWS, see reconcileAWSMachineTemplate() in the same file. It does get and create.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Templates can't just be changed inline as that would break rolling upgrades. This needs to be solved in a provider agnostic way and it is tracked here https://github.com/openshift/hypershift/blob/main/hypershift-operator/controllers/nodepool/nodepool_controller.go#L606
Let's tackle this aside from this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need to be consolidated with some of the logic it's in reconcileAWSMachineTemplate
hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
Outdated
Show resolved
Hide resolved
hypershift-operator/controllers/hostedcluster/internal/platform/agent/agent.go
Show resolved
Hide resolved
|
/hold all comments addressed, testing to validate |
Also please squash/rephrase "Fix review comments" commit. |
6a13985 to
fcc4b2f
Compare
go.mod
Outdated
| github.com/clarketm/json v1.14.1 | ||
| github.com/coreos/ignition/v2 v2.10.1 | ||
| github.com/docker/distribution v2.7.1+incompatible | ||
| github.com/eranco74/cluster-api-provider-agent/api v0.0.0-20211202085429-b633556695e5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we now update to use https://github.com/openshift/cluster-api-provider-agent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
|
lgtm other than #760 (comment). |
The Agent platform uses cluster-api-provider-agent to match existing Agent resources with Machines. This is a platform-agnostic way of adding workers.
|
/lgtm |
|
As mentioned above, this is just an enabler development step towards eventually support an agent based workflow which will require docs and e2e testing. /approve |
|
/cancel hold |
|
/hold cancel |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: avishayt, enxebre The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
6 similar comments
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
@avishayt: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
The Agent platform uses cluster-api-provider-agent to match existing Agent resources with Machines. This is a platform-agnostic way of adding workers.