-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ support ability to move custom objects in clusterctl #3337
Conversation
Hi @rsmitty. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@wfernandes @vincepri @fabriziopandini I'm still working through some tests and validating locally, but here's a first cut for you to look at on the "clusterctl move" feature we talked about yesterday. |
/ok-to-test |
5262434
to
d92bc61
Compare
/assign @fabriziopandini @wfernandes |
/milestone v0.3.x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rsmitty thanks for this new version of this PR!
The PR looks good apart from some nits.
However, there are two points I would like to clarify with you:
- It is possible that the external CRD is defining global objects? Because if this is the case there is a set of considerations that should be addressed: What happens if they are used by two namespaces? Should we leave them in the original cluster? What happens if they get moved two times?
- What is the contract around moving objects which are not related to Cluster API provider? What happens if we are moving an object while it is being reconciled?
@fabriziopandini I've fixed all the naming nits in an amended commit. Gonna resolve those. To talk about questions:
It's definitely possible that the external CRD is global. That's what we're doing. Servers and ServerClasses are cluster-wide and can be associated with one or more clusters. We handle setting "in-use" labels on these resources to keep them from getting used my clusters at the same time. That said, I think we agreed the other day that cluster-wide objects shouldn't get deleted from the source cluster. I'll look into making that happen, as it seems like the safest option. Getting moved two times I don't think should happen unless I'm misunderstanding. We do checks to de-dup the nodes in the object graph.
Great question and I'm not sure I have a good answer. Do we have any kind of way to know a resource is being actively reconciled? If not, I'm not sure we can provide anything other than best effort moving of these resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rsmitty! I noticed that when running the tests locally I would see some interesting failures. In clusterctl/client/cluster
when I run go test ./... -count=3
I see failures such as
--- FAIL: Test_getMoveSequence (0.01s)
--- FAIL: Test_getMoveSequence/A_ClusterResourceSet_applied_to_a_cluster (0.00s)
mover_test.go:447:
Expected
<[]string | len:1, cap:1>: [
"addons.cluster.x-k8s.io/v1alpha3, Kind=ClusterResourceSet, ns1/crs1",
]
to consist of
<[]interface {} | len:1, cap:1>: [
[
"cluster.x-k8s.io/v1alpha3, Kind=Cluster, ns1/cluster1",
"addons.cluster.x-k8s.io/v1alpha3, Kind=ClusterResourceSet, ns1/crs1",
],
]
the missing elements were
<[]interface {} | len:1, cap:1>: [
"cluster.x-k8s.io/v1alpha3, Kind=Cluster, ns1/cluster1",
]
--- FAIL: Test_objectMover_move (0.02s)
--- FAIL: Test_objectMover_move/A_ClusterResourceSet_applied_to_a_cluster (0.00s)
mover_test.go:506: ns1/cluster1 not deleted in source cluster
mover_test.go:510: error = Object 'Kind' is missing in 'unstructured object has no kind' when checking for ns1/cluster1 deleted in source cluster
mover_test.go:506: ns1/cluster1 not deleted in source cluster
mover_test.go:506: ns1/cluster1-ca not deleted in source cluster
mover_test.go:506: ns1/cluster1-kubeconfig not deleted in source cluster
FAIL
FAIL sigs.k8s.io/cluster-api/cmd/clusterctl/client/cluster 1.099s
FAIL
Not sure what's going on here though.
continue | ||
} | ||
if !apierrors.IsNotFound(err) { | ||
if oFrom.GetNamespace() != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if oFrom.GetNamespace() != "" { | |
if err== nil && oFrom.GetNamespace() != "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I can quite do this. I tried it at first. If err is nil and there's no namespace for the object, it won't meet this condition and will fail in the condition below with the error mover_test.go:508: error = <nil> when checking for /externalTest deleted in source cluster
@wfernandes I know, I've been seeing that error in local testing too. It's weird because it's flaky. I can run it multiple times and get diff behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rsmitty thanks for updating for global object management.
Getting moved two times I don't think should happen unless I'm misunderstanding.
Sorry for my poor explanation. The use case is:
- I have a source cluster with CAPI objects in ns1 and ns2, both referencing a global object Server1
- move --namespace ns1; Server1 get copied in target cluster, but it remains in the source Cluster as well (Server1 now exists both in the source and in the target )
- Server1 is changed in source (or in target), so the two version of Server1 are dis-aligned
- move --namespace ns1; Server1 should be copied in target cluster, but the version are dis-aligned.
How do we manage conflict resolution?
if err != nil { | ||
return err | ||
} | ||
|
||
// Discovery the object graph for the selected types: | ||
// - Nodes are defined the Kubernetes objects (Clusters, Machines etc.) identified during the discovery process. | ||
// - Edges are derived by the OwnerReferences between nodes. | ||
if err := objectGraph.Discovery(namespace, types); err != nil { | ||
if err := objectGraph.Discovery(namespace); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be a topic for a follow-up PR/something to be tracked in a issue, but once we determine the external CRD to move, it would be great to check if they exist on the target cluster (we are already doing this for the providers in checkTargetProviders)
if _, ok := crd.Labels[clusterctlv1.ClusterctlMoveLabelName]; ok { | ||
o.types[getKindAPIString(typeMeta)].forceMove = true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
move this above L227 and change it into
if _, ok := crd.Labels[clusterctlv1.ClusterctlMoveLabelName]; ok { | |
o.types[getKindAPIString(typeMeta)].forceMove = true | |
} | |
if _, ok := crd.Labels[clusterctlv1.ClusterctlMoveLabelName]; ok { | |
typeMeta.forceMove = true | |
} |
Updated to address @fabriziopandini's nits. Still working on the test flakiness, as well as pondering on your question about conflict resolution. Not sure we've got a good answer for that yet. |
I'm not sure there's a great way to manage divergence between two global resources in two different clusters. We do, at least, already check to see if a resource already exists in the target cluster during the move step. Maybe we could refuse to update global resources during that check? But then again, it may be that I'm misunderstanding the longer term goals of the clusterctl move command. We're really wanting to use it specifically for bootstrap => initial management plane porting, as well as the occasional need to move the management plane to new servers or something. It's something I've not really thought about happening regularly and I've been assuming the source cluster essentially gets torn down after the move happens, so divergence wouldn't really matter. |
@rsmitty The long term vision for Hope that helps! |
@rsmitty @vincepri I have opened #3354 to discuss the log term vision for In the future, a possible solution to properly handle external resources (not CAPI resources) is to implement a set of hooks/a plugin mechanism so clusterctl can trigger resource specific tasks, but this requires some more design |
/milestone v0.3.9 |
@fabriziopandini I think that makes a lot of sense and I'd like to see this landed as soon as we're all comfortable with it. I added a note in the docs that specifically point out that global resources are copied, but not deleted from the source cluster. |
1256057
to
76e813f
Compare
There's something flaky going on with the mover_test and the ClusterResourceSet that I can not figure out for the life of me. I don't think it's necessarily an issue with this PR, but I'd love another set of eyes on it if you get some free time @fabriziopandini. |
I'm trying to reproduce errors locally |
This PR will add the ability for anything in clusterctl's object graph that is part of a CRD with the label `clusterctl.cluster.x-k8s.io/move: ""` to also be eligible for moving. This allows for providers to explicitly tag their own custom resources and ensure that they're also processed as part of `clusterctl move` Signed-off-by: Spencer Smith <[email protected]>
Cool, looks like we're passing tests in CI now (as well as locally). Thanks @fabriziopandini! Hopefully we're ready to go with this PR now. |
/lgtm |
I ran all the tests locally and nothing else stood out. |
Unfortunately, I don't have approver access. I'll defer to @vincepri 🙂 |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fabriziopandini, wfernandes The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This PR will add the ability for anything in clusterctl's object graph
that is part of a CRD with the label
clusterctl.cluster.x-k8s.io/move: ""
to also be eligible for moving.This allows for providers to explicitly tag their own custom resources
and ensure that they're also processed as part of
clusterctl move
Signed-off-by: Spencer Smith [email protected]
Fixes #3329