Adds Managed APIServer Loadbalancer#401
Conversation
|
I'll test it today or tomorrow again against CoreOS with P.S. add I'll check what's wrong with the tests |
6025adc to
b2c47ed
Compare
|
I will have to rebase this on top of #404 when it's merged. |
|
Cherry-picked a0919d3 for now |
91a5c68 to
5696f1d
Compare
|
I tested the following
|
|
I don't have an env with LB in my hand, so have to read code and no manual test locally ... |
5696f1d to
94ee9e4
Compare
|
Review comments fixed. PTAL @chrigl |
|
|
||
| // ManagedAPIServerLoadBalancer defines whether a LoadBalancer for the | ||
| // APIServer should be created. If set to true the following properties are | ||
| // mandatory: APIServerLoadBalancerFloatingIP, APIServerLoadBalancerPort |
There was a problem hiding this comment.
Why is floating ip field necessary when this field is true if the comment for that field indicates it will be created if not specified?
There was a problem hiding this comment.
I'm not sure if I get it :). If this field is true that means an APIServerLoadBalancer should be created. If an APIServerLoadBalancer should be created we need a FloatingIP to associate it with the LB.
There was a problem hiding this comment.
The comment of the floating ip field doesnt't say it will be created if not specified. It says the FloatingIP will be created (in OpenStack) if it doesn't already exist before (in OpenStack). But you have to set the value.
There was a problem hiding this comment.
Oh I understand, the field is always necessary then right? Thanks for clarifying
There was a problem hiding this comment.
APIServerLoadBalancerFloatingIP is necessary if ManagedAPIServerLoadBalancer is set to true.
config/crds/openstackproviderconfig_v1alpha1_openstackclusterproviderstatus.yaml
Show resolved
Hide resolved
| klog.V(3).Info("No need to create loadbalancer, due to missing ExternalNetworkID") | ||
| return nil | ||
| } | ||
| if clusterProviderSpec.APIServerLoadBalancerFloatingIP == "" { |
There was a problem hiding this comment.
The comment about floating IP getting created if not provided is false then?
There was a problem hiding this comment.
Oh wait, is it because the floating IP field would start out as nil, so if it's empty then something else didn't go right when setting it?
There was a problem hiding this comment.
The idea is: You configure that the APIServerLoadBalancer should be managed. Only in this case this method here is used. Then it's also required that the FloatingIP is set.
| lbID := clusterProviderStatus.Network.APIServerLoadBalancer.ID | ||
| subnetID := clusterProviderStatus.Network.Subnet.ID | ||
|
|
||
| for _, port := range []int{22, 6443} { |
There was a problem hiding this comment.
Maybe make these constants so there's no magic numbers even if it's apparent what they're for. I'd say 6443 is less obvious if someone is unfamiliar with CAPI.
There was a problem hiding this comment.
Oh I actually added properties for this, just forgot to use them. Thx :)
There was a problem hiding this comment.
So I will iterate over: APIServerLoadBalancerPort & APIServerLoadBalancerAdditionalPorts
| } | ||
|
|
||
| func waitForLoadBalancer(networkingClient *gophercloud.ServiceClient, id, target string) error { | ||
| klog.Infof("Waiting for loadbalancer %s to become %s.", id, target) |
There was a problem hiding this comment.
What are all possible values of target? Link to documentation would be helpful
There was a problem hiding this comment.
Good point. I'll add this link: https://developer.openstack.org/api-ref/network/v2/?expanded=show-load-balancer-status-tree-detail#load-balancer-statuses
| klog.Infof("Returning IP from machine annotation %s", ip) | ||
| return ip, nil | ||
| clusterProviderSpec, err := providerv1.ClusterSpecFromProviderSpec(cluster.Spec.ProviderSpec) | ||
| if err != nil { |
There was a problem hiding this comment.
How do you know this error will always be because the IP isn't there?
There was a problem hiding this comment.
The idea is not to say that the error in ClusterSpecFromProviderSpec is because could not get ip. The idea is to concatenate the error from ClusterSpecFromProviderSpec with could not get IP to provider more context to the caller, which is a usual go pattern to build stacktraces.
If something like this isn't done you get some low level error in the log (e.g. like could not parse whatever) but you have no idea in which context something failed or what failed.
There was a problem hiding this comment.
Oh so the returned error will get shown in a greater context by the caller is what you're saying? How do you know that's the case here? I assume because you're familiar with what is calling functions from deployer.go and that's the typical pattern? Thanks for the explanation
There was a problem hiding this comment.
It's just a typical go pattern. The error is returned up the call stack and more and more context is added so you get a useful error message when you print it out. If you don't do this you'll end up with error's like "error parsing yaml" without any context and you have no idea what went wrong.
|
@CamelCaseNotation PTAL |
119e256 to
4e7408f
Compare
This commit adds the optional feature manage APIServer Loadbalancer. If the feature is enabled a LoadBalancer for the control plane node(s) is created. The cluster actuator creates the LoadBalancers and all corresponding objects. The machine actuator creates and removes LoadBalancer members for control plane nodes.
4e7408f to
e4ae3c5
Compare
|
Rebased to master ip PR. PTAL @jichenjc |
|
I like the approach, I knew the Loadbalancer stuff but need some more time |
|
I can use this PR to create a cluster locally, thanks for the update ~ |
Thx for testing this :). I'll re-test it this evening in my environment. |
sorry, I mean for the LB case, my env doesn't have octavia installed... by the way, I assume your currently implementation is based on Octavia, instead of LBaaS v1, right? |
|
Hi, I test this PR manually with Octavia. I deployed master node. Load balancer with floating ip was created, but server address in kubeconfig was master floating IP address. Is my configuration wrong? |
Hi, I only have Neutron in my environment no Octavia. So it's implemented based on that. (I wasn't aware it makes a difference) @hidekazuna Hm good question. I'm deploying the machines without floating ip. The code should return the APIServerLoadBalancerFloatingIP if ManagedAPIServerLoadBalancer is true: https://github.com/kubernetes-sigs/cluster-api-provider-openstack/pull/401/files#diff-266c339b51cf18a076cf106526357c53R32 |
|
@sbueringer I guess this might be a big question... https://wiki.openstack.org/wiki/Neutron/LBaaS/Deprecation you might know that this already deprecated and https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/using-octavia-ingress-controller.md has additional info on gopher cloud for ocatvia @chrigl @hidekazuna can you provide some insight on this ? I am ok to continue this way and move to octavia later on but I hope folks can be on same page |
|
@jichenjc Yes, neutron-lbaas is deprecated and Octavia became the reference implementation for Neutron LBaaS version 2. I confirmed using ocatavia as Service of LoadBalancer type on Cloud Provider OpenStack: https://www.hidekazuna.org/?p=983 I think we do not need ingress controller for our purpose. |
Do we need something like the octavia switch in terraform provider openstack? https://github.com/terraform-providers/terraform-provider-openstack/blob/77673807914733f1ebe536aafb7521c8189f2555/openstack/lb_v2_shared.go#L30 |
I think we might can support that, so go with LBaaSv1 now and do something to support Octavia later.. though Octavia is the future but at least we have some reference solution now. |
I agree. Just read the docs neutron-lbaas is deprecated but it currently implements LBaaSv2 (the same as Octavia) https://wiki.openstack.org/wiki/Neutron/LBaaS/Deprecation. But Octavia is a superset of the Neutron APIs.
I'm not sure but I think we can support Octavia by implementing a similar switch like the one I linked above from terraform.
Unfortunately, I cannot test it because I don't have access to Octavia right now and I will be stuck on neutron-lbaas for a while :/ |
|
Okay I just re-tested the current version on Neutron LBaaS v2 on VIO5 (Queens, in my case based on NSX-V). How should we progress with this PR? @jichenjc @hidekazuna |
@sbueringer I think we can merge if this PR works for Neutron LBaaS v2 on VIO5. After that (or in parallel), I test for Octavia and fix if needed. |
|
@sbueringer I tested this PR with the following yamls. but clusterapi-controllers-0 ended with panic. http://paste.openstack.org/show/755147/ panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x10948ab]
goroutine 182 [running]:
sigs.k8s.io/cluster-api-provider-openstack/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x105
panic(0x13037c0, 0x232c050)
/usr/local/go/src/runtime/panic.go:522 +0x1b5
sigs.k8s.io/cluster-api-provider-openstack/pkg/cloud/openstack/cluster.(*Actuator).storeCluster(0xc000467e50, 0xc0000cd040, 0xc0000cd520, 0xc0000f4dc0, 0xc0006fa900, 0x22c860f, 0xb)
/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/cloud/openstack/cluster/actuator.go:229 +0x28b
sigs.k8s.io/cluster-api-provider-openstack/pkg/cloud/openstack/cluster.(*Actuator).Reconcile.func1(0xc000467e50, 0xc0000cd040, 0xc0000cd520, 0xc0000f4dc0, 0xc0006fa900)
/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/cloud/openstack/cluster/actuator.go:83 +0x74
sigs.k8s.io/cluster-api-provider-openstack/pkg/cloud/openstack/cluster.(*Actuator).Reconcile(0xc000467e50, 0xc0000cd040, 0x0, 0x0)
/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/cloud/openstack/cluster/actuator.go:148 +0x8b2
sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/cluster-api/pkg/controller/cluster.(*ReconcileCluster).Reconcile(0xc00026a1b0, 0xc0000298a0, 0x7, 0xc000029890, 0x5, 0x2340180, 0x0, 0x0, 0x0)
/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/cluster-api/pkg/controller/cluster/cluster_controller.go:141 +0x39b
sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0002c0000, 0x0)
/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215 +0x1cc
sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1()
/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158 +0x36
sigs.k8s.io/cluster-api-provider-openstack/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc0005b3630)
/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x54
sigs.k8s.io/cluster-api-provider-openstack/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0005b3630, 0x3b9aca00, 0x0, 0x1, 0xc0004b72c0)
/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xf8
sigs.k8s.io/cluster-api-provider-openstack/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc0005b3630, 0x3b9aca00, 0xc0004b72c0) |
|
ok,let's wait for @hidekazuna issue to be fixed first then we can go ahead and merge |
|
@hidekazuna I pushed a fix for the panic. I guess we'll now see the Octavia error, which can probably only be fixed by implementing Octavia handling. Shouldn't there be some kind of auto-discovery possible? (e.g. something like |
|
@sbueringer Thanks fixing. but loadbalancer was not created and k8s cluster was up. My bootstrap machine is in the same project as in the target cluster so even floating ip is not attached, bootstrap machine can get kubeconfig. Error message is the followings. E0801 02:17:16.592773 1 actuator.go:427] Machine error openstack-master-m6hgj: Reconcile LoadBalancer Member err: network is not yet available in clusterProviderStatus
W0801 02:17:16.592839 1 machine_controller.go:226] Failed to create machine "openstack-master-m6hgj": Reconcile LoadBalancer Member err: network is not yet available in clusterProviderStatusApart from the quick fix, I studied cloud-provider-openstack. As we already know, cloud-provider-openstack has the UseOctavia option. It creates gophercloud.ServiceClient depends on the option. cloud-provider-openstack has five switch case by the UseOctavia option in pkg/cloudprovider/providers/openstack/openstack_loadbalancer.go. I think it is inevitable to use UseOctavia option. |
|
@hidekazuna sounds good to me |
|
ok, let's merge this /lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jichenjc, sbueringer The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
created |
* Adds Managed APIServer Loadbalancer This commit adds the optional feature manage APIServer Loadbalancer. If the feature is enabled a LoadBalancer for the control plane node(s) is created. The cluster actuator creates the LoadBalancers and all corresponding objects. The machine actuator creates and removes LoadBalancer members for control plane nodes. * add check for lb member * fix panic
This commit adds the optional feature manage APIServer Loadbalancer. If
the feature is enabled a LoadBalancer for the control plane node(s) is created.
The cluster actuator creates the LoadBalancers and all corresponding objects.
The machine actuator creates and removes LoadBalancer members for control plane
nodes.
What this PR does / why we need it:
This PR is also prereq for the implementation of multi-node control plane
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Implements part of #382