Skip to content

Upgrade CAPO version to v0.12.2#152

Closed
Dmitriy Rabotyagov (noonedeadpunk) wants to merge 10 commits into
vexxhost:mainfrom
noonedeadpunk:feature/capo_0.12.2
Closed

Upgrade CAPO version to v0.12.2#152
Dmitriy Rabotyagov (noonedeadpunk) wants to merge 10 commits into
vexxhost:mainfrom
noonedeadpunk:feature/capo_0.12.2

Conversation

@noonedeadpunk
Copy link
Copy Markdown
Contributor

@noonedeadpunk Dmitriy Rabotyagov (noonedeadpunk) commented Apr 7, 2025

In CAPO version v0.11.2 there is a severe bug allowing to accomplish
Denial of Service by any tenant.

Manual removal of VM by tenant which is managed by CAPO results
in a pod crash in a loop. This has been fixed with [1] and is part
of the 0.12.2 release.

[1] kubernetes-sigs/cluster-api-provider-openstack#2477

@noonedeadpunk
Copy link
Copy Markdown
Contributor Author

recheck

@noonedeadpunk
Copy link
Copy Markdown
Contributor Author

OSError: [Errno 24] Too many open files - for linters seems unrelated

@noonedeadpunk Dmitriy Rabotyagov (noonedeadpunk) marked this pull request as draft April 7, 2025 16:55
@noonedeadpunk
Copy link
Copy Markdown
Contributor Author

So, in this CAPO version kind: Image is gone. So I'd guess that it also needs more modern CAPI or smth....

@noonedeadpunk
Copy link
Copy Markdown
Contributor Author

Yeah, ok, it's not capi version, but missing ORC which was split into separate project. doh.

@mnaser
Copy link
Copy Markdown
Member

Dmitriy Rabotyagov (@noonedeadpunk) could we get away with bumping to latest 0.11.x which might have the fix?

@noonedeadpunk
Copy link
Copy Markdown
Contributor Author

Mohammed Naser (@mnaser) this is the first I checked and unfortunately it's not there as of today. Probably could attempt backporting to 0.11, but I kinda not confident in stable policy in there :(

@mnaser
Copy link
Copy Markdown
Member

Ah, the team is pretty flexible at backporting things especially if it's a crash. One moment.

@noonedeadpunk
Copy link
Copy Markdown
Contributor Author

Oops, just realized I never added a fix, here it is: kubernetes-sigs/cluster-api-provider-openstack#2477

I'm also looking at what it would take to install ORC, as I'd guess sooner or later this needs to be done anyway.

@mnaser
Copy link
Copy Markdown
Member

I pushed kubernetes-sigs/cluster-api-provider-openstack#2507

I'll ping folks for a review and hopefully we can get that landed, would still need a release :(

@mnaser
Copy link
Copy Markdown
Member

I'm also looking at what it would take to install ORC, as I'd guess sooner or later this needs to be done anyway.

I think the best way to go about this is to go over the install instructions on a normal Kind cluster and then see how to "replicate" this into the playbook.

@noonedeadpunk
Copy link
Copy Markdown
Contributor Author

fwiw, regarding rocky failures in molecule here: we've spotted same failures caused by apparmor blocking PAM inside of the docker with EL, when host is running Ubuntu 24.04. And become/or SSH.

With SSH workaround was to comment out UsePAM, but for become - we just dropped become from the role....

@noonedeadpunk
Copy link
Copy Markdown
Contributor Author

Ok, so I was able to spawn a healthy cluster with this PR in:

~# openstack coe cluster show 1458e73e-2440-4aff-a57e-37d7acb46c2f -c created_at -c status -c health_status -c labels -c coe_version -c labels_added -c health_status_reason
+----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field                | Value                                                                                                                                                                                                             |
+----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| status               | CREATE_COMPLETE                                                                                                                                                                                                   |
| health_status        | HEALTHY                                                                                                                                                                                                           |
| created_at           | 2025-04-07T19:46:37+00:00                                                                                                                                                                                         |
| coe_version          | v1.31.1                                                                                                                                                                                                           |
| labels               | {'cloud_provider_enabled': 'True', 'kube_tag': 'v1.31.1', 'calico_tag': 'v3.29.0', 'octavia_provider': 'amphorav2', 'octavia_lb_algorithm': 'SOURCE_IP_PORT', 'availability_zone': 'az1', 'auto_scaling_enabled': |
|                      | 'False', 'auto_healing_enabled': 'False', 'master_lb_floating_ip_enabled': 'True', 'kube_dashboard_enabled': 'True', 'ingress_controller': 'octavia'}                                                             |
| labels_added         | {'availability_zone': 'az1', 'auto_scaling_enabled': 'False', 'auto_healing_enabled': 'False', 'master_lb_floating_ip_enabled': 'True', 'kube_dashboard_enabled': 'True', 'ingress_controller': 'octavia'}        |
| health_status_reason | {'kube-pldql-default-worker-t4hhs-24zr6-59h7k.Ready': 'True', 'kube-pldql-default-worker-t4hhs-24zr6-7wqc2.Ready': 'True', 'kube-pldql-gx976-jgs24.Ready': 'True'}                                                |
+----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Though it adds new required variable: cluster_api_openstack_controller_version: 2.0.3

Mohammed Naser (@mnaser) With that I was wondering - how CHANGELOG.md is managed? Manually or automated from some fragments?

@noonedeadpunk
Copy link
Copy Markdown
Contributor Author

The upgrade job seems validly broken :(

https://zuul.atmosphere.vexxhost.dev/build/5d0e70976ea648d9ab3dd9d548995378

Comment thread roles/cluster_api/tasks/patch.yml Outdated
Comment thread roles/cluster_api/tasks/patch.yml
Comment thread roles/cluster_api/tasks/main.yml Outdated
@noonedeadpunk
Copy link
Copy Markdown
Contributor Author

Regarding ansible-test - gitlab does install requirements for /opt/hostedtoolcache/Python/3.10.16/x64 but then ansible-test units tries to execute through /usr/bin/python3.12

@yaguangtang
Copy link
Copy Markdown
Member

Dmitriy Rabotyagov (@noonedeadpunk) I have fixed the CI issue

@noonedeadpunk
Copy link
Copy Markdown
Contributor Author

Would be really nice to get some reviews/progress on this one...

@noonedeadpunk
Copy link
Copy Markdown
Contributor Author

Any updates?

In CAPO version v0.11.2 there is a severe bug allowing to accomplish
Denial of Service by any tenant.

Manual removal of VM by tenant which is managed by CAPO results
in a pod crash in a loop. This has been fixed with [1] and is part
of the  0.12.2 release.

Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@gmail.com>
More modern CAPO also requires corresponding CAPI , otherwise
VM creation fails with:
`no matches for kind \"Image\" in version \"openstack.k-orc.cloud/v1alpha1\`

Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@gmail.com>
CAPO 0.12.0 has removed ORC [1] and now it needs to be installed
additionally.

[1] https://github.com/kubernetes-sigs/cluster-api-provider-openstack/releases/tag/v0.12.0

Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@gmail.com>
Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@gmail.com>
Mohammed Naser (mnaser) and others added 5 commits July 10, 2025 12:02
Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@gmail.com>
Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@gmail.com>
Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@gmail.com>
* feat: allow set capo instance creation timeoput

Signed-off-by: Tadas Sutkaitis <tadasas@gmail.com>

* fix: license and rename variable

Signed-off-by: Tadas Sutkaitis <tadasas@gmail.com>

* fix: patch using native kubernetes module

Signed-off-by: Tadas Sutkaitis <tadas.sutkaitis@vexxhost.com>

---------

Signed-off-by: Tadas Sutkaitis <tadasas@gmail.com>
Signed-off-by: Tadas Sutkaitis <tadas.sutkaitis@vexxhost.com>
Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@gmail.com>
Signed-off-by: Dong Ma <dong.ma@vexxhost.com>
Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@gmail.com>
@noonedeadpunk
Copy link
Copy Markdown
Contributor Author

omfg... Adding DCO seemed to pull in quite some unrelated things with rebase... I have no idea how to resolve that in github tbh at this point...

@noonedeadpunk
Copy link
Copy Markdown
Contributor Author

recheck - Error: etcdserver: request timed out

@noonedeadpunk
Copy link
Copy Markdown
Contributor Author

recheck

@noonedeadpunk
Copy link
Copy Markdown
Contributor Author

Dmitriy Rabotyagov (noonedeadpunk) commented Jul 10, 2025

In favor of #165 due to DCO mess-up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants