-
Notifications
You must be signed in to change notification settings - Fork 33
Rebase on to CAPO main #263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This avoids an extremely surprising write-after-read consistency bug when later returning after making subsequent object changes. The issue stems from the fact that we don't write to the same place we read from. We read from a cache: a shared informer which is populated from a watch running against the apiserver. However we write directly to the apiserver rather than writing through the cache. If we patch the watched object without returning, while we continue to execute the change will be written to the api server and will propagate to our own shared informer. This change will result in a new reconcile being queued for this version of the object which now has a Finalizer but no other changes. Taking the case of the machine controller we will then go on to create a server and we will patch the object with a providerID and return. On return we will write this change to the apiserver which will eventually propagate to our shared informer and result in another reconcile. However our previous patch has already propagated, so we will be *immediately* called with the *old version* of the object. If we've been careful this will hopefully be no more than inefficient, but can lead to very hard to debug errors. The safe way to patch objects when reconcile is called from an informer is to always return after patching an object which will directly result in another reconcile. We also *must not* set Requeue in the returned Result as this will result in being called again with the stale object. When patching an object we must return and wait for the informer to see the change and call the reconciler again.
…abel 🐛 Fix Tilt by adding CAPO label in tilt-provider.json
…ources ✨ Add Tags to API-Loadbalancer resources
…patch 🐛 Return from reconciler after adding finalizer
…ertificates 📖 Add documentation about --ca-cert flag
…hercloud 🌱 Bump gophercloud to v1.2.0
Works round some go mod dependency issues upgrading to v1.51.1, and is in line with what CAPI does.
While there, remove some legacy skipped paths and rename skipped_dirs to reflect that it actually skips paths (including ensure-golangci-lint.sh, which is a file, not a directory).
Download golangci-lint instead of building it
Failure of boilerplate.py did not cause the failure of verify-boilerplate.sh.
Also document the behaviour of each.
Fix boilerplate linter
The resource and machine tickers are responsible for fetching logs every 5 and 10 seconds respectively during a test run. Each fetch overwrites the previous one. These are problematic because they cause failure state to be overwritten by logs captured during deletion. That is, if a test ends in failure, we continue to capture these logs regularly during cleanup. This causes the test failure state to be overwritten by the state during cleanup. We already capture these resources after each test regardless of success or failure, so regular capture during the test is redundant. This change simply removes the tickers. We still get the same logs when we execute DumpSpecResourcesAndCleanup() after each test. Additionally, if there is an error during cleanup we optionally capture that state too, but to a separate directory.
Remove the resource and machine tickers from e2e tests
Fix "internal ip doesn't exist (yet)" in e2e logs
Uplift golang to 1.19.6 and x/net to 0.7.0 due https://osv.dev/vulnerability/GO-2023-1571
…nd-x-net 🐛 uplift golang and x/net
CAPO uses any ZA it finds, rather than checking whether the ZA is actually available, meaning you can get into a state where it's impossible to provision a cluster. This allows clients to filter based on AZ availability before scheduling.
🐛 Fix Provisioning to Unavailable AZs
🐛 Switch to "4" instead of "ipip" for rules
fix: fix typo of worker rules and controller rules
…ue_specs_gophercloud ✨ Support value specs for Ports
This adds additional information on how to create a test environment similar to the one used during continuous integration. Signed-off-by: Wolodja Wentland <[email protected]>
CAPO doesn't use vendoring, so let's ignore that directory from git. That will prevent someone from committing it.
🌱 gitignore: ignore vendor/ directory
|
/retest |
|
Can't speak to the failing test, but this looks like a straight forward reset to upstream |
Uses strategy `ours` to replace previous HEAD.
mandre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a straight copy of upstream's main branch, with openshift's needed commits clearly marked as CARRY patches.
❯ git log origin/capo-v0.8.0..k8s/main --oneline
(no result)
❯ git log k8s/main..origin/capo-v0.8.0 --oneline
5f50f580 (HEAD, origin/capo-v0.8.0) CARRY: go mod vendor
1415c4d1 CARRY: Don't ignore vendor directories
878a986d CARRY: Add OCP CI config
5ec07227 CARRY: Downstream OWNERS
|
I thought the test failure was just missing vendoring due to interaction with .gitignore, but it's more than that. I thought I had previously reproduced locally but I wasn't testing properly. To reproduce you need to explicitly set |
mandre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Looking at the diff between the upstream's main branch and this, we only have the necessary openshift bits.
$ git diff k8s/main...origin/capo-v0.8.0 --name-only -- . ':!vendor' ':!hack/tools/vendor'
.ci-operator.yaml
.gitignore
Dockerfile.rhel
OWNERS
OWNERS_ALIASES
|
/override ci/prow/test |
|
@mdbooth: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
@mandre: Overrode contexts on behalf of mandre: ci/prow/test DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@mdbooth: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mandre The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
The setup-envtest failure should be fixed by this upstream PR: kubernetes-sigs#1707 |
This is a rebase on to current upstream CAPO main branch.This is a merge with
oursstrategy of:i.e. It is a rebase which also explicitly declares itself a successor to the previous history.
I identified downstream carry patches with:
There were only 2:
However, several other patches were also returned which were backports to the upstream release-0.6 branch. We probably should not pull upstream release branches to main, but we will need a strategy for our own release branches.
For now I have renamed all carry patches to have a
CARRY:prefix and added a couple more to add vendoring.