Skip to content

Conversation

@kevinrizza
Copy link
Member

testing ordered namespace delete failures for 1.33 rebase

munnerz and others added 30 commits March 20, 2025 20:19
…ImgChangeE2E

Add e2e test for Regular Container image change
Optimize DS Controller Performance: Reduce Work Duration Time & Minimize Cache Locking.
test: switch gotestsum quiet output format
DRA device taints: fix some race conditions
[PodLevelResources] Pod Level Hugepage Resources
…opagation

APIServerTracing: Respect trace context only for privileged users
CI integration scripts: reduce log noise from installing etcd
KEP-4742: Copy topology labels from Node objects to Pods upon binding/scheduling
…d_of_caching_cluster_events_in_binding

Call queue.Done() before PreBind phase, removing the pod in binding from inFlightPods to save memory
[KEP-2371] add test about container metrics from cadvisor
…size

disable in-place pod vertical scaling for swap enabled pods
Remove general available feature-gate CPUManager
…ularContainerImgChangeE2E

Revert "Add e2e test for Regular Container image change"
The defaulting of TimeAdded randomly broke some of the tests:

   TestList:
       resttest.go:1393: expected:
       []runtime.Object{(*resource.DeviceTaintRule)(0xc000b83080), (*resource.DeviceTaintRule)(0xc000b831e0)},
       got:
       []runtime.Object{(*resource.DeviceTaintRule)(0xc0003db608), (*resource.DeviceTaintRule)(0xc0003db750)}
       ...

   TestCreate:
    resttest.go:346: unexpected obj: &resource.DeviceTaintRule{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"foo2", GenerateName:"", Namespace:"", SelfLink:"", UID:"18d3084d-7d11-4575-8730-4650b81cf1a7", ResourceVersion:"8", Generation:1, CreationTimestamp:time.Date(2025, time.March, 21, 8, 27, 23, 0, time.Local), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:resource.DeviceTaintRuleSpec{DeviceSelector:(*resource.DeviceTaintSelector)(nil), Taint:resource.DeviceTaint{Key:"example.com/taint", Value:"", Effect:"NoExecute", TimeAdded:time.Date(2025, time.March, 21, 8, 27, 23, 0, time.Local)}}}, expected &resource.DeviceTaintRule{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"foo2", GenerateName:"", Namespace:"", SelfLink:"", UID:"18d3084d-7d11-4575-8730-4650b81cf1a7", ResourceVersion:"8", Generation:1, CreationTimestamp:time.Date(2025, time.March, 21, 8, 27, 23, 0, time.Local), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:resource.DeviceTaintRuleSpec{DeviceSelector:(*resource.DeviceTaintSelector)(nil), Taint:resource.DeviceTaint{Key:"example.com/taint", Value:"", Effect:"NoExecute", TimeAdded:time.Date(2025, time.March, 21, 8, 27, 24, 0, time.Local)}}}

Failure rate before: 3m40s: 1332 runs so far, 7 failures (0.53%)

It's not obvious from the test failure, but the difference is the
TimeAdded. Setting it beforehand to a value that can be encoded (i.e. truncated
to seconds) fixes the flake.

Failure rate after: 5m0s: 1825 runs so far, 0 failures
p0lyn0mial and others added 23 commits May 12, 2025 10:36
…navailable errors for the etcd health checker client

UPSTREAM: <carry>: replace newETCD3ProberMonitor with etcd3RetryingProberMonitor
This commit fixes bug 1919737.

https://bugzilla.redhat.com/show_bug.cgi?id=1919737

* pkg/proxy/iptables/proxier.go (syncProxyRules): Prefer a local endpoint
for the cluster DNS service.
similarly to what we do for the managed CPU (aka workload partitioning)
feature, introduce a master configuration file
`/etc/kubernetes/openshift-llc-alignment` which needs to be present for
the LLC alignment feature to be activated, in addition to the policy
option being required.

Note this replace the standard upstream feature gate check.

This can be dropped when the feature per  KEP
kubernetes/enhancements#4800 goes beta.

Signed-off-by: Francesco Romani <[email protected]>
Explicitly exclude etcd and etcd-readiness checks (OCPBUGS-48177)
and have etcd operator take responsibility for properly reporting etcd readiness.
Justification: kube-apiserver instances get removed from a load balancer when etcd starts
to report not ready (as will KA's /readyz). Client connections can withstand etcd unreadiness
longer than the readiness timeout is. Thus, it is not necessary to drop connections
in case etcd resumes its readiness before a client connection times out naturally.
This is a downstream patch only as OpenShift's way of using etcd is unique.
The existing patch retried any etcd error returned from storage with the code "Unavailable". Writes
can only be safely retried if the client can be absolutely sure that the initial attempt ended
before persisting any changes. The "Unavailable" code includes errors like "timed out" that can't be
safely retried for writes.
Signed-off-by: Peter Hunt <[email protected]>

UPSTREAM: <carry>: authorization: add minimumkubeletversion package

MinimumKubeletVersion is a way for an admin to declare that nodes any older than the
minimum version cannot authorize with the apiserver. This effectively prevents them from joining.

Doing so means the apiservers can trust newer features are usable on clusters with version skews

Signed-off-by: Peter Hunt <[email protected]>

UPSTREAM: <carry>: authorizer: move mininum kubelet version authorizer to pkg/kubeapiserver and add authorization mode

this does require a line of code be moved from the enablement package to stop a cyclical import

Signed-off-by: Peter Hunt <[email protected]>

UPSTREAM: <carry>: crdvalidation: move latency profile file to be agnostic of field

Signed-off-by: Peter Hunt <[email protected]>

UPSTREAM: <carry>: features: add MinimumKubeletVersion feature

Signed-off-by: Peter Hunt <[email protected]>
Upstream enables volume group snapshots by editing yaml files in a shell
script [1]. We can't use this script in openshift-tests.

Create a brand new, OCP specific test driver based on csi-driver-hostpath,
only with the --feature-gate=VolumeGroupSnapshot on external-snapshotter command line.

We will need to carry this patch until the feature graduates to GA. I've
chosen to create brand new files in this carry patch, so it can't conflict
with the existing ones.

1: https://github.com/kubernetes/kubernetes/blob/91d6fd3455c4a071408df20c7f48df221f2b6d30/test/e2e/testing-manifests/storage-csi/external-snapshotter/volume-group-snapshots/run_group_snapshot_e2e.sh
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 12, 2025
@openshift-ci-robot
Copy link

@kevinrizza: the contents of this pull request could not be automatically validated.

The following commits are valid:

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@openshift-ci-robot
Copy link

@kevinrizza: the contents of this pull request could not be automatically validated.

The following commits are valid:

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@openshift-ci
Copy link

openshift-ci bot commented May 13, 2025

@kevinrizza: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify-deps f365335 link false /test verify-deps
ci/prow/e2e-aws-ovn-fips 7ca2197 link true /test e2e-aws-ovn-fips
ci/prow/e2e-aws-ovn-runc 7ca2197 link true /test e2e-aws-ovn-runc
ci/prow/e2e-aws-ovn-crun 7ca2197 link true /test e2e-aws-ovn-crun
ci/prow/e2e-aws-ovn-techpreview 7ca2197 link false /test e2e-aws-ovn-techpreview
ci/prow/e2e-aws-ovn-serial 7ca2197 link true /test e2e-aws-ovn-serial
ci/prow/e2e-gcp 7ca2197 link true /test e2e-gcp
ci/prow/e2e-aws-ovn-downgrade 7ca2197 link true /test e2e-aws-ovn-downgrade
ci/prow/e2e-aws-ovn-upgrade 7ca2197 link true /test e2e-aws-ovn-upgrade
ci/prow/e2e-aws-ovn-techpreview-serial 7ca2197 link false /test e2e-aws-ovn-techpreview-serial
ci/prow/e2e-aws-ovn-cgroupsv2 7ca2197 link true /test e2e-aws-ovn-cgroupsv2
ci/prow/e2e-aws-crun-wasm 7ca2197 link true /test e2e-aws-crun-wasm
ci/prow/okd-scos-e2e-aws-ovn 7ca2197 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@kevinrizza kevinrizza closed this Jul 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backports/unvalidated-commits Indicates that not all commits come to merged upstream PRs. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. vendor-update Touching vendor dir or related files

Projects

None yet

Development

Successfully merging this pull request may close these issues.