Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update k8s-staging-test-infra GCR images as needed #32863

Merged

Conversation

k8s-infra-ci-robot
Copy link
Contributor

@k8s-infra-ci-robot k8s-infra-ci-robot commented Jun 27, 2024

No gcr.io/k8s-testimages/ changes.

Multiple distinct gcr.io/k8s-staging-test-infra changes:

Commits Dates Images
69ac574...6dd397d 2024‑02‑05 → 2024‑06‑27 bigquery
3b134c2...6dd397d 2024‑03‑08 → 2024‑06‑27 bootstrap
597c402...1dde27f 2024‑06‑11 → 2024‑06‑25 kubekins-e2e(1.29), kubekins-e2e(master)
1dde27f...6dd397d 2024‑06‑25 → 2024‑06‑27 krte(1.27), krte(1.28), krte(1.29), krte(1.30), krte(experimental), krte(master)

No us-central1-docker.pkg.dev/k8s-staging-test-infra/images changes.

No gcr.io/k8s-staging-apisnoop/ changes.

No gcr.io/k8s-staging-apisnoop/ changes.

/cc @nathanperkins

@k8s-ci-robot
Copy link
Contributor

@k8s-infra-ci-robot: GitHub didn't allow me to request PR reviews from the following users: nathanperkins.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

No gcr.io/k8s-testimages/ changes.

Multiple distinct gcr.io/k8s-staging-test-infra changes:

Commits Dates Images
597c402...1dde27f 2024‑06‑11 → 2024‑06‑25 kubekins-e2e(master)

No us-central1-docker.pkg.dev/k8s-staging-test-infra/images changes.

No gcr.io/k8s-staging-apisnoop/ changes.

No gcr.io/k8s-staging-apisnoop/ changes.

/cc @nathanperkins

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. area/config Issues or PRs related to code in /config area/jobs sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jun 27, 2024
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. area/conformance Issues or PRs related to kubernetes conformance tests and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 27, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: k8s-infra-ci-robot
Once this PR has been reviewed and has the lgtm label, please assign bentheelder for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/images area/release-eng Issues or PRs related to the Release Engineering subproject sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/release Categorizes an issue or PR as relevant to SIG Release. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels Jun 27, 2024
@k8s-infra-ci-robot k8s-infra-ci-robot force-pushed the prowjobs-autobump branch 3 times, most recently from 81dab82 to 5ff2aad Compare June 28, 2024 22:07
@k8s-ci-robot k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Jun 28, 2024
…nd k8s-staging-test-infra AR images and k8s-staging-apisnoop-test-infra and k8s-staging-apisnoop-apisnoop

No gcr.io/k8s-testimages/ changes.

Multiple distinct gcr.io/k8s-staging-test-infra changes:

Commits | Dates | Images
--- | --- | ---
kubernetes/test-infra@69ac574...6dd397d | 2024‑02‑05 → 2024‑06‑27 | bigquery
kubernetes/test-infra@3b134c2...6dd397d | 2024‑03‑08 → 2024‑06‑27 | bootstrap
kubernetes/test-infra@597c402...1dde27f | 2024‑06‑11 → 2024‑06‑25 | kubekins-e2e(1.29), kubekins-e2e(master)
kubernetes/test-infra@1dde27f...6dd397d | 2024‑06‑25 → 2024‑06‑27 | krte(1.27), krte(1.28), krte(1.29), krte(1.30), krte(experimental), krte(master)

No us-central1-docker.pkg.dev/k8s-staging-test-infra/images changes.

No gcr.io/k8s-staging-apisnoop/ changes.

No gcr.io/k8s-staging-apisnoop/ changes.
@dims dims added the skip-review Indicates a PR is trusted, used by tide for auto-merging PRs. label Jul 1, 2024
@k8s-ci-robot k8s-ci-robot merged commit 63170bb into kubernetes:master Jul 1, 2024
7 checks passed
@k8s-ci-robot
Copy link
Contributor

@k8s-infra-ci-robot: Updated the job-config configmap in namespace default at cluster test-infra-trusted using the following files:

  • key cloud-provider-kind-periodic.yaml using file config/jobs/kubernetes-sigs/cloud-provider-kind/cloud-provider-kind-periodic.yaml
  • key cloud-provider-kind-presubmits.yaml using file config/jobs/kubernetes-sigs/cloud-provider-kind/cloud-provider-kind-presubmits.yaml
  • key cluster-api-provider-azure-presubmits-main.yaml using file config/jobs/kubernetes-sigs/cluster-api-provider-azure/cluster-api-provider-azure-presubmits-main.yaml
  • key kind-presubmits.yaml using file config/jobs/kubernetes-sigs/kind/kind-presubmits.yaml
  • key kind-release-blocking.yaml using file config/jobs/kubernetes-sigs/kind/kind-release-blocking.yaml
  • key kind.yaml using file config/jobs/kubernetes-sigs/kind/kind.yaml
  • key hnc-e2e.yaml using file config/jobs/kubernetes-sigs/wg-multi-tenancy/hnc-e2e.yaml
  • key mtb-presubmit.yaml using file config/jobs/kubernetes-sigs/wg-multi-tenancy/mtb-presubmit.yaml
  • key conformance-audit.yaml using file config/jobs/kubernetes/sig-arch/conformance-audit.yaml
  • key sig-instrumentation-kind-periodics.yaml using file config/jobs/kubernetes/sig-instrumentation/sig-instrumentation-kind-periodics.yaml
  • key sig-instrumentation-presubmit.yaml using file config/jobs/kubernetes/sig-instrumentation/sig-instrumentation-presubmit.yaml
  • key sig-k8s-infra-test-infra.yaml using file config/jobs/kubernetes/sig-k8s-infra/trusted/sig-k8s-infra-test-infra.yaml
  • key sig-test-infra.yaml using file config/jobs/kubernetes/sig-k8s-infra/trusted/sig-test-infra.yaml
  • key sig-network-kind.yaml using file config/jobs/kubernetes/sig-network/sig-network-kind.yaml
  • key sig-node-presubmit.yaml using file config/jobs/kubernetes/sig-node/sig-node-presubmit.yaml
  • key 1.27.yaml using file config/jobs/kubernetes/sig-release/release-branch-jobs/1.27.yaml
  • key 1.28.yaml using file config/jobs/kubernetes/sig-release/release-branch-jobs/1.28.yaml
  • key 1.29.yaml using file config/jobs/kubernetes/sig-release/release-branch-jobs/1.29.yaml
  • key 1.30.yaml using file config/jobs/kubernetes/sig-release/release-branch-jobs/1.30.yaml
  • key sig-scheduling-config.yaml using file config/jobs/kubernetes/sig-scheduling/sig-scheduling-config.yaml
  • key sig-storage-kind.yaml using file config/jobs/kubernetes/sig-storage/sig-storage-kind.yaml
  • key conformance-e2e.yaml using file config/jobs/kubernetes/sig-testing/conformance-e2e.yaml
  • key kubernetes-kind.yaml using file config/jobs/kubernetes/sig-testing/kubernetes-kind.yaml

In response to this:

No gcr.io/k8s-testimages/ changes.

Multiple distinct gcr.io/k8s-staging-test-infra changes:

Commits Dates Images
69ac574...6dd397d 2024‑02‑05 → 2024‑06‑27 bigquery
3b134c2...6dd397d 2024‑03‑08 → 2024‑06‑27 bootstrap
597c402...1dde27f 2024‑06‑11 → 2024‑06‑25 kubekins-e2e(1.29), kubekins-e2e(master)
1dde27f...6dd397d 2024‑06‑25 → 2024‑06‑27 krte(1.27), krte(1.28), krte(1.29), krte(1.30), krte(experimental), krte(master)

No us-central1-docker.pkg.dev/k8s-staging-test-infra/images changes.

No gcr.io/k8s-staging-apisnoop/ changes.

No gcr.io/k8s-staging-apisnoop/ changes.

/cc @nathanperkins

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@BenTheElder
Copy link
Member

BenTheElder commented Jul 1, 2024

@dims this broke kind.

EDIT: this is a useless comment without further context and verification, in the future will refrain from jumping on that assumption so quickly, apologies.

@BenTheElder
Copy link
Member

... confirming here kubernetes-sigs/kind#648 (comment)

integration tests hitting the same issue as https://kubernetes.slack.com/archives/CEKK1KTN2/p1719537867758879 by the looks of it.

@BenTheElder
Copy link
Member

... pretty sure anyhow, appears to be a problematic docker upgrade breaking IPV6 stuff.

cc @aojea we're going to have to dig into this, as soon as I get a failure on the no-op PR will revert this and then we can look into rollback

looks like major changes to ipv6 in docker.

@BenTheElder
Copy link
Member

BenTheElder commented Jul 1, 2024

I think maybe it became racy, with something else having modprobed on the node, but I'm still suspicious of a docker upgrade because:

  • I have not seen this failure before, our unit/integration tests in kind have been 100% reliable, no flakes
    • This is now sometimes passing and failing with the same diff, failure mode related to ip6tables
  • We recently had a report of this in another environment where they upgraded docker to v28 (see the slack thread)

Let me pull one of these images and confirm what docker version it has ...

The other thought is that it is mis-attributed to this change and is instead the build cluster.

@BenTheElder
Copy link
Member

Yeah, we picked up docker 27.x:

$ docker run --rm --entrypoint=docker gcr.io/k8s-staging-test-infra/krte:v20240627-6dd397d329-master version
Unable to find image 'gcr.io/k8s-staging-test-infra/krte:v20240627-6dd397d329-master' locally
v20240627-6dd397d329-master: Pulling from k8s-staging-test-infra/krte
fea1432adf09: Pull complete 
910334bc68b1: Pull complete 
af497ffc85e2: Pull complete 
Digest: sha256:a9b0127377d84aadbf9729fc4a5c7bf5f9aadb07e3893f4512157c96ce47f78c
Status: Downloaded newer image for gcr.io/k8s-staging-test-infra/krte:v20240627-6dd397d329-master
Client: Docker Engine - Community
 Version:           27.0.2
 API version:       1.46
 Go version:        go1.21.11
 Git commit:        912c1dd
 Built:             Wed Jun 26 18:47:36 2024
 OS/Arch:           linux/amd64
 Context:           default
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

@BenTheElder
Copy link
Member

Tracking the kind side of this at kubernetes-sigs/kind#3677

@dims
Copy link
Member

dims commented Jul 2, 2024

Yeah, we picked up docker 27.x:

Ugh! do we want to revert?

@BenTheElder
Copy link
Member

Trying to figure out if it's causing issues in other CI, but I had to go out. It's only a light flake in kind, but I'm concerned that we're going to find more issues, it's causing the network creation to flake and some behavior change in IPv6 networking.

Let's leave it for the moment.

@aojea
Copy link
Member

aojea commented Jul 2, 2024

Ok, what it seems to happen is that now docker REQUIRES ip6tables,

We had a knob to enable this

# optionally enable ipv6 docker
export DOCKER_IN_DOCKER_IPV6_ENABLED=${DOCKER_IN_DOCKER_IPV6_ENABLED:-false}

that installed the required module

# enable ipv6 iptables
modprobe -v ip6table_nat

and now some jobs does not seem to have it

network_integration_test.go:63: "Error response from daemon: Failed to Setup IP tables: Unable to enable NAT rule:  (iptables failed: ip6tables --wait -t nat -I POSTROUTING -s fc00:3051:9942:af9f::/64 ! -o br-4e53c7863d0d -j MASQUERADE: modprobe: FATAL: Module ip6_tables not found in directory /lib/modules/5.15.0-1054-gke\nip6tables v1.8.9 (legacy): can't initialize ip6tables table `nat': Table does not exist (do you need to insmod?)\nPerhaps ip6tables or your kernel needs to be upgraded.\n (exit status 3))\n"

@BenTheElder we are back in 2019 😄 , my memory may fail, but I think that some images didn't have that module?

@dims
Copy link
Member

dims commented Jul 2, 2024

is this is a GKE cluster/pool issue? based on FATAL: Module ip6_tables not found in directory /lib/modules/5.15.0-1054-gke\nip6tables v1.8.9 (legacy) - we could try the eks prow cluster then

@BenTheElder
Copy link
Member

is this is a GKE cluster/pool issue? based on FATAL: Module ip6_tables not found in directory /lib/modules/5.15.0-1054-gke\nip6tables v1.8.9 (legacy) - we could try the eks prow cluster then

on the ubuntu nodes it's just not loaded by default but the module is available, since the host nodes are ipv4.

previously we added the modprobe only when we intend to use ipv6 and then enable it in docker, but now ipv6 is always enabled

#32890

@BenTheElder
Copy link
Member

let's use kubernetes-sigs/kind#3677 to track, even though technically this affects docker networks in all jobs, we do not have evidence yet that other jobs are creating networks (I could see this causing issues for the default bridge but we have no proof yet).

@BenTheElder
Copy link
Member

I think this is resolved now, if not fully cleanly, will keep an eye out for any other issues.

We're ensuring we load the ipv6 NAT module when setting up dind, and having all dind jobs mount /lib/modules.

We can do something more clever in the future.

There may be other issues from the v27 changes, but I'm not seeing them yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/config Issues or PRs related to code in /config area/conformance Issues or PRs related to kubernetes conformance tests area/images area/jobs area/provider/azure Issues or PRs related to azure provider area/release-eng Issues or PRs related to the Release Engineering subproject cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/release Categorizes an issue or PR as relevant to SIG Release. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. skip-review Indicates a PR is trusted, used by tide for auto-merging PRs.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants