Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate cli-utils jobs to eks-prow-build-cluster #29742

Merged
merged 1 commit into from
Jun 13, 2023
Merged

Migrate cli-utils jobs to eks-prow-build-cluster #29742

merged 1 commit into from
Jun 13, 2023

Conversation

rjsadow
Copy link
Contributor

@rjsadow rjsadow commented Jun 9, 2023

This PR transitions the cli-utils jobs from the default cluster to eks-prow-build-cluster

ref: #29722

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 9, 2023
@k8s-ci-robot k8s-ci-robot added area/config Issues or PRs related to code in /config area/jobs sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jun 9, 2023
@ameukam
Copy link
Member

ameukam commented Jun 13, 2023

/lgtm
/approve
/hold

@rjsadow Remove the hold when ready. Please keep an eye on those jobs for a few days.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 13, 2023
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 13, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ameukam, rjsadow

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 13, 2023
@rjsadow
Copy link
Contributor Author

rjsadow commented Jun 13, 2023

/remove-hold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 13, 2023
@k8s-ci-robot k8s-ci-robot merged commit b007877 into kubernetes:master Jun 13, 2023
@k8s-ci-robot
Copy link
Contributor

@rjsadow: Updated the job-config configmap in namespace default at cluster test-infra-trusted using the following files:

  • key cli-utils-presubmit-master.yaml using file config/jobs/kubernetes-sigs/cli-utils/cli-utils-presubmit-master.yaml

In response to this:

This PR transitions the cli-utils jobs from the default cluster to eks-prow-build-cluster

ref: #29722

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@karlkfi
Copy link
Contributor

karlkfi commented Jul 18, 2023

This change (probably due to added memory limit) seems to be causing the cli-utils stress tests to fail. Should I just make a PR to increase the memory limit? Where are the node sizes documented?

@sdowell
Copy link
Contributor

sdowell commented Jul 21, 2023

cc @rjsadow @ameukam

It appears this change broke the cli-utils stress tests (as @karlkfi mentioned). Could you take a look?

@rjsadow
Copy link
Contributor Author

rjsadow commented Jul 21, 2023

@sdowell sorry, i missed that. Yes, I'll dig into it tomorrow morning.

@rjsadow
Copy link
Contributor Author

rjsadow commented Jul 21, 2023

@sdowell can you please link some current examples? It looks like there were issues a few days ago but it looks like #30110 fixed it. I'm not seeing any current resource issues in grafana and the current job history seems ok. The couple failures that are there seem to be from tests failing.

@sdowell
Copy link
Contributor

sdowell commented Jul 21, 2023

@rjsadow

The common failure mode that I'm seeing is Job execution failed: Pod got deleted unexpectedly. I'm not sure how to diagnose why the Pod is getting deleted without having access to the build cluster myself.

Here's an example where the tests passed but the job failed before posting a status (with the above error): https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cli-utils/622/cli-utils-presubmit-master-stress/1682149784174989312

@rjsadow
Copy link
Contributor Author

rjsadow commented Jul 21, 2023

This might be more related to kubernetes/k8s.io#5473. I too have seen those Job execution failed: Pod got deleted unexpectedly off and on but also don't have access to the build cluster myself. We can, if you want, move this job to the GCP cluster for now.

@sdowell
Copy link
Contributor

sdowell commented Jul 21, 2023

@rjsadow That would be great if that is okay with you. We're currently blocked on submitting PRs and would prefer not to disable the presubmit if possible.

I appreciate it, I understand this probably makes your job harder with the migration effort. We can move back to the eks cluster once the Node health issue is better understood - does that work for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/config Issues or PRs related to code in /config area/jobs cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants