Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to AWS CI/CD #1356

Merged
merged 51 commits into from
Oct 26, 2020
Merged

Switch to AWS CI/CD #1356

merged 51 commits into from
Oct 26, 2020

Conversation

andreyvelich
Copy link
Member

Related: #1332.
I will debug the infra in this PR.

I also made few changes to improve CI/CD quality.

/cc @gaocegege @johnugeorge
/cc @Jeffwan @PatrickXYS @jlewi @Bobgy

@kubeflow-bot
Copy link

kubeflow-bot commented Oct 16, 2020

This change is Reviewable

@kubeflow kubeflow deleted a comment from kubeflow-bot Oct 16, 2020
--node-type m5.xlarge \
--nodes 2 \
--nodes-min 1 \
--nodes-max 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@andreyvelich andreyvelich Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will use your script to create EKS cluster: https://github.com/kubeflow/testing/blob/master/images/aws-scripts/create-eks-cluster.sh. Since we clone kubeflow/testing repo to testing NFS, it should be possible.
What do you think @PatrickXYS ?

Copy link
Member

@PatrickXYS PatrickXYS Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will use your script to create EKS cluster

Yes we can, reuse common script makes a lot of sense to me.

@PatrickXYS
Copy link
Member

/test ?

@k8s-ci-robot
Copy link

@PatrickXYS: The following commands are available to trigger jobs:

  • /test kubeflow-katib-presubmit

Use /test all to run all jobs.

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@PatrickXYS
Copy link
Member

PatrickXYS commented Oct 16, 2020

Pasted from #1357

The reason that plugin trigger can not trigger prow test:

{"author":"PatrickXYS","component":"hook","error":"error checking trust of PatrickXYS: error in IsCollaborator: return code not 2XX: 403 Forbidden","event-GUID":"16d24480-0fdd-11eb-911f-618d74ee83e5","event-type":"issue_comment","file":"prow/hook/events.go:425","func":"k8s.io/test-infra/prow/hook.(*Server).handleGenericComment.func1","level":"error","msg":"Error handling GenericCommentEvent.","org":"kubeflow","plugin":"trigger","pr":1357,"repo":"katib","severity":"error","time":"2020-10-16T18:26:29Z","url":"https://github.com/kubeflow/katib/pull/1357#issuecomment-710377544"}

It's caused by IsCollaborrator error on aws-kf-ci-bot,

the reason k8s-ci-bot works fine is k8s-ci-bot is admin of kubeflow org, so it can fetch org list to validate if member is Collaborator

I'm guessing only give write access is not enough, need to investigate


Pasted Prow Plugin trigger definition:

The trigger plugin starts tests in reaction to commands and pull request events. 

It is responsible for ensuring that test jobs are only run on trusted PRs. 

A PR is considered trusted if the author is a member of the 'trusted organization' for the repository or if such a member has left an '/ok-to-test' command on the PR.

Trigger starts jobs automatically when a new trusted PR is created or when an untrusted PR becomes trusted, but it can also be used to start jobs manually via the '/test' command.

The '/retest' command can be used to rerun jobs that have reported failure.

@andreyvelich
Copy link
Member Author

andreyvelich commented Oct 16, 2020

@PatrickXYS Should we remove access k8s-ci-bot to Katib repo?
Or in that case, tide also doesn't work?

@PatrickXYS
Copy link
Member

PatrickXYS commented Oct 16, 2020

Removing kf-ci-bot from katib might be un-acceptable by google folks.

As we discussed in the proposal, Google Prow owns Tide functionality, and AWS Prow owns running tests and report PR status.

This should be one of the potential issue with mixed Prow, I'll keep you updated with the progress

/cc @andreyvelich

@k8s-ci-robot
Copy link

@PatrickXYS: GitHub didn't allow me to request PR reviews from the following users: andreyvelich.

Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Removing kf-ci-bot from katib might be un-acceptable by google folks.

As we discussed in the proposal, Google Prow owns Tide functionality, and AWS Prow owns running tests and report PR status.

This should be one of the potential issue with mixed Prow, I'll keep you updated with the progress

/cc @andreyvelich

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Jeffwan
Copy link
Member

Jeffwan commented Oct 16, 2020

I think the problem it won't be enable to fetch org member list to check collaborators.

Let's try to see if bot can get read access to entire org.

@PatrickXYS: GitHub didn't allow me to request PR reviews from the following users: andreyvelich.

Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs.

em.. This should be unexpected @andreyvelich is definitely a member. Anything wrong with the upstream prow because of new prow?

@PatrickXYS
Copy link
Member

and authors cannot review their own PRs.

@Jeffwan This is expected, because @andreyvelich can't review his own PR.

@aws-kf-ci-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Jeffwan
Copy link
Member

Jeffwan commented Oct 16, 2020

@PatrickXYS seems bot does have access to the repo. However, I didn't see it can add label. approve label is added by k8s-ci-bit. Can you check backend logs?

@Jeffwan
Copy link
Member

Jeffwan commented Oct 16, 2020

and authors cannot review their own PRs.

@Jeffwan This is expected, because @andreyvelich can't review his own PR.

I see. my careless. I thought it's failed to assign.

@Jeffwan
Copy link
Member

Jeffwan commented Oct 16, 2020

/assign @andreyvelich

@andreyvelich
Copy link
Member Author

/retest

@PatrickXYS
Copy link
Member

/test ?

@aws-kf-ci-bot
Copy link
Contributor

@PatrickXYS: The following commands are available to trigger jobs:

  • /test kubeflow-katib-presubmit-e2e

Use /test all to run all jobs.

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link

@PatrickXYS: The following commands are available to trigger jobs:

  • /test kubeflow-katib-presubmit

Use /test all to run all jobs.

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@PatrickXYS
Copy link
Member

/test kubeflow-katib-presubmit

@aws-kf-ci-bot
Copy link
Contributor

@PatrickXYS: The specified target(s) for /test were not found.
The following commands are available to trigger jobs:

  • /test kubeflow-katib-presubmit-e2e

Use /test all to run all jobs.

In response to this:

/test kubeflow-katib-presubmit

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot removed the lgtm label Oct 23, 2020
@PatrickXYS
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot removed the lgtm label Oct 24, 2020
@andreyvelich
Copy link
Member Author

andreyvelich commented Oct 24, 2020

@johnugeorge @gaocegege I was thinking about temporary solution for the registry.

Actually, it is easier to use just another workflow-v1beta1-release workflow during post-submit for us.
This workflow just pushes images to kubeflow-images-public and doesn't deploy any clusters.
In that case, we don't need to change website/manifests/katib repo's docs with temporary container registry.
After AWS release process implementation, we will directly switch registry to the new one.

What do you think?

@k8s-ci-robot
Copy link

@andreyvelich: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
kubeflow-katib-presubmit de3c82b link /test kubeflow-katib-presubmit

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jlewi
Copy link
Contributor

jlewi commented Oct 26, 2020

Actually, it is easier to use just another workflow-v1beta1-release workflow during post-submit for us.
This workflow just pushes images to kubeflow-images-public

How will these jobs have access to kubeflow-images-public?

@andreyvelich
Copy link
Member Author

andreyvelich commented Oct 26, 2020

@jlewi Using kubeflow-ci service account. This is just shor-term solution, until we implement new release process in AWS.
/cc @gaocegege @johnugeorge

@johnugeorge
Copy link
Member

/lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants