Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent pod restarts caused by slow db boot #261

Merged

Conversation

lkpdn
Copy link
Contributor

@lkpdn lkpdn commented Nov 26, 2018

When we run ./scripts/deploy.sh for the first time, vizier-core is highly likely to undergo a few times of "Pod Restart". This is because vizier-db takes much longer time to become ready. This PR intends to have vizier-core wait for DB to become ready, so that we no longer see any pod Restart if all components are properly deployed.


This change is Reviewable

@k8s-ci-robot
Copy link

Hi @lkpdn. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@YujiOshima
Copy link
Contributor

/ok-to-test

@lkpdn
Copy link
Contributor Author

lkpdn commented Nov 26, 2018

#260 has been merged so let me rebase this (branch: prevent-pod-restarts-caused-by-slow-db-boot) to master, otherwise it would blind both vizier-db and vizier-core from that Secret.

Koichiro Den added 2 commits November 27, 2018 07:33
Fixes: 67e94c7 ("Set MYSQL_ROOT_PASSWORD via Secret (kubeflow#253)")
Signed-off-by: Koichiro Den <[email protected]>
Signed-off-by: Koichiro Den <[email protected]>
@lkpdn lkpdn force-pushed the prevent-pod-restarts-caused-by-slow-db-boot branch from 6232972 to a0d4d06 Compare November 26, 2018 22:34
pkg/db/interface.go Outdated Show resolved Hide resolved
@lkpdn
Copy link
Contributor Author

lkpdn commented Nov 28, 2018

Please take another look: dc7bb07
Thanks.

@lkpdn lkpdn force-pushed the prevent-pod-restarts-caused-by-slow-db-boot branch from dc7bb07 to 88cfaca Compare November 28, 2018 07:20
@lkpdn
Copy link
Contributor Author

lkpdn commented Nov 28, 2018

My apologies for the lack of test. I cooked + fixed up commit so please review this one:
88cfaca ccd034e .

@lkpdn lkpdn force-pushed the prevent-pod-restarts-caused-by-slow-db-boot branch from 88cfaca to ccd034e Compare November 28, 2018 07:28
pkg/db/interface.go Outdated Show resolved Hide resolved
pkg/db/interface.go Outdated Show resolved Hide resolved
@lkpdn lkpdn force-pushed the prevent-pod-restarts-caused-by-slow-db-boot branch from ccd034e to 2ae5826 Compare November 29, 2018 03:09
@lkpdn
Copy link
Contributor Author

lkpdn commented Nov 29, 2018

@toshiiw thanks for reviewing, PTAL: 2ae5826

@YujiOshima
Copy link
Contributor

@lkpdn Thanks, great!
/lgtm
/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: YujiOshima

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 72a0fc0 into kubeflow:master Nov 30, 2018
@lkpdn lkpdn deleted the prevent-pod-restarts-caused-by-slow-db-boot branch November 30, 2018 13:00
@lkpdn
Copy link
Contributor Author

lkpdn commented Nov 30, 2018

@YujiOshima thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants