Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: Reduce flake for integration test TestStartStop #6999

Closed
wants to merge 4 commits into from

Conversation

medyagh
Copy link
Member

@medyagh medyagh commented Mar 11, 2020

in first verison of this PR, I tried adding checking for service account being ready to minikube wait, but that adds 13seconds of waiting, for almost very small amount of use cases
Fixes #6997

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 11, 2020
@medyagh medyagh changed the title Test flake wait for default service account reduce test flake wait for default service account Mar 11, 2020
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Mar 11, 2020
@medyagh medyagh changed the title reduce test flake wait for default service account Reduce Flake TestStartStop Mar 11, 2020
@medyagh medyagh changed the title Reduce Flake TestStartStop Reduce flake for integration test TestStartStop Mar 11, 2020
@medyagh
Copy link
Member Author

medyagh commented Mar 11, 2020

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Mar 11, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: medyagh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 11, 2020
@medyagh medyagh requested a review from tstromberg March 11, 2020 04:40
@minikube-pr-bot
Copy link

Error: running mkcmp: exit status 1

@minikube-pr-bot
Copy link

Error: running mkcmp: exit status 1

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 11, 2020
@minikube-pr-bot
Copy link

All Times minikube: [ 63.898231 67.423725 63.331392]
All Times Minikube (PR 6999): [ 64.428541 65.633788 65.886340]

Average minikube: 64.884449
Average Minikube (PR 6999): 65.316223

Averages Time Per Log

+----------------------+-----------+--------------------+
|         LOG          | MINIKUBE  | MINIKUBE (PR 6999) |
+----------------------+-----------+--------------------+
| minikube v           |  0.231943 |           0.213732 |
| Creating kvm2        | 41.078840 |          40.463399 |
| Preparing Kubernetes |  0.767395 |           0.783589 |
| Pulling images       |           |                    |
| Launching Kubernetes | 19.740141 |          20.764748 |
| Waiting for cluster  |  0.057964 |           0.227393 |
+----------------------+-----------+--------------------+

@codecov-io
Copy link

codecov-io commented Mar 11, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@12c12c4). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #6999   +/-   ##
=========================================
  Coverage          ?   37.23%           
=========================================
  Files             ?      144           
  Lines             ?     9079           
  Branches          ?        0           
=========================================
  Hits              ?     3381           
  Misses            ?     5272           
  Partials          ?      426

@minikube-pr-bot
Copy link

All Times minikube: [ 66.552134 64.958802 65.883644]
All Times Minikube (PR 6999): [ 67.105611 64.430888 64.203486]

Average minikube: 65.798193
Average Minikube (PR 6999): 65.246662

Averages Time Per Log

+----------------------+-----------+--------------------+
|         LOG          | MINIKUBE  | MINIKUBE (PR 6999) |
+----------------------+-----------+--------------------+
| minikube v           |  0.219062 |           0.218661 |
| Creating kvm2        | 41.709804 |          40.988235 |
| Preparing Kubernetes |  0.792077 |           0.774145 |
| Pulling images       |           |                    |
| Launching Kubernetes | 20.254669 |          20.434818 |
| Waiting for cluster  |  0.068649 |           0.068963 |
+----------------------+-----------+--------------------+

@@ -106,7 +106,7 @@ func validateMountCmd(ctx context.Context, t *testing.T, profile string) {
}

start := time.Now()
if err := retry.Expo(checkMount, time.Second, 15*time.Second); err != nil {
if err := retry.Expo(checkMount, time.Second, Seconds(15)); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seconds is not a good name for this function.

How about "ApproximateSeconds" or "SecondsWithMultiplier"?

// to avoid https://github.com/kubernetes/minikube/issues/6997
// adding this logic to minikube start is not necessary but useful for rare test flakes
saReady := func() error { // check if default service account is created.
if _, err := Run(t, exec.CommandContext(ctx, "kubectl", "--context", profile, "get", "serviceaccount", "default")); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be covering up an actual flaw that users can face. Shouldn't the start command wait for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this, this rarely happens but it eats up 13 seconds of waiting on my machine. I have a feeling that would not be helpful

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, this integration test is surfacing a bug: minikube start + kubectl is currently racy. As it is successfully capturing a rare race condition (yay!), we should either fix it, or leave it present, not paper over it.

I believe the right way to go about this is to add a --wait-for-serviceaccount flag to minikube start, and set it to false initially, but override it in these tests that are affected. That way other users who prefer reliability over latency can benefit: integration tests, IDE's, etc. This way we can better measure how to optimize for serviceaccount start latency.

I would argue that defaulting this flag to 'true' would be more in the spirit of minikube (reliability > latency) , but I do understand the trade-off here, and am comfortable leaving it as false for the time being.

Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tstromberg
I agree with you, lets not merge this PR so it bugs us in integeration test,
and we fix minikube, here are my thoughts, let keep the discussion in this issue so others can participate, #7011 (comment)

@medyagh medyagh changed the title Reduce flake for integration test TestStartStop wip: Reduce flake for integration test TestStartStop Mar 11, 2020
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 11, 2020
@medyagh
Copy link
Member Author

medyagh commented Mar 25, 2020

closing in favor of #7209

@medyagh medyagh closed this Mar 25, 2020
@k8s-ci-robot
Copy link
Contributor

@medyagh: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 25, 2020
@medyagh medyagh deleted the wait_sa branch May 2, 2020 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flake TestStartStop: error looking up service account default/default: serviceaccount "default" not found
5 participants