Go Test failed sometimes #1649

anencore94 · 2021-08-31T14:54:44Z

/kind bug

What steps did you take and what happened:
[A clear and concise description of what the bug is.]

One of github action for Go Test failed sometimes, but it succeed when retest.

I've check for some cases:

All case failed for either one of these:

    trial_controller_test.go:261: 
        Timed out after 40.000s.
        Expected
            <bool>: false
        to be true
FAIL

    experiment_controller_test.go:350: 
        Timed out after 40.000s.
        Expected
            <bool>: false
        to be true

What did you expect to happen:

If these issue occured just from the network issue, what about increasing the timeout ?
Does it make sense ? WYDT ?

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

Kubeflow version (kfctl version):
Minikube version (minikube version):
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

The text was updated successfully, but these errors were encountered:

andreyvelich · 2021-09-02T14:56:32Z

Thank you for creating this @anencore94!
Yes, this unit test is flaky.
envtest sometime can fail in the Controller test.

Any ideas how to improve this and keep the same coverage level would be appreciated!

anencore94 · 2021-09-03T01:09:45Z

@andreyvelich Thanks :)
If the envtest you linked failed, could you link the action log in this issue ?
So we can keep track of it, then anybody could fix it

andreyvelich · 2021-09-06T12:03:38Z

Let's continue discussion: #1654 (review) for improving controller unit test here.
/priority p2

@anencore94 Thank you for taking this!
Are you sure that increasing timeout will help to avoid flaky unit tests ?

For example in this failed test: https://github.com/kubeflow/katib/runs/3523610062?check_suite_focus=true.
Trial status was updated to Metrics Unavailable, but tests were failed on the previous step, where we compare observation results.

I think to fix it, we might need to split running Experiments in the different unit tests.

For example, currently in the TestReconcileBatchJob we run 3 Experiments:

Trial run with "Failed" BatchJob
Trail with "Complete" BatchJob and Available metrics.
Trail with "Complete" BatchJob and Unavailable metrics.
Maybe we should split it in 3 different unit tests with separate test managers.

gaocegege · 2021-09-06T12:11:35Z

Maybe we should split it in 3 different unit tests with separate test managers.

I am wondering why it works. Splitting test cases just makes it fail fast, I think.

anencore94 · 2021-09-07T02:12:34Z

Are you sure that increasing timeout will help to avoid flaky unit tests ?

No, I couldn't for sure as you might expected.

By the way, from your linked action result, I couldn't get this order from test results

{"level":"info","ts":1630924172.6146765,"logger":"trial-controller","msg":"Trial status changed to Failed","Trial":"default/test-trial"}
{"level":"info","ts":1630924172.6534996,"logger":"trial-controller","msg":"Trial status changed to Succeeded","Trial":"default/test-trial"}
{"level":"info","ts":1630924172.6575294,"logger":"trial-controller","msg":"Trial status changed to Metrics Unavailable","Trial":"default/test-trial"}

From the trial_controller_test.go, after trial status changed to Failed, I think the log with trial deleted should follows. But there isn't any.
Even if Supposing that "the log with trial deleted" were implemented to never logged at all, But I couldn't get why test3 was started before waiting test2 to be done and delete it from the above github workflow.

andreyvelich · 2021-09-08T15:37:43Z

@anencore94 What do you think about splitting Experiment test cases in the different unit test functions ?
Each function will have separate envtest manager setup.

anencore94 · 2021-09-09T09:19:07Z

What do you think about splitting Experiment test cases in the different unit test functions ?
Each function will have separate envtest manager setup.

I think that helps to debug, but when they are splitted, we couldn't call the test function as ReconcileTest anymore?

andreyvelich · 2021-09-09T11:25:14Z

we couldn't call the test function as ReconcileTest anymore?

What do you mean by that ?
We should have various function names for each test Experiment.
For example,
TestReconcileJobFailed(t *testing.T), TestReconcileJobAvailableMetrics(t *testing.T), TestReconcileJobUnavailableMetrics(t *testing.T).

anencore94 · 2021-09-09T14:38:46Z

We should have various function names for each test Experiment.
For example,
TestReconcileJobFailed(t *testing.T), TestReconcileJobAvailableMetrics(t *testing.T), TestReconcileJobUnavailableMetrics(t *testing.T).

Yes, your suggestion is right.
But the reaseon why I intended "we couldn't call the test function as ReconcileTest anymore?" was that I felt handling those 3 cases at once make sense.

However, splitting those 3 would ok.

stale · 2022-01-03T21:44:04Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2022-03-02T11:58:34Z

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

tenzen-y · 2022-06-14T17:51:07Z

/reopen

google-oss-prow · 2022-06-14T17:51:12Z

@tenzen-y: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tenzen-y · 2022-06-14T17:52:18Z

/lifecycle frozen

johnugeorge · 2022-06-14T18:20:34Z

We need to inspect if we have a better solution with gingko v2

tenzen-y · 2022-06-14T18:23:01Z

We need to inspect if we have a better solution with gingko v2

I agree with you.
We can take it after the next release.

tenzen-y · 2024-06-14T06:42:57Z

This should be fixed by #2350
/close

google-oss-prow · 2024-06-14T06:43:01Z

@tenzen-y: Closing this issue.

In response to this:

This should be fixed by #2350
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

google-oss-robot added the kind/bug label Aug 31, 2021

anencore94 mentioned this issue Sep 3, 2021

[bugfix]: increase test timeout #1654

Merged

1 task

google-oss-robot added the priority/p2 label Sep 6, 2021

tenzen-y mentioned this issue Sep 22, 2021

Use golangci-lint as linter for Go #1671

Merged

stale bot added the lifecycle/stale label Jan 3, 2022

stale bot closed this as completed Mar 2, 2022

google-oss-prow bot reopened this Jun 14, 2022

google-oss-prow bot added lifecycle/frozen and removed lifecycle/stale labels Jun 14, 2022

tenzen-y mentioned this issue Jun 14, 2022

Update job name and service name as configurable for cert generator #1889

Merged

1 task

tenzen-y mentioned this issue Oct 24, 2022

UI: Format code #1979

Merged

forsaken628 mentioned this issue Jun 10, 2024

Fix TestReconcileBatchJob #2350

Merged

1 task

google-oss-prow bot closed this as completed Jun 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Go Test failed sometimes #1649

Go Test failed sometimes #1649

anencore94 commented Aug 31, 2021

andreyvelich commented Sep 2, 2021

anencore94 commented Sep 3, 2021

andreyvelich commented Sep 6, 2021 •

edited by gaocegege

Loading

gaocegege commented Sep 6, 2021

anencore94 commented Sep 7, 2021

andreyvelich commented Sep 8, 2021

anencore94 commented Sep 9, 2021

andreyvelich commented Sep 9, 2021

anencore94 commented Sep 9, 2021

stale bot commented Jan 3, 2022

stale bot commented Mar 2, 2022

tenzen-y commented Jun 14, 2022

google-oss-prow bot commented Jun 14, 2022

tenzen-y commented Jun 14, 2022

johnugeorge commented Jun 14, 2022

tenzen-y commented Jun 14, 2022

tenzen-y commented Jun 14, 2024

google-oss-prow bot commented Jun 14, 2024

Go Test failed sometimes #1649

Go Test failed sometimes #1649

Comments

anencore94 commented Aug 31, 2021

andreyvelich commented Sep 2, 2021

anencore94 commented Sep 3, 2021

andreyvelich commented Sep 6, 2021 • edited by gaocegege Loading

gaocegege commented Sep 6, 2021

anencore94 commented Sep 7, 2021

andreyvelich commented Sep 8, 2021

anencore94 commented Sep 9, 2021

andreyvelich commented Sep 9, 2021

anencore94 commented Sep 9, 2021

stale bot commented Jan 3, 2022

stale bot commented Mar 2, 2022

tenzen-y commented Jun 14, 2022

google-oss-prow bot commented Jun 14, 2022

tenzen-y commented Jun 14, 2022

johnugeorge commented Jun 14, 2022

tenzen-y commented Jun 14, 2022

tenzen-y commented Jun 14, 2024

google-oss-prow bot commented Jun 14, 2024

andreyvelich commented Sep 6, 2021 •

edited by gaocegege

Loading