-
Notifications
You must be signed in to change notification settings - Fork 4.8k
OCPBUGS-62987: Skip idling service with RC test on TechPreview clusters #30375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@alebedev87: This pull request references Jira Issue OCPBUGS-62987, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@alebedev87: This pull request references Jira Issue OCPBUGS-62987, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/payload-job periodic-ci-openshift-release-master-nightly-4.20-e2e-vsphere-ovn-techpreview |
|
@alebedev87: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7935e2b0-a850-11f0-97c3-1e923ba4ab7f-0 |
ecf5c91 to
3c33bce
Compare
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alebedev87, sosiouxme The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/payload-job periodic-ci-openshift-release-master-nightly-4.20-e2e-vsphere-ovn-techpreview |
|
@alebedev87: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e70b2fd0-a877-11f0-8ec2-e54b4d6e79e3-0 |
|
The payload job successfully skipped the idling service for RC test: |
|
/retest |
| g.BeforeEach(func() { | ||
| if exutil.IsTechPreviewNoUpgrade(context.Background(), oc.AdminConfigClient()) { | ||
| g.Skip("skipping, this test is only supported on Default featureset until https://issues.redhat.com/browse/NE-1984 is implemented") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why the "should idle the service and DeploymentConfig properly" test doesn't need a skip? It also defines a route in its fixture (idling-echo-server.yaml) and uses checkSingleIdle to verify that the application remains idle, so I would expect OCPBUGS-49908 to affect it as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I would expect OCPBUGS-49908 to affect it as well.
Yes, I expect too but I don't recall seeing it failed. That's why https://issues.redhat.com/browse/OCPBUGS-49908 mentions explicitly only the RC case. https://issues.redhat.com//browse/OCPBUGS-62987 also mentions only the RC case. So I prefer to do things gradually. If the DC case will pop up in the similar way, I'll disable it too. However until it's not, this may be a good corner case to dig deeper during https://issues.redhat.com/browse/NE-1984 implementation.
|
/retest-required |
|
/test go-verify-deps |
1 similar comment
|
/test go-verify-deps |
3c33bce to
233f138
Compare
|
Rebased from |
|
Why is the test passing sometimes and not others? That seemed a little odd for lack of feature support. I'm a little uncomfortable turning TP tests off that are showing a feature is not complete. It feels very easy from that point for someone to get the impression all is good, you've got the signal you need to promote, and move forward. From the new dev guide would DPNU be correct for where the feature is right now? |
This commit skips the test "Idling with a single service and ReplicationController should idle the service and ReplicationController properly" when the TechPreviewNoUpgrade feature set is enabled. The router's Dynamic Configuration Manager (DCM) currently lacks full support for handling idled services, which causes this test to flake. The support gap will be addressed in https://issues.redhat.com/browse/NE-1984.
233f138 to
26049cf
Compare
When DCM mode is on for the router the health check is not disabled for an idled backend (==service). There can be a race between the router's health check ping and the test verifying "no replicas are up for an idled service".
I depends on the NI&D team priorities. All I can say for the moment is that we don't plan to GA DCM in 4.21.
We have the task you mentioned before, it's a per-requisite which has to be addressed before we can GA the feature.
Well, we already made an effort to promote the feature from DP to TP. One of the benefits of this is that now we have a constant signal about where the feature has gaps. |
|
Appreciate the explanation thanks. Apologies for this as I know we do not presently have a great way for a partially complete feature to sit in some state that generates a lot of signal, and slowly stabilize over time. I don't think we should disable this test, and I think the feature gate should go back to DPNU, considering this period in time a soak test where issues were uncovered and more work is needed. Removing the test means we lose that coverage for everything else in techpreview for some yet undetermined number of releases. It exposes us to the possibility it wasn't actually what we thought it was. Granted this is all unlikely and the move from TPNU to default should re-catch those things, but it could be at a very inopportune time and it's not guaranteed someone will be there to witness it who remembers what we're looking to do in this PR. As things stand today, I think the feature should stick to DPNU for time being. Is this possible or will it cause mass headaches? |
|
@alebedev87: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This test case is to verify that the router doesn't wake up idle services. We know it can with the DCM feature enabled by default. What is "everything else" for which we can loose the coverage?
We will need 2 OCP cycles again to get the feature to GA. |
|
/retest |
|
After discussion with @candita and @Miciah I move DynamicConfigurationManager back to DevPreview. |
|
agree skipping the test in the origin repo from qe side, thank. |
|
@alebedev87: This pull request references Jira Issue OCPBUGS-62987, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
DCM feature was moved back to DevPreview (github.com/openshift/api/pull/2552). |
|
@alebedev87: This pull request references Jira Issue OCPBUGS-62987. The bug has been updated to no longer refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
This PR skips the test
Idling with a single service and ReplicationController should idle the service and ReplicationController properlywhen the TechPreviewNoUpgrade feature set is enabled.The router's Dynamic Configuration Manager (DCM) currently lacks full support for handling idled services, which causes this test to flake.
The support gap will be addressed in https://issues.redhat.com/browse/NE-1984.