Skip to content

Conversation

@alebedev87
Copy link
Contributor

This PR skips the test Idling with a single service and ReplicationController should idle the service and ReplicationController properly when the TechPreviewNoUpgrade feature set is enabled.

The router's Dynamic Configuration Manager (DCM) currently lacks full support for handling idled services, which causes this test to flake.

The support gap will be addressed in https://issues.redhat.com/browse/NE-1984.

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 13, 2025
@openshift-ci-robot
Copy link

@alebedev87: This pull request references Jira Issue OCPBUGS-62987, which is invalid:

  • expected the bug to target the "4.21.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

This PR skips the test Idling with a single service and ReplicationController should idle the service and ReplicationController properly when the TechPreviewNoUpgrade feature set is enabled.

The router's Dynamic Configuration Manager (DCM) currently lacks full support for handling idled services, which causes this test to flake.

The support gap will be addressed in https://issues.redhat.com/browse/NE-1984.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from deads2k and p0lyn0mial October 13, 2025 15:50
@alebedev87
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 13, 2025
@openshift-ci-robot
Copy link

@alebedev87: This pull request references Jira Issue OCPBUGS-62987, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@alebedev87
Copy link
Contributor Author

/payload-job periodic-ci-openshift-release-master-nightly-4.20-e2e-vsphere-ovn-techpreview

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 13, 2025

@alebedev87: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.20-e2e-vsphere-ovn-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7935e2b0-a850-11f0-97c3-1e923ba4ab7f-0

@alebedev87 alebedev87 force-pushed the OCPBUGS-62987-dcm-idle-flaky branch from ecf5c91 to 3c33bce Compare October 13, 2025 18:48
@sosiouxme
Copy link
Member

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 13, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alebedev87, sosiouxme

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 13, 2025
@alebedev87
Copy link
Contributor Author

/payload-job periodic-ci-openshift-release-master-nightly-4.20-e2e-vsphere-ovn-techpreview

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 13, 2025

@alebedev87: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.20-e2e-vsphere-ovn-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e70b2fd0-a877-11f0-8ec2-e54b4d6e79e3-0

@alebedev87
Copy link
Contributor Author

The payload job successfully skipped the idling service for RC test:

: [sig-network-edge][Feature:Idling] Idling with a single service and ReplicationController should idle the service and ReplicationController properly [Suite:openshift/conformance/parallel]

Reason: skip [github.com/openshift/origin/test/extended/idling/idling.go:233]: skipping, this test is only supported on Default featureset until https://issues.redhat.com/browse/NE-1984 is implemented

@alebedev87
Copy link
Contributor Author

/retest

g.BeforeEach(func() {
if exutil.IsTechPreviewNoUpgrade(context.Background(), oc.AdminConfigClient()) {
g.Skip("skipping, this test is only supported on Default featureset until https://issues.redhat.com/browse/NE-1984 is implemented")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why the "should idle the service and DeploymentConfig properly" test doesn't need a skip? It also defines a route in its fixture (idling-echo-server.yaml) and uses checkSingleIdle to verify that the application remains idle, so I would expect OCPBUGS-49908 to affect it as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I would expect OCPBUGS-49908 to affect it as well.

Yes, I expect too but I don't recall seeing it failed. That's why https://issues.redhat.com/browse/OCPBUGS-49908 mentions explicitly only the RC case. https://issues.redhat.com//browse/OCPBUGS-62987 also mentions only the RC case. So I prefer to do things gradually. If the DC case will pop up in the similar way, I'll disable it too. However until it's not, this may be a good corner case to dig deeper during https://issues.redhat.com/browse/NE-1984 implementation.

@alebedev87
Copy link
Contributor Author

/retest-required

@alebedev87
Copy link
Contributor Author

alebedev87 commented Oct 20, 2025

/test go-verify-deps

1 similar comment
@alebedev87
Copy link
Contributor Author

/test go-verify-deps

@alebedev87 alebedev87 force-pushed the OCPBUGS-62987-dcm-idle-flaky branch from 3c33bce to 233f138 Compare October 20, 2025 08:44
@alebedev87
Copy link
Contributor Author

Rebased from main to fix go-verify-deps.

@dgoodwin
Copy link
Contributor

dgoodwin commented Oct 27, 2025

Why is the test passing sometimes and not others? That seemed a little odd for lack of feature support.
When is https://issues.redhat.com/browse/NE-1984 expected to be completed?
When is this feature expected to go GA?
How do we ensure this test gets re-enabled and not forgotten?

I'm a little uncomfortable turning TP tests off that are showing a feature is not complete. It feels very easy from that point for someone to get the impression all is good, you've got the signal you need to promote, and move forward.

From the new dev guide would DPNU be correct for where the feature is right now?

This commit skips the test
"Idling with a single service and ReplicationController should idle the
service and ReplicationController properly"
when the TechPreviewNoUpgrade feature set is enabled.

The router's Dynamic Configuration Manager (DCM) currently lacks full
support for handling idled services, which causes this test to flake.

The support gap will be addressed in https://issues.redhat.com/browse/NE-1984.
@alebedev87 alebedev87 force-pushed the OCPBUGS-62987-dcm-idle-flaky branch from 233f138 to 26049cf Compare October 27, 2025 16:28
@alebedev87
Copy link
Contributor Author

Why is the test passing sometimes and not others? That seemed a little odd for lack of feature support.

When DCM mode is on for the router the health check is not disabled for an idled backend (==service). There can be a race between the router's health check ping and the test verifying "no replicas are up for an idled service".

When is https://issues.redhat.com/browse/NE-1984 expected to be completed?
When is this feature expected to go GA?

I depends on the NI&D team priorities. All I can say for the moment is that we don't plan to GA DCM in 4.21.

How do we ensure this test gets re-enabled and not forgotten?
I'm a little uncomfortable turning TP tests off that are showing a feature is not complete. It feels very easy from that point for someone to get the impression all is good, you've got the signal you need to promote, and move forward.

We have the task you mentioned before, it's a per-requisite which has to be addressed before we can GA the feature.

From the new dev guide would DPNU be correct for where the feature is right now?

Well, we already made an effort to promote the feature from DP to TP. One of the benefits of this is that now we have a constant signal about where the feature has gaps.

@dgoodwin
Copy link
Contributor

Appreciate the explanation thanks.

Apologies for this as I know we do not presently have a great way for a partially complete feature to sit in some state that generates a lot of signal, and slowly stabilize over time. I don't think we should disable this test, and I think the feature gate should go back to DPNU, considering this period in time a soak test where issues were uncovered and more work is needed.

Removing the test means we lose that coverage for everything else in techpreview for some yet undetermined number of releases. It exposes us to the possibility it wasn't actually what we thought it was. Granted this is all unlikely and the move from TPNU to default should re-catch those things, but it could be at a very inopportune time and it's not guaranteed someone will be there to witness it who remembers what we're looking to do in this PR.

As things stand today, I think the feature should stick to DPNU for time being. Is this possible or will it cause mass headaches?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 27, 2025

@alebedev87: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 233f138 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@alebedev87
Copy link
Contributor Author

Removing the test means we lose that coverage for everything else in techpreview for some yet undetermined number of releases

This test case is to verify that the router doesn't wake up idle services. We know it can with the DCM feature enabled by default. What is "everything else" for which we can loose the coverage?

As things stand today, I think the feature should stick to DPNU for time being. Is this possible or will it cause mass headaches?

We will need 2 OCP cycles again to get the feature to GA.

@alebedev87
Copy link
Contributor Author

/retest

@alebedev87
Copy link
Contributor Author

After discussion with @candita and @Miciah I move DynamicConfigurationManager back to DevPreview.

@ShudiLi
Copy link
Member

ShudiLi commented Oct 29, 2025

agree skipping the test in the origin repo from qe side, thank.
/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Oct 29, 2025
@openshift-ci-robot
Copy link

@alebedev87: This pull request references Jira Issue OCPBUGS-62987, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @ShudiLi

In response to this:

This PR skips the test Idling with a single service and ReplicationController should idle the service and ReplicationController properly when the TechPreviewNoUpgrade feature set is enabled.

The router's Dynamic Configuration Manager (DCM) currently lacks full support for handling idled services, which causes this test to flake.

The support gap will be addressed in https://issues.redhat.com/browse/NE-1984.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from ShudiLi October 29, 2025 02:27
@alebedev87
Copy link
Contributor Author

DCM feature was moved back to DevPreview (github.com/openshift/api/pull/2552).

@alebedev87 alebedev87 closed this Nov 3, 2025
@openshift-ci-robot
Copy link

@alebedev87: This pull request references Jira Issue OCPBUGS-62987. The bug has been updated to no longer refer to the pull request using the external bug tracker.

In response to this:

This PR skips the test Idling with a single service and ReplicationController should idle the service and ReplicationController properly when the TechPreviewNoUpgrade feature set is enabled.

The router's Dynamic Configuration Manager (DCM) currently lacks full support for handling idled services, which causes this test to flake.

The support gap will be addressed in https://issues.redhat.com/browse/NE-1984.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. qe-approved Signifies that QE has signed off on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants