NE-408: Allow configuring ELB connection idle timeout #451

Miciah · 2020-09-01T04:09:12Z

Bump openshift/api for ELB connection idle timeout

Bump to github.com/openshift/api@b25f69a603a76ccc809f986c9f5811f0825febbb to get the AWS ELB connection idle timeout API.

go.mod: Update.
go.sum:
manifests/00-custom-resource-definition.yaml:
pkg/manifests/bindata.go:
vendor/github.com/openshift/api/*:
vendor/modules.txt: Regenerate.

`test/e2e`: import API errors package as `apierrors`

Import the apimachinery errors package as "apierrors".

test/e2e/operator_test.go: Import the "k8s.io/apimachinery/pkg/api/errors" package as "apierrors" so as not to conflict with the standard "errors" package.

`desiredLoadBalancerService`: Simplify with "lb" var

pkg/operator/controller/ingress/load_balancer_service.go (desiredLoadBalancerService): Introduce an "lb" variable to shorten some long lines.

`desiredLoadBalancerService`: Check for nil LB status

pkg/operator/controller/ingress/load_balancer_service.go (desiredLoadBalancerService): Add a nil check just in case status.endpointPublishingStrategy.LoadBalancer is nil somehow.

setDefaultPublishingStrategy: Rework GCP logic

Rework the GCP load-balancer provider parameters defaulting logic in preparation for the next change. Besides simplifying the logic, this change also changes the defaulting logic to ignore unknown provider parameters and to ignore provider parameters for platforms other than the actual platform. These changes should avoid surprises when more provider parameters are added to the API later on as well as prevent weird behavior when the user sets GCP provider parameters on non-GCP clusters.

pkg/operator/controller/ingress/controller.go (setDefaultPublishingStrategy): Rework defaulting logic for GCP load-balancer provider parameters.

Allow configuring ELB connection idle timeout

pkg/operator/controller/ingress/controller.go (setDefaultPublishingStrategy): Handle changes to the connection idle timeout for an AWS ELB.
pkg/operator/controller/ingress/controller_test.go (TestSetDefaultPublishingStrategyHandlesUpdates): Add test cases for changing the ELB connection idle timeout.
pkg/operator/controller/ingress/load_balancer_service.go (awsELBConnectionIdleTimeoutAnnotation): New constant.
(managedLoadBalancerServiceAnnotations): Add awsELBConnectionIdleTimeoutAnnotation.
(desiredLoadBalancerService): Set the connection idle timeout annotation if the ingresscontroller specifies a non-nil connectionIdleTimeout value.
pkg/operator/controller/ingress/load_balancer_service_test.go (TestDesiredLoadBalancerServiceAWSIdleTimeout): New test.
test/e2e/operator_test.go (TestAWSELBConnectionIdleTimeout): New test.
test/e2e/util.go (buildSlowHTTPDPod): New helper for TestAWSELBConnectionIdleTimeout.

Related to openshift/enhancements#461.

openshift-ci-robot · 2020-09-01T04:09:27Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [Miciah]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2020-09-02T22:21:02Z

@Miciah: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-aws	`5c5286b`	link	`/test e2e-aws`
ci/prow/e2e-aws-operator	`5c5286b`	link	`/test e2e-aws-operator`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-merge-robot · 2020-10-21T22:10:53Z

@Miciah: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-upgrade	`5c5286b`	link	`/test e2e-upgrade`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-bot · 2021-01-20T00:38:56Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2021-02-19T00:56:29Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2021-03-21T05:13:36Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot · 2021-03-21T05:13:46Z

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Miciah · 2022-04-07T06:31:07Z

/reopen
/remove-lifecycle rotten

openshift-ci · 2022-04-07T06:31:31Z

@Miciah: Reopened this PR.

Details

In response to this:

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Miciah · 2022-04-07T15:00:07Z

In the last e2e-aws-operator run, the new TestAWSELBConnectionIdleTimeout passed like a champ, but TestHostNetworkEndpointPublishingStrategy failed:

 === RUN   TestHostNetworkEndpointPublishingStrategy
    operator_test.go:758: failed to observe expected conditions: timed out waiting for the condition
    operator_test.go:760: deleted ingresscontroller host
--- FAIL: TestHostNetworkEndpointPublishingStrategy (300.03s)

That test has been failing on a few PRs over the past 26 hours.

/retest

Miciah · 2022-04-07T20:47:04Z

In the last e2e-aws-operator run, TestConfigurableRouteRBAC, TestConfigurableRouteNoSecretNoRBAC, TestConfigurableRouteNoConsumingUserNoRBAC, TestHostNetworkEndpointPublishingStrategy, and TestAWSELBConnectionIdleTimeout failed. The last failed with the following error message:

     operator_test.go:2208: failed to get ingresscontroller: Get "https://api.ci-op-1yr8trkg-43abb.origin-ci-int-aws.dev.rhcloud.com:6443/apis/operator.openshift.io/v1/namespaces/openshift-ingress-operator/ingresscontrollers/test-idle-timeout": http2: client connection lost

I could add some retry logic around the Get call that failed, but the E2E tests have a lot of Get calls without retry logic, and the other failures in the test run lead me to think that the cluster had some other issue unrelated to this PR.

/retest

Miciah · 2022-04-07T22:03:00Z

Latest push adds a unit test that I had forgotten to commit.

Rework the GCP load-balancer provider parameters defaulting logic in preparation for an upcoming commit. Besides simplifying the logic, this commit also changes the defaulting logic to ignore unknown provider parameters and to ignore provider parameters for platforms other than the actual platform. These changes should avoid surprises when more provider parameters are added to the API later on as well as prevent weird behavior when the user sets GCP provider parameters on non-GCP clusters. * pkg/operator/controller/ingress/controller.go (setDefaultPublishingStrategy): Rework defaulting logic for GCP load-balancer provider parameters.

Miciah · 2022-05-10T13:10:40Z

Rebased for #735.

Miciah · 2022-05-16T15:43:30Z

/assign frobware
/assign gcs278

frobware

LGTM, just some places in the e2e test that really should call t.Fatal() and not t.Error(). Plus a question regarding commit - does this belong to this PR; the commit message mentions it is for an upcoming change.

frobware · 2022-05-17T13:57:57Z

test/e2e/operator_test.go

+
+		return true, nil
+	}); err != nil {
+		t.Errorf("failed to observe expected condition: %v", err)


Should this be Fatal()? Can we carry on without lookup succeeding?

I suppose it wouldn't hurt to make this Fatal.

frobware · 2022-05-17T14:04:12Z

test/e2e/operator_test.go

+
+		return false, nil
+	}); err != nil {
+		t.Errorf("failed to observe expected condition: %v", err)


This should be fatal too.

Isn't it potentially useful to know whether a large value causes the expected behavior when diagnosing why a low value does not?

frobware · 2022-05-17T14:39:05Z

test/e2e/operator_test.go

+
+		return true, nil
+	}); err != nil {
+		t.Errorf("failed to observe expected condition: %v", err)


Should be fatal.

frobware · 2022-05-17T14:39:50Z

test/e2e/operator_test.go

+
+		return true, nil
+	}); err != nil {
+		t.Errorf("failed to observe expected condition: %v", err)


Should be fatal. Although it's at the end of the test we may as well be consistent.

You're suggesting making every Error into a Fatal. What's an example of where you would advise using Error?

We should use Fatal if any subsequent testing can only produce garbage. However, I think in some of these instances where I used Error in this test, there could be value in continuing the test. For example, even if setting a 3-second timeout didn't behave as expected, we can still try a 120-second timeout to see whether it behaves as expected, and the result could be helpful in diagnosing why the 3-second timeout didn't behave as expected.

You're suggesting making every Error into a Fatal. What's an example of where you would advise using Error?

Table-driven unit tests.

We should use Fatal if any subsequent testing can only produce garbage

Isn't that the case for all these e2e tests; we try to setup up the cluster/objects/resources/state in a very particular way and if that doesn't happen is it worth generating cascading failures messages if the test was to continue? If you were to debug a failing test you're likely to start with the first error message. Would further error messages help?

frobware · 2022-05-17T14:40:56Z

test/e2e/operator_test.go

+	if err := wait.PollImmediate(1*time.Second, 5*time.Minute, func() (bool, error) {
+		_, err := net.LookupIP(route.Spec.Host)
+		if err != nil {
+			t.Log(err)


This can be really chatty, particularly at 1s interval.

It's test output, it should be under 300 lines, and it's hidden unless a test fails. Is that too chatty? I could add some logic to suppress the log message if it's identical to the previous one.

Maybe just bump the interval to 3s?

On my sample size of 1 run, the lookup resolved in ~60s.

Perhaps I'm just a little wary if this becomes a parallel test candidate [1]. One downside of running some or a lot of the tests in parallel is the interleaved test output.

Not a blocker for me on the PR though. Was just an observation.

[1] PR #756.

frobware · 2022-05-17T15:10:05Z

pkg/operator/controller/ingress/controller.go

@@ -412,19 +412,63 @@ func setDefaultPublishingStrategy(ic *operatorv1.IngressController, infraConfig
 				changed = true
 			}



Does commit "rework GCP logic" belong to this PR? Can it go in the upcoming commit?

Miciah · 2022-05-17T22:56:17Z

I've changed Error to Fatal in the E2E test.

frobware · 2022-05-18T10:27:18Z

/lgtm

frobware · 2022-05-18T10:27:26Z

/retest

* pkg/operator/controller/ingress/controller.go (setDefaultPublishingStrategy): Handle changes to the connection idle timeout for an AWS ELB. * pkg/operator/controller/ingress/controller_test.go (TestSetDefaultPublishingStrategyHandlesUpdates): Add test cases for changing the ELB connection idle timeout. * pkg/operator/controller/ingress/load_balancer_service.go (awsELBConnectionIdleTimeoutAnnotation): New constant. (managedLoadBalancerServiceAnnotations): Add awsELBConnectionIdleTimeoutAnnotation. (desiredLoadBalancerService): Set the connection idle timeout annotation if the ingresscontroller specifies a non-nil connectionIdleTimeout value. * pkg/operator/controller/ingress/load_balancer_service_test.go (TestDesiredLoadBalancerServiceAWSIdleTimeout): New test. (TestLoadBalancerServiceChanged): Add a test case for the connection-idle-timeout annotation. * test/e2e/operator_test.go (TestAWSELBConnectionIdleTimeout): New test. * test/e2e/util.go (buildSlowHTTPDPod): New helper for TestAWSELBConnectionIdleTimeout.

Miciah · 2022-05-23T13:50:19Z

/retest

frobware · 2022-05-23T15:51:08Z

/lgtm

openshift-ci · 2022-05-23T15:51:50Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: frobware, Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [Miciah,frobware]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2022-05-23T16:22:46Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2022-05-23T17:34:49Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-ci · 2022-05-23T18:06:47Z

@Miciah: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci-robot requested review from frobware and ironcladlou September 1, 2020 04:09

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 1, 2020

Miciah changed the title ~~Allow configuring ELB connection idle timeout~~ WIP: Allow configuring ELB connection idle timeout Sep 1, 2020

openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 1, 2020

Miciah force-pushed the allow-configuring-ELB-connection-idle-timeout branch 2 times, most recently from 04c7846 to ff7701e Compare September 2, 2020 20:08

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 2, 2020

Miciah force-pushed the allow-configuring-ELB-connection-idle-timeout branch from ff7701e to 5c5286b Compare September 2, 2020 21:03

openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 2, 2020

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2021

openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 19, 2021

openshift-ci-robot closed this Mar 21, 2021

openshift-ci bot reopened this Apr 7, 2022

openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 7, 2022

Miciah force-pushed the allow-configuring-ELB-connection-idle-timeout branch from 5c5286b to d024778 Compare April 7, 2022 06:31

Miciah force-pushed the allow-configuring-ELB-connection-idle-timeout branch from d024778 to 70f8c6b Compare April 7, 2022 22:02

Miciah force-pushed the allow-configuring-ELB-connection-idle-timeout branch from 90d4ea4 to 1a310ef Compare May 10, 2022 13:10

openshift-ci bot assigned frobware and gcs278 May 16, 2022

frobware requested changes May 17, 2022

View reviewed changes

Miciah force-pushed the allow-configuring-ELB-connection-idle-timeout branch from 1a310ef to fc7c323 Compare May 17, 2022 22:55

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 18, 2022

frobware approved these changes May 18, 2022

View reviewed changes

Miciah force-pushed the allow-configuring-ELB-connection-idle-timeout branch from fc7c323 to eee2928 Compare May 18, 2022 15:47

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label May 18, 2022

Miciah force-pushed the allow-configuring-ELB-connection-idle-timeout branch from eee2928 to a635566 Compare May 19, 2022 18:30

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 23, 2022

openshift-merge-robot merged commit 9ad500d into openshift:master May 23, 2022

blrm mentioned this pull request Aug 10, 2022

Set ELBConnectionTimeout on IngressController for v4.11+ openshift/cloud-ingress-operator#265

Merged

Miciah mentioned this pull request Aug 4, 2023

OCPBUGS-17359: test/e2e: Don't use openshift/origin-node #970

Merged

This was referenced Oct 25, 2023

[release-4.13] OCPBUGS-22402: test/e2e: Don't use openshift/origin-node #991

Merged

[release-4.12] OCPBUGS-22432: test/e2e: Don't use openshift/origin-node #992

Merged

[release-4.11] OCPBUGS-22433: test/e2e: Don't use openshift/origin-node #993

Merged

		@@ -412,19 +412,63 @@ func setDefaultPublishingStrategy(ic *operatorv1.IngressController, infraConfig
		changed = true
		}

NE-408: Allow configuring ELB connection idle timeout #451

NE-408: Allow configuring ELB connection idle timeout #451

Uh oh!

Conversation

Miciah commented Sep 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bump openshift/api for ELB connection idle timeout

test/e2e: import API errors package as apierrors

desiredLoadBalancerService: Simplify with "lb" var

desiredLoadBalancerService: Check for nil LB status

setDefaultPublishingStrategy: Rework GCP logic

Allow configuring ELB connection idle timeout

Uh oh!

openshift-ci-robot commented Sep 1, 2020

Uh oh!

openshift-ci-robot commented Sep 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-merge-robot commented Oct 21, 2020

Uh oh!

openshift-bot commented Jan 20, 2021

Uh oh!

openshift-bot commented Feb 19, 2021

Uh oh!

openshift-bot commented Mar 21, 2021

Uh oh!

openshift-ci-robot commented Mar 21, 2021

Uh oh!

Miciah commented Apr 7, 2022

Uh oh!

openshift-ci bot commented Apr 7, 2022

Uh oh!

Miciah commented Apr 7, 2022

Uh oh!

Miciah commented Apr 7, 2022

Uh oh!

Miciah commented Apr 7, 2022

Uh oh!

Miciah commented May 10, 2022

Uh oh!

Miciah commented May 16, 2022

Uh oh!

frobware left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Miciah commented May 17, 2022

Uh oh!

frobware commented May 18, 2022

Uh oh!

frobware commented May 18, 2022

Uh oh!

Miciah commented May 23, 2022

Uh oh!

Miciah commented Sep 1, 2020 •

edited

Loading

`test/e2e`: import API errors package as `apierrors`

`desiredLoadBalancerService`: Simplify with "lb" var

`desiredLoadBalancerService`: Check for nil LB status

openshift-ci-robot commented Sep 2, 2020 •

edited

Loading

frobware left a comment •

edited

Loading