Skip to content

controller-manager/scheduler: switch to secure ports#1576

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
s-urbaniak:secure-ports
Apr 18, 2019
Merged

controller-manager/scheduler: switch to secure ports#1576
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
s-urbaniak:secure-ports

Conversation

@s-urbaniak
Copy link
Copy Markdown
Contributor

Recently, the control plane switched to secure ports in [1] and [2].
This aligns them in the installer.

[1]
openshift/cluster-kube-scheduler-operator#88
[2]
openshift/cluster-kube-controller-manager-operator#207

/cc @brancz @sttts

Recently, the control plane switched to secure ports in [1] and [2].
This aligns them in the installer.

[1]
openshift/cluster-kube-scheduler-operator#88
[2]
openshift/cluster-kube-controller-manager-operator#207
@s-urbaniak
Copy link
Copy Markdown
Contributor Author

/test e2e-aws

@brancz
Copy link
Copy Markdown
Contributor

brancz commented Apr 10, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 10, 2019
@abhinavdahiya
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 10, 2019
@wking
Copy link
Copy Markdown
Member

wking commented Apr 10, 2019

Can we just cut over? I'd have expected keeping both ports open during the transition. I guess we'll see how e2e-aws goes...

@brancz
Copy link
Copy Markdown
Contributor

brancz commented Apr 10, 2019

The insecure port has already been disabled on the components, so I'd be surprised if e2e-aws only turned red now. We (as in monitoring) would have appreciated a transition though instead of a hard break, as this is currently blocking all our PRs as our CI is failing because of these not being available. Now we've almost completed all of the transition though so more for next time 🙂 .

@wking
Copy link
Copy Markdown
Member

wking commented Apr 10, 2019

e2e-aws:

Flaky tests:

[sig-cli] Kubectl client [k8s.io] Simple pod should contain last line of the log [Suite:openshift/conformance/parallel] [Suite:k8s]

Failing tests:

[Feature:Prometheus][Conformance] Prometheus when installed on the cluster should start and expose a secured proxy and unsecured metrics [Suite:openshift/conformance/parallel/minimal]

/retest

@wking
Copy link
Copy Markdown
Member

wking commented Apr 10, 2019

We (as in monitoring) would have appreciated a transition though instead of a hard break, as this is currently blocking all our PRs as our CI is failing because of these not being available.

Sounds like you need to add a stronger monitoring check to the e2e-aws suite, so future changes are gated on not breaking you?

@brancz
Copy link
Copy Markdown
Contributor

brancz commented Apr 10, 2019

Sounds like you need to add a stronger monitoring check to the e2e-aws suite, so future changes are gated on not breaking you?

We had these, but they were disabled for migration purposes 🙃 (without us being informed).

@brancz
Copy link
Copy Markdown
Contributor

brancz commented Apr 10, 2019

The OpenShift API never became available in the test failure, seems unrelated.

/retest

@brancz
Copy link
Copy Markdown
Contributor

brancz commented Apr 10, 2019

That seems like another failure in bringing up the cluster, I'm seeing:

                    {
                        "lastTransitionTime": "2019-04-10T15:19:36Z",
                        "message": "StaticPodsFailing: pods \"kube-apiserver-ip-10-0-130-161.ec2.internal\" not found\nStaticPodsFailing: pods \"kube-apiserver-ip-10-0-168-57.ec2.internal\" not found\nStaticPodsFailing: pods \"kube-apiserver-ip-10-0-148-58.ec2.internal\" not found",
                        "reason": "StaticPodsFailingError",
                        "status": "True",
                        "type": "Failing"
                    },

And

                    {
                        "lastTransitionTime": "2019-04-10T15:19:37Z",
                        "message": "StaticPodsFailing: pods \"kube-controller-manager-ip-10-0-130-161.ec2.internal\" not found\nStaticPodsFailing: pods \"kube-controller-manager-ip-10-0-168-57.ec2.internal\" not found\nStaticPodsFailing: pods \"kube-controller-manager-ip-10-0-148-58.ec2.internal\" not found",
                        "reason": "StaticPodsFailingError",
                        "status": "True",
                        "type": "Failing"
                    },

Among other components. I'm not entirely convinced this is due to this change though as apiserver and openshift-apiserver are failing as well.

Could installer folks have a deeper look? For now restarting, we might just have been unlucky.

/retest

@brancz
Copy link
Copy Markdown
Contributor

brancz commented Apr 10, 2019

Looked into the failure, this time "only" compute nodes and apiserver seem to have failed, but I can't find further logs to find out more.

/retest

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@brancz
Copy link
Copy Markdown
Contributor

brancz commented Apr 11, 2019

failed to create listener: failed to listen on 0.0.0.0:10257: listen tcp 0.0.0.0:10257: bind: address already in use

Seems like a flake as this PRs isn't the one switching this behavior.

/retest

@s-urbaniak
Copy link
Copy Markdown
Contributor Author

/retest

1 similar comment
@s-urbaniak
Copy link
Copy Markdown
Contributor Author

/retest

@s-urbaniak
Copy link
Copy Markdown
Contributor Author

At last verified locally this branch successfully and confirmed the control plane is reachable on 10257/10259.

The flakes are addressed in openshift/origin#22543.

@s-urbaniak
Copy link
Copy Markdown
Contributor Author

/retest

flake PR merge, let's see how this goes

@s-urbaniak
Copy link
Copy Markdown
Contributor Author

/test e2e-aws

1 similar comment
@s-urbaniak
Copy link
Copy Markdown
Contributor Author

/test e2e-aws

@paulfantom
Copy link
Copy Markdown
Contributor

/retest

@s-urbaniak
Copy link
Copy Markdown
Contributor Author

/test e2e-aws

@s-urbaniak
Copy link
Copy Markdown
Contributor Author

@abhinavdahiya this is a chicken/egg problem, as prometheus still tries to scrape insecure ports, this is resolved in openshift/cluster-monitoring-operator#316, but the latter cannot land without this. We temporarily disabled this very e2e test and will reenable once this and the referenced PR land.

@s-urbaniak
Copy link
Copy Markdown
Contributor Author

side note: openshift/origin#22574 got merged just 6 hours ago due to flakes, so let's observe the current builds.

@abhinavdahiya
Copy link
Copy Markdown
Contributor

e2e-aws is still failing:

[Feature:Prometheus][Conformance] Prometheus when installed on the cluster should start and expose a secured proxy and unsecured metrics [Suite:openshift/conformance/parallel/minimal]

@abhinavdahiya
Copy link
Copy Markdown
Contributor

/test e2e-metal

@s-urbaniak
Copy link
Copy Markdown
Contributor Author

hmm .. it's still e2e'ing the scheduler metrics 🤔

Apr 16 21:35:17.877: INFO: missing some targets: [no match for map[job:scheduler] with health up and scrape URL ^(http|https)://.*/metrics$]

@paulfantom do you mind to have a look why this is the case?

@s-urbaniak
Copy link
Copy Markdown
Contributor Author

/retest

1 similar comment
@s-urbaniak
Copy link
Copy Markdown
Contributor Author

/retest

@s-urbaniak
Copy link
Copy Markdown
Contributor Author

@abhinavdahiya PTAL, aws is green again, I have not enough context about the baremetal failure but it seems unrelated.

@abhinavdahiya
Copy link
Copy Markdown
Contributor

/retest

@abhinavdahiya
Copy link
Copy Markdown
Contributor

/test e2e-aws

@brancz
Copy link
Copy Markdown
Contributor

brancz commented Apr 18, 2019

/retest

@s-urbaniak
Copy link
Copy Markdown
Contributor Author

fwiw openshift/cluster-monitoring-operator#316 just got merged, so once this lands, we can reenable the metrics e2e tests for the control plane.

@brancz
Copy link
Copy Markdown
Contributor

brancz commented Apr 18, 2019

/retest

2 similar comments
@brancz
Copy link
Copy Markdown
Contributor

brancz commented Apr 18, 2019

/retest

@brancz
Copy link
Copy Markdown
Contributor

brancz commented Apr 18, 2019

/retest

@abhinavdahiya
Copy link
Copy Markdown
Contributor

/lgtm

e2e-metal is optional

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 18, 2019
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, brancz, s-urbaniak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 18, 2019
@brancz
Copy link
Copy Markdown
Contributor

brancz commented Apr 18, 2019

/retest

1 similar comment
@brancz
Copy link
Copy Markdown
Contributor

brancz commented Apr 18, 2019

/retest

@trown
Copy link
Copy Markdown

trown commented Apr 18, 2019

/test e2e-openstack

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 6e8d748 into openshift:master Apr 18, 2019
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@s-urbaniak: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-metal 36bb700 link /test e2e-metal

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

maci0 added a commit to maci0/openshift4-ansible that referenced this pull request Apr 18, 2019
@s-urbaniak s-urbaniak deleted the secure-ports branch April 23, 2019 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants