Re-enable cluster ingress operator (remove from overriden set) #467

ramr · 2018-10-15T21:14:34Z

Drop openshift-cluster-ingress-operator from list of cvo-overrides so that
it is re-enabled as the ingress operator has changed to work out of the box.
Associated jira ticket: https://jira.coreos.com/browse/NE-88

And dropping as per comments in #415

@ironcladlou @abhinavdahiya @rajatchopra PTAL Thx - not sure who else to cc.

Edited the whole shebang!

ironcladlou · 2018-10-15T21:17:56Z

Can we disable tectonic-ingress-controller-operator as part of this?

abhinavdahiya · 2018-10-15T21:24:54Z

@ramr

the commit message title like is way too long. https://github.com/openshift/installer/blob/master/CONTRIBUTING.md#commit-message-format
You don't have any comments or have not made any changes that suggest why the comment in the override is no longer required or true

rajatchopra · 2018-10-15T21:54:25Z

pkg/asset/manifests/content/bootkube/cvo-overrides.go

Did you mean to drop both the deployment for cluster-dns operator as well? As per comments in #415?

no only the deployment - the service account is for the cluster-dns-operator.

ramr · 2018-10-15T22:47:53Z

@abhinavdahiya done PTAL Thx
btw, the file name is too long for the 70 char subject line limit so truncated it to the directory name and changed the title accordingly ... also git log|show does have a --stat option!

@ironcladlou I know a 3/4 files pkg/asset/manifests/tectonic.go and the actual asset content but am not sure what else is needed. Still checking thru' the code/repo. Thx

abhinavdahiya · 2018-10-15T23:04:32Z

@ramr

am not sure what else is needed

you can drop all tectonic-ingress-*.go from https://github.com/openshift/installer/tree/master/pkg/asset/manifests/content/tectonic
delete these line https://github.com/openshift/installer/blob/master/pkg/asset/manifests/tectonic.go#L65-L70 and https://github.com/openshift/installer/blob/master/pkg/asset/manifests/tectonic.go#L44-L49 and https://github.com/openshift/installer/blob/master/pkg/asset/manifests/tectonic.go#L53
for 1, 2 i'm not sure if the tls secret is no longer required?
drop https://github.com/openshift/installer/blob/master/pkg/asset/manifests/content/bootkube/02-ingress-namespace.go
and delete this https://github.com/openshift/installer/blob/master/pkg/asset/manifests/operators.go#L209

This should be mostly the chunk of code.

ramr · 2018-10-15T23:37:33Z

@abhinavdahiya thx for the info.
Did notice that ./pkg/asset/manifests/content/tectonic/tectonic-system-01-ca-cert.go depends on the IngressCaCert? So am assuming I'd need to keep whatever code that sets IngressCaCert.
Unless you think the system-ca-cert code is also a candidate to be removed?

And there's some cleanup to un{used,referenced} fields in tectonicTemplateData in pkg/asset/manifests/template.go.

abhinavdahiya · 2018-10-15T23:44:37Z

lets remove ./pkg/asset/manifests/content/tectonic/tectonic-system-01-ca-cert.go too :)

And there's some cleanup to un{used,referenced} fields in tectonicTemplateData in pkg/asset/manifests/template.go

Yeah that would be nice too.

ramr · 2018-10-16T21:45:00Z

@abhinavdahiya done and squashed the commits. PTAL Thx

abhinavdahiya · 2018-10-16T21:48:01Z

The changes look good. lets see if the ci passes. will /lgtm as soon as that happens.

wking · 2018-10-16T23:47:37Z

e2e-aws:

Waiting for router to be created ...
NAME                           STATUS    ROLES       AGE       VERSION
ip-10-0-147-178.ec2.internal   Ready     worker      1h        v1.11.0+d4cacc0
ip-10-0-147-211.ec2.internal   Ready     worker      1h        v1.11.0+d4cacc0
ip-10-0-158-155.ec2.internal   Ready     worker      1h        v1.11.0+d4cacc0
ip-10-0-18-133.ec2.internal    Ready     master      1h        v1.11.0+d4cacc0
ip-10-0-2-102.ec2.internal     Ready     master      1h        v1.11.0+d4cacc0
ip-10-0-47-156.ec2.internal    Ready     master      1h        v1.11.0+d4cacc0
ip-10-0-6-122.ec2.internal     Ready     bootstrap   1h        v1.11.0+d4cacc0
Waiting for router to be created ...
Another process exited

Sometimes those timeouts are just flakes.

/retest

abhinavdahiya · 2018-10-17T00:19:03Z

testing locally

oc -n openshift-cluster-ingress-router get pods
NAME                   READY     STATUS    RESTARTS   AGE
router-default-pxqbp   1/1       Running   0          1h

i think it is necessary to run router in openshift-ingress. cc @smarterclayton

smarterclayton · 2018-10-17T03:23:40Z

Yeah, let’s us openshift-ingress as the namespace for the operator , less to type and cluster is redundant

ironcladlou · 2018-10-17T11:27:51Z

@abhinavdahiya

i think it is necessary to run router in openshift-ingress. cc @smarterclayton

Why?

@smarterclayton

Yeah, let’s us openshift-ingress as the namespace for the operator , less to type and cluster is redundant

Let's not, because we'll have to make a bunch of changes. Also, you seem to be talking about the operator and @abhinavdahiya is talking about the namespace for routers managed by the operator.

I don't want to change any of it, not right now. These namespaces aren't configurable and everything's tested as-is today.

ironcladlou · 2018-10-17T16:30:26Z

Okay, talked with @smarterclayton and he convinced me it's easier for us to change the operator than to change the other stuff that wants routers in openshift-ingress. Will reference PR shortly.

abhinavdahiya · 2018-10-17T16:30:35Z

@ironcladlou
the origin-e2e cares:
https://github.com/openshift/origin/blob/01b0649c894ffb88a6f1a2d0a41d3abfa3f0b80c/test/extended/util/framework.go#L1429-L1452

the ci cares:
https://github.com/openshift/release/blob/46bde58076e277fe5c9f8afaab9d922ad74b3df8/ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml#L129-L142

ironcladlou · 2018-10-17T17:32:36Z

Operator and router namespaces renamed in openshift/cluster-ingress-operator#52.

ironcladlou · 2018-10-17T17:45:57Z

@abhinavdahiya @smarterclayton @ramr

@ironcladlou
the origin-e2e care https://github.com/openshift/origin/blob/01b0649c894ffb88a6f1a2d0a41d3abfa3f0b80c/test/extended/util/framework.go#L1429-L1452
the ci cares https://github.com/openshift/release/blob/46bde58076e277fe5c9f8afaab9d922ad74b3df8/ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml#L129-L142

Both of those are going to have to change, as the new default router is actually a daemonset named router-default.

wking · 2018-10-18T14:43:41Z

... can someone explain why we even need this router gate at all?

We want to wait until the cluster is "up" before running the e2e tests. I guess the router is just a good marker for that.

ironcladlou · 2018-10-18T14:45:41Z

We want to wait until the cluster is "up" before running the e2e tests. I guess the router is just a good marker for that.

Wouldn't the apiserver be a better indicator in the general sense? Shouldn't any tests that rely on routes (like... the router tests) be retrying assertions while the router is coming up anyway?

crawford · 2018-10-18T14:57:03Z

The API server comes up "too" quickly. The router was arbitrarily chosen because it is one of the last things to come up. It was a good indication that everything else behaved correctly.

wking · 2018-10-18T14:58:59Z

Shouldn't any tests that rely on routes (like... the router tests) be retrying assertions while the router is coming up anyway?

I think the e2e tests assume they're running on a working cluster, not one that is in the process of launching itself.

deads2k · 2018-10-18T15:01:39Z

The API server comes up "too" quickly. The router was arbitrarily chosen because it is one of the last things to come up. It was a good indication that everything else behaved correctly.

Which API server though? The openshift one? oc get --raw /apis/apps.openshift.io/v1 requires dns, networking, and openshift APIs and aggregation before succeeding.

abhinavdahiya · 2018-10-18T16:47:58Z

/retest

1. Drop openshift-cluster-ingress-operator from list of cvo-overrides so that it is re-enabled as the ingress operator has changed to work out of the box. Associated jira ticket: https://jira.coreos.com/browse/NE-88 2. Remove techtonic-ingress assets and configuration - replaced by the github.com/openshift/cluster-ingress-operator code.

abhinavdahiya · 2018-10-19T21:31:16Z

testing locally, the operator pod is crashlooping:

oc -n openshift-ingress-operator logs ingress-operator-b748b79cf-rdbtw
time="2018-10-19T21:29:45Z" level=info msg="Go Version: go1.10.3"
time="2018-10-19T21:29:45Z" level=info msg="Go OS/Arch: linux/amd64"
time="2018-10-19T21:29:45Z" level=info msg="operator-sdk Version: 0.0.6+git"
time="2018-10-19T21:29:46Z" level=info msg="Metrics service ingress-operator created"
time="2018-10-19T21:29:46Z" level=fatal msg="Ensuring default cluster ingress: missing kco-config in configmap"

You should be using the install-config key from that configmap.

ramr · 2018-10-21T05:13:26Z

@abhinavdahiya it was fixed a couple of days back with openshift/cluster-ingress-operator#53 ... i'll rebase this and fix the conflicts on monday.

ironcladlou · 2018-10-23T15:07:13Z

/retest

ironcladlou · 2018-10-23T18:01:00Z

I don't think this condition assertion is valid for a DaemonSet, causing the check to hang.

abhinavdahiya · 2018-10-23T18:17:22Z

I don't think this condition assertion is valid for a DaemonSet, causing the check to hang.

cc @smarterclayton

smarterclayton · 2018-10-23T19:43:18Z

What’s the equivalent? Does a dameonset have a condition we can wait for?

The reason we gate is because of installation lag

ironcladlou · 2018-10-23T19:46:43Z

What’s the equivalent? Does a dameonset have a condition we can wait for?
The reason we gate is because of installation lag

Haven't found a way to do it with oc wait; the daemonset exposes no conditions. Guess we could compare status.desiredNumberScheduled to status.numberAvailable from templated output using bash, or something... would that be acceptable? Got a better suggestion?

ramr · 2018-10-23T20:14:33Z

Or if oc get ds/router-default -n openshift-ingress -o go-template='{{ne "0" (print .status.numberReady)}}' returns true, maybe? I couldn't do an int comparison.

ironcladlou · 2018-10-25T14:41:08Z

/retest

ironcladlou · 2018-10-25T15:22:56Z

@abhinavdahiya @smarterclayton looks like we finally resolved the e2e issue. I tested this manually in AWS and it looks like our operator comes up okay.

abhinavdahiya · 2018-10-25T17:56:06Z

/lgtm

openshift-ci-robot · 2018-10-25T17:56:54Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, ramr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [abhinavdahiya]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

crawford · 2018-10-25T19:41:44Z

Tests passed, but it looks like teardown might be having trouble. :/

ironcladlou · 2018-10-25T19:42:40Z

Noooooooooooo

ironcladlou · 2018-10-25T20:03:05Z

/retest

ironcladlou · 2018-10-25T21:06:57Z

/retest

openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Oct 15, 2018

openshift-ci-robot requested review from smarterclayton and yifan-gu October 15, 2018 21:14

rajatchopra reviewed Oct 15, 2018

View reviewed changes

ramr force-pushed the ingress-changes branch 2 times, most recently from 58ca076 to 400af19 Compare October 15, 2018 22:39

ramr force-pushed the ingress-changes branch from 400af19 to 905d161 Compare October 16, 2018 19:47

openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 16, 2018

ironcladlou mentioned this pull request Oct 16, 2018

kubernetes.io/cluster/CLUSTER_NAME tag missing from EIPs, NAT GWs, S3 buckets and R53 zones #458

Closed

ironcladlou mentioned this pull request Oct 17, 2018

Rename operator and router namespaces openshift/cluster-ingress-operator#52

Merged

This was referenced Oct 17, 2018

Update router namespace check for daemonset started by the cluster ingress operator openshift/release#1952

Merged

Fix extended tests to check router daemonsets created by the ingress operator openshift/origin#21292

Closed

ramr force-pushed the ingress-changes branch from 905d161 to 13341ca Compare October 17, 2018 22:18

ramr force-pushed the ingress-changes branch from 13341ca to 8f8c941 Compare October 22, 2018 20:59

ironcladlou mentioned this pull request Oct 23, 2018

Fix how we wait on routers - cluster ingress operator uses daemonsets openshift/release#2001

Closed

openshift-ci-robot assigned abhinavdahiya Oct 25, 2018

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 25, 2018

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 25, 2018

openshift-merge-robot merged commit aeb2938 into openshift:master Oct 25, 2018

ramr deleted the ingress-changes branch October 29, 2018 19:50

Re-enable cluster ingress operator (remove from overriden set) #467

Re-enable cluster ingress operator (remove from overriden set) #467

Uh oh!

Conversation

ramr commented Oct 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ironcladlou commented Oct 15, 2018

Uh oh!

abhinavdahiya commented Oct 15, 2018

Uh oh!

rajatchopra Oct 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ramr Oct 15, 2018

Choose a reason for hiding this comment

Uh oh!

ramr commented Oct 15, 2018

Uh oh!

abhinavdahiya commented Oct 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ramr commented Oct 15, 2018

Uh oh!

abhinavdahiya commented Oct 15, 2018

Uh oh!

ramr commented Oct 16, 2018

Uh oh!

abhinavdahiya commented Oct 16, 2018

Uh oh!

wking commented Oct 16, 2018

Uh oh!

abhinavdahiya commented Oct 17, 2018

Uh oh!

smarterclayton commented Oct 17, 2018

Uh oh!

ironcladlou commented Oct 17, 2018

Uh oh!

ironcladlou commented Oct 17, 2018

Uh oh!

abhinavdahiya commented Oct 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ironcladlou commented Oct 17, 2018

Uh oh!

ironcladlou commented Oct 17, 2018

Uh oh!

wking commented Oct 18, 2018

Uh oh!

ironcladlou commented Oct 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crawford commented Oct 18, 2018

Uh oh!

wking commented Oct 18, 2018

Uh oh!

deads2k commented Oct 18, 2018

Uh oh!

abhinavdahiya commented Oct 18, 2018

Uh oh!

abhinavdahiya commented Oct 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ramr commented Oct 21, 2018

Uh oh!

ironcladlou commented Oct 23, 2018

Uh oh!

ironcladlou commented Oct 23, 2018

Uh oh!

abhinavdahiya commented Oct 23, 2018

Uh oh!

smarterclayton commented Oct 23, 2018

Uh oh!

ironcladlou commented Oct 23, 2018

Uh oh!

ramr commented Oct 23, 2018

Uh oh!

ironcladlou commented Oct 25, 2018

Uh oh!

ramr commented Oct 15, 2018 •

edited

Loading

rajatchopra Oct 15, 2018 •

edited

Loading

abhinavdahiya commented Oct 15, 2018 •

edited

Loading

abhinavdahiya commented Oct 17, 2018 •

edited

Loading

ironcladlou commented Oct 18, 2018 •

edited

Loading

abhinavdahiya commented Oct 19, 2018 •

edited

Loading