Skip to content

Conversation

@ramr
Copy link
Contributor

@ramr ramr commented Oct 15, 2018

Drop openshift-cluster-ingress-operator from list of cvo-overrides so that
it is re-enabled as the ingress operator has changed to work out of the box.
Associated jira ticket: https://jira.coreos.com/browse/NE-88

And dropping as per comments in #415

@ironcladlou @abhinavdahiya @rajatchopra PTAL Thx - not sure who else to cc.

Edited the whole shebang!

@openshift-ci-robot openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Oct 15, 2018
@ironcladlou
Copy link
Contributor

Can we disable tectonic-ingress-controller-operator as part of this?

@abhinavdahiya
Copy link
Contributor

@ramr

  1. the commit message title like is way too long. https://github.com/openshift/installer/blob/master/CONTRIBUTING.md#commit-message-format
  2. You don't have any comments or have not made any changes that suggest why the comment in the override is no longer required or true

Copy link

@rajatchopra rajatchopra Oct 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to drop both the deployment for cluster-dns operator as well? As per comments in #415?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no only the deployment - the service account is for the cluster-dns-operator.

@ramr ramr force-pushed the ingress-changes branch 2 times, most recently from 58ca076 to 400af19 Compare October 15, 2018 22:39
@ramr
Copy link
Contributor Author

ramr commented Oct 15, 2018

@abhinavdahiya done PTAL Thx
btw, the file name is too long for the 70 char subject line limit so truncated it to the directory name and changed the title accordingly ... also git log|show does have a --stat option!

@ironcladlou I know a 3/4 files pkg/asset/manifests/tectonic.go and the actual asset content but am not sure what else is needed. Still checking thru' the code/repo. Thx

@ramr
Copy link
Contributor Author

ramr commented Oct 15, 2018

@abhinavdahiya thx for the info.
Did notice that ./pkg/asset/manifests/content/tectonic/tectonic-system-01-ca-cert.go depends on the IngressCaCert? So am assuming I'd need to keep whatever code that sets IngressCaCert.
Unless you think the system-ca-cert code is also a candidate to be removed?

And there's some cleanup to un{used,referenced} fields in tectonicTemplateData in pkg/asset/manifests/template.go.

@abhinavdahiya
Copy link
Contributor

lets remove ./pkg/asset/manifests/content/tectonic/tectonic-system-01-ca-cert.go too :)

And there's some cleanup to un{used,referenced} fields in tectonicTemplateData in pkg/asset/manifests/template.go

Yeah that would be nice too.

@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 16, 2018
@ramr
Copy link
Contributor Author

ramr commented Oct 16, 2018

@abhinavdahiya done and squashed the commits. PTAL Thx

@abhinavdahiya
Copy link
Contributor

The changes look good. lets see if the ci passes. will /lgtm as soon as that happens.

@wking
Copy link
Member

wking commented Oct 16, 2018

e2e-aws:

Waiting for router to be created ...
NAME                           STATUS    ROLES       AGE       VERSION
ip-10-0-147-178.ec2.internal   Ready     worker      1h        v1.11.0+d4cacc0
ip-10-0-147-211.ec2.internal   Ready     worker      1h        v1.11.0+d4cacc0
ip-10-0-158-155.ec2.internal   Ready     worker      1h        v1.11.0+d4cacc0
ip-10-0-18-133.ec2.internal    Ready     master      1h        v1.11.0+d4cacc0
ip-10-0-2-102.ec2.internal     Ready     master      1h        v1.11.0+d4cacc0
ip-10-0-47-156.ec2.internal    Ready     master      1h        v1.11.0+d4cacc0
ip-10-0-6-122.ec2.internal     Ready     bootstrap   1h        v1.11.0+d4cacc0
Waiting for router to be created ...
Another process exited

Sometimes those timeouts are just flakes.

/retest

@abhinavdahiya
Copy link
Contributor

testing locally

oc -n openshift-cluster-ingress-router get pods
NAME                   READY     STATUS    RESTARTS   AGE
router-default-pxqbp   1/1       Running   0          1h

i think it is necessary to run router in openshift-ingress. cc @smarterclayton

@smarterclayton
Copy link
Contributor

Yeah, let’s us openshift-ingress as the namespace for the operator , less to type and cluster is redundant

@ironcladlou
Copy link
Contributor

@abhinavdahiya

i think it is necessary to run router in openshift-ingress. cc @smarterclayton

Why?

@smarterclayton

Yeah, let’s us openshift-ingress as the namespace for the operator , less to type and cluster is redundant

Let's not, because we'll have to make a bunch of changes. Also, you seem to be talking about the operator and @abhinavdahiya is talking about the namespace for routers managed by the operator.

I don't want to change any of it, not right now. These namespaces aren't configurable and everything's tested as-is today.

@ironcladlou
Copy link
Contributor

Okay, talked with @smarterclayton and he convinced me it's easier for us to change the operator than to change the other stuff that wants routers in openshift-ingress. Will reference PR shortly.

@ironcladlou
Copy link
Contributor

Operator and router namespaces renamed in openshift/cluster-ingress-operator#52.

@wking
Copy link
Member

wking commented Oct 18, 2018

... can someone explain why we even need this router gate at all?

We want to wait until the cluster is "up" before running the e2e tests. I guess the router is just a good marker for that.

@ironcladlou
Copy link
Contributor

ironcladlou commented Oct 18, 2018

We want to wait until the cluster is "up" before running the e2e tests. I guess the router is just a good marker for that.

Wouldn't the apiserver be a better indicator in the general sense? Shouldn't any tests that rely on routes (like... the router tests) be retrying assertions while the router is coming up anyway?

@crawford
Copy link
Contributor

The API server comes up "too" quickly. The router was arbitrarily chosen because it is one of the last things to come up. It was a good indication that everything else behaved correctly.

@wking
Copy link
Member

wking commented Oct 18, 2018

Shouldn't any tests that rely on routes (like... the router tests) be retrying assertions while the router is coming up anyway?

I think the e2e tests assume they're running on a working cluster, not one that is in the process of launching itself.

@deads2k
Copy link
Contributor

deads2k commented Oct 18, 2018

The API server comes up "too" quickly. The router was arbitrarily chosen because it is one of the last things to come up. It was a good indication that everything else behaved correctly.

Which API server though? The openshift one? oc get --raw /apis/apps.openshift.io/v1 requires dns, networking, and openshift APIs and aggregation before succeeding.

@abhinavdahiya
Copy link
Contributor

/retest

1. Drop openshift-cluster-ingress-operator from list of cvo-overrides
   so that it is re-enabled as the ingress operator has changed to
   work out of the box.
   Associated jira ticket: https://jira.coreos.com/browse/NE-88
2. Remove techtonic-ingress assets and configuration - replaced by
   the github.com/openshift/cluster-ingress-operator code.
@abhinavdahiya
Copy link
Contributor

abhinavdahiya commented Oct 19, 2018

testing locally, the operator pod is crashlooping:

oc -n openshift-ingress-operator logs ingress-operator-b748b79cf-rdbtw
time="2018-10-19T21:29:45Z" level=info msg="Go Version: go1.10.3"
time="2018-10-19T21:29:45Z" level=info msg="Go OS/Arch: linux/amd64"
time="2018-10-19T21:29:45Z" level=info msg="operator-sdk Version: 0.0.6+git"
time="2018-10-19T21:29:46Z" level=info msg="Metrics service ingress-operator created"
time="2018-10-19T21:29:46Z" level=fatal msg="Ensuring default cluster ingress: missing kco-config in configmap"

You should be using the install-config key from that configmap.

@ramr
Copy link
Contributor Author

ramr commented Oct 21, 2018

@abhinavdahiya it was fixed a couple of days back with openshift/cluster-ingress-operator#53 ... i'll rebase this and fix the conflicts on monday.

@ironcladlou
Copy link
Contributor

/retest

@ironcladlou
Copy link
Contributor

I don't think this condition assertion is valid for a DaemonSet, causing the check to hang.

@abhinavdahiya
Copy link
Contributor

I don't think this condition assertion is valid for a DaemonSet, causing the check to hang.

cc @smarterclayton

@smarterclayton
Copy link
Contributor

What’s the equivalent? Does a dameonset have a condition we can wait for?

The reason we gate is because of installation lag

@ironcladlou
Copy link
Contributor

What’s the equivalent? Does a dameonset have a condition we can wait for?
The reason we gate is because of installation lag

Haven't found a way to do it with oc wait; the daemonset exposes no conditions. Guess we could compare status.desiredNumberScheduled to status.numberAvailable from templated output using bash, or something... would that be acceptable? Got a better suggestion?

@ramr
Copy link
Contributor Author

ramr commented Oct 23, 2018

Or if oc get ds/router-default -n openshift-ingress -o go-template='{{ne "0" (print .status.numberReady)}}' returns true, maybe? I couldn't do an int comparison.

@ironcladlou
Copy link
Contributor

/retest

@ironcladlou
Copy link
Contributor

@abhinavdahiya @smarterclayton looks like we finally resolved the e2e issue. I tested this manually in AWS and it looks like our operator comes up okay.

@abhinavdahiya
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 25, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, ramr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 25, 2018
@crawford
Copy link
Contributor

Tests passed, but it looks like teardown might be having trouble. :/

@ironcladlou
Copy link
Contributor

Noooooooooooo

@ironcladlou
Copy link
Contributor

/retest

1 similar comment
@ironcladlou
Copy link
Contributor

/retest

@openshift-merge-robot openshift-merge-robot merged commit aeb2938 into openshift:master Oct 25, 2018
@ramr ramr deleted the ingress-changes branch October 29, 2018 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants