Skip to content

Conversation

@sgreene570
Copy link
Contributor

@sgreene570 sgreene570 commented Oct 15, 2020

This PR adds an initial canary controller in pkg/operator/controller/canary/ that manages Ingress canary resources in the openshift-ingress-canary namespace. The canary controller reconciles a daemonset, service, and route to expose the hello-openshift container. Phase 2 of NE-199 will implement the network checks and metrics/status reporting in the canary controller.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 15, 2020
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 15, 2020
@sgreene570 sgreene570 force-pushed the NE-199 branch 4 times, most recently from 096df7b to f4f40a8 Compare October 20, 2020 17:39
@sgreene570 sgreene570 changed the title [WIP] NE-199 Phase 1: Add initial Canary Controller and Canary Resources NE-199 Phase 1: Add initial Canary Controller and Canary Resources Oct 20, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 20, 2020
@sgreene570
Copy link
Contributor Author

 unable to parse image registry.build01.ci.openshift.org/ci-op-qzf3i2tn/stable@sha256:ac7e21e580c9b7b53a7c93c407967898e2e7420f4b41c2a350e896d860804983: cannot retrieve image configuration for manifest sha256:ac7e21e580c9b7b53a7c93c407967898e2e7420f4b41c2a350e896d860804983: received unexpected HTTP status: 500 Internal Server Error

Not sure if there's registry issues again, or if those test failures were one offs
/retest

@sgreene570
Copy link
Contributor Author

/retest

@sgreene570
Copy link
Contributor Author

/retest

1 similar comment
@sgreene570
Copy link
Contributor Author

/retest

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 24, 2020
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 26, 2020
@sgreene570
Copy link
Contributor Author

rebased.

@sgreene570
Copy link
Contributor Author

/retest

1 similar comment
@sgreene570
Copy link
Contributor Author

/retest

- containerPort: 8080
protocol: TCP
- containerPort: 8888
protocol: TCP
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might add deployment liveliness / readiness probes in a future PR.

@sgreene570
Copy link
Contributor Author

/test e2e-aws

@sgreene570
Copy link
Contributor Author

level=error msg=Error: Error launching source instance: InvalidParameterValue: Value (ci-op-kc8b4rtg-2945d-cxxvb-bootstrap-profile) for parameter iamInstanceProfile.name is invalid. Invalid IAM Instance Profile name

/retest

verbs:
- list
- watch
- '*'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could define a new role and rolebinding in the openshift-ingress-canary namespace to avoid granting such broad permissions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but how would the operator be able to take advantage of a new role + role binding in the openshift-ingress-canary namespace? The canary controller (which resides within the ingress operator) needs to be able to perform all (well, most at least, so maybe the "*" isn't necessary) actions on routes.

Copy link
Contributor

@Miciah Miciah Oct 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The role would be in the openshift-ingress-canary namespace and allow full control over routes, and the role binding would bind that role to the operator's service account. Because it would be a role (not a clusterrole) in the openshift-ingress-canary namespace, the role binding should only grant access to routes within that namespace, if I understand correctly. Something like the following should work (but I have not verified that it does):

---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: ingress-operator-canary-controller
  namespace: openshift-ingress-canary
  annotations:
    include.release.openshift.io/self-managed-high-availability: "true"
rules:
- apiGroups:
  - route.openshift.io
  resources:
  - routes
  verbs:
  - *
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: ingress-operator-canary-controller
  namespace: openshift-ingress-canary
  annotations:
    include.release.openshift.io/self-managed-high-availability: "true"
subjects:
- kind: ServiceAccount
  name: ingress-operator
  namespace: openshift-ingress-operator
roleRef:
  kind: Role
  apiGroup: rbac.authorization.k8s.io
  name: ingress-operator-canary-controller

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ill include these resources in the manifests in my next push and verify that they work. Thanks!

Copy link
Contributor Author

@sgreene570 sgreene570 Oct 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Miciah the CVO does not (and in my opinion, probably should not) create the openshift-ingress-canary namespace, so the CVO cannot create a Role in the openshift-ingress-canary namespace. Additionally, the operator would need cluster-wide route RBAC permissions to be able to create a role in the openshift-ingress-canary namespace for routes.

So I see 2 solutions:

  1. The CVO creates the openshift-ingress-canary namespace, and then creates the ingress canary controller role.
    (This seems like an awkward thing to do especially if the canary controller isnt a mandatory component of the operator)
  2. We fall back to just giving the operator '*' permissions on routes in it's cluster-role.

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, ive reverted this PR to use * permissions for routes in the operator's cluster role

Copy link
Contributor Author

@sgreene570 sgreene570 Nov 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~ @Miciah https://kubernetes.io/docs/reference/access-authn-authz/rbac/#privilege-escalation-prevention-and-bootstrapping is the solution we are looking for! I amended this PR so that the ingress operator is granted bind and escalate permissions on roles and rolebindings in it's cluster role. I also added the role.yaml and role-binding.yaml assets for the ingress canary. With these new changes, the ingress operator's route permissions are extended to * for the openshift-ingress-canary namespace only. PTAL when you can. ~

Copy link
Contributor Author

@sgreene570 sgreene570 Nov 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going with route '*' permissions after a discussion with miciah. While these permissions are broad, this is the "least bad" approach. The above mentioned escalate/bind approach introduces far more concerning privilege escalation issues to the operator (despite work around efforts to mitigate this).

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@sgreene570
Copy link
Contributor Author

/retest

2 similar comments
@sgreene570
Copy link
Contributor Author

/retest

@sgreene570
Copy link
Contributor Author

/retest

@sgreene570
Copy link
Contributor Author

Not sure where the retest bot went.

/retest

@sgreene570 sgreene570 closed this Nov 16, 2020
@sgreene570 sgreene570 reopened this Nov 16, 2020
@sgreene570
Copy link
Contributor Author

/retest

@sgreene570
Copy link
Contributor Author

Whoops, accidentally clicked the "close" button instead of "comment" 😢

@sgreene570
Copy link
Contributor Author

/retest

@sgreene570
Copy link
Contributor Author

/test e2e-aws

@sgreene570
Copy link
Contributor Author

/retest

1 similar comment
@sgreene570
Copy link
Contributor Author

/retest

@sgreene570
Copy link
Contributor Author

/refresh

@sgreene570
Copy link
Contributor Author

/test e2e-aws
/retest

@sgreene570
Copy link
Contributor Author

Not sure why the retest bot is absent still
/refresh
/skip

@openshift-ci-robot
Copy link
Contributor

@sgreene570: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-operator 2a52a78 link /test e2e-aws-operator
ci/prow/e2e-aws 2a52a78 link /test e2e-aws

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@sgreene570
Copy link
Contributor Author

/retest

3 similar comments
@sgreene570
Copy link
Contributor Author

/retest

@sgreene570
Copy link
Contributor Author

/retest

@sgreene570
Copy link
Contributor Author

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 058274c into openshift:master Nov 18, 2020
candita pushed a commit to candita/cluster-ingress-operator that referenced this pull request Nov 18, 2020
NE-199 Phase 1: Add initial Canary Controller and Canary Resources
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants