Skip to content

Conversation

@deads2k
Copy link
Contributor

@deads2k deads2k commented Mar 29, 2023

The kubelet consistently resolves the name. This change allows the CVO to use the kubelet's DNS configuration.

The kubelet consistently resolves the name.  This change allows the CVO
to use the kubelet's DNS configuration.
dnsPolicy: ClusterFirstWithHostNet
# this pod is hostNetwork and uses the internal LB DNS name when possible, which the kubelet also uses.
# this dnsPolicy allows us to use the same dnsConfig as the kubelet, without access to read it ourselves.
dnsPolicy: Default
Copy link
Contributor

@danwinship danwinship Mar 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: This doesn't cause it to use the kubelet's DNS config, it causes it to use 127.0.0.1:53 as its DNS resolver (regardless of what kubelet does). This only works if we run a recursive DNS resolver in the host network ns on nodes, which I guess we must...

Just leaving dnsPolicy unset would do the same thing, for a hostNetwork pod.
The naming of the dnsPolicy options is completely broken and led the author of this yaml file astray...

@deads2k
Copy link
Contributor Author

deads2k commented Mar 29, 2023

/hold

will cause

Address: "https://thanos-querier.openshift-monitoring.svc.cluster.local:9091",
to fail

also, that appears to be missing an e2e test.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 29, 2023
@deads2k
Copy link
Contributor Author

deads2k commented Mar 31, 2023

now allowing access to internal promql

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 31, 2023
deads2k added 2 commits March 31, 2023 14:19
Globals cause problems for unit tests (demonstrated in some of the unit
tests) and prevent general reuse.  The anti-pattern is avoided in
kuberentes code and should be OCP as well.  This updates to providing a
configured condition registry without init blocks and provides a
standard clustercondition registry for each consumption.
We do this so that our host-network pod can use the node's resolv.conf to resolve the internal load balancer name
on the pod before DNS pods are available and before the service network is available.  The side effect is that
the CVO cannot resolve service DNS names.
Copy link
Member

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 31, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 31, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 31, 2023
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 7f63e64 and 2 for PR HEAD 5e03367 in total

@deads2k
Copy link
Contributor Author

deads2k commented Apr 1, 2023

/test ci/prow/e2e-agnostic-ovn-upgrade-out-of-change

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 1, 2023

@deads2k: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test e2e-agnostic-operator
  • /test e2e-agnostic-ovn
  • /test e2e-agnostic-ovn-upgrade-into-change
  • /test e2e-agnostic-ovn-upgrade-out-of-change
  • /test gofmt
  • /test images
  • /test lint
  • /test unit

Use /test all to run all jobs.

Details

In response to this:

/test ci/prow/e2e-agnostic-ovn-upgrade-out-of-change

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@deads2k
Copy link
Contributor Author

deads2k commented Apr 1, 2023

/retest

@deads2k
Copy link
Contributor Author

deads2k commented Apr 2, 2023

/hold

upgrade out of change is failing consistently. It's upgrading just fine, but at the start of the upgrade there are dial failures to routes. I don't know how this change causes that, but it apparently does. Even stranger, the installation itself is working (see the agnostic ovn job)

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 2, 2023
@deads2k
Copy link
Contributor Author

deads2k commented Apr 2, 2023

/retest

@deads2k
Copy link
Contributor Author

deads2k commented Apr 2, 2023

actually this is a general azure regression. I've scheduled a slack message for TRT in the morning and left details in the thread.

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 2, 2023
@deads2k
Copy link
Contributor Author

deads2k commented Apr 4, 2023

/retest

Copy link
Member

@petr-muller petr-muller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big fan of getting rid of the global registry 🤮 . I am not sure whether the name resolving commit is an ugly hack or not: it looks to me that way but /shrug.


// NewClient creates a new Cincinnati client with the given client identifier.
func NewClient(id uuid.UUID, transport *http.Transport, userAgent string) Client {
func NewClient(id uuid.UUID, transport *http.Transport, userAgent string, conditionRegistry clusterconditions.ConditionRegistry) Client {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note: this is a really good example why the global sucked. I would never expect an osus client code to depend on the condition registry data, this way it is at least explicit (although it's still fishy and seems wrong to me)

"reflect"
"testing"

"github.com/openshift/cluster-version-operator/pkg/clusterconditions/standard"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we come up with some better names here? Both clusterconditions and clusterconditions/standard have a ConditionRegistry type and NewConditionRegistry constructor, and at callsites they will look like clusterconditions.NewConditionRegistry and standard.NewConditionRegistry and it is not clear how they are different and when to use which.

defer ts.Close()

c := NewClient(clientID, nil, "")
c := NewClient(clientID, nil, "", standard.NewConditionRegistry(nil))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd prefer if the registry had a WithClient variant or a fluent method but I guess it's just a cosmetic thing


for _, config := range matchingRules {
condition, ok := Registry[config.Type]
condition, ok := r.registry[config.Type]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this is a code I dislike (not your fault, another artifact of the global), the actual registry is needed here only to validate types, and the remaining code IMO does not belong to a registry. I believe this code should live elsewhere, and the registry should only provide a IsValid(type) method or maybe a method that returns a validator func to be used here (probably the latter because the registration can allow dynamic behavior, even if it is not dynamically changed right now)

I don't think this needs to be addressed in this PR, I'll probably want to clean this up as a followup


for _, config := range matchingRules {
condition, ok := Registry[config.Type]
condition, ok := r.registry[config.Type]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

})
}

delete(clusterconditions.Registry, "Error")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Comment on lines +64 to +67
svc, err := p.kubeClient.CoreV1().Services("openshift-monitoring").Get(ctx, "thanos-querier", metav1.GetOptions{})
if err != nil {
return "", err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any expectations about how reliable this is? Rock-solid, flakey, no idea? I'm just curious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any expectations about how reliable this is? Rock-solid, flakey, no idea? I'm just curious.

very reliable. It's what drives DNS

@deads2k
Copy link
Contributor Author

deads2k commented Apr 4, 2023

I'm going for it. It's https://issues.redhat.com/browse/OCPBUGS-11308

/override ci/prow/e2e-agnostic-ovn-upgrade-out-of-change

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 4, 2023

@deads2k: Overrode contexts on behalf of deads2k: ci/prow/e2e-agnostic-ovn-upgrade-out-of-change

Details

In response to this:

I'm going for it. It's https://issues.redhat.com/browse/OCPBUGS-11308

/override ci/prow/e2e-agnostic-ovn-upgrade-out-of-change

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit 73ee5e5 into openshift:master Apr 4, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 4, 2023

@deads2k: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@petr-muller
Copy link
Member

/shrug
/meow

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 5, 2023

@petr-muller: cat image

Details

In response to this:

/shrug
/meow

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the ¯\_(ツ)_/¯ ¯\\\_(ツ)_/¯ label Apr 5, 2023
@petr-muller
Copy link
Member

/retitle OCPBUGS-3783: Update dnsPolicy to allow consistent resolution of the internal LB

@openshift-ci openshift-ci bot changed the title Update dnsPolicy to allow consistent resolution of the internal LB OCPBUGS-3783: Update dnsPolicy to allow consistent resolution of the internal LB Apr 5, 2023
@openshift-ci-robot
Copy link
Contributor

@deads2k: Jira Issue OCPBUGS-3783: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-3783 has been moved to the MODIFIED state.

Details

In response to this:

The kubelet consistently resolves the name. This change allows the CVO to use the kubelet's DNS configuration.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@petr-muller
Copy link
Member

/cherry-pick release-4.13

@openshift-cherrypick-robot

@petr-muller: new pull request created: #923

Details

In response to this:

/cherry-pick release-4.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

DavidHurta added a commit to DavidHurta/cluster-version-operator that referenced this pull request Jul 10, 2023
This commit will introduce new flags and logic to the CVO regarding its
PromQL target for risk evaluation of conditional updates with the CVO
being run in a hosted control plane in mind.

For the CVO to successfully access a service that provides metrics
(in the case of the CVO in a standalone OpenShift cluster it's the
thanos-querier service), it needs three things. It needs the service's
address, a CA bundle to verify the certificate provided by the service
to allow secure communication using TLS between the actors [1], and the
authorization credentials of the CVO. Currently, the CVO hardcodes the
address, the path to the CA bundle, and the path to the credentials file.

This is not ideal, as CVO is starting to be used in other repositories
such as HyperShift [2].

A path to the service CA bundle file is added to allow explicitly
setting the CA bundle file of the given query service.

Currently, the CVO is using a kube-api server to resolve the IP address
of a specific service in the cluster [4]. New flags were added to
specify the service name and its namespace as the PromQL target may
reside in different namespaces or have a different name.

In HyperShift multiple kube-api servers are being used. A management
cluster creates a hosted control plane where a hosted CVO resides. This
hosted CVO is then given a kubeconfig file for a cluster where the
hosted CVO reconciles manifests. However, because the CVO cannot resolve
internal DNS names and, in this case, is given a kubeconfig for a remote
cluster, the CVO is not capable to communicate with internal services on
the management cluster. Thus a new flag is added to specify the path to
the management cluster kubeconfig file. In case the management cluster
kubeconfig is not specified, the CVO defaults to using the
`serviceaccount` directory as is the behavior in the standalone OCP
cluster.

A flag to specify the path to the credentials file was not added because
the CVO hardcodes the same path to the credentials file as is the
path to the credentials file used in other service accounts in
Kubernetes [3].

    [1] https://docs.openshift.com/container-platform/4.12/security/certificate_types_descriptions/service-ca-certificates.html
    [2] https://github.com/openshift/hypershift
    [3] https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#serviceaccount-admission-controller
    [4] openshift#920

This commit also adds a new flag to specify whether the CVO resides in
a hosted control plane. In this case, the CVO will inject its cluster
Id into PromQL queries to differentiate between multiple time series
belonging to different hosted clusters [5].

[5] openshift/cincinnati-graph-data#3591
DavidHurta added a commit to DavidHurta/cluster-version-operator that referenced this pull request Jul 11, 2023
This commit will introduce new flags and logic to the CVO regarding its
PromQL target for risk evaluation of conditional updates with the CVO
being run in a hosted control plane in mind.

For the CVO to successfully access a service that provides metrics
(in the case of the CVO in a standalone OpenShift cluster it's the
thanos-querier service), it needs three things. It needs the service's
address, a CA bundle to verify the certificate provided by the service
to allow secure communication using TLS between the actors [1], and the
authorization credentials of the CVO. Currently, the CVO hardcodes the
address, the path to the CA bundle, and the path to the credentials file.

This is not ideal, as CVO is starting to be used in other repositories
such as HyperShift [2].

A path to the service CA bundle file is added to allow explicitly
setting the CA bundle file of the given query service.

Currently, the CVO is using a kube-api server to resolve the IP address
of a specific service in the cluster [4]. New flags were added to
specify the service name and its namespace as the PromQL target may
reside in different namespaces or have a different name.

In HyperShift multiple kube-api servers are being used. A management
cluster creates a hosted control plane where a hosted CVO resides. This
hosted CVO is then given a kubeconfig file for a cluster where the
hosted CVO reconciles manifests. However, because the CVO cannot resolve
internal DNS names and, in this case, is given a kubeconfig for a remote
cluster, the CVO is not capable to communicate with internal services on
the management cluster. Thus a new flag is added to specify the path to
the management cluster kubeconfig file. In case the management cluster
kubeconfig is not specified, the CVO defaults to using the
`serviceaccount` directory as is the behavior in the standalone OCP
cluster.

A flag to specify the path to the credentials file was not added because
the CVO hardcodes the same path to the credentials file as is the
path to the credentials file used in other service accounts in
Kubernetes [3].

    [1] https://docs.openshift.com/container-platform/4.12/security/certificate_types_descriptions/service-ca-certificates.html
    [2] https://github.com/openshift/hypershift
    [3] https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#serviceaccount-admission-controller
    [4] openshift#920

This commit also adds a new flag to specify whether the CVO resides in
a hosted control plane. In this case, the CVO will inject its cluster
Id into PromQL queries to differentiate between multiple time series
belonging to different hosted clusters [5].

[5] openshift/cincinnati-graph-data#3591
DavidHurta added a commit to DavidHurta/hypershift that referenced this pull request Jul 11, 2023
This commit will enable the Cluster Version Operator (CVO) to evaluate
conditional updates.

For the CVO to evaluate conditional updates, it accesses a PromQL query
service holding the appropriate information. In the standalone
OpenShift, it is the thanos-querier service in the openshift-monitoring
namespace.

We are granting the CVO the `get` permission on namespaces resources
because to access Thanos Querier service APIs, the requesting account
must have `get` permission on the namespaces resource [1].

The CVO lost the ability to resolve service DNS names [2]. The CVO is
now using the kube-api server to get a specific service, and the CVO
then uses the IP address specified in the resource. This commit will
give the CVO permission to `get` the service resources.
This will enable the CVO to contact the appropriate query service and to
contact any potential services in the future.

This commit also adds more volumes to the CVO, such as the service CA
bundle and the /var/run/secrets/kubernetes.io/serviceaccount directory
for the CVO to be able to communicate with the query service.
The volumes were added similarly as to how are they set in the OpenShift
manifest file [3].

The hosted control planes scraped data can be either queried by
accessing the thanos-querier service or by accessing the
hypershift-monitoring-stack-prometheus service in the case the
ObO (Observability Operator) is responsible for scraping metrics [4].
The CPO (Control Plane Operator) will pass this information to the CVO,
in the case RHOBS monitoring is enabled.

[1] https://docs.openshift.com/container-platform/4.13/monitoring/accessing-third-party-monitoring-apis.html
[2] openshift/cluster-version-operator#920
[3] https://github.com/openshift/cluster-version-operator/blob/216683cfc09eebefd87bed66849d52b6a357c5ab/install/0000_00_cluster-version-operator_03_deployment.yaml
[4] https://gitlab.cee.redhat.com/service/rhobs-rules-and-dashboards/-/blob/main/docs/rules/hypershift-monitoring-architecture-introduction.md#the-big-picture
DavidHurta added a commit to DavidHurta/cluster-version-operator that referenced this pull request Jul 12, 2023
This commit will introduce new flags and logic to the CVO regarding its
PromQL target for risk evaluation of conditional updates with the CVO
being run in a hosted control plane in mind.

For the CVO to successfully access a service that provides metrics
(in the case of the CVO in a standalone OpenShift cluster it's the
thanos-querier service), it needs three things. It needs the service's
address, a CA bundle to verify the certificate provided by the service
to allow secure communication using TLS between the actors [1], and the
authorization credentials of the CVO. Currently, the CVO hardcodes the
address, the path to the CA bundle, and the path to the credentials file.

This is not ideal, as CVO is starting to be used in other repositories
such as HyperShift [2].

A path to the service CA bundle file is added to allow explicitly
setting the CA bundle file of the given query service.

Currently, the CVO is using a kube-api server to resolve the IP address
of a specific service in the cluster [4]. New flags were added to
specify the service name and its namespace as the PromQL target may
reside in different namespaces or have a different name. The port of the
service can also be different as in the case of the managed HyperShift
monitoring. This commit will make sure the port number is not hard-coded
but is rather dynamically found by the port's name.

In HyperShift multiple kube-api servers are being used. A management
cluster creates a hosted control plane where a hosted CVO resides. This
hosted CVO is then given a kubeconfig file for a cluster where the
hosted CVO reconciles manifests. However, because the CVO cannot resolve
internal DNS names and, in this case, is given a kubeconfig for a remote
cluster, the CVO is not capable to communicate with internal services on
the management cluster. Thus a new flag is added to specify the path to
the management cluster kubeconfig file. In case the management cluster
kubeconfig is not specified, the CVO defaults to using the
`serviceaccount` directory as is the behavior in the standalone OCP
cluster.

A flag to specify the path to the credentials file was not added because
the CVO hardcodes the same path to the credentials file as is the
path to the credentials file used in other service accounts in
Kubernetes [3].

    [1] https://docs.openshift.com/container-platform/4.12/security/certificate_types_descriptions/service-ca-certificates.html
    [2] https://github.com/openshift/hypershift
    [3] https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#serviceaccount-admission-controller
    [4] openshift#920

This commit also adds a new flag to specify whether the CVO resides in
a hosted control plane. In this case, the CVO will inject its cluster
Id into PromQL queries to differentiate between multiple time series
belonging to different hosted clusters [5].

[5] openshift/cincinnati-graph-data#3591
DavidHurta added a commit to DavidHurta/hypershift that referenced this pull request Jul 19, 2023
This commit will enable the Cluster Version Operator (CVO) to evaluate
conditional updates.

For the CVO to evaluate conditional updates, it accesses a PromQL query
service holding the appropriate information. In the standalone
OpenShift, it is the thanos-querier service in the openshift-monitoring
namespace.

We are granting the CVO the `get` permission on namespaces resources
because to access Thanos Querier service APIs, the requesting account
must have `get` permission on the namespaces resource [1].

The CVO lost the ability to resolve service DNS names [2]. The CVO is
now using the kube-api server to get a specific service, and the CVO
then uses the IP address specified in the resource. This commit will
give the CVO permission to `get` the service resources.
This will enable the CVO to contact the appropriate query service and to
contact any potential services in the future.

This commit also adds more volumes to the CVO, such as the service CA
bundle and the /var/run/secrets/kubernetes.io/serviceaccount directory
for the CVO to be able to communicate with the query service.
The volumes were added similarly as to how are they set in the OpenShift
manifest file [3].

The hosted control planes scraped data can be either queried by
accessing the thanos-querier service or by accessing the
hypershift-monitoring-stack-prometheus service in the case the
ObO (Observability Operator) is responsible for scraping metrics [4].
The CPO (Control Plane Operator) will pass this information to the CVO,
in the case RHOBS monitoring is enabled.

[1] https://docs.openshift.com/container-platform/4.13/monitoring/accessing-third-party-monitoring-apis.html
[2] openshift/cluster-version-operator#920
[3] https://github.com/openshift/cluster-version-operator/blob/216683cfc09eebefd87bed66849d52b6a357c5ab/install/0000_00_cluster-version-operator_03_deployment.yaml
[4] https://gitlab.cee.redhat.com/service/rhobs-rules-and-dashboards/-/blob/main/docs/rules/hypershift-monitoring-architecture-introduction.md#the-big-picture
DavidHurta added a commit to DavidHurta/cluster-version-operator that referenced this pull request Aug 1, 2023
This commit will introduce new flags and logic to the CVO regarding its
PromQL target for risk evaluation of conditional updates with the CVO
being run in a hosted control plane in mind.

For the CVO to successfully access a service that provides metrics
(in the case of the CVO in a standalone OpenShift cluster it's the
thanos-querier service), it needs three things. It needs the service's
address, a CA bundle to verify the certificate provided by the service
to allow secure communication using TLS between the actors [1], and the
authorization credentials of the CVO. Currently, the CVO hardcodes the
address, the path to the CA bundle, and the path to the credentials file.

This is not ideal, as CVO is starting to be used in other repositories
such as HyperShift [2].

A path to the service CA bundle file is added to allow explicitly
setting the CA bundle file of the given query service.

Currently, the CVO is using a kube-api server to resolve the IP address
of a specific service in the cluster [4]. New flags were added to
specify the service name and its namespace as the PromQL target may
reside in different namespaces or have a different name. The port of the
service can also be different as in the case of the managed HyperShift
monitoring. This commit will make sure the port number is not hard-coded
but is rather dynamically found by the port's name.

In HyperShift multiple kube-api servers are being used. A management
cluster creates a hosted control plane where a hosted CVO resides. This
hosted CVO is then given a kubeconfig file for a cluster where the
hosted CVO reconciles manifests. However, because the CVO cannot resolve
internal DNS names and, in this case, is given a kubeconfig for a remote
cluster, the CVO is not capable to communicate with internal services on
the management cluster. Thus a new flag is added to specify the path to
the management cluster kubeconfig file. In case the management cluster
kubeconfig is not specified, the CVO defaults to using the
`serviceaccount` directory as is the behavior in the standalone OCP
cluster.

A flag to specify the path to the credentials file was not added because
the CVO hardcodes the same path to the credentials file as is the
path to the credentials file used in other service accounts in
Kubernetes [3].

    [1] https://docs.openshift.com/container-platform/4.12/security/certificate_types_descriptions/service-ca-certificates.html
    [2] https://github.com/openshift/hypershift
    [3] https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#serviceaccount-admission-controller
    [4] openshift#920

This commit also adds a new flag to specify whether the CVO resides in
a hosted control plane. In this case, the CVO will inject its cluster
Id into PromQL queries to differentiate between multiple time series
belonging to different hosted clusters [5].

[5] openshift/cincinnati-graph-data#3591
DavidHurta added a commit to DavidHurta/cluster-version-operator that referenced this pull request Aug 1, 2023
This commit will introduce new flags and logic to the CVO regarding its
PromQL target for risk evaluation of conditional updates with the CVO
being run in a hosted control plane in mind.

For the CVO to successfully access a service that provides metrics
(in the case of the CVO in a standalone OpenShift cluster it's the
thanos-querier service), it needs three things. It needs the service's
address, a CA bundle to verify the certificate provided by the service
to allow secure communication using TLS between the actors [1], and the
authorization credentials of the CVO. Currently, the CVO hardcodes the
address, the path to the CA bundle, and the path to the credentials file.

This is not ideal, as CVO is starting to be used in other repositories
such as HyperShift [2].

A path to the service CA bundle file is added to allow explicitly
setting the CA bundle file of the given query service.

Currently, the CVO is using a kube-api server to resolve the IP address
of a specific service in the cluster [4]. New flags were added to
specify the service name and its namespace as the PromQL target may
reside in different namespaces or have a different name. The port of the
service can also be different as in the case of the managed HyperShift
monitoring. This commit will make sure the port number is not hard-coded
but is rather dynamically found by the port's name.

In HyperShift multiple kube-api servers are being used. A management
cluster creates a hosted control plane where a hosted CVO resides. This
hosted CVO is then given a kubeconfig file for a cluster where the
hosted CVO reconciles manifests. However, because the CVO cannot resolve
internal DNS names and, in this case, is given a kubeconfig for a remote
cluster, the CVO is not capable to communicate with internal services on
the management cluster. Thus a new flag is added to specify the path to
the management cluster kubeconfig file. In case the management cluster
kubeconfig is not specified, the CVO defaults to using the
`serviceaccount` directory as is the behavior in the standalone OCP
cluster.

A flag to specify the path to the credentials file was not added because
the CVO hardcodes the same path to the credentials file as is the
path to the credentials file used in other service accounts in
Kubernetes [3].

    [1] https://docs.openshift.com/container-platform/4.12/security/certificate_types_descriptions/service-ca-certificates.html
    [2] https://github.com/openshift/hypershift
    [3] https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#serviceaccount-admission-controller
    [4] openshift#920

This commit also adds a new flag to specify whether the CVO resides in
a hosted control plane. In this case, the CVO will inject its cluster
Id into PromQL queries to differentiate between multiple time series
belonging to different hosted clusters [5].

[5] openshift/cincinnati-graph-data#3591
DavidHurta added a commit to DavidHurta/cluster-version-operator that referenced this pull request Aug 2, 2023
This commit will introduce new flags and logic to the CVO regarding its
PromQL target for risk evaluation of conditional updates with the CVO
being run in a hosted control plane in mind.

For the CVO to successfully access a service that provides metrics
(in the case of the CVO in a standalone OpenShift cluster it's the
thanos-querier service), it needs three things. It needs the service's
address, a CA bundle to verify the certificate provided by the service
to allow secure communication using TLS between the actors [1], and the
authorization credentials of the CVO. Currently, the CVO hardcodes the
address, the path to the CA bundle, and the path to the credentials file.

This is not ideal, as CVO is starting to be used in other repositories
such as HyperShift [2].

A path to the service CA bundle file is added to allow explicitly
setting the CA bundle file of the given query service.

Currently, the CVO is using a kube-api server to resolve the IP address
of a specific service in the cluster [4]. New flags were added to
specify the service name and its namespace as the PromQL target may
reside in different namespaces or have a different name. The port of the
service can also be different as in the case of the managed HyperShift
monitoring. This commit will make sure the port number is not hard-coded
but is rather dynamically found by the port's name.

In HyperShift multiple kube-api servers are being used. A management
cluster creates a hosted control plane where a hosted CVO resides. This
hosted CVO is then given a kubeconfig file for a cluster where the
hosted CVO reconciles manifests. However, because the CVO cannot resolve
internal DNS names and, in this case, is given a kubeconfig for a remote
cluster, the CVO is not capable to communicate with internal services on
the management cluster. Thus a new flag is added to specify the path to
the management cluster kubeconfig file. In case the management cluster
kubeconfig is not specified, the CVO defaults to using the
`serviceaccount` directory as is the behavior in the standalone OCP
cluster.

A flag to specify the path to the credentials file was not added because
the CVO hardcodes the same path to the credentials file as is the
path to the credentials file used in other service accounts in
Kubernetes [3].

    [1] https://docs.openshift.com/container-platform/4.12/security/certificate_types_descriptions/service-ca-certificates.html
    [2] https://github.com/openshift/hypershift
    [3] https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#serviceaccount-admission-controller
    [4] openshift#920

This commit also adds a new flag to specify whether the CVO resides in
a hosted control plane. In this case, the CVO will inject its cluster
Id into PromQL queries to differentiate between multiple time series
belonging to different hosted clusters [5].

[5] openshift/cincinnati-graph-data#3591
DavidHurta added a commit to DavidHurta/cluster-version-operator that referenced this pull request Aug 2, 2023
This commit will introduce new flags and logic to the CVO regarding its
PromQL target for risk evaluation of conditional updates with the CVO
being run in a hosted control plane in mind.

For the CVO to successfully access a service that provides metrics
(in the case of the CVO in a standalone OpenShift cluster it's the
thanos-querier service), it needs three things. It needs the service's
address, a CA bundle to verify the certificate provided by the service
to allow secure communication using TLS between the actors [1], and the
authorization credentials of the CVO. Currently, the CVO hardcodes the
address, the path to the CA bundle, and the path to the credentials file.

This is not ideal, as CVO is starting to be used in other repositories
such as HyperShift [2].

A path to the service CA bundle file is added to allow explicitly
setting the CA bundle file of the given query service.

Currently, the CVO is using a kube-api server to resolve the IP address
of a specific service in the cluster [4]. New flags were added to
specify the service name and its namespace as the PromQL target may
reside in different namespaces or have a different name. The port of the
service can also be different as in the case of the managed HyperShift
monitoring. This commit will make sure the port number is not hard-coded
but is rather dynamically found by the port's name.

In HyperShift multiple kube-api servers are being used. A management
cluster creates a hosted control plane where a hosted CVO resides. This
hosted CVO is then given a kubeconfig file for a cluster where the
hosted CVO reconciles manifests. However, because the CVO cannot resolve
internal DNS names and, in this case, is given a kubeconfig for a remote
cluster, the CVO is not capable to communicate with internal services on
the management cluster. Thus a new flag is added to specify the path to
the management cluster kubeconfig file. In case the management cluster
kubeconfig is not specified, the CVO defaults to using the
`serviceaccount` directory as is the behavior in the standalone OCP
cluster.

A flag to specify the path to the credentials file was not added because
the CVO hardcodes the same path to the credentials file as is the
path to the credentials file used in other service accounts in
Kubernetes [3].

    [1] https://docs.openshift.com/container-platform/4.12/security/certificate_types_descriptions/service-ca-certificates.html
    [2] https://github.com/openshift/hypershift
    [3] https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#serviceaccount-admission-controller
    [4] openshift#920

This commit also adds a new flag to specify whether the CVO resides in
a hosted control plane. In this case, the CVO will inject its cluster
Id into PromQL queries to differentiate between multiple time series
belonging to different hosted clusters [5].

[5] openshift/cincinnati-graph-data#3591
DavidHurta added a commit to DavidHurta/cluster-version-operator that referenced this pull request Aug 2, 2023
This commit will introduce new flags and logic to the CVO regarding its
PromQL target for risk evaluation of conditional updates with the CVO
being run in a hosted control plane in mind.

For the CVO to successfully access a service that provides metrics
(in the case of the CVO in a standalone OpenShift cluster it's the
thanos-querier service), it needs three things. It needs the service's
address, a CA bundle to verify the certificate provided by the service
to allow secure communication using TLS between the actors [1], and the
authorization credentials of the CVO. Currently, the CVO hardcodes the
address, the path to the CA bundle, and the path to the credentials file.

This is not ideal, as CVO is starting to be used in other repositories
such as HyperShift [2].

A path to the service CA bundle file is added to allow explicitly
setting the CA bundle file of the given query service.

Currently, the CVO is using a kube-api server to resolve the IP address
of a specific service in the cluster [4]. New flags were added to
specify the service name and its namespace as the PromQL target may
reside in different namespaces or have a different name. The port of the
service can also be different as in the case of the managed HyperShift
monitoring. This commit will make sure the port number is not hard-coded
but is rather dynamically found by the port's name.

In HyperShift multiple kube-api servers are being used. A management
cluster creates a hosted control plane where a hosted CVO resides. This
hosted CVO is then given a kubeconfig file for a cluster where the
hosted CVO reconciles manifests. However, because the CVO cannot resolve
internal DNS names and, in this case, is given a kubeconfig for a remote
cluster, the CVO is not capable to communicate with internal services on
the management cluster. Thus a new flag is added to specify the path to
the management cluster kubeconfig file. In case the management cluster
kubeconfig is not specified, the CVO defaults to using the
`serviceaccount` directory as is the behavior in the standalone OCP
cluster.

A flag to specify the path to the credentials file was not added because
the CVO hardcodes the same path to the credentials file as is the
path to the credentials file used in other service accounts in
Kubernetes [3].

    [1] https://docs.openshift.com/container-platform/4.12/security/certificate_types_descriptions/service-ca-certificates.html
    [2] https://github.com/openshift/hypershift
    [3] https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#serviceaccount-admission-controller
    [4] openshift#920

This commit also adds a new flag to specify whether the CVO resides in
a hosted control plane. In this case, the CVO will inject its cluster
Id into PromQL queries to differentiate between multiple time series
belonging to different hosted clusters [5].

[5] openshift/cincinnati-graph-data#3591
DavidHurta added a commit to DavidHurta/cluster-version-operator that referenced this pull request Aug 7, 2023
This commit will introduce new flags and logic to the CVO regarding its
PromQL target for risk evaluation of conditional updates with the CVO
being run in a hosted control plane in mind.

For the CVO to successfully access a service that provides metrics
(in the case of the CVO in a standalone OpenShift cluster it's the
thanos-querier service), it needs three things. It needs the service's
address, a CA bundle to verify the certificate provided by the service
to allow secure communication using TLS between the actors [1], and
the authorization credentials of the CVO. Currently, the CVO hardcodes
the address, the path to the CA bundle, and the path to the credentials
file.

This is not ideal, as CVO is starting to be used in other repositories
such as HyperShift [2].

A path to the service CA bundle file is added to allow explicitly
setting the CA bundle file of the given query service.

Currently, the CVO is using a kube-api server to resolve the IP address
of a specific service in the cluster [3]. Add new flags that allow
configuring the CVO to resolve the IP address via DNS when DNS is
available to the CVO. This is the case for hosted CVOs in HyperShift.
The alternative in HyperShift would be to use the management cluster
kube-api server and give the hosted CVOs additional permissions.

Add flags to specify the PromQL target URL. This URL contains the used
scheme, port, and the server's name used for TLS configuration. In the
case, DNS is enabled, the URL is also used to query the Prometheus
server. In the case, the DNS is disabled, the server can be specified
via the Kubernetes service in the cluster that exposes the server.

A flag to specify the path to the credentials file was added for more
customizability. This flag also enables the CVO to not set the token
when it's not needed. A CVO can communicate with a Prometheus server
over HTTP. In the case, the token is not needed, it would be undesirable
for the CVO to send its token without reason over HTTP.

This commit also adds a new flag to specify whether the CVO resides in
a hosted control plane. In this case, the CVO will inject its cluster
Id into PromQL queries to differentiate between multiple time series
belonging to different hosted clusters [4].

	[1] https://docs.openshift.com/container-platform/4.12/security/certificate_types_descriptions/service-ca-certificates.html
	[2] https://github.com/openshift/hypershift
	[3] openshift#920
	[4] openshift/cincinnati-graph-data#3591
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. ¯\_(ツ)_/¯ ¯\\\_(ツ)_/¯

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants