Skip to content

Conversation

@abhinavdahiya
Copy link
Contributor

@abhinavdahiya abhinavdahiya commented Feb 20, 2019

AWS:

  • All the resources are named <cluster_id>-<something> and all names use - for separator.
    This helps with consistently naming all the resources, and keeping the cluster_id in the reduces the probability of collisions.

  • All tags include Name key with name mathcing the name for the resource.
    This allows cluster to find resources using tags based on names.
    Also move some of the Name tags from aws root module into specific modules because setting Name at root level makes it difficult for overriding the Name locally.

  • All resources are tagged with kubernetes.io/cluster/<cluster_id>: owned

The loadbalancer target group names also have a max length of 32, therefore the target groups are using 4 character long suffixes.

Libvirt:

  • All resources are prefixed with cluster_id

Openstack:

  • all resources are prefixed with cluster_id
  • openstack continues to use openshiftClusterID: cluster_id to tag all its resources.

The machine-api requires that all machine objects reference back to cluster object as specified by sigs.k8s.io/cluster-api-cluster label
on the machine objects to make sure masters are correctly adopted. Therfore the name for cluster object is the id used to tag the resources.

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 20, 2019
@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 20, 2019
Copy link
Member

@wking wking Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we drop this? A non-uniquified name seems like something we'd never use. Edit: except for creating DNS entries, so nevermind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're cleaning up, I think we can drop -role, which is already implied by the resource type. Basically, every bootstrap resource can use ${var.cluster_id}-bootstrap unless we create several of a single resource type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine if we keep the names as is. it doesn't hurt here as iam roles do not seem to have any limitation on length of name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 8a37f72 and f56c341 about not putting kubernetes.io/cluster/... on bootstrap resources except the instance. Do you feel like the accidental-adoption concern is ungrounded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everybidy needs to have kubernetes.io/cluster/... tag so that destroy captures bootstrap resources.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we still planning to use the kubernetes.io/cluster/... tag for destroy? Since everything is tagged with the Name tag, can't we search on that instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we still planning to use the kubernetes.io/cluster/... tag for destroy? Since everything is tagged with the Name tag, can't we search on that instead?

Name is resource name and has nothing to do with the cluster. AWS will use kubernetes.io/cluster/... tag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of keeping a tag that Kubernetes ignores (like openshiftClusterID) for tearing down bootstrap resources that are not part of the Kubernetes cluster. But if folks think we don't have to worry about the cluster accidentally adopting bootstrap resources, I'm ok collapsing to just the kubernetes.io/cluster/... tag.

Copy link
Contributor

@staebler staebler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes make me happy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add human-readable to the description to increase the differentiation between the UUID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we differentiate between UUID and HumanID in terraform?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within terraform we do not need to differentiate. However, this is at the boundary between terraform and go. The go code is ambiguous with the term cluster ID. As a reader of the code, it is not immediately clear to me here whether the cluster ID that terraform is expecting is the UUID or the human ID. We could either (1) expect the reader of the code to dig further into the code to figure out which one is expected here or (2) add more information to the description of the variable to make it clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the boundary says give me an identifier for the cluster, it doesn't need to care if it was given an UUID or a ID that is human readable... ?

Let me update it to include constraints like length, characters, because that's what it cares any nothing more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be worthwhile to replace this with ClusterDomain. That way the logic for how the cluster domain is constructed is contained in only one place rather than in both go and terraform.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done with 3711b75

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why set Name on resources that have a name property, if we're setting the same value in both? I think it only makes sense when we feel like name must be overly truncated and want the tag for the unabbreviated version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from d6de59a

- All tags include `Name` key with name mathcing the name for the resource.
This allows cluster to find resources using tags based on names.

Copy link
Member

@wking wking Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to trim sensitive characters too, see #1089.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is also imposing AWS-specific length and (eventually) charset restrictions on a nominally-platform-agnostic property. Will this end up as a race to the bottom? Maybe it should be a platform-specific asset? Or do we expect this to be something almost all platforms will need?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this same thing with the PR that I put out earlier. My decision was that for the time being there is no reason to make the HumanID characteristics unique to the platform as we do not have any platforms currently that place more onerous restrictions than we face with ELBs from AWS. It should be easy enough to change to platform-specific HumanIDs in the future if and when they are beneficial.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should be able to send cluste_ids to different platforms if that becomes a problem.
IMO this HumanID has enough constraints to work through most.

I think we want to trim sensitive characters too, see #1089.

Will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to trim sensitive characters too, see #1089.

Will do.

added here 55b2410

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation behind bringing this here from the route53 module? I would have expected that we would want to go the other way with these route53 resources and bring aws_route53_record.etcd_a_nodes, aws_route53_record.etcd_cluster, and aws_route53_zone.int to the route53 module instead of bringing route53 resources out of the route53 module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with how the current r53 module is setup it takes private zone id and base_domain, and I think pulling the public zone out makes it so that the module takes ids for both for better consistency.

from d6de59a

Also moves the public zone discover to root module and pass the `public_route53_id`, similar to private zone, to `route53` zone.

and for moving the etcd records into the module, that makes sense. let me see and if the change is simple i'll do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, my contention is that aws_route32_zone.int would be better placed in the route53 module. If that were moved, then there would be consistency in that neither public zone id nor private zone id would be passed into the route53 module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you then turn off either of them (not that you can do them right now..)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. How would you turn off either of them if they were in main instead of the route53 module? How would that be different if they were in the route53 module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example if we move public and private zone into module, then the module needs 2 variables... base_domain cluster_domain. And if I want to switch off the public zone records, I will have to pass empty base_domain IMO i don't like this indicator.

on the other hand if we pass zone ids, in the root module i pass empty zone id for public zone and that IMO is better indicator.

thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to make the public zone records optional, then there would need to be a variable coming into terraform to cover that configuration. That same variable would be passed down to the route53 module. I would not suggest overloading the semantics of base_domain to use it to indicate whether the public zone records are off or on. In my opinion, adding an additional variable to the route53 module that explicitly configures whether the public zone records are enabled is a worthwhile trade-off for keeping route53-specific resources contained within the route53 module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done with d4ce44e

@openshift-ci-robot openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 20, 2019
@abhinavdahiya
Copy link
Contributor Author

These changes make me happy.

* Should the following use cluster_id?
  
    
      
        [installer/data/data/aws/bootstrap/main.tf](https://github.com/openshift/installer/blob/e9073fd3dcdef7595b7eca927b7944e7e1e76dcd/data/data/aws/bootstrap/main.tf#L37)
      
      
           Line 37
        in
        [e9073fd](/openshift/installer/commit/e9073fd3dcdef7595b7eca927b7944e7e1e76dcd)
      
      
      
      
  
          
            
             name = "${var.cluster_name}-bootstrap-profile"

Fixed here d2d9bbc

* `pkg/asset/machines/worker.go` is using the UUID instead of the human ID.

Fixed here 1a2a37f bda7b94 6e6d71f

* I don't like that the term cluster ID is overloaded. It would be nice if one of the two uses were qualified in some way so that it is evident from the term which of the cluster IDs is being used.

I think it is fair to say the clusterID gived to 2 options a UUID that you can use where you absolutely want global uniquenes and HumanID where you care about uniqueness to only small level.

@staebler
Copy link
Contributor

I think it is fair to say the clusterID gived to 2 options a UUID that you can use where you absolutely want global uniquenes and HumanID where you care about uniqueness to only small level.

Fair enough. If you think that it will always be completely obvious which specific kind of cluster ID is required for every parameter passed to every function, then I am satisfied.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The max length is 27, not 32.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done b156efa

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/alphabumeric/alphanumeric/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done with 144ab05a5

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is of max length maxNameLen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh Yeah :) will fix that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done with 9f5c0c0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather see the expected replacements written out explicitly in the test cases. This just ends up running the same code that is used in generateHumanID. If there is a bug in the replacement, this won't capture the bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to make it more meaningful with 9f5c0c0 -> 4650c15 compare

@abhinavdahiya abhinavdahiya force-pushed the human_id branch 4 times, most recently from 79a821c to 653a9b2 Compare February 20, 2019 22:33
@abhinavdahiya abhinavdahiya changed the title WIP *: use a human friendly ID to tag and name cloud resources. *: use a human friendly ID to tag and name cloud resources. Feb 20, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 20, 2019
@abhinavdahiya
Copy link
Contributor Author

looks like network operator error e2e-aws:

level=info msg="Waiting up to 30m0s for the cluster to initialize..."
level=fatal msg="failed to initialize the cluster: timed out waiting for the condition"
        {
            "apiVersion": "config.openshift.io/v1",
            "kind": "ClusterOperator",
            "metadata": {
                "creationTimestamp": "2019-02-20T23:36:45Z",
                "generation": 1,
                "name": "network",
                "resourceVersion": "29767",
                "selfLink": "/apis/config.openshift.io/v1/clusteroperators/network",
                "uid": "628e2788-3568-11e9-b802-0efd314e1e72"
            },
            "spec": {},
            "status": {
                "conditions": [
                    {
                        "lastTransitionTime": "2019-02-20T23:36:53Z",
                        "status": "False",
                        "type": "Failing"
                    },
                    {
                        "lastTransitionTime": "2019-02-20T23:36:53Z",
                        "message": "DaemonSet \"openshift-multus/multus\" is not available (awaiting 1 nodes)\nDaemonSet \"openshift-sdn/ovs\" is not available (awaiting 1 nodes)\nDaemonSet \"openshift-sdn/sdn\" is not available (awaiting 1 nodes)",
                        "reason": "Deploying",
                        "status": "True",
                        "type": "Progressing"
                    },
                    {
                        "lastTransitionTime": "2019-02-20T23:36:53Z",
                        "message": "DaemonSet \"openshift-multus/multus\" is not available (awaiting 1 nodes)\nDaemonSet \"openshift-sdn/ovs\" is not available (awaiting 1 nodes)\nDaemonSet \"openshift-sdn/sdn\" is not available (awaiting 1 nodes)",
                        "reason": "Deploying",
                        "status": "False",
                        "type": "Available"
                    }
                ],
                "extension": null,
                "version": ""
            }
        },

/retest

@abhinavdahiya
Copy link
Contributor Author

/retest

Currently the hive uses the `.clusterID` field as source for the ID used by telemeter and cluster-version-operator.

Adding the `infraID` field allows hive to continue using `clusterID` for cluster UUID purpose and `infraID` for ID used to
tag/name cloud resources.
@abhinavdahiya
Copy link
Contributor Author

missed dropping variable during rebase :P

@abhinavdahiya
Copy link
Contributor Author

Failing tests:

[Conformance][Area:Networking][Feature:Router] The HAProxy router should expose prometheus metrics for a route [Suite:openshift/conformance/parallel/minimal]

router flaking
/retest

Copy link
Contributor

@staebler staebler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 22, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, staebler

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [abhinavdahiya,staebler]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit df57c7b into openshift:master Feb 22, 2019
provider := provider(clustername, config.Networking.MachineCIDR.String(), platform, userDataSecret)
name := fmt.Sprintf("%s-%s-%d", clustername, pool.Name, 0)
provider := provider(clusterID, config.Networking.MachineCIDR.String(), platform, userDataSecret)
name := fmt.Sprintf("%s-%s-%d", clusterID, pool.Name, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should Hive be passing when we call this function? This looks like it's still cluster ID, but should probably be infraID?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N/M found where it was passed in, we're sending infraID, var could probably use a rename though.

abhinavdahiya added a commit to abhinavdahiya/installer that referenced this pull request Feb 23, 2019
image-registry-operator is using the ID from clusterversion object in the cluster to tag s3 buckets [1].

```
$ AWS_PROFILE=ci aws s3api get-bucket-tagging  --bucket image-registry-us-east-1-f23dc729905144e6b844ab3d76a42ed7-a306
{
    "TagSet": [
        {
            "Key": "openshiftClusterID",
            "Value": "f23dc729-9051-44e6-b844-ab3d76a42ed7"
        },
        {
            "Key": "expirationDate",
            "Value": "2019-02-23T05:39+0000"
        }
    ]
}
```
Therefore the destroy code is leaking registry s3 buckets. Adding the `openshiftClusterID` tag to AWS metadata back to make sure we don't
leak resources.

Installer tried to move to single tag for AWS clusters here [2].

[1]: https://github.com/openshift/cluster-image-registry-operator/blob/7228534c826c92dee38d578445c93f2537f0b775/pkg/storage/s3/s3.go#L288
[2]: openshift#1280
wking added a commit to wking/openshift-training that referenced this pull request Mar 5, 2019
The installer no longer sets this since openshift/installer@56f47611
(data/aws: use cluster_id to name all resources, 2019-02-19,
openshift/installer#1280).  Preserving the file seems fairly
straightforward, so I'm just dropping the reconstruction nodes.  We
may be able to add something similar back once the format stabilizes
again.
wking added a commit to wking/openshift-docs that referenced this pull request Mar 5, 2019
The installer no longer sets this since openshift/installer@56f47611
(data/aws: use cluster_id to name all resources, 2019-02-19,
openshift/installer#1280).
wking added a commit to wking/openshift-installer that referenced this pull request Mar 5, 2019
The installer no longer sets this since openshift/installer@56f47611
(data/aws: use cluster_id to name all resources, 2019-02-19,
openshift#1280).  The credentials operator still sets it on
users, but just leak those for now (there's work in progress to fix
them).
wking added a commit to wking/cluster-image-registry-operator that referenced this pull request Apr 23, 2019
…t.io

Mostly, this is replacing cluster-config-v1 (which is deprecated [1])
with the Infrastructure config object.  As part of that, I'm dropping
UserTags support (see discussion in [2,3]).  I'm also replacing the
openshiftClusterID tag with a "kubernetes.io/cluster/{infraName}:
owned" tag [4,5].

[1]: openshift/installer#680
[2]: openshift/api#231 (comment)
[3]: openshift/api#266
[4]: openshift/installer#1280
[5]: https://bugzilla.redhat.com/show_bug.cgi?id=1685089#c11
wking added a commit to wking/cluster-image-registry-operator that referenced this pull request Apr 23, 2019
…t.io

Mostly, this is replacing cluster-config-v1 (which is deprecated [1])
with the Infrastructure config object.  As part of that, I'm dropping
UserTags support (see discussion in [2,3]).  I'm also replacing the
openshiftClusterID tag with a "kubernetes.io/cluster/{infraName}:
owned" tag [4,5].

UnversionedRESTClientFor (vs. RESTClientFor) avoids errors like:

  $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-image-registry-operator/260/pull-ci-openshift-cluster-image-registry-operator-master-e2e-aws/1304/artifacts/e2e-aws/pods/openshift-image-registry_cluster-image-registry-operator-fc9948cc-b767n_cluster-image-registry-operator.log.gz | gunzip
  I0423 05:43:03.599910       1 main.go:18] Cluster Image Registry Operator Version: 4.0.0-142-g38fc8f9-dirty
  ...
  I0423 05:43:04.942285       1 bootstrap.go:38] generating registry custom resource
  E0423 05:43:04.942775       1 controller.go:235] unable to sync: unable to get storage configuration from cluster install config: GroupVersion is required when initializing a RESTClient, requeuing

InClusterConfig does not set GroupVersion [6] and RESTClientFor
requires it [7] while UnversionedRESTClientFor can fill it in [8].

[1]: openshift/installer#680
[2]: openshift/api#231 (comment)
[3]: openshift/api#266
[4]: openshift/installer#1280
[5]: https://bugzilla.redhat.com/show_bug.cgi?id=1685089#c11
[6]: https://github.com/kubernetes/client-go/blob/59781b88d0faa9d51be48b4ca49eb08e05246de4/rest/config.go#L422-L428
[7]: https://github.com/kubernetes/client-go/blob/59781b88d0faa9d51be48b4ca49eb08e05246de4/rest/config.go#L269-L270
[8]: https://github.com/kubernetes/client-go/blob/59781b88d0faa9d51be48b4ca49eb08e05246de4/rest/config.go#L331-L333
wking added a commit to wking/cluster-image-registry-operator that referenced this pull request Apr 23, 2019
…t.io

Mostly, this is replacing cluster-config-v1 (which is deprecated [1])
with the Infrastructure config object.  As part of that, I'm dropping
UserTags support (see discussion in [2,3]).  I'm also replacing the
openshiftClusterID tag with a "kubernetes.io/cluster/{infraName}:
owned" tag [4,5].

The trip through clientcorev1.NewForConfig is because calling
RESTClientFor on the InClusterConfig raises:

  $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-image-registry-operator/260/pull-ci-openshift-cluster-image-registry-operator-master-e2e-aws/1304/artifacts/e2e-aws/pods/openshift-image-registry_cluster-image-registry-operator-fc9948cc-b767n_cluster-image-registry-operator.log.gz | gunzip
  I0423 05:43:03.599910       1 main.go:18] Cluster Image Registry Operator Version: 4.0.0-142-g38fc8f9-dirty
  ...
  I0423 05:43:04.942285       1 bootstrap.go:38] generating registry custom resource
  E0423 05:43:04.942775       1 controller.go:235] unable to sync: unable to get storage configuration from cluster install config: GroupVersion is required when initializing a RESTClient, requeuing

InClusterConfig does not set GroupVersion [6] and RESTClientFor
requires it [7] while UnversionedRESTClientFor can fill it in [8].
But UnversionedRESTClientFor leaves NegotiatedSerializer unset:

  $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-image-registry-operator/260/pull-ci-openshift-cluster-image-registry-operator-master-e2e-aws/1305/artifacts/e2e-aws/pods/openshift-image-registry_cluster-image-registry-operator-66d8b56784-5wt9v_cluster-image-registry-operator.log.gz | gunzip
  I0423 09:21:42.998300       1 main.go:18] Cluster Image Registry Operator Version: 4.0.0-142-g64d5f62-dirty
  ...
  I0423 09:21:44.345677       1 bootstrap.go:38] generating registry custom resource
  E0423 09:21:44.346403       1 controller.go:235] unable to sync: unable to get storage configuration from cluster install config: NegotiatedSerializer is required when initializing a RESTClient, requeuing

So I'm washing through clientcorev1.NewForConfig to get defaults for
both [9].

[1]: openshift/installer#680
[2]: openshift/api#231 (comment)
[3]: openshift/api#266
[4]: openshift/installer#1280
[5]: https://bugzilla.redhat.com/show_bug.cgi?id=1685089#c11
[6]: https://github.com/kubernetes/client-go/blob/59781b88d0faa9d51be48b4ca49eb08e05246de4/rest/config.go#L422-L428
[7]: https://github.com/kubernetes/client-go/blob/59781b88d0faa9d51be48b4ca49eb08e05246de4/rest/config.go#L269-L270
[8]: https://github.com/kubernetes/client-go/blob/59781b88d0faa9d51be48b4ca49eb08e05246de4/rest/config.go#L331-L333
[9]: https://github.com/kubernetes/client-go/blob/59781b88d0faa9d51be48b4ca49eb08e05246de4/kubernetes/typed/core/v1/core_client.go#L144-L155
wking added a commit to wking/cluster-image-registry-operator that referenced this pull request Apr 23, 2019
…t.io

Mostly, this is replacing cluster-config-v1 (which is deprecated [1])
with the Infrastructure config object.  As part of that, I'm dropping
UserTags support (see discussion in [2,3]).  I'm also replacing the
openshiftClusterID tag with a "kubernetes.io/cluster/{infraName}:
owned" tag [4,5].

The trip through clientcorev1.NewForConfig is because calling
RESTClientFor on the InClusterConfig raises:

  $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-image-registry-operator/260/pull-ci-openshift-cluster-image-registry-operator-master-e2e-aws/1304/artifacts/e2e-aws/pods/openshift-image-registry_cluster-image-registry-operator-fc9948cc-b767n_cluster-image-registry-operator.log.gz | gunzip
  I0423 05:43:03.599910       1 main.go:18] Cluster Image Registry Operator Version: 4.0.0-142-g38fc8f9-dirty
  ...
  I0423 05:43:04.942285       1 bootstrap.go:38] generating registry custom resource
  E0423 05:43:04.942775       1 controller.go:235] unable to sync: unable to get storage configuration from cluster install config: GroupVersion is required when initializing a RESTClient, requeuing

InClusterConfig does not set GroupVersion [6] and RESTClientFor
requires it [7] while UnversionedRESTClientFor can fill it in [8].
But UnversionedRESTClientFor leaves NegotiatedSerializer unset:

  $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-image-registry-operator/260/pull-ci-openshift-cluster-image-registry-operator-master-e2e-aws/1305/artifacts/e2e-aws/pods/openshift-image-registry_cluster-image-registry-operator-66d8b56784-5wt9v_cluster-image-registry-operator.log.gz | gunzip
  I0423 09:21:42.998300       1 main.go:18] Cluster Image Registry Operator Version: 4.0.0-142-g64d5f62-dirty
  ...
  I0423 09:21:44.345677       1 bootstrap.go:38] generating registry custom resource
  E0423 09:21:44.346403       1 controller.go:235] unable to sync: unable to get storage configuration from cluster install config: NegotiatedSerializer is required when initializing a RESTClient, requeuing

So I'm washing through clientcorev1.NewForConfig to get defaults for
both [9].

The RBAC change avoids:

  $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-image-registry-operator/260/pull-ci-openshift-cluster-image-registry-operator-master-e2e-aws/1306/artifacts/e2e-aws/pods/openshift-image-registry_cluster-image-registry-operator-7f59899bf6-8c8x8_cluster-image-registry-operator.log.gz | gunzip | head -n7
  I0423 10:44:51.121540       1 main.go:18] Cluster Image Registry Operator Version: 4.0.0-142-gfa5b59b-dirty
  ...
  I0423 10:44:52.461691       1 bootstrap.go:38] generating registry custom resource
  E0423 10:44:52.464035       1 controller.go:235] unable to sync: unable to get storage configuration from cluster install config: infrastructures "cluster" is forbidden: User "system:serviceaccount:openshift-image-registry:cluster-image-registry-operator" cannot get resource "infrastructures" in API group "" at the cluster scope, requeuing

[1]: openshift/installer#680
[2]: openshift/api#231 (comment)
[3]: openshift/api#266
[4]: openshift/installer#1280
[5]: https://bugzilla.redhat.com/show_bug.cgi?id=1685089#c11
[6]: https://github.com/kubernetes/client-go/blob/59781b88d0faa9d51be48b4ca49eb08e05246de4/rest/config.go#L422-L428
[7]: https://github.com/kubernetes/client-go/blob/59781b88d0faa9d51be48b4ca49eb08e05246de4/rest/config.go#L269-L270
[8]: https://github.com/kubernetes/client-go/blob/59781b88d0faa9d51be48b4ca49eb08e05246de4/rest/config.go#L331-L333
[9]: https://github.com/kubernetes/client-go/blob/59781b88d0faa9d51be48b4ca49eb08e05246de4/kubernetes/typed/core/v1/core_client.go#L144-L155
wking added a commit to wking/cluster-image-registry-operator that referenced this pull request Apr 23, 2019
…t.io

Mostly, this is replacing cluster-config-v1 (which is deprecated [1])
with the Infrastructure config object.  As part of that, I'm dropping
UserTags support (see discussion in [2,3]).  I'm also replacing the
openshiftClusterID tag with a "kubernetes.io/cluster/{infraName}:
owned" tag [4,5].

The trip through clientcorev1.NewForConfig is because calling
RESTClientFor on the InClusterConfig raises:

  $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-image-registry-operator/260/pull-ci-openshift-cluster-image-registry-operator-master-e2e-aws/1304/artifacts/e2e-aws/pods/openshift-image-registry_cluster-image-registry-operator-fc9948cc-b767n_cluster-image-registry-operator.log.gz | gunzip
  I0423 05:43:03.599910       1 main.go:18] Cluster Image Registry Operator Version: 4.0.0-142-g38fc8f9-dirty
  ...
  I0423 05:43:04.942285       1 bootstrap.go:38] generating registry custom resource
  E0423 05:43:04.942775       1 controller.go:235] unable to sync: unable to get storage configuration from cluster install config: GroupVersion is required when initializing a RESTClient, requeuing

InClusterConfig does not set GroupVersion [6] and RESTClientFor
requires it [7] while UnversionedRESTClientFor can fill it in [8].
But UnversionedRESTClientFor leaves NegotiatedSerializer unset:

  $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-image-registry-operator/260/pull-ci-openshift-cluster-image-registry-operator-master-e2e-aws/1305/artifacts/e2e-aws/pods/openshift-image-registry_cluster-image-registry-operator-66d8b56784-5wt9v_cluster-image-registry-operator.log.gz | gunzip
  I0423 09:21:42.998300       1 main.go:18] Cluster Image Registry Operator Version: 4.0.0-142-g64d5f62-dirty
  ...
  I0423 09:21:44.345677       1 bootstrap.go:38] generating registry custom resource
  E0423 09:21:44.346403       1 controller.go:235] unable to sync: unable to get storage configuration from cluster install config: NegotiatedSerializer is required when initializing a RESTClient, requeuing

So I'm washing through clientcorev1.NewForConfig to get defaults for
both [9].

The RBAC change avoids:

  $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-image-registry-operator/260/pull-ci-openshift-cluster-image-registry-operator-master-e2e-aws/1306/artifacts/e2e-aws/pods/openshift-image-registry_cluster-image-registry-operator-7f59899bf6-8c8x8_cluster-image-registry-operator.log.gz | gunzip | head -n7
  I0423 10:44:51.121540       1 main.go:18] Cluster Image Registry Operator Version: 4.0.0-142-gfa5b59b-dirty
  ...
  I0423 10:44:52.461691       1 bootstrap.go:38] generating registry custom resource
  E0423 10:44:52.464035       1 controller.go:235] unable to sync: unable to get storage configuration from cluster install config: infrastructures "cluster" is forbidden: User "system:serviceaccount:openshift-image-registry:cluster-image-registry-operator" cannot get resource "infrastructures" in API group "" at the cluster scope, requeuing

[1]: openshift/installer#680
[2]: openshift/api#231 (comment)
[3]: openshift/api#266
[4]: openshift/installer#1280
[5]: https://bugzilla.redhat.com/show_bug.cgi?id=1685089#c11
[6]: https://github.com/kubernetes/client-go/blob/59781b88d0faa9d51be48b4ca49eb08e05246de4/rest/config.go#L422-L428
[7]: https://github.com/kubernetes/client-go/blob/59781b88d0faa9d51be48b4ca49eb08e05246de4/rest/config.go#L269-L270
[8]: https://github.com/kubernetes/client-go/blob/59781b88d0faa9d51be48b4ca49eb08e05246de4/rest/config.go#L331-L333
[9]: https://github.com/kubernetes/client-go/blob/59781b88d0faa9d51be48b4ca49eb08e05246de4/kubernetes/typed/core/v1/core_client.go#L144-L155
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants