Skip to content

Conversation

@jstuever
Copy link
Contributor

@jstuever jstuever commented Mar 11, 2020

This change adds direction in the GCP UPI install document about how to
install a cluster using a Shared VPC. Because the VPC, networks,
subnets, and dns zones are in a different project (the host project),
the installer has problems finding them while creating the Ignition
files. Furthermore, some changes are required to the cloud-provider in
order for the cluster to properly provision resources in the subnets. In
addition, it is assumed the service account in the service project will
likely not have sufficient permissions in the host project to perform
all of the required tasks.

Depends on: #2574, #3270, #3309

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 11, 2020
@jstuever
Copy link
Contributor Author

/cc @patrickdillon
/assign @abhinavdahiya

Copy link
Contributor

@abhinavdahiya abhinavdahiya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment

@jstuever jstuever force-pushed the cors1396 branch 2 times, most recently from 5f9abe1 to ce33cfb Compare March 12, 2020 22:19
@patrickdillon
Copy link
Contributor

patrickdillon commented Mar 13, 2020

This LGTM. Not sure if we are missing dpeendencies for e2e-gcp-upi but let's give it a shot...

This comment was intended for 3270.

@jstuever
Copy link
Contributor Author

jstuever commented Mar 13, 2020

Experiencing an issue with ingress controller creating the default-route. That blocks wait-for install-complete from succeeding. Will investigate, find a solution, and update the PR accordingly.

Looks to be failing to add a worker to an instance-group as expected when publish: Internal.

I0313 19:31:07.588979       1 gce_loadbalancer_internal.go:539] ensureInternalInstanceGroups(k8s-ig--3b88cb21c8b51f06): 5 nodes over 3 zones in region us-east1                                                                               
I0313 19:31:09.915592       1 gce_loadbalancer_internal.go:481] ensureInternalInstanceGroup(k8s-ig--3b88cb21c8b51f06, us-east1-b): checking group that it contains 1 nodes                                                                    
I0313 19:31:10.411576       1 gce_loadbalancer_internal.go:525] ensureInternalInstanceGroup(k8s-ig--3b88cb21c8b51f06, us-east1-b): adding nodes: [jstuev-74trp-w-0]                                                                           
E0313 19:31:11.437918       1 gce_loadbalancer.go:169] Failed to EnsureLoadBalancer(jstuev-74trp, openshift-ingress, router-default, ab98db2a8888942daa1299ee41987ca6, us-east1), err: googleapi: Error 400: Resource 'projects/<...>/zones/us-east1-b/instances/jstuev-74trp-w-0' is expected to be in the subnetwork 'projects/<...>/regions/us-east1/subnetworks/installer-shared-vpc-subnet-1' but is in the subnetwork 'projects/<...>/regions/us-east1/subnetworks/installer-shared-vpc-subnet-2'., wrongSubnetwork

Update: It appears the gce_loadbalancer_internal is ignoring the master IG because it contains the bootstrap node. As a result, it tries to put the master node in the new ingress IG. Then it fails to add the worker node because it is in a different subnet. Working on solution to move bootstrap node into its own IG.

Update: It is resolved by #3309

@jstuever jstuever force-pushed the cors1396 branch 3 times, most recently from f8143e9 to 464a381 Compare March 18, 2020 03:26
@jstuever jstuever requested a review from abhinavdahiya March 18, 2020 16:06
@jstuever
Copy link
Contributor Author

/hold for e2e-gcp-upi

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 18, 2020
@jstuever jstuever changed the title WIP: GCP UPI: document how to install into a Shared VPC GCP UPI: document how to install into a Shared VPC Mar 18, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 18, 2020
@jstuever
Copy link
Contributor Author

/hold cancel
This is documentation only, there are no functional changes.

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 19, 2020
@jstuever
Copy link
Contributor Author

/retest

Comment on lines 28 to 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the installer will not be able to find the base domain zone in the host project.
Create the `install-config.yaml` manually using the documentation references/examples.
It must have `publish: Internal` to avoid the base domain verification when executing `create manifests` later in this workflow.
Create the `install-config.yaml` manually using the documentation references/examples.
It must have `publish: Internal` because the installer will not be able to find the public DNS zone for the the base domain in the host project, which is required for the External clusters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ptal 67658b5fca0aeaa92b80e428086fdd30037dfc9d

Comment on lines 160 to 197
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, i don't think removing is a solution that the ingress team will like, this probably needs to update the object correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file only exists when publish: Internal. It is not clear what it should look like, if at all, when publish: External

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with f512a3c to only modify the file from Internal to External

Comment on lines 759 to 824
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik, the kube-controller-manger (gcp) emits events for the firewall rules that need to be created for each load balancer service.

so i think we should make it clear that the example here is opening up everything when technically they should be using those events to create specific ones. an example of what the event looks like and how to do that would be useful here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make it clear this is opening everything up. I'm not sure what the event looks like because I did not see them in my testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked this section. PTAL 67658b5fca0aeaa92b80e428086fdd30037dfc9d

@jstuever jstuever requested a review from abhinavdahiya March 24, 2020 19:55
@jstuever jstuever force-pushed the cors1396 branch 2 times, most recently from bd1ff87 to 6445a18 Compare March 30, 2020 18:16
This change adds direction in the GCP UPI install document about how to
install a cluster using a Shared VPC. Because the VPC, networks,
subnets, and dns zones are in a different project (the host project),
the installer has problems finding them while creating the Ignition
files. Furthermore, some changes are required to the cloud-provider in
order for the cluster to properly provision resources in the subnets. In
addition, it is assumed the service account in the service project will
likely not have sufficient permissions in the host project to perform
all of the required tasks.
@abhinavdahiya
Copy link
Contributor

/lgtm
/approve

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 30, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 30, 2020
@openshift-merge-robot openshift-merge-robot merged commit 78f1acc into openshift:master Mar 30, 2020
@openshift-ci-robot
Copy link
Contributor

@jstuever: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-gcp-upi ce33cfb9b23d2ee5cc2f349566cd3ae0ca1bb3eb link /test e2e-gcp-upi
ci/prow/e2e-openstack 464a38145475fc05861ff8bbbda30f6ebde7b4d7 link /test e2e-openstack
ci/prow/e2e-aws 464a38145475fc05861ff8bbbda30f6ebde7b4d7 link /test e2e-aws
ci/prow/e2e-libvirt 464a38145475fc05861ff8bbbda30f6ebde7b4d7 link /test e2e-libvirt
ci/prow/e2e-aws-steps 67658b5fca0aeaa92b80e428086fdd30037dfc9d link /test e2e-aws-steps
ci/prow/e2e-aws-scaleup-rhel7 f512a3c link /test e2e-aws-scaleup-rhel7

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jstuever jstuever deleted the cors1396 branch June 17, 2020 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants