Skip to content

Conversation

@kalexand-rh
Copy link
Contributor

@kalexand-rh kalexand-rh self-assigned this May 20, 2020
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 20, 2020
@openshift-docs-preview-bot

The preview will be available shortly at:

@kalexand-rh
Copy link
Contributor Author

@jstuever, will you PTAL?

@kalexand-rh
Copy link
Contributor Author

@shellyyang1989, will you PTAL?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the shared vpc alongside host project should be pre-existing resources, IMO, we still need to provide an end-to-end solution to customers so that they can deploy OCP from scratch. Therefore, I suggest to remove this line and add a section to describe how to configure host project and shared VPC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shellyyang1989 or @jstuever, do you have more details about the host project and VPC configuration?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To learn more about how to enable shared VPC, please refer to https://cloud.google.com/vpc/docs/provisioning-shared-vpc#setting_up. Once shared VPC is enabled, all the vpc networks in host project can be accessed by service project.
The host project is dedicated to host vpc network and dns which do not have much difference with those in stand-alone project.
The host project should have these:

  1. Service account which has these roles enabled:
    Compute Network User
    Compute Security Admin
    Deployment Manager Editor
    DNS Administrator
    Security Admin
    Network Management Admin
  2. Have user managed service account of service project got 'Compute Network User' role of host project
  3. Have google managed service account i.e. [email protected] of service project got 'Compute Network User' role of host project
  4. A public dns zone for external dns resolution
  5. VPC network

Copy link

@shellyyang1989 shellyyang1989 Jun 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In shared VPC scenario, there are 2 kinds of projects: host project which hosts the shared VPC network and service project which hosts the OCP cluster. I think you'd like to describe the service project in this section, and I suggest to update it to Configuring your GCP service project to make it more clear. If you'd like to learn more about shared VPC, please refer to https://cloud.google.com/vpc/docs/shared-vpc#shared_vpc_networks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, missing service before project.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned before, add a section "Configuring your GCP host project" here

Copy link

@shellyyang1989 shellyyang1989 Jun 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In shared VPC scenario, host project hosts the DNS zone for the cluster which means DNS is in a different project from OCP cluster. Please update the content of this section accordingly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A service account in host project is also required. And required roles are:
Compute Network User
Compute Security Admin
Deployment Manager Editor
DNS Administrator
Security Admin
Network Management Admin

Copy link

@shellyyang1989 shellyyang1989 Jun 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

network, controlPlaneSubnet and computeSubnet are not required in install-config.yaml. If you specify them, you will get the below error as the network is in host project instead of service project.
FATAL failed to fetch Master Machines: failed to load asset "Install Config": platform.gcp.network: Invalid value: "aos-qe-network": failed to get network aos-qe-network: googleapi: Error 404: The resource 'projects/openshift-qe/global/networks/aos-qe-network' was not found, notFound

Copy link

@shellyyang1989 shellyyang1989 Jun 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this: Required. The installation program prompts you for this value.
Please explicitly demonstrate it's the public DNS zone resided on host project

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The replica of work nodes should be 0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jstuever , please help confirm: in shared VPC scenario, only support to create worker nodes manually, correct?
In UPI installation of standalone project, cluster could provision the worker nodes, does it work for shared VPC? Actually I tried that but worker nodes were not created. I am not sure if it's expected.

@kalexand-rh kalexand-rh changed the title [WIP] OSDOCS-986 GCP UPI shared VPC OSDOCS-986 GCP UPI shared VPC Jun 26, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 26, 2020
Copy link

@shellyyang1989 shellyyang1989 Jun 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is suffixed 'yaml' not yml. And absolute path would be better.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of network-project-id is the id of host project

Copy link

@shellyyang1989 shellyyang1989 Jun 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For external cluster, we need to change load balance to external before ignition files creation:
Open <installation_directory>/manifests/cluster-ingress-default-ingresscontroller.yaml file
Replace the value of 'scope' parameter with External.
The content of this file is like this:

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  creationTimestamp: null
  name: default
  namespace: openshift-ingress-operator
spec:
  endpointPublishingStrategy:
    loadBalancer:
      scope: External
    type: LoadBalancerService
status:
  availableReplicas: 0
  domain: ''
  selector: ''

Copy link

@shellyyang1989 shellyyang1989 Jun 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think absolute path would be better.

@shellyyang1989
Copy link

For $ jq -r .infraID /<installation_directory>/metadata.json
openshift-vw9j6, remove the first slash would be better.

@shellyyang1989
Copy link

shellyyang1989 commented Jun 29, 2020

Regarding 01_VPC.py, it's recommended to use auto_only network. Please get the latest VPC deployment config here: https://github.com/openshift/installer/blob/master/upi/gcp/01_vpc.py. You haven't completed the VPC network creation section, right? As the VPC network is hosted by host project, we need to use --project --account to specify the host project and the service account of host project, e.g.

gcloud deployment-manager deployments create ${INFRA_ID}-vpc --config 01_vpc.yaml --project ${HOST_PROJECT} --account ${HOST_PROJECT_ACCOUNT}

@shellyyang1989
Copy link

shellyyang1989 commented Jun 29, 2020

In Creating networking and load balancing components in GCP section:
again, dns is hosted in host project whereas load balance is hosted in service project. So we need to separate the creation of these 2 resources.
To create private dns zone:
Get 02_dns.py from https://github.com/openshift/installer/blob/master/upi/gcp/02_dns.py

cat <<EOF > ${DIR}/02_dns.yaml
imports:
- path: ${DIR}/02_dns.py
resources:
- name: cluster-dns
  type: ${DIR}/02_dns.py
  properties:
    infra_id: '${INFRA_ID}'
    cluster_domain: '${CLUSTER_NAME}.${BASE_DOMAIN}'
    cluster_network: '${CLUSTER_NETWORK}'
EOF

gcloud deployment-manager deployments create ${INFRA_ID}-dns --config ${DIR}/02_dns.yaml --project ${HOST_PROJECT} --account ${HOST_PROJECT_ACCOUNT}

gcloud deployment-manager deployments create ${INFRA_ID}-dns --config ${DIR}/02_dns.yaml --project ${HOST_PROJECT} --account ${HOST_PROJECT_ACCOUNT}

For external cluster, both external lb and internal lb are required, to create lb:
Get 02_lb_ext.py from https://github.com/openshift/installer/blob/master/upi/gcp/02_lb_ext.py
Get 02_lb_int.py from https://github.com/openshift/installer/blob/master/upi/gcp/02_lb_int.py

cat <<EOF >02_lb.yaml
imports:
- path: 02_lb_ext.py
- path: 02_lb_int.py
resources:
- name: cluster-lb-ext
  type: 02_lb_ext.py
  properties:
    infra_id: '${INFRA_ID}'
    region: '${REGION}'
- name: cluster-lb-int
  type: 02_lb_int.py
  properties:
    cluster_network: '${CLUSTER_NETWORK}'
    control_subnet: '${CONTROL_SUBNET}'
    infra_id: '${INFRA_ID}'
    region: '${REGION}'
    zones:
    - '${ZONE_0}'
    - '${ZONE_1}'
    - '${ZONE_2}'
EOF

gcloud deployment-manager deployments create ${INFRA_ID}-lb --config 02_lb.yaml

export CLUSTER_IP=$(gcloud compute addresses describe ${INFRA_ID}-cluster-ip --region=${REGION} --format json | jq -r .address)
export CLUSTER_PUBLIC_IP=$(gcloud compute addresses describe ${INFRA_ID}-cluster-public-ip --region=${REGION} --format json | jq -r .address)

For internal cluster, only internal lb is required, to create internal lb only:

cat <<EOF > ${DIR}/02_lb.yaml
imports:
- path: ${DIR}/02_lb_int.py
resources:
- name: cluster-lb-int
  type: ${DIR}/02_lb_int.py
  properties:
    cluster_network: '${CLUSTER_NETWORK}'
    control_subnet: '${CONTROL_SUBNET}'
    infra_id: '${INFRA_ID}'
    region: '${REGION}'
    zones:
    - '${ZONE_0}'
    - '${ZONE_1}'
    - '${ZONE_2}'
EOF

gcloud deployment-manager deployments create ${INFRA_ID}-lb --config ${DIR}/02_lb.yaml || exit 2

export CLUSTER_IP=$(gcloud compute addresses describe ${INFRA_ID}-cluster-ip --region=${REGION} --format json | jq -r .address)
export CLUSTER_PUBLIC_IP=$(gcloud compute addresses describe ${INFRA_ID}-cluster-public-ip --region=${REGION} --format json | jq -r .address)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a redundant space between > and '.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for misleading you. As the variables INFRA_ID is not available here, we can change ${INFRA_ID}-vpc to <vpc_deployment_name> or something else. And we need to define HOST_PROJECT and HOST_PROJECT_ACCOUNT before this command line.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change control plane to compute.

Copy link

@shellyyang1989 shellyyang1989 Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we define REGION in the Creating the VPC section, please add a note below to explain that the region should be consistent with REGION. I mean the region in which the cluster will install should be the same as the region that VPC network resides in.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a redundant space in HOST_ PROJECT

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not required since master-subnet is not used.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As HOST_PROJECT_CONTROL_SUBNET and HOST_PROJECT_COMPUTE_SUBNET are defined in Creating VPC section, please remove them from this section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only see where these variables are used in this file now, so if you're looking through the preview, it might not have refreshed right. I'm uploading a fresh preview now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worker-subnet is not used so please remove this line.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding also before copy might make it more clear.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding also before export might make it more clear.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding also before add might make it more clear.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section seems duplicate with Creating cluster-wide firewall rules for a shared VPC in GCP. Please check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shellyyang1989, it looks like the options in the commands are different, and I am not sure if they're trying to do exactly the same thing or not.

I don't work with the devs who work on ingress, so you know more about this than I do. What can I do today to make this work for GA, and will you file a bug for what you think we need to figure out to make this section right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kalexand-rh I don't find the command lines are different. If you're seeing any difference, please let me know.

Copy link

@shellyyang1989 shellyyang1989 Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per upstream doc, there are 2 approaches to add firewall rules: 1 is Adding firewall rules based on cluster events and another is Adding a cluster-wide firewall rules. So I suggest to change the section name as below:

Adding the ingress firewall rules
    Adding firewall rules based on cluster events
    Adding cluster-wide firewall rules (alternative)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked with the author of the dev docs, and dev's preference is to use the event-based rules to better mimic the IPI functionality. I structured the docs to mimic that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. What I mean was that we need to instruct that Adding cluster-wide firewall rules is an alternative of event-based rules, if there is any issue on event, user can add cluster-wide firewall rules instead.

@kalexand-rh kalexand-rh added the peer-review-needed Signifies that the peer review team needs to review this PR label Jul 9, 2020
Copy link

@lamek lamek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left a few quick comments.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing word? "... firewall rules in Google Cloud..."

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeated word "that".

Slight reword suggestion:
"While making the service account an Owner of the project is the easiest way to gain the required permissions, this gives the service account complete control over the project. You must determine if the risk that comes from offering that power is acceptable.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing article: "...have a project that hosts a shared VPC network..."

@shellyyang1989
Copy link

LGTM except for the copy buttons and the zone example for bootstrap, master, worker.

@kalexand-rh
Copy link
Contributor Author

Great! I am going to fix the copy buttons on a separate PR because it's going to involve a lot of code block changes. I fixed the zones to use us-central1. Per the GCP docs, us-central1 uses a,b,c,f instead of b,c,d, like the us-east1 examples, so I'm adjusting those references, too.

@kalexand-rh
Copy link
Contributor Author

@vikram-redhat approved merging this change with the final amendments.

@kalexand-rh kalexand-rh merged commit f397331 into openshift:master Jul 10, 2020
@kalexand-rh
Copy link
Contributor Author

/cherrypick enterprise-4.5

@openshift-cherrypick-robot

@kalexand-rh: new pull request created: #23628

Details

In response to this:

/cherrypick enterprise-4.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

branch/enterprise-4.5 peer-review-needed Signifies that the peer review team needs to review this PR size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants