Skip to content

Conversation

@abhinavdahiya
Copy link
Contributor

pkg/types: add the publish strategy to install-config

This commit adds a enum(string) type PublishingStrategy. It supports 2 options

  • ExternalPublishingStrategy : the endpoints are exposed to the Internet
  • InternalPublishingStrategy : the endpoints are exposed to the Private Network only

This enum type is added to the InstallConfig to control the strategy for the cluster endpoints. Cluster endpoints include the API server,
the default Ingress controller, public IPs etc.

Also adds the defaulting to make sure this strategy is External by default, keeping the backward-compatible behavior, and validation to make sure only valid enum options are set.

pkg/asset: update machines and DNS for external-internal strategy

For aws, only when the strategy is External, do we set the public Route53 zone for the basedomain in DNSes.config.openshift.com cluster object.

data/aws: add publish strategy External/Internal feature

Adds the aws_publish_strategy variable to the terraform root variables to allow skipping public resources.

The brief list of resources that are only created when strategy is External are:

  • Public Load Balancers for API (6443) (aws_lb.api_external, aws_lb_target_group.api_external, aws_lb_listener.api_external_api)
  • Public DNS record (aws_route53_record.api_external)
  • No public IP address associated to bootstrap-host (associate_public_ip_address: false)
  • No security group rule that allows SSH to bootstrap-host from Internet (aws_security_group_rule.ssh), rather switching to VPC only.

Due to terraform issue hashicorp/terraform#12570, where count cannot use variables that are dynamic as they are not computed during plan, certain implicit assumes/hacks need to be continued

An error below shows how the aws_lb_target list can't be used as sole source for allowing other modules like bootstrap and master attach isntances without caring about publish strategy.
Similarly the error also shows how route53 module cannot turn off the public records based on existence of the external LBs alone.

ERROR Error: Invalid count argument
ERROR
ERROR   on ../../../../../../../tmp/openshift-install-940637305/bootstrap/main.tf line 154, in resource "aws_lb_target_group_attachment" "bootstrap":
ERROR  154:   count = length(var.target_group_arns)
ERROR
ERROR The "count" value depends on resource attributes that cannot be determined
ERROR until apply, so Terraform cannot predict how many instances will be created.
ERROR To work around this, use the -target argument to first apply only the
ERROR resources that the count depends on.
ERROR
ERROR
ERROR Error: Invalid count argument
ERROR
ERROR   on ../../../../../../../tmp/openshift-install-940637305/master/main.tf line 131, in resource "aws_lb_target_group_attachment" "master":
ERROR  131:   count = var.instance_count * length(var.target_group_arns)
ERROR
ERROR The "count" value depends on resource attributes that cannot be determined
ERROR until apply, so Terraform cannot predict how many instances will be created.
ERROR To work around this, use the -target argument to first apply only the
ERROR resources that the count depends on.
ERROR
ERROR
ERROR Error: Invalid count argument
ERROR
ERROR   on ../../../../../../../tmp/openshift-install-940637305/route53/base.tf line 2, in data "aws_route53_zone" "public":
ERROR    2:   count = var.api_external_lb_dns_name != null ? 1 : 0
ERROR
ERROR The "count" value depends on resource attributes that cannot be determined
ERROR until apply, so Terraform cannot predict how many instances will be created.
ERROR To work around this, use the -target argument to first apply only the
ERROR resources that the count depends on.

For instances to communicate with internet, they need to

  • either have public IP in a public subnet or,
  • be launched in private subnet.

For more info see https://forums.aws.amazon.com/thread.jspa?threadID=96369

Therefore the bootstrap instance is created in public subnet for External strategy and private subnet for others

Also the bootstrap instance SSH rules are modified to target Internet (0.0.0.0) for External and allow only VPC CIDR ranges otherwise

xref: https://jira.coreos.com/browse/CORS-1221

/cc @wking

@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 17, 2019
@abhinavdahiya
Copy link
Contributor Author

abhinavdahiya commented Oct 17, 2019

Create your VPC+subnets

$ aws cloudformation create-stack  --stack-name adahiya-byo-vpc --template-body "$(cat upi/aws/cloudformation/01_vpc.yaml)" --parameters ParameterKey=AvailabilityZoneCount,ParameterValue=3
$ aws cloudformation describe-stacks --stack-name adahiya-byo-vpc --query 'Stacks[].Outputs[]' --output json

Create bastion into your VPC.

$ cat upi/aws/cloudformation/01b_bastian.yaml
AWSTemplateFormatVersion: 2010-09-09
Description: Template for OpenShift Cluster Security Elements (Security Groups & IAM)

Parameters:
  VpcId:
    Description: The VPC-scoped resources will belong to this VPC.
    Type: AWS::EC2::VPC::Id
  SubnetId:
    Description: The public subnet.
    Type: AWS::EC2::Subnet::Id
  AllowedSshCidr:
    AllowedPattern: ^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])(\/([0-9]|1[0-9]|2[0-9]|3[0-2]))$
    ConstraintDescription: CIDR block parameter must be in the form x.x.x.x/0-32.
    Default: 0.0.0.0/0
    Description: CIDR block to allow SSH access to the bastian host.
    Type: String
  Ami:
    Description: The AMI to use for bastian.
    Type: AWS::EC2::Image::Id
  SshKeyName:
    Description: The key pair in AWS for bastian.
    Type: String

Metadata:
  AWS::CloudFormation::Interface:
    ParameterGroups:
    - Label:
        default: "Network Configuration"
      Parameters:
      - VpcId
      - SubnetId
      - AllowedSshCidr
    - Label:
        default: "Host Information"
      Parameters:
      - Ami
      - SshKeyName
    ParameterLabels:
      VpcId:
        default: "VPC ID"
      SubnetId:
        default: "Public Subnet"
      AllowedSshCidr:
        default: "Allowed SSH Source"
      Ami:
        default: "AMI ID"
      Ami:
        default: "The key pair name"

Resources:
  BastianSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Bastian Security Group
      SecurityGroupIngress:
      - IpProtocol: tcp
        FromPort: 22
        ToPort: 22
        CidrIp: !Ref AllowedSshCidr
      VpcId: !Ref VpcId

  BastianInstance:
    Type: AWS::EC2::Instance
    Properties:
      ImageId: !Ref Ami
      InstanceType: "m5.large"
      NetworkInterfaces:
      - AssociatePublicIpAddress: "true"
        DeviceIndex: "0"
        GroupSet:
        - !Ref BastianSecurityGroup
        SubnetId: !Ref SubnetId
      KeyName: !Ref SshKeyName

Outputs:
  BastianInstanceId:
    Description: Bastian Instance ID.
    Value: !Ref BastianInstance

  BastianPublicIp:
    Description: The bootstrap node public IP address.
    Value: !GetAtt BastianInstance.PublicIp


$ aws cloudformation create-stack  --stack-name adahiya-byo-vpc-bastian-2 --template-body "$(cat upi/aws/cloudformation/01b_bastian.yaml)" --parameters ParameterKey=VpcId,ParameterValue=vpc-0f582a396680d7058 ParameterKey=SubnetId,ParameterValue=subnet-00d14380eeb2b1111 ParameterKey=Ami,ParameterValue=ami-0e05097610fae5559 ParameterKey=SshKeyName,ParameterValue=adahiya
$ aws cloudformation describe-stacks --stack-name adahiya-byo-vpc-bastian-2 --query 'Stacks[].Outputs[]' --output json

I'm using sshuttle to create a tunnel to the VPC using ssh, it also handles DNS resolution

$ sshuttle --dns -r ec2-user@34.201.100.240 0/0
[local sudo] Password:
client: Connected.

Open another terminal and run the installer with your list of subnets and publish: Internal

$ openshift-install create cluster --dir dev
INFO Consuming Install Config from target directory
INFO Creating infrastructure resources...
INFO Waiting up to 30m0s for the Kubernetes API at https://api.adahiya-1.devcluster.openshift.com:6443...
INFO API v1.16.0-beta.2+a8701aa up
INFO Waiting up to 30m0s for bootstrapping to complete...
INFO Waiting up to 30m0s for the cluster at https://api.adahiya-1.devcluster.openshift.com:6443 to initialize...
INFO Waiting up to 10m0s for the openshift-console route to be created...
INFO Install complete!
INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/home/adahiya/go/src/github.com/openshift/installer/dev/auth/kubeconfig'
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.adahiya-1.devcluster.openshift.com
INFO Login to the console with user: kubeadmin, password: xxxx-xxxxx-xxxxx-xxxxx

@abhinavdahiya abhinavdahiya changed the title AWS: gcp: Enable clusters with no public endpoints AWS: Enable clusters with no public endpoints Oct 17, 2019
@abhinavdahiya abhinavdahiya added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 17, 2019
@abhinavdahiya abhinavdahiya force-pushed the aws-internal-only branch 2 times, most recently from 40b2128 to 586800a Compare October 18, 2019 01:25
@abhinavdahiya abhinavdahiya removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 18, 2019
@wking
Copy link
Member

wking commented Oct 18, 2019

I'm using sshuttle to create a tunnel to the VPC using ssh...

Could you SSH in and run the installer on the bastion? I'd expect you'd have AWS-endpoint access to any cluster-managed AWS resources you'd need, because the cluster would be managing those resources going forward. And it would get us one step closer to the installer-appliance AMI flow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂

@abhinavdahiya
Copy link
Contributor Author

I'm using sshuttle to create a tunnel to the VPC using ssh...

Could you SSH in and run the installer on the bastion?

It would mean I have to ssh, and move all the binaries, configuration, secrets to the Bastian and then run the installer, sshuttle was extremely useful for me to develop and iterate easily. Bonus sshuttle captured my DNS and I could access the cluster console through my browser...

Most of our customers will have their computers already configured to have access to network only private resources.. it was easy for me...

I'd expect you'd have AWS-endpoint access to any cluster-managed AWS resources you'd need, because the cluster would be managing those resources going forward. And it would get us one step closer to the installer-appliance AMI flow.

I agree 💯 but a UI based installer is very important to make that easy to use.

This commit adds a enum(string) type PublishingStrategy. It supports 2 options

* ExternalPublishingStrategy : the endpoints are exposed to the Internet
* InternalPublishingStrategy : the endpoints are exposed to the `Private Network` only

This enum type is added to the InstallConfig to control the strategy for the cluster endpoints. Cluster endpoints include the API server,
the default Ingress controller, public IPs etc.

Also adds the defaulting to make sure this strategy is `External` by default, keeping the backward-compatible behavior, and validation to make sure only valid enum options are set.

Also it looks like gofmt prefers `{}` over `struct{}{}` in validation map.
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_installer/2526/pull-ci-openshift-installer-master-gofmt/6229
For aws, only when the strategy is External, do we set the public Route53 zone for the basedomain in DNSes.config.openshift.com cluster object.

Also for aws, the master machines objects are configured to belong to the External Load Balancer target group.
@abhinavdahiya
Copy link
Contributor Author

All green :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 12570 was closed with 0.12's better error message, and the actual support tracking has moved to terraform#4149. Dunno if it's worth updating your links?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 12570 definitely has the correct context

@wking
Copy link
Member

wking commented Oct 18, 2019

Left three nits, but I don't care if they get fixed in this PR, in follow-up work, or never. Pull the hold if you want to punt on them.

/lgtm
/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 18, 2019
@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 18, 2019
Adds the `aws_publish_strategy` variable to the terraform root variables to allow skipping public resources.

The brief list of resources that are only created when strategy is External are:

- Public Load Balancers for API (6443) (aws_lb.api_external, aws_lb_target_group.api_external, aws_lb_listener.api_external_api)
- Public DNS record (aws_route53_record.api_external)
- No public IP address associated to bootstrap-host (associate_public_ip_address: false)
- No security group rule that allows SSH to bootstrap-host from Internet (aws_security_group_rule.ssh), rather switching to VPC only.

Due to terraform issue hashicorp/terraform#12570, where count cannot use variables that are dynamic as they are not computed during plan, certain implicit assumes/hacks need to be continued

An error below shows how the aws_lb_target list can't be used as sole source for allowing other modules like bootstrap and master attach isntances without caring about publish strategy.
Similarly the error also shows how route53 module cannot turn off the public records based on existence of the external LBs alone.

```
ERROR Error: Invalid count argument
ERROR
ERROR   on ../../../../../../../tmp/openshift-install-940637305/bootstrap/main.tf line 154, in resource "aws_lb_target_group_attachment" "bootstrap":
ERROR  154:   count = length(var.target_group_arns)
ERROR
ERROR The "count" value depends on resource attributes that cannot be determined
ERROR until apply, so Terraform cannot predict how many instances will be created.
ERROR To work around this, use the -target argument to first apply only the
ERROR resources that the count depends on.
ERROR
ERROR
ERROR Error: Invalid count argument
ERROR
ERROR   on ../../../../../../../tmp/openshift-install-940637305/master/main.tf line 131, in resource "aws_lb_target_group_attachment" "master":
ERROR  131:   count = var.instance_count * length(var.target_group_arns)
ERROR
ERROR The "count" value depends on resource attributes that cannot be determined
ERROR until apply, so Terraform cannot predict how many instances will be created.
ERROR To work around this, use the -target argument to first apply only the
ERROR resources that the count depends on.
ERROR
ERROR
ERROR Error: Invalid count argument
ERROR
ERROR   on ../../../../../../../tmp/openshift-install-940637305/route53/base.tf line 2, in data "aws_route53_zone" "public":
ERROR    2:   count = var.api_external_lb_dns_name != null ? 1 : 0
ERROR
ERROR The "count" value depends on resource attributes that cannot be determined
ERROR until apply, so Terraform cannot predict how many instances will be created.
ERROR To work around this, use the -target argument to first apply only the
ERROR resources that the count depends on.
```

For instances to communicate with internet, they need to
- either have public IP in a public subnet or,
- be launched in private subnet.

For more info see https://forums.aws.amazon.com/thread.jspa?threadID=96369

Therefore the bootstrap instance is created in public subnet for External strategy and private subnet for others

Also the bootstrap instance SSH rules are modified to target Internet (0.0.0.0) for External and allow only VPC CIDR ranges otherwise
The install-config's publish strategy is passed to the terraform using tf-variables

Since in Internal strategy, public subnets are going to be empty, making sure `not set` is not propagated in pre-existing VPC.
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Oct 18, 2019
@abhinavdahiya
Copy link
Contributor Author

/hold cancel

fixed two nits from #2526 (comment)

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 18, 2019
@wking
Copy link
Member

wking commented Oct 18, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 18, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [abhinavdahiya,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@abhinavdahiya
Copy link
Contributor Author

/retest

@openshift-ci-robot
Copy link
Contributor

@abhinavdahiya: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws-scaleup-rhel7 2bd7e95 link /test e2e-aws-scaleup-rhel7

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 61fe0ea into openshift:master Oct 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants