Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Oct 1, 2019

Builds on #2437; no need to review before that lands.

The installer will no longer create (vs. the fully-IPI flow):

  • Internet gateways (aws_internet_gateway)
  • NAT gateways (aws_nat_gateway, aws_eip.nat_eip)
  • Subnets (aws_subnet)
  • Route tables (aws_route_table, aws_route, aws_route_table_association, aws_main_route_table_association)
  • VPCs (aws_vpc)
  • VPC DHCP options (aws_vpc_dhcp_options, vpc_dhcp_options_association)
  • VPC endpoints (aws_vpc_endpoint)

because those resources define the VPC/subnet networking and we can't pick values for CIDRs and routing and such without a risk of picking something that collides with some other VPC/subnet consumer.

The installer will continue to create (vs. the fully-IPI flow):

  • AMI copies (aws_ami_copy)
  • IAM roles and profiles (aws_iam_instance_profile, aws_iam_role, aws_iam_role_policy)
  • Instances (aws_instance, aws_network_interface.master)
  • Load balancers (aws_lb, aws_lb_listener, aws_lb_target_group, aws_lb_target_group_attachment)
  • Route 53 resources (aws_route53_zone, aws_route53_record)
  • S3 resources (aws_s3_bucket, aws_s3_bucket_object)
  • Security groups (aws_security_group, aws_security_group_rule)

because other VPC/subnet consumers won't care about those.

The installer will still need to grow new creation code (vs. the fully-IPI flow) for:

  • Adding kubernetes.io/cluster/.*: shared to the user-provided subnets. This will happen in Go immediately before calling out to Terraform to begin creating new resources, but I'm punting on that for this pull request.

@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 1, 2019
@wking wking force-pushed the aws-bring-your-own-vpc branch from 04ef9f8 to 93e4c71 Compare October 1, 2019 00:21
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 1, 2019
@abhinavdahiya
Copy link
Contributor

/retest

@wking wking force-pushed the aws-bring-your-own-vpc branch 4 times, most recently from d25ace5 to 12ead21 Compare October 2, 2019 18:42
@wking
Copy link
Member Author

wking commented Oct 2, 2019

images:

 + go build -ldflags ' -X github.com/openshift/installer/pkg/version.Raw=unreleased-master-1852-g434c2dbeea19e03b8f2f1db612fae966dab9084f-dirty -X github.com/openshift/installer/pkg/version.Commit=434c2dbeea19e03b8f2f1db612fae966dab9084f -s -w' -tags ' release' -o bin/openshift-install ./cmd/openshift-install
error: build error: no such image 

Dunno what that is.

/retest

@wking wking force-pushed the aws-bring-your-own-vpc branch 2 times, most recently from 408576e to d4d1cc3 Compare October 2, 2019 21:36
@wking
Copy link
Member Author

wking commented Oct 3, 2019

Success :). Still need to get the shared tagging worked up, at which point this may be sufficient for some early CI work.

@abhinavdahiya
Copy link
Contributor

Success :). Still need to get the shared tagging worked up, at which point this may be sufficient for some early CI work.

The green only means it doesn't break exiting code-flow, and doesn't provide any information if the new code-flow works. It would be great if you can provide some test run example correctly setting up the resources..

@wking wking force-pushed the aws-bring-your-own-vpc branch from d4d1cc3 to 500337d Compare October 6, 2019 08:22
@wking
Copy link
Member Author

wking commented Oct 6, 2019

$ hack/build.sh
$ openshift-install version
openshift-install unreleased-master-1863-g500337dad0fcdfe6a060107468dfedf59a37822b
built from commit 500337dad0fcdfe6a060107468dfedf59a37822b
release image registry.svc.ci.openshift.org/origin/release:4.3
$ aws cloudformation create-stack  --stack-name wking-vpc --template-body "$(cat upi/aws/cloudformation/01_vpc.yaml)" --parameters ParameterKey=AvailabilityZoneCount,ParameterValue=3
$ aws cloudformation wait stack-create-complete --stack-name wking-vpc
$ VPC_JSON="$(aws cloudformation describe-stacks --stack-name wking-vpc --query 'Stacks[].Outputs[]' --output json)"
$ export TF_VAR_aws_vpc="$(echo "${VPC_JSON}" | jq -r '.[] | select(.OutputKey == "VpcId").OutputValue')"
$ export TF_VAR_aws_private_subnets="$(echo "${VPC_JSON}" | jq -r '.[] | select(.OutputKey == "PrivateSubnetIds").OutputValue | split(",")')"
$ export TF_VAR_aws_public_subnets="$(echo "${VPC_JSON}" | jq -r '.[] | select(.OutputKey == "PublicSubnetIds").OutputValue | split(",")')"
$ mkdir wking
$ cp aws.yaml wking/install-config.yaml
$ openshift-install --dir wking create cluster
...
INFO Install complete!
...
$ cat wking/.openshift_install.log
...
time="2019-10-06T00:47:20-07:00" level=info msg="Creating infrastructure resources..."
time="2019-10-06T00:47:21-07:00" level=debug msg="tagging arn:aws:ec2:us-west-2:269733383066:vpc/vpc-042d24ae44492d91b with kubernetes.io/cluster/wking-bn4p5: shared"
time="2019-10-06T00:47:21-07:00" level=debug msg="tagging arn:aws:ec2:us-west-2:269733383066:subnet/subnet-010a3c6fa6b41adf3 with kubernetes.io/cluster/wking-bn4p5: shared"
time="2019-10-06T00:47:21-07:00" level=debug msg="tagging arn:aws:ec2:us-west-2:269733383066:subnet/subnet-07b9f69f76f604cca with kubernetes.io/cluster/wking-bn4p5: shared"
time="2019-10-06T00:47:21-07:00" level=debug msg="tagging arn:aws:ec2:us-west-2:269733383066:subnet/subnet-087a619e9d0ec9aa2 with kubernetes.io/cluster/wking-bn4p5: shared"
time="2019-10-06T00:47:21-07:00" level=debug msg="tagging arn:aws:ec2:us-west-2:269733383066:subnet/subnet-06774084e67d2b987 with kubernetes.io/cluster/wking-bn4p5: shared"
time="2019-10-06T00:47:21-07:00" level=debug msg="tagging arn:aws:ec2:us-west-2:269733383066:subnet/subnet-03a1ec852b85df3aa with kubernetes.io/cluster/wking-bn4p5: shared"
time="2019-10-06T00:47:21-07:00" level=debug msg="tagging arn:aws:ec2:us-west-2:269733383066:subnet/subnet-0cedf32418a8d4f24 with kubernetes.io/cluster/wking-bn4p5: shared"
time="2019-10-06T00:47:21-07:00" level=debug msg="Symlinking plugin terraform-provider-google src: \"/home/trking/.local/lib/go/src/github.com/openshift/installer/bin/openshift-install\" dst: \"/tmp/openshift-install-412074759/plugins/terraform-provider-google\""
...
time="2019-10-06T00:47:22-07:00" level=debug msg="If you ever set or change modules or backend configuration for Terraform,"
time="2019-10-06T00:47:22-07:00" level=debug msg="rerun this command to reinitialize your working directory. If you forget, other"
time="2019-10-06T00:47:22-07:00" level=debug msg="commands will detect it and remind you to do so if necessary."
time="2019-10-06T00:47:26-07:00" level=debug msg="module.vpc.data.aws_subnet.private[0]: Refreshing state..."
time="2019-10-06T00:47:26-07:00" level=debug msg="module.vpc.data.aws_vpc.cluster_vpc: Refreshing state..."
time="2019-10-06T00:47:26-07:00" level=debug msg="module.vpc.data.aws_subnet.private[1]: Refreshing state..."
time="2019-10-06T00:47:26-07:00" level=debug msg="module.vpc.data.aws_subnet.private[2]: Refreshing state..."
time="2019-10-06T00:47:26-07:00" level=debug msg="module.vpc.data.aws_subnet.public[0]: Refreshing state..."
time="2019-10-06T00:47:26-07:00" level=debug msg="module.vpc.data.aws_subnet.public[2]: Refreshing state..."
time="2019-10-06T00:47:26-07:00" level=debug msg="module.vpc.data.aws_subnet.public[1]: Refreshing state..."
time="2019-10-06T00:47:26-07:00" level=debug msg="module.dns.data.aws_route53_zone.public: Refreshing state..."
time="2019-10-06T00:47:31-07:00" level=debug msg="module.bootstrap.aws_iam_role.bootstrap: Creating..."
time="2019-10-06T00:47:31-07:00" level=debug msg="module.masters.aws_iam_role.master_role: Creating..."
time="2019-10-06T00:47:31-07:00" level=debug msg="module.iam.aws_iam_role.worker_role: Creating..."
time="2019-10-06T00:47:31-07:00" level=debug msg="module.vpc.aws_security_group.worker: Creating..."
...

So that looks good to me :). But we'll want to land #2460 and #2453 before this PR.

@wking wking force-pushed the aws-bring-your-own-vpc branch from 500337d to d0efe35 Compare October 6, 2019 08:23
@wking wking force-pushed the aws-bring-your-own-vpc branch from d0efe35 to 7df157d Compare October 6, 2019 09:20
@wking wking force-pushed the aws-bring-your-own-vpc branch from 7df157d to 68be627 Compare October 6, 2019 09:23
@wking wking force-pushed the aws-bring-your-own-vpc branch from 68be627 to 120ce26 Compare October 6, 2019 09:34
@abhinavdahiya
Copy link
Contributor

/wip

The code has fixme comments

@abhinavdahiya abhinavdahiya added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 6, 2019
@wking
Copy link
Member Author

wking commented Oct 6, 2019

The code has fixme comments

Those are for install-config injection (vs. my current TF_VAR_* hack). I don't think we need to block the PR on that, but we can if you like. If we don't block on it, it makes things like blackholed-subnet proxy testing easier in the short term, but we'll obviously want the install-config knobs this week regardless of which PR they come in.

@abhinavdahiya
Copy link
Contributor

The code has fixme comments

Those are for install-config injection (vs. my current TF_VAR_* hack). I don't think we need to block the PR on that,

I don't think it is acceptable to merge the TF_var style changes.

I would rather accept a install-config wiring that is brittle rather than the env hacks.

@abhinavdahiya
Copy link
Contributor

Also can we move 105e9d9 out to a separate PR.. it's different enough that we can keep it out of already big PR.

The installer will no longer create (vs. the fully-IPI flow):

* Internet gateways (aws_internet_gateway)
* NAT gateways (aws_nat_gateway, aws_eip.nat_eip)
* Subnets (aws_subnet)
* Route tables (aws_route_table, aws_route,
  aws_route_table_association, aws_main_route_table_association)
* VPCs (aws_vpc)
* VPC DHCP options (aws_vpc_dhcp_options, vpc_dhcp_options_association)
* VPC endpoints (aws_vpc_endpoint)

because those resources define the VPC/subnet networking and we can't
pick values for CIDRs and routing and such without a risk of picking
something that collides with some other VPC/subnet consumer.

The installer will continue to create (vs. the fully-IPI flow):

* AMI copies (aws_ami_copy)
* IAM roles and profiles (aws_iam_instance_profile, aws_iam_role,
  aws_iam_role_policy)
* Instances (aws_instance, aws_network_interface.master)
* Load balancers (aws_lb, aws_lb_listener, aws_lb_target_group,
  aws_lb_target_group_attachment)
* Route 53 resources (aws_route53_zone, aws_route53_record)
* S3 resources (aws_s3_bucket, aws_s3_bucket_object)
* Security groups (aws_security_group, aws_security_group_rule)

because other VPC/subnet consumers won't care about those.

The installer will still need to grow new creation code (vs. the
fully-IPI flow) for:

* Adding kubernetes.io/cluster/.*: shared to the user-provided
  subnets.  This will happen in Go immediately before calling out to
  Terraform to begin creating new resources, but I'm punting on that
  for this commit.

I'd tried to use:

  id = var.vpc ? var.vpc : aws_vpc.new_vpc.*.id"

and similar, hoping for something like Python's truthiness [1], but
Terraform rejected that with [2]:

  The condition value is null. Conditions must either be true or false.

aws_vpc.new_vpc[0].id and similar for the newly conditional resources
(count = var.vpc == null ? 1 : 0, and similar) follows [3]:

  For resources where count is set -- even if the expression evaluates
  to 1 -- aws_instance.example returns a list of objects whose length
  is decided by the count. In this case aws_instance.example.id is an
  error, and must instead be written as aws_instance.example[0].id to
  access one of the objects before retrieving its id attribute value.

[1]: https://docs.python.org/3.6/library/stdtypes.html#truth-value-testing
[2]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_installer/2438/pull-ci-openshift-installer-master-e2e-aws/7844
[3]: https://www.terraform.io/upgrade-guides/0-12.html#working-with-count-on-resources
@wking wking force-pushed the aws-bring-your-own-vpc branch from aa14435 to bd92182 Compare October 7, 2019 19:56
@wking
Copy link
Member Author

wking commented Oct 7, 2019

Also can we move 105e9d9 out to a separate PR...

I've just punted everything after Terraform and rebased onto master with aa1443545 -> bd92182. I'll file the other stuff in follow-up work.

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 7, 2019
@wking
Copy link
Member Author

wking commented Oct 7, 2019

/wip cancel

@abhinavdahiya
Copy link
Contributor

/approve

@abhinavdahiya
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 7, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [abhinavdahiya,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wking
Copy link
Member Author

wking commented Oct 7, 2019

Upgrade failed, but it doesn't test code from in-flight installer PRs anyway.

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 3967b2f into openshift:master Oct 8, 2019
@openshift-ci-robot
Copy link
Contributor

@wking: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws-disruptive d4d1cc3ac452379a8b941a0f427e083935bbb21f link /test e2e-aws-disruptive
ci/prow/e2e-openstack bd92182 link /test e2e-openstack

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants