Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Oct 9, 2019

To avoid:

The DNS provider failed to ensure the record: failed to get hosted zone for load balancer target "afdffb7c0550244d39ac35bdfd21fff0-1591348991.us-east-1.elb.amazonaws.com": failed to describe load balancers: RequestError: send request failed caused by: Post https://elasticloadbalancing.us-east-1.amazonaws.com/: dial tcp 54.239.29.168:443: i/o timeout

when running the DNS provider in a blackhole where it can't get out to the wider internet. With the LB endpoint exposed inside the VPC, the operator will be able to reach the cloud provider without needing to go through the proxy. I expect we will want the ingress operator to respect proxy settings anyway, but at least with this change we won't need that for blackholed private subnets.

docs/user/aws/images/install_upi.dia needs to be updated to reflect the new endpoint, but I couldn't figure that out on short notice and it seems safe to punt. There's also an issue with tagging:

The DNS provider failed to ensure the record: failed to find hosted zone for record: failed to get tagged resources: RequestError: send request failed caused by: Post https://tagging.us-east-1.amazonaws.com/: dial tcp 52.94.233.76:443: i/o timeout

but apparently no VPC endpoints for tagging. But one step at a time ;).

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 9, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 9, 2019
To avoid [1]:

  The DNS provider failed to ensure the record: failed to get hosted
  zone for load balancer target
  "afdffb7c0550244d39ac35bdfd21fff0-1591348991.us-east-1.elb.amazonaws.com":
  failed to describe load balancers: RequestError: send request failed
  caused by: Post
  https://elasticloadbalancing.us-east-1.amazonaws.com/: dial tcp
  54.239.29.168:443: i/o timeout

when running the DNS provider in a blackhole where it can't get out to
the wider internet.  With the LB endpoint exposed inside the VPC, the
operator will be able to reach the cloud provider without needing to
go through the proxy.  I expect we will want the ingress operator to
respect proxy settings anyway, but at least with this change we won't
*need* that for blackholed private subnets.

docs/user/aws/images/install_upi.dia needs to be updated to reflect
the new endpoint, but I couldn't figure that out on short notice and
it seems safe to punt.  There's also an issue with tagging [1]:

  The DNS provider failed to ensure the record: failed to find hosted
  zone for record: failed to get tagged resources: RequestError: send
  request failed caused by: Post
  https://tagging.us-east-1.amazonaws.com/: dial tcp 52.94.233.76:443:
  i/o timeout

but apparently no VPC endpoints for tagging.  But one step at a time
;).

Tagging these VPC endpoints themselves would be nice too [2], but the
AWS Terraform provider grew support for that in 2.16.0 [3], and we're
still at 2.10.0 per pkg/terraform/exec/plugins/Gopkg.lock.

[1]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/5308/rehearse-5308-pull-ci-openshift-installer-master-e2e-aws-proxy/10/artifacts/e2e-aws-proxy/must-gather/namespaces/openshift-ingress-operator/ingress.operator.openshift.io/dnsrecords/default-wildcard.yaml
[2]: https://aws.amazon.com/about-aws/whats-new/2019/05/amazon-vpc-endpoints-now-support-tagging-for-gateway-endpoints-interface-endpoints-and-endpoint-services/
[3]: hashicorp/terraform-provider-aws@864285b
Generated with:

  $ dia install_upi.dia  # manual edits
  $ make
@wking
Copy link
Member Author

wking commented Oct 9, 2019

Added the Dia edits.

)
}

resource "aws_vpc_endpoint" "elasticloadbalancing" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So i don't think we should be creating these endpoints in IPI as they don't provide great value to the cluster.

If they do, let's start a general discussion about these elsewhere, github issue / enhacements.

DestinationCidrBlock: 0.0.0.0/0
NatGatewayId:
Ref: NAT3
LoadBalancingEndpoint:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again this UPI templates are reference only. And I don't think we allow special env setup to handle these and not add these for all.

This is not required for openshift clusters and don't provide value.

@abhinavdahiya
Copy link
Contributor

/hold

I think we should have a larger discussion on IPI and VPC endpoints.

and the change to UPI templates which is for reference only and bare minimum for installs doesn't seem to need these endpoints.

If the proxy test needs to create an environment, i would hope the test handle those separately.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 9, 2019
@openshift-ci-robot
Copy link
Contributor

@wking: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws 1bc4717 link /test e2e-aws
ci/prow/e2e-aws-scaleup-rhel7 1bc4717 link /test e2e-aws-scaleup-rhel7
ci/prow/e2e-openstack 1bc4717 link /test e2e-openstack

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@wking
Copy link
Member Author

wking commented Oct 9, 2019

I think we should have a larger discussion on IPI and VPC endpoints.

Sounds good. Spun off into #2486, because it's installer-specific and not a cross-team issue.

and the change to UPI templates which is for reference only and bare minimum for installs doesn't seem to need these endpoints.

By this reasoning we should drop our existing S3 VPC endpoint, which is for saving customers money (per #745) and allows for registry access without going through a proxy even from blackholed subnets. And we should be very clear about which environments we are targeting as a bare minimum (e.g. the templates would not be sufficient for blackholed private subnets). But I'm fine circling back to this after we settle #2486.

@abhinavdahiya
Copy link
Contributor

abhinavdahiya commented Oct 9, 2019

By this reasoning we should drop our existing S3 VPC endpoint, which is for saving customers money (per #745)

because you pull data from s3 apis, it is definitely saving our customers money. the elb is not.

And we should be very clear about which environments we are targeting as a bare minimum (e.g. the templates would not be sufficient for blackholed private subnets).

The default / getting ready / basic installation. blacked-holed subnets created by the template is not that. is the user skip the vpc and wants to do more advanced that's an exercise for them.

@danehans
Copy link
Contributor

@wking I don't believe this PR is required for proxy. The ingress operator calls the aws elb api, but that point it moot if we state https://github.com/openshift/installer/blob/master/docs/user/aws/install_upi.md#remove-dns-zones is required for proxy black-holed vpc setups. The k8s controller-manager calls the elb api when a svc of type LoadBalancer is created. The controller-manager did not properly support proxy prior to openshift/cluster-kube-controller-manager-operator#285. cc: @ewolinetz

@abhinavdahiya
Copy link
Contributor

I think we need a better consensus on addition to IPI setup #2486

and it seems like it doesn't help the UPI templates too #2485 (comment) and #2486 (comment)

So closing this for now. we can use the github issue to discuss this before opening a change.

/close

@openshift-ci-robot
Copy link
Contributor

@abhinavdahiya: Closed this PR.

Details

In response to this:

I think we need a better consensus on addition to IPI setup #2486

and it seems like it doesn't help the UPI templates too #2485 (comment) and #2486 (comment)

So closing this for now. we can use the github issue to discuss this before opening a change.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants