-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NAT Gateway Support #43
Comments
should we allow customers to define how many nat gateways to create and in which zone (and in which subnet)?
|
Hmm we discussed that, the main benefit is redundancy, right? Means if the nat gateway fails in one zone, then only machines in this zone have no egress and the others in the other zones are still fine. On the other hand side this would not be possible for Availability Set based clusters (distribution accross fault domains is not possible?) and come with more costs, because we need in this case for each zone a nat and not only one per cluster. |
Just to summarise it. The current network setup for Azure Shoot clusters consists of one subnet within a virtual network (vnet) and machines assigned to an AvailabilitySet or machines distributed across zones can be attached to the subnet. This approach has some implications for the natgateway integration:
Btw, if we decide to go with multiple natgateways later on, means one per zone, then we could easily extend the suggested structure by adding a zone information to each ip address or ip address range. Gardener Azure extension controller would then only create a public ip(s) for zones were the user does not specify an ip or ip range.
|
Another finding. The natgateway attached a subnet which has a That would mean if we want to enable natgateways for AvailabilitySet based cluster then we would need to switch to |
It seems there is a bug in the |
I propose to go one with the Azure NatGateway integration in several steps.
|
Thanks @dkistner. Since users that deploy to multiple AZs want to do that for availability reasons, we should definitely have one NAT GW per zone, otherwise we introduce singletons and break the main motivation to go for AZs in the first place. We will never be able to explain/motivate this decision when push comes to shove. Enforcing NAT GWs for AvSet-based clusters is acceptable, because we don't want to go on with AvSet-based clusters as of today for two main reasons:
The plan above also makes a lot of sense in regards to the TF bug. Maybe MS could help here fixing it, or we, or we could use the native SDKs, but "sitting it out" is absolutely fair, especially given all the work we have to do. |
So with the current network setup for Azure Shoots we have only one subnet and machines distributed across several zones can be attached to it. So far this wasn't an issue as the Standard LoadBalancer which was used for ingress and egress (which is still the default, as NatGateway should be optional) is deployed automatically zone redundant by Azure. Azure subnets can currently only associate one NatGateway, which means that multiple NatGateways in different zones attached to the same Subnet is not possible. So at the moment I see only two possibilities to enable zone redundant NatGateways:
So for now I see only option 1. as short term solution, but with more effort to enable redundant NatGateways. That's probably also the only option which Gardener fully control. Option 2. would mean wait and go on for now without NatGateway HA. This will be anyways an optional feature and therefore we can reason, if zone redundant reliable Nating is required please go with Standard LoadBalancer. But as I mentioned. AvailabilitySet are also a valid HA mechanism for machines and in this case we will always have just one subnet and therefore only one NatGateway (expect when multiple NatGateways are allowed to be attached to one subnet). |
Well, "(1) larger effort" and "I see only (1) as short term solution" doesn't fir together for me. ;-) Considering what you wrote, I would then do (2), i.e. sit it out and hope nobody escalates before MS/Azure changes that. If MS/Azure offers cross-AZ subnets and has "zone-aware LBs" then I would expect that for all resources as well. AWS does it differently and scopes by zone. |
:D Sorry for the misleading statements. I meant I see more implementation efforts for (1) because of the change in the network layout and the migration logic to move machines from one subnet to another.Of course we could do that. With short term I mean maybe within weeks, for (2) I do not know. I can only estimate and I would guess months... I'm also for (2) in general because we still have the Standard LoadBalancer with zone redundancy which should be sufficient for most cases. |
Hi @vlerenc |
The step 3 to make the NatGateway usable in combination with AvailabilitySets will be probably not be implemented as we are planning to deprecate AvailabilitySet based clusters and replace them with clusters based on VirtualMachineScaleSet Orchestration mode VM (VMO) in the mid term. Those cluster will be out of the box compatible with NatGateway. |
What would you like to be added:
Azure will offer soon a NAT service (currently in a private preview) and there are some scenarios when users could need a dedicated nat service e.g. whitelisting scenarios which require a stable ip(s) for egress connections initiated within the cluster.
Currently all egress traffic from a Gardener managed Azure cluster is routed via the cluster load balancer.
As the nat gateway will come with additional costs I would recommend to integrate it optionally and make it configurable for the users.
As the nat gateway require always to have at least one public ip assigned. I would propose to make it possible for users to pass their public ip address(es) or public ip address range(s) to the extension via
.spec.providerConfig.networks.natGateway.ipAddresses[] | .ipAddressRanges[]
. Only if both lists are empty the Gardener extension would create one public ip and assign it to the service.Later on when we go on with multiple Availability Set support we will probably need to make the nat gateway required. In this case the
.spec.providerConfig.networks.natGateway.enabled
need to be alwaystrue
.Why is this needed:
Support scenarios which require a dedicated nat service e.g. whitelisting scenarios.
Status
Step 3 – Enable NatGateway for AvailabilitySet based/non zoned clusters, when Standard Load Balancer is integrated for non zoned clusters (larger effort, due to LoadBalancer migration etc.). This will probably will require to deploy the NatGateway mandatory/non optional in addition to the Standard LoadBalancer.cc @vlerenc, @AndreasBurger, @MSSedusch
The text was updated successfully, but these errors were encountered: