Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Allocated ports on load balancer (using SNAT) are not released when the connection is closed #4697

Open
daconstenla opened this issue Dec 11, 2024 · 0 comments
Labels

Comments

@daconstenla
Copy link

daconstenla commented Dec 11, 2024

Describe the bug

Since around November 18th (but varies depending on the environment; we have 3) we have noticed that short lived connections, that are opened and closed, to a postgres flexible server publicly exposed, are leaving SNAT ports allocated in the load balancer (named: kubernetes handling the aksOutboundRule), after multiple iterations and before the idle timeout goes by, the number of available ports goes down to 0, effectively dropping new connections that require new port allocation.

To Reproduce

For us the problem is happening when using runatlantis/atlantis to run terraform to configure postgres flexible using cyrilgdn/terraform-provider-postgresql.

We are configuring in 1 database as example:

  • 21 postgresql_role
  • 21 postgresql_schema
  • 21 postgresql_grant
  • 63 postgresql_grant_role

that means around 126 connections that will be open and close (no connection pool) to the same HOST and PORT which lead to many allocated ports in the load balancer.

Note

the setup was there before we started noticing the actual problem, nothing remarkable from our setup changed around detection date.

Expected behavior

We'd expect that a connection that's effectively opened and closed will not leave an allocated port.

Screenshots

Last month of allocated SNAT ports on the public load balanacer kubernetes on 3 different environments:
Image

Environment (please complete the following information):

  • az cli 2.67.0
  • kubectl Version v1.30.6
  • Kubernetes Server Version: v1.30.6 and v1.30.5
  • network configuration: Azure CNI Node Subnet
  • Network policy: Calico
  • istio 1.24.1 (full mesh and ambient mode)
  • IPv4 only
  • single public IP per load balancer
  • default allocated outbound ports (per instance)

Additional context

We have mitigated the problem by:

  • increasing the allocated outbound ports (per instance)
  • attaching more IPs to the load balancer(s)

We have verified that connections are opened and closed inspired by the following Troubleshoot doc https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/connectivity/snat-port-exhaustion:

N.con  state    binary           ip
325    connect  terraform-provi  <IP-flexible-server>
325    close    terraform-provi  <IP-flexible-server>

also used netstat in the atlantis pod to verify connections are open and closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant