Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCP Cloud Provider: Source IP preservation for Virtual IPs #27

Closed
girishkalele opened this issue Jul 17, 2016 · 87 comments
Closed

GCP Cloud Provider: Source IP preservation for Virtual IPs #27

girishkalele opened this issue Jul 17, 2016 · 87 comments
Assignees
Labels
sig/network Categorizes an issue or PR as relevant to SIG Network. stage/stable Denotes an issue tracking an enhancement targeted for Stable/GA status
Milestone

Comments

@girishkalele
Copy link

girishkalele commented Jul 17, 2016

Feature Description

@mwitkow
Copy link

mwitkow commented Jul 19, 2016

Does this involve preserving the visibility of the Source IP of the TCP connection that hits Pod which is behind a Service with an external IP address? Similar to what's discussed in kubernetes/kubernetes#10921 ?

@girishkalele
Copy link
Author

Yes, this is exactly that feature, sorry I haven't had time to convert the design proposal from Google doc into Markdown.

@mwitkow
Copy link

mwitkow commented Jul 20, 2016

@girishkalele great to know. Can I ask a couple of questions for clarification?

We're operating Kubernetes for non-HTTP traffic (gRPC, and just plain TCP) on GCE using L3 Load Balancing (Network LB). We direct our L3 LBs to a bunch of Frontend Instances running kube-proxy. We rely on GCE network routes (programmed through Flannel) to direct the traffic to the Pod's IP address on a Worker Instance.

Ideally, what we'd like is the Frontend Instance kube-proxy would substitute the IP DST IP and port to point to the Pod's IP and port using DNAT. Then the GCE network/flannel stuff to do its magic and have the traffic end up on the Worker Instance and the Pod IP:port. Once our service running in the Pod responds, the kube-proxy on the Worker Instance would SNAT rewrite the SRC IP and port (which is the private Pod IP:port) to be the IP:port of the External service and drop it on the GCE network for egress. This would mean we get Direct Server Return (DSR) for free.

Is that roughly what you guys are thinking of doing?

@girishkalele
Copy link
Author

@mwitkow

No...what we are planning is to eliminate the double hop by using GCE load balancer health checks. Instances/VMs that host endpoints in that service will respond healthy, and all other instances will respond unhealthy - thus, the GCE LB will split traffic only across nodes that host the service in local pods.

@erictune
Copy link
Member

Does that mean that Source IP won't be preserved on non-GCE cloud providers?

@girishkalele
Copy link
Author

Sorry, I should have said "Cloud Load Balancer" instead of GCE.
Source IP preservation will also be possible on other cloud providers, we just need to program the health checks using the provider's API - this will be an update in the pkg/cloudprovider/ package.

Note:
The new feature will be opt-in by adding an annotation in the service spec; the old LB behavior will still be the default. For cloud provider packages that are not updated, the annotation will be ignored. This will allow the contributors to catch up in subsequent releases without requiring a hard change-over for all providers in one release.

@mwitkow
Copy link

mwitkow commented Jul 20, 2016

@girishkalele ok, that makes sense. But that means that the MASQUARADE step needs to go away (and be replaced with SNAT+DNAT I presume?), otherwise the service in the Pod won't see the SRC ip.

@therc
Copy link
Member

therc commented Jul 24, 2016

This is very close to what I am working on for node-local services, in that kube-proxy will have to go from a node-agnostic approach, where its iptables rules look the same on all nodes, to a node-aware one, where the rules and behaviour are tailored to each host.

@girishkalele, how do you plan on plumbing the node's IP through? It can be tricky on some cloud providers, namely AWS. And speaking of AWS, if you talk HTTP and X-Forwarded-For works for you, then you can already get the original source IP through the L7 annotations I implemented: https://github.com/kubernetes/kubernetes.github.io/blob/a135f0a717803b35ec80563768d5abb4b45ac4d1/docs/user-guide/services/index.md#ssl-support-on-aws

That said, removing extra hops is a great thing.

@girishkalele
Copy link
Author

@mwitkow

Yes, we eliminate the SNAT (and Masquerade rules shouldn't be hit for traffic with external source ips).

@girishkalele
Copy link
Author

@therc

I didn't need to pass the node IP to kube-proxy for this, atleast on GCE, none of the iptables rules need it for my purposes - did you have a need for it for the local-service implementation ?

I see the following note in that user-guide -
TCP and SSL will select layer 4 proxying: the ELB will forward traffic without modifying the headers.

The AWS LB looks like it will do L7 balancing with source ip transferred to the headers but for generic TCP services it wouldn't work. BTW, does it do websockets and HTTP ?

@aronchick
Copy link
Contributor

Please have these discussions on the design proposal. Let's try to keep the
feature repo meta.

On Mon, Jul 25, 2016 at 10:56 AM, Girish Kalele [email protected]
wrote:

@therc https://github.com/therc

I didn't need to pass the node IP to kube-proxy for this, atleast on GCE,
none of the iptables rules need it for my purposes - did you have a need
for it for the local-service implementation ?

I see the following note in that user-guide -
TCP and SSL will select layer 4 proxying: the ELB will forward traffic
without modifying the headers.

The AWS LB looks like it will do L7 balancing with source ip transferred
to the headers but for generic TCP services it wouldn't work. BTW, does it
do websockets and HTTP ?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#27 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AADIdQOMWMkobWz7VXMfT1V5oNl2a4Bfks5qZPjngaJpZM4JOTyt
.

@girishkalele
Copy link
Author

Update: Code and docs merged - code activated only with alpha feature-gate flag.
E2Es cannot run till the E2E Jenkins job with the feature-gate ON is ready (WIP).

@janetkuo
Copy link
Member

janetkuo commented Sep 2, 2016

@girishkalele Please add docs PR numbers in the issue description

k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this issue Sep 7, 2016
Automatic merge from submit-queue

External Load Balancer Source IP Preservation Proposal

Proposal for feature kubernetes/enhancements#27
@jaredbhatti
Copy link

Ping. Can you add the docs PR to the issue description?

@idvoretskyi
Copy link
Member

@girishkalele docs PR question is still actual - please, add the links.

@girishkalele
Copy link
Author

kubernetes/website#1062

@girishkalele
Copy link
Author

CCed kubernetes/docs in the Docs PR and also added to tracking spreadsheet.

@MrHohn
Copy link
Member

MrHohn commented May 18, 2017

@idvoretskyi Could you please copy my previous post onto the top one? I don't yet have the privilege to edit post. Thanks!

@idvoretskyi
Copy link
Member

@MrHohn thanks!

@idvoretskyi idvoretskyi removed the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jun 6, 2017
@MrHohn
Copy link
Member

MrHohn commented Jun 6, 2017

Quick update on this feature:

I'm now working on updating the existing docs to reference the new first class fields instead of beta annotations.

@idvoretskyi
Copy link
Member

@MrHohn thanks

@liggetm
Copy link

liggetm commented Jul 3, 2017

I've tried running with NodePort and the OnlyLocal annotation (bare-metal/flannel) but receive no traffic on local pod. When the annotation is applied to the service, packets are not marked for masquerading in iptables but still always dropped by a rule that states "servicexyz has no local endpoints". This is not the case, however - the service does indeed have load endpoints. My guess is that, the health-check is failing. Is this likely the case, and if so, what do I need to do in order that the health-check succeeds?

@cmluciano
Copy link

@liggetm Please open an issue for this on the main kubernetes repository.

@liggetm
Copy link

liggetm commented Jul 3, 2017

@cmluciano will do.

@cmluciano
Copy link

@idvoretskyi This feature is complete for 1.8. Is there anymore to do with this issue or do we just close it?

@MrHohn
Copy link
Member

MrHohn commented Jul 17, 2017

This feature was promoted to GA in 1.7 and the release has been shipped. I believe we can close this.

/close

perotinus pushed a commit to kubernetes-retired/cluster-registry that referenced this issue Sep 2, 2017
Automatic merge from submit-queue (batch tested with PRs 45623, 45241, 45460, 41162)

Promotes Source IP preservation for Virtual IPs from Beta to GA

Fixes #33625. Feature issue: kubernetes/enhancements#27.

Bullet points:
- Declare 2 fields (ExternalTraffic and HealthCheckNodePort) that mirror the ESIPP annotations.
- ESIPP alpha annotations will be ignored.
- Existing ESIPP beta annotations will still be fully supported.
- Allow promoting beta annotations to first class fields or reversely.
- Disallow setting invalid ExternalTraffic and HealthCheckNodePort on services. Default ExternalTraffic field for nodePort or loadBalancer type service to "Global" if not set.

**Release note**:

```release-note
Promotes Source IP preservation for Virtual IPs to GA.

Two api fields are defined correspondingly:
- Service.Spec.ExternalTrafficPolicy <- 'service.beta.kubernetes.io/external-traffic' annotation.
- Service.Spec.HealthCheckNodePort <- 'service.beta.kubernetes.io/healthcheck-nodeport' annotation.
```
@unclewizard
Copy link

I've noticed that on GKE running a NodePort service + Ingress that this feature causes the backend to be marked as "UNHEALTHY" when I inspect the ingress with kubectl describe ing ingress (presumably because most nodes are failing the health check). Are there any negative side-effects to this or is it safe to ignore?

@chy168
Copy link

chy168 commented Sep 19, 2017

@unclewizard Is your GET / are available (HTTP code, 200 OK) for GCP healthy check?

@unclewizard
Copy link

@chy168 Yep - here is a simple way to reproduce:

Follow the instructions here: https://cloud.google.com/container-engine/docs/tutorials/http-balancer but when you get to Step 2, use the following yaml to create the nginx service:

apiVersion: v1
kind: Service
metadata:
  name: nginx
  annotations:
    service.beta.kubernetes.io/external-traffic: "OnlyLocal"
spec:
  selector:
    run: nginx
  ports:
  - name: nginx
    port: 80
  type: NodePort

After you create the Ingress service in Step 3, wait about 10 minutes and then do a kubectl describe ing basic-ingress. The output will contain something like:

  ...
  backends:		{"k8s-be-30910--7b4223ab4c1af15d":"UNHEALTHY"}

If you visit the Ingress address you'll see that the page loads fine. I suspect the UNHEALTHY message doesn't actually matter but I wanted to make sure this is expected.

@MrHohn
Copy link
Member

MrHohn commented Sep 20, 2017

@unclewizard That is expected. When externalTrafficPolicy is set to Local, nodes that don't have backend pods run on it will intentionally fail LB health check.

Though backend service could contain multiple backends, the backends annotation only shows the healthiness of one of them, that seems incorrect. Opened kubernetes/ingress-nginx#1395.

@MrHohn
Copy link
Member

MrHohn commented Sep 20, 2017

That is expected. When externalTrafficPolicy is set to Local, nodes that don't have backend pods run on it will intentionally fail LB health check.

Sorry, I said it wrong. We internally fail healthcheck for L4 LB but not L7. Backend service shouldn't show as "UNHEALTHY".

@unclewizard Could you open a separate issue on https://github.com/kubernetes/ingress? Thanks.

@MrHohn
Copy link
Member

MrHohn commented Sep 20, 2017

Sorry for spamming, please ignore what I said on previous post. I need some sleep :(

ingvagabund pushed a commit to ingvagabund/enhancements that referenced this issue Apr 2, 2020
Propose convention for storing operator bundle
brahmaroutu added a commit to brahmaroutu/enhancements that referenced this issue Sep 2, 2020
* Fixing minor issues in Static provisioning and gRPC

* Adding Graduation Criteria
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/network Categorizes an issue or PR as relevant to SIG Network. stage/stable Denotes an issue tracking an enhancement targeted for Stable/GA status
Projects
None yet
Development

No branches or pull requests