-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IPv4/IPv6 dual stack KEP #648
Conversation
@aojea - Thanks for moving this. I have some edits in the works based on comments on the previous PR, mostly rearranging some sections. I should be able to get these posted late next week (currently busy with some other issues, as you know :) ). |
This remark;
is not valid. Ipvs has full support for ipv6. The sub-sequent reference to a |
@uablrek - Thanks for the clarification. That was something that was mentioned earlier, and will be reworded. |
Yes, IPVS proxier has full support for ipv6. |
|
||
One alternative to adding transition mechanisms would be to modify Kubernetes to provide support for IPv4 and IPv6 communications in parallel, for both pods and services, throughout the cluster (a.k.a. "full" dual stack). | ||
|
||
A second, simpler alternative, which is a variation to the "full" dual stack model, would be to provide dual stack addresses for pods and nodes, but restrict service IPs to be single-family (i.e. allocated from a single service CIDR). In this case, service IPs in a cluster would be either all IPv4 or all IPv6, as they are now. Compared to a full dual-stack approach, this "dual-stack pods / single-family services" approach saves on implementation complexity, but would introduce some minor feature restrictions. (For more details on these tradeoffs, please refer to the "Variation: Dual-Stack Service CIDRs" section under "Alternatives" below). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In re-reading, the complexity saved is pretty minimal. We have to run dual-stack kube-proxy, so I think it's safe to say that the "full" path is almost certain to come to be, though it is 100% OK to add that as a second-step (on the assumption it is easier).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an alternative, is it worthwhile to discuss an intermediate delivery where all kube-proxy is single-stack (including NodePorts) ?
In other words: step1 is just pod IPs, hostPorts, and headless services (DNS). step2 adds nodeports, external IPs, ingress, and endpoints. step3 add service IPs and single-family deployments.
Viable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thockin I'm all for chunking this task as much as we can. Your list of steps seems reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added implementation plan as suggested in #808
- Service addresses: 1 service IP address per service | ||
- Kube-DNS is expected to be End-of-Life soon, so dual-stack testing will be performed using coreDNS. | ||
- External load balancers that rely on Kubernetes services for load balancing functionality will only work with the IP family that matches the IP family of the cluster's service CIDR. | ||
- Dual-stack support for Kubernetes orchestration tools other than kubeadm (e.g. miniKube, KubeSpray, etc.) are considered outside of the scope of this proposal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to communicate HOW to enable this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added note to communicate usage through documentation - #808
- NodePort: Support listening on both IPv4 and IPv6 addresses | ||
- ExternalIPs: Can be IPv4 or IPv6 | ||
- Kube-proxy IPVS mode will support dual-stack functionality similar to kube-proxy iptables mode as described above. IPVS kube-router support for dual stack, on the other hand, is considered outside of the scope of this proposal. | ||
- For health/liveness/readiness probe support, a kubelet configuration will be added to allow a cluster administrator to select a preferred IP family to use for implementing probes on dual-stack pods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible option: check on both? Does this offer any avenue for single-family apps? Maybe not (readiness is per-pod). How would we want health-checks to work in a multi-net model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in #808
- ExternalIPs: Can be IPv4 or IPv6 | ||
- Kube-proxy IPVS mode will support dual-stack functionality similar to kube-proxy iptables mode as described above. IPVS kube-router support for dual stack, on the other hand, is considered outside of the scope of this proposal. | ||
- For health/liveness/readiness probe support, a kubelet configuration will be added to allow a cluster administrator to select a preferred IP family to use for implementing probes on dual-stack pods. | ||
- The pod status API changes will include a per-IP string map for arbitrary annotations, as a placeholder for future Kubernetes enhancements. This mapping is not required for this dual-stack design, but will allow future annotations, e.g. allowing a CNI network plugin to indicate to which network a given IP address applies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to give CNI/CRI hooks to fill these in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed as part of #808
``` | ||
|
||
##### Default Pod IP Selection | ||
Older servers and clients that were built before the introduction of full dual stack will only be aware of and make use of the original, singular PodIP field above. It is therefore considered to be the default IP address for the pod. When the PodIP and PodIPs fields are populated, the PodIPs[0] field must match the (default) PodIP entry. If a pod has both IPv4 and IPv6 addresses allocated, then the IP address chosen as the default IP address will match the IP family of the cluster's configured service CIDR. For example, if the service CIDR is IPv4, then the IPv4 address will be used as the default address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will we want this to work if/when we support dual-stack service IPs?
``` | ||
--cluster-cidr ipNetSlice (IP CIDRs, in a comma separated list, Default: []) | ||
``` | ||
Only the first CIDR for each IP family will be used; all others will be ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logged and ignored
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated in #808
|
||
### IPVS Support and Operation | ||
|
||
Since IPVS functionality does not yet include IPv6 support (see [cloudnativelabs/kube-router Issue #307](https://github.com/cloudnativelabs/kube-router/issues/307)), support for IPVS functionality in a dual-stack cluster is considered a "nice-to-have" or stretch goal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this out of date? IPVS does support v6 though kube-proxy might not. I don't think IPVS is optional - it is a GA feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed out of date and removed in #808
|
||
Currently, health, liveness, and readiness probes are defined without any concern for IP addresses or families. For the first release of dual-stack support, a cluster administrator will be able to select the preferred IP family to use for probes when a pod has both IPv4 and IPv6 addresses. For this selection, a new "--preferred-probe-ip-family" argument for the for the [kubelet startup configuration](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/) will be added: | ||
``` | ||
--preferred-probe-ip-family string ["ipv4", "ipv6", or "none". Default: "none", meaning use the pod's default IP] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mtaufen should we be adding flags or JUST component config fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved in #808. It was removed as we updated the behavior here.
- For health/liveness/readiness probe support, the default behavior will not change and an additional optional field would be added to the pod specification and is respected by kubelet. This will allow application developers to select a preferred IP family to use for implementing probes on dual-stack pods.
|
||
- Because service IPs will remain single-family, pods will continue to access the CoreDNS server via a single service IP. In other words, the nameserver entries in a pod's /etc/resolv.conf will typically be a single IPv4 or single IPv6 address, depending upon the IP family of the cluster's service CIDR. | ||
- Non-headless Kubernetes services: CoreDNS will resolve these services to either an IPv4 entry (A record) or an IPv6 entry (AAAA record), depending upon the IP family of the cluster's service CIDR. | ||
- Headless Kubernetes services: CoreDNS will resolve these services to either an IPv4 entry (A record), an IPv6 entry (AAAA record), or both, depending on the service's endpointFamily configuration (see [Configuration of Endpoint IP Family in Service Definitions](#configuration-of-endpoint-ip-family-in-service-definitions)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5 lines up it said "It is not expected that any changes will be needed for CoreDNS" but this is clearly a change. It has to be taught about the new plural field on endpoints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved as part of #808
fieldPath: status.podIPs | ||
``` | ||
|
||
This definition will cause an environmental variable setting in the pod similar to the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about extra fields of podIPs? The parameters block, for example? Do we want to do anything with that? I am not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in #808
I want to say what a good KEP this is. Thanks. |
Post holiday bump. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove any references to NEXT_KEP_NUMBER
and rename the KEP to just be the draft date and KEP title.
KEP numbers will be obsolete once #703 merges.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: aojea If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
KEP numbers will be obsolete once kubernetes#703 merges.
cc @khenidak |
@aojea Thank you for this. Are you planning to carry this forward with implementation as well? or just the KEP? |
The KEP is over a month with no word. @khenidak I think if you want to start thinking about impl, that's fair. You could even open a PR that builds on this one (keeping original work intact) if you want to push it ahead. |
Hi. Has implementation started? I would like to contribute, foremost in the "ipvs" area since I think the "nice to have" is not the right way, but also in other areas. Where do I sign up? |
- NodePort: Support listening on both IPv4 and IPv6 addresses | ||
- ExternalIPs: Can be IPv4 or IPv6 | ||
- Kube-proxy IPVS mode will support dual-stack functionality similar to kube-proxy iptables mode as described above. IPVS kube-router support for dual stack, on the other hand, is considered outside of the scope of this proposal. | ||
- For health/liveness/readiness probe support, a kubelet configuration will be added to allow a cluster administrator to select a preferred IP family to use for implementing probes on dual-stack pods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be an auto-detect
. Although the cluster will be running in dual stack mode. Some pods will be only using one address family
// Properties: Arbitrary metadata associated with the allocated IP. | ||
type PodIPInfo struct { | ||
IP string | ||
Properties map[string]string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what would be an example of properties
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is worth to check previous discussion https://github.com/kubernetes/community/pull/2254/files#r227458664
``` | ||
--pod-cidr ipNetSlice [IP CIDRs, comma separated list of CIDRs, Default: []] | ||
``` | ||
Only the first address of each IP family will be used; all others will be ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be difficult for user to understand. I would think an additional flag --pod-cidr-alternative
(or so). Forcing the user to think about mail ip address family and alternative ip address family.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously discussed here https://github.com/kubernetes/community/pull/2254/files#r227469823
``` | ||
--cluster-cidr ipNetSlice [IP CIDRs, comma separated list of CIDRs, Default: []] | ||
``` | ||
Only the first address of each IP family will be used; all others will be ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as pod-cidr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// "PodIP" field, and this default IP address must be recorded in the | ||
// 0th entry (PodIPs[0]) of the slice. The list is empty if no IPs have | ||
// been allocated yet. | ||
PodIPs []PodIPInfo `json:"podIPs,omitempty" protobuf:"bytes,6,opt,name=podIPs"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would would a situation where a pod has more than two ips? the argument i have here is to have one ip (current) and one additional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPv4+IPv6 now. I think it's short sighted to think that the first addition we want to make is also going to be the last.
I can think of things we might want to do in the future like: assigning external addresses directly to pods, IPv6 privacy addresses (which also rotate periodically), addresses from multiple IPv4/IPv6 prefixes during renumbering transitions, or as some future pod-to-pod federation optimisation.
We certainly don't have to design for any of these possibilities now, but once we have an array of addresses, I bet $5 we'll find a reason to use a third address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPv6 can have several addresses on the same interface depending on the scope IPv6 Scoped Address Architecture
#### Type Load Balancer | ||
|
||
The cloud provider will provision an external load balancer. If the cloud provider load balancer maps directly to the pod iP's then a dual stack load balancer could be used. Additional information may need to be provided to the cloud provider to configure dual stack. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will have to gate this KEP on input from the cloud providers. Currently aws
, azure
and gcp
use node ports for routing LB
to node
(even when CNI that hands out IPs from cluster network are used).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed as part of #808
I would also love to contribute to this effort. I've been working on adding IPv6 functionality and e2e testing to the kube-router project and have developed a vagrant based ipv6-only test environment. |
|
||
- Pod Connectivity: IPv4-to-IPv4 and IPv6-to-IPv6 access between pods | ||
- Access to External Servers: IPv4-to-IPv4 and IPv6-to-IPv6 access from pods to external servers | ||
- NGINX Ingress Controller Access: Access from IPv4 and/or IPv6 external clients to Kubernetes services via the Kubernetes NGINX Ingress Controller. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you shed some light on why the nginx ingress controller is part of the goals of this KEP? Does that mean there is a commitment to implement the necessary changes (if any) to the nginx ingress controller? Why would we not leave that up to individual ingress controllers to implement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explained in L456
I think we are also missing CC @lachie83 |
Added as part of #808
As with PodIP, corresponding changes will need to be made to NodeCIDR. These changes are essentially the same as the aformentioned PodIP changes which create the pularalization of NodeCIDRs to a slice rather than a singular and making those changes across the internal representation and v1 with associated conversations. |
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
obsoleted by #808 |
Moving kubernetes/community#2254 from k/community to k/enhancements
cc: @leblancd