|
| 1 | +# KEP-3698: Multi-Network requirements |
| 2 | + |
| 3 | +<!-- toc --> |
| 4 | +- [Summary](#summary) |
| 5 | +- [Motivation](#motivation) |
| 6 | + - [Goals](#goals) |
| 7 | + - [Non-Goals](#non-goals) |
| 8 | +- [Proposal](#proposal) |
| 9 | + - [Personas](#personas) |
| 10 | + - [Network Administrator](#network-administrator) |
| 11 | + - [User](#user) |
| 12 | + - [Terminology](#terminology) |
| 13 | + - [User Stories](#user-stories) |
| 14 | + - [Story #1](#story-1) |
| 15 | + - [Story #1a](#story-1a) |
| 16 | + - [Story #2](#story-2) |
| 17 | + - [Story #3](#story-3) |
| 18 | + - [Story #4](#story-4) |
| 19 | + - [Story #5](#story-5) |
| 20 | + - [Story #6](#story-6) |
| 21 | + - [Story #7](#story-7) |
| 22 | + - [Story #8](#story-8) |
| 23 | + - [Story #9](#story-9) |
| 24 | + - [Requirements](#requirements) |
| 25 | + - [Phase I (base API and reference in Pod)](#phase-i-base-api-and-reference-in-pod) |
| 26 | + - [Phase II (scheduler, kubelet and API probing)](#phase-ii-scheduler-kubelet-and-api-probing) |
| 27 | + - [Phase III (basic Kubernetes features integration)](#phase-iii-basic-kubernetes-features-integration) |
| 28 | + - [Phase IV (extended functionality)](#phase-iv-extended-functionality) |
| 29 | + - [Risks and Mitigations](#risks-and-mitigations) |
| 30 | +- [Design Details](#design-details) |
| 31 | + - [Graduation Criteria](#graduation-criteria) |
| 32 | + - [Alpha](#alpha) |
| 33 | + - [Beta](#beta) |
| 34 | +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) |
| 35 | + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) |
| 36 | + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) |
| 37 | + - [Monitoring Requirements](#monitoring-requirements) |
| 38 | + - [Dependencies](#dependencies) |
| 39 | +- [Implementation History](#implementation-history) |
| 40 | +- [Drawbacks](#drawbacks) |
| 41 | +- [Alternatives](#alternatives) |
| 42 | +<!-- /toc --> |
| 43 | + |
| 44 | +## Summary |
| 45 | + |
| 46 | +Today Kubernetes Networking is very straightforward and easy to achieve. Main |
| 47 | +requirement is to enable connectivity between the Pods in the cluster. That |
| 48 | +simple approach satisfies most of Kubernetes customers, but it is not sufficient |
| 49 | +for cases where more complex networking is required: |
| 50 | +* some applications leverage different isolated networks, exposed through |
| 51 | + different interfaces |
| 52 | +* some leverage performance oriented interfaces (e.g. AF_XDP, memif, SR-IOV), |
| 53 | + besides the regular management interface |
| 54 | +* others require support for specific protocols not yet supported in Kubernetes |
| 55 | + |
| 56 | +For such requirements we need a solution that allows a user to express this in |
| 57 | +the pod specification. This can be done by an external solution leveraging |
| 58 | +annotations, but having it incorporated in Kubernetes would be a much cleaner |
| 59 | +and safer approach that would allow better compatibility and consistency for |
| 60 | +workloads with these needs. |
| 61 | + |
| 62 | +This KEP is an entry level document for the whole Multi-Network endeavor. Here |
| 63 | +we will define a set of requirements to follow. Additionally, we will introduce |
| 64 | +a phased approach where each phase will have its own KEP with detailed design. |
| 65 | + |
| 66 | +## Motivation |
| 67 | + |
| 68 | +We want to have a common API allowing us to define a catalog of different |
| 69 | +networks in the Kubernetes cluster. It would allow attaching a pod to one or |
| 70 | +several networks via a given type of interface depending on its connectivity |
| 71 | +or performance needs. |
| 72 | + |
| 73 | +### Goals |
| 74 | + |
| 75 | +Define user stories and requirements for the Multi-Network effort in Kubernetes. |
| 76 | + |
| 77 | +### Non-Goals |
| 78 | + |
| 79 | +Define API and implementation. |
| 80 | + |
| 81 | +## Proposal |
| 82 | + |
| 83 | +### Personas |
| 84 | + |
| 85 | +#### Network Administrator |
| 86 | + |
| 87 | +With this proposal we will introduce a new “persona”, called **Network |
| 88 | +Administrator**. This role will be responsible for defining and managing |
| 89 | +“Networks” (name TBD, referred as “Object” in the Requirement section) that |
| 90 | +properly describe the infrastructure available for the cluster, namespace and |
| 91 | +workloads. This persona can define which users can “attach” to a specific |
| 92 | +“Network”. |
| 93 | + |
| 94 | +#### User |
| 95 | + |
| 96 | +User is the consumer of Networks (name TBD, referred as “Object” in the |
| 97 | +Requirement section) via referencing them in their workloads. Users usually will |
| 98 | +not create or remove the Network on their own. |
| 99 | + |
| 100 | +### Terminology |
| 101 | + |
| 102 | +* **Default network** - This is the initial cluster-wide Pod networking provided |
| 103 | + during cluster creation that is available to the Pod when no additional |
| 104 | + networking configuration is provided. |
| 105 | +* **Primary network** - This is the network inside the Pod which interface is |
| 106 | + used for the default gateway. |
| 107 | + |
| 108 | +### User Stories |
| 109 | + |
| 110 | +All user stories represent the type of use cases the multi-networking API should |
| 111 | +be able to support. References to technologies or exact products does not |
| 112 | +indicate that this API will directly support them. The set of requirements, |
| 113 | +defined out of these use cases, are the final indicator of what will be covered |
| 114 | +by this API and effort. |
| 115 | + |
| 116 | +#### Story #1 |
| 117 | +As a Cloud Native Network Function (CNF) vendor I require an additional |
| 118 | +interface to be provisioned into the Kubernetes Pod. Each of these interfaces |
| 119 | +has to be in an isolated network for regulatory compliance. The isolation has to |
| 120 | +be done on a Layer-2. |
| 121 | + |
| 122 | +<p align="center"> |
| 123 | + <img src="mn-story-1.png?raw=true" alt="multi-network story 1 network L2 isolation"/> |
| 124 | +</p> |
| 125 | + |
| 126 | +#### Story #1a |
| 127 | +As a Cloud NativeNetwork Function (CNF) vendor I require an additional interface |
| 128 | +to be provisioned into the Kubernetes Pod. Each of these interfaces has to be in |
| 129 | +an isolated network for regulatory compliance. The isolation has to be done on a |
| 130 | +Layer-3. |
| 131 | + |
| 132 | +<p align="center"> |
| 133 | + <img src="mn-story-1a.png?raw=true" alt="multi-network story 1a network L3 isolation"/> |
| 134 | +</p> |
| 135 | + |
| 136 | +#### Story #2 |
| 137 | +As a Cloud Native Network Function (CNF) vendor I require a HW-based interface |
| 138 | +(e.g. SRIOV VF) to be provisioned to my workload Pod. I need to leverage that HW |
| 139 | +for performance purposes (high bandwidth, low latency), that my user-space |
| 140 | +application (e.g. DPDK-based) can use. The VF will not use the standard netdev |
| 141 | +kernel module. The Pod’s scheduling to the nodes should be based on hardware |
| 142 | +availability (e.g. devicePlugin or some other way). |
| 143 | + |
| 144 | +<p align="center"> |
| 145 | + <img src="mn-story-2.png?raw=true" alt="multi-network story 2 HW interface"/> |
| 146 | +</p> |
| 147 | + |
| 148 | +#### Story #3 |
| 149 | +I have implemented my Kubernetes cluster networking using a virtual switch. In |
| 150 | +this implementation I am capable of creating isolated Networks. I need a means |
| 151 | +to express to which Network my workloads connect to. |
| 152 | + |
| 153 | +<p align="center"> |
| 154 | + <img src="mn-story-3.png?raw=true" alt="multi-network story 3 virtual switch isolation"/> |
| 155 | +</p> |
| 156 | + |
| 157 | +#### Story #4 |
| 158 | +As a Virtual Machine -based compute platform provider that I run on top of |
| 159 | +Kubernetes and Kubevirt I require multi-tenancy. The isolation has to be |
| 160 | +achieved on Layer-2 for security reasons. |
| 161 | + |
| 162 | +<p align="center"> |
| 163 | + <img src="mn-story-4.png?raw=true" alt="multi-network story 4 Kubevirt VMs"/> |
| 164 | +</p> |
| 165 | + |
| 166 | +#### Story #5 |
| 167 | +As a platform operator I need to connect my on-premise networks to my workload |
| 168 | +Pods. I need to have the ability to represent these networks in my Kubernetes |
| 169 | +cluster in such a way that I can easily use them in my workloads. |
| 170 | + |
| 171 | +<p align="center"> |
| 172 | + <img src="mn-story-5.png?raw=true" alt="multi-network story 5 On-Premise Network representation"/> |
| 173 | +</p> |
| 174 | + |
| 175 | +#### Story #6 |
| 176 | +As a Kubernetes cluster administrator I wish to isolate workloads based on |
| 177 | +namespaces and network access via assigning different default Network to a |
| 178 | +Namespace. I do not want the tenants to change their manifests for that purpose. |
| 179 | +Those workloads should have the same level of Kubernetes functionality: |
| 180 | +Services, NetworkPolicies, access to Kubernetes API. I wish to support |
| 181 | +“profiles” for Namespace, where I can not only change default Network, |
| 182 | +but define a set of Networks assigned to given Namespace that Pods created in |
| 183 | +that NS are automatically attached to. |
| 184 | + |
| 185 | +<p align="center"> |
| 186 | + <img src="mn-story-6.png?raw=true" alt="multi-network story 6 namespace networks profiles"/> |
| 187 | +</p> |
| 188 | + |
| 189 | +#### Story #7 |
| 190 | +As a “Power User” with admin privileges I wish to have the ability to modify my |
| 191 | +Pod network namespace without any restrictions. I am aware that by doing this I |
| 192 | +might break established contracts for the kubernetes features. |
| 193 | + |
| 194 | +#### Story #8 |
| 195 | +As a Virtual Machine -based compute platform provider that I run on top of |
| 196 | +Kubernetes and Kubevirt I need the ability to add/remove Network Interfaces to |
| 197 | +existing VMs without re-creating the VM. This would not be applicable for the |
| 198 | +Primary Interface of the Pod. |
| 199 | + |
| 200 | +<p align="center"> |
| 201 | + <img src="mn-story-8.png?raw=true" alt="multi-network story 8 network interface hot-plug"/> |
| 202 | +</p> |
| 203 | + |
| 204 | +#### Story #9 |
| 205 | +As an infrastructure administrator I wish to support IPv6 SLAAC in the |
| 206 | +Kubernetes Pods. SLAAC's nature is to dynamically assign IP addresses and change |
| 207 | +them over time. Those should be reflected in Pod status. This would not be |
| 208 | +applicable for the Primary Interface of the Pod. |
| 209 | + |
| 210 | +### Requirements |
| 211 | + |
| 212 | +Below requirements are divided into phases. Each phase will produce a separate |
| 213 | +KEP with detailed design for specified set of requirements. Each of this KEP |
| 214 | +will have to take all of the requirements, from each phase, into consideration |
| 215 | +when created. |
| 216 | + |
| 217 | +#### Phase I (base API and reference in Pod) |
| 218 | +1. This effort shall not change the behavior of Today existing clusters |
| 219 | +2. We need to introduce an “Object” that represents the existing |
| 220 | +infrastructure’s networking |
| 221 | +3. “Object” is decoupled from the network definition/description created by the |
| 222 | +network administrator |
| 223 | +4. The “Object” shall not define any implementation specific parameters in that |
| 224 | +object |
| 225 | +5. “Object” shall provide option to define: |
| 226 | + * IPAM mode: external/internal |
| 227 | + * List of route prefixes - optional and not forced on the implementations |
| 228 | +6. “Object” can reference to implementation-specific parameters |
| 229 | +7. “Object” is the consumer-facing declaration for workloads |
| 230 | +8. Cluster “Default” “Object” is the network the cluster has been created with |
| 231 | +9. Cluster “Default” “Object” cannot be changed or removed |
| 232 | +10. Pods shall reference the “object” when trying to attach to the specific |
| 233 | +networks |
| 234 | + * Pods has to specify all the “object” that it wishes to attach to, |
| 235 | + including the Cluster “Default” “Object” |
| 236 | +11. Workloads can access “Object” when “attach” RBAC exists at the attachment time |
| 237 | + * “attach” RBAC is new verb to allow referencing API objects |
| 238 | +12. The Pod reference to a Networks is optional and when NOT specified, Pod |
| 239 | +connects to “Default” “Object” (network the cluster has been created with) |
| 240 | +13. Pod shall be able to provide additional configuration on how it attaches to |
| 241 | +a network |
| 242 | + * Identify what “Object” is the Primary network |
| 243 | + * Optional parameters: MAC address, IP address, speed, MTU, interface name etc. |
| 244 | +14. Every Pod connected to specific network (represented by the “Object”) must |
| 245 | +have connectivity within that network between each other within the Cluster |
| 246 | + * A Pod connected to specific network (represented by the “Object”) may or |
| 247 | + may not have cross connectivity between different networks (represented by |
| 248 | + the “Object”) |
| 249 | +15. Pods attached to a network are connected to each other in a manner defined |
| 250 | +by the “Object” implementation |
| 251 | +16. Basic network Interface information for each attachment will be exposed to |
| 252 | +runtime Pod (via e.g. environment variables, downward API etc.) |
| 253 | + |
| 254 | +#### Phase II (scheduler, kubelet and API probing) |
| 255 | +17. Networks represented by “Object” can be selectively (per Node) available in |
| 256 | +the Cluster. This does NOT apply to the Cluster “Default” “Object” |
| 257 | +18. Kubelet network-based probing is optional for the “Object” connections to |
| 258 | +Pod |
| 259 | +19. “Object” connections to Pod are optionally able to connect to Kubernetes |
| 260 | +API - the Pods connections via non-default Pod network does not require access |
| 261 | +to Kubernetes API |
| 262 | +20. Kubernetes API can optionally reach to “Object” connections to |
| 263 | +Pod - Kubernetes API access to the Pod via non-default Pod network is not |
| 264 | +required |
| 265 | + |
| 266 | +#### Phase III (basic Kubernetes features integration) |
| 267 | +21. “Object” connections to Pod are optionally able to provide Service, |
| 268 | + NetworkPolicies functionality |
| 269 | + |
| 270 | +#### Phase IV (extended functionality) |
| 271 | +22. Have capability to override Cluster “Default” “Object” on the namespace |
| 272 | +“level” |
| 273 | +23. Have capability to add/remove attachments to/from running Pods |
| 274 | +24. Have capability to add/remove IP on running Pods network attachments |
| 275 | + |
| 276 | +### Risks and Mitigations |
| 277 | + |
| 278 | +N/A |
| 279 | + |
| 280 | +## Design Details |
| 281 | + |
| 282 | +N/A |
| 283 | + |
| 284 | +### Graduation Criteria |
| 285 | + |
| 286 | +#### Alpha |
| 287 | + |
| 288 | +- Approval from subproject owners + KEP reviewers |
| 289 | + |
| 290 | +#### Beta |
| 291 | + |
| 292 | +N/A |
| 293 | + |
| 294 | +## Production Readiness Review Questionnaire |
| 295 | + |
| 296 | +### Feature Enablement and Rollback |
| 297 | + |
| 298 | +###### How can this feature be enabled / disabled in a live cluster? |
| 299 | + |
| 300 | +N/A |
| 301 | + |
| 302 | +###### Does enabling the feature change any default behavior? |
| 303 | + |
| 304 | +No |
| 305 | + |
| 306 | +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? |
| 307 | + |
| 308 | +N/A |
| 309 | + |
| 310 | +###### What happens if we reenable the feature if it was previously rolled back? |
| 311 | + |
| 312 | +N/A |
| 313 | + |
| 314 | +###### Are there any tests for feature enablement/disablement? |
| 315 | + |
| 316 | +N/A |
| 317 | + |
| 318 | +### Rollout, Upgrade and Rollback Planning |
| 319 | + |
| 320 | +###### How can a rollout or rollback fail? Can it impact already running workloads? |
| 321 | + |
| 322 | +N/A |
| 323 | + |
| 324 | +###### What specific metrics should inform a rollback? |
| 325 | + |
| 326 | +N/A |
| 327 | + |
| 328 | +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? |
| 329 | + |
| 330 | +N/A |
| 331 | + |
| 332 | +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? |
| 333 | + |
| 334 | +No |
| 335 | + |
| 336 | +### Monitoring Requirements |
| 337 | + |
| 338 | +###### How can an operator determine if the feature is in use by workloads? |
| 339 | + |
| 340 | +N/A |
| 341 | + |
| 342 | +###### How can someone using this feature know that it is working for their instance? |
| 343 | + |
| 344 | +N/A |
| 345 | + |
| 346 | +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? |
| 347 | + |
| 348 | +N/A |
| 349 | + |
| 350 | +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? |
| 351 | + |
| 352 | +N/A |
| 353 | + |
| 354 | +###### Are there any missing metrics that would be useful to have to improve observability of this feature? |
| 355 | + |
| 356 | +N/A |
| 357 | + |
| 358 | +### Dependencies |
| 359 | + |
| 360 | +###### Does this feature depend on any specific services running in the cluster? |
| 361 | + |
| 362 | +No |
| 363 | + |
| 364 | +## Implementation History |
| 365 | + |
| 366 | +N/A |
| 367 | + |
| 368 | +## Drawbacks |
| 369 | + |
| 370 | +N/A |
| 371 | + |
| 372 | +## Alternatives |
| 373 | + |
| 374 | +We will not define a unified API, and this feature will live on as just an addon |
| 375 | +to Kubernetes. |
0 commit comments