Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEP-1867: Per-Gateway Infrastructure #1868

Merged
merged 4 commits into from
May 30, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 134 additions & 0 deletions geps/gep-1867.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# GEP-1867: Per-Gateway Infrastructure

* Status: Provisional
* Issue: [#1876](https://github.com/kubernetes-sigs/gateway-api/issues/1876)

## Overview

`Gateway`s represent a piece of infrastructure implemented by cloud load balancers, in-cluster deployments, or other mechanisms.
These often need vendor-specific configuration outside the scope of existing APIs (e.g. "size" or "version" of the infrastructure to provision).

Today `GatewayClass.spec.parametersRef` is available to attach arbitrary configuration to a `GatewayClass`.

This GEP will explain why that is not sufficient to meet common use cases, and introduce a new field - `infrastructure` - to address these cases.

Related discussions:
* [Support cluster-local Gateways](https://github.com/kubernetes-sigs/gateway-api/discussions/1247)
* [Scaling Gateway Resources](https://github.com/kubernetes-sigs/gateway-api/discussions/1355)
* [Manual deployments](https://github.com/kubernetes-sigs/gateway-api/issues/1687)
* [Merging Gateways](https://github.com/kubernetes-sigs/gateway-api/pull/1863)
* [In Cluster Gateway Deployments](https://github.com/kubernetes-sigs/gateway-api/pull/1757)

## Goals

* Provide the ability to configure arbitrary (implementation specific) attributes about a **specific Gateway**.
* Provide the ability to configure a standardized set of attributes about a **specific Gateway**.
Comment on lines +24 to +25
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These goals feel incomplete because you're also proposing standard infra-level config at the GatewayClass level but these only focus on individual Gateways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implicit goal on the GEP is "Have consistency in the API". Providing infrastructure is purely to maintain consistency between the APIs; the entire goal of the GEP is to not use GatewayClass.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the entire goal of the GEP is to not use GatewayClass

This seems problematic. Not to use GatewayClass at all or just for this specific set of config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is to be able to configure infrastructure options on a per-gateway basis. Its not to stop users from using GC, but they can already do that today. They cannot do it on a per-GW basis today, so this GEP aims to enable this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think a goal of this needs to be to standardize some infra level configuration at the GWC level as well. Although this can be configured via GWC params already, we don't have a standard way to represent shared concepts yet.


## Why not GatewayClass parameters?

`GatewayClass.spec.parametersRef` is the existing mechanism to configure arbitrary fields on a Gateway.
However, this introduces operational challenges when configuring Gateways.

### Scope

As a `Gateway` manager (with RBAC permissions to a specific `Gateway`) I should be able to declaratively make changes to that `Gateway` without the need for access to cluster-scoped resources (`GatewayClass`) and without affecting other `Gateways` managed by the same `GatewayClass`.
This has been previously discussed in [this issue](https://github.com/kubernetes-sigs/gateway-api/issues/567).

As a cluster scoped resource, `GatewayClass` does not meet this requirement.
This restricts customization use cases to either a few pre-provisioned classes by the admin, or running in an environment where the "Infrastructure Provider" and "Cluster Operator" are the same roles.
The distinction between these roles is explicitly called out on the [homepage](https://gateway-api.sigs.k8s.io/#what-is-the-gateway-api).

### Custom Resource

`parametersRef` is entirely a generic implementation-specific meaning.
This means implementations will either need a custom CRD or use untyped resources like ConfigMap.
Neither of these have any consistency between implementations.
While there will always be some vendor-specific requirements, there are also a number of configuration aspects of a Gateway that are common between implementations.
However, these cannot currently be expressed in a vendor-neutral way.
howardjohn marked this conversation as resolved.
Show resolved Hide resolved

The original motivation behind `parametersRef` was for implementation specific concepts, while portable comments could be added into the API as first-class fields, but this has not been done (yet).

Additionally, there is hesitancy to use a CRD (which leads to CRD proliferation), which pushes users towards untyped ConfigMaps which are not much better than annotations.
The scoping, as mentioned above, is also a bit awkward of a cluster scoped resource pointing to a namespaced object.

### Separation of concerns

While there is value out of providing class-wide options as defaults, there is also value in providing these options on the object (Gateway) directly.

Some parallels in existing APIs:

[Policy Attachment](https://gateway-api.sigs.k8s.io/references/policy-attachment) offers a hierarchy of defaults and overrides, allowing attachment to GatewayClass and Gateway.
This is similar to our needs here, but representing infrastructure configuration as a "Policy" is a bit problematic, and the existing mechanisms have no hierarchy.
howardjohn marked this conversation as resolved.
Show resolved Hide resolved

In core Kubernetes, Pods declare their requirements (for example, CPU requests) inline in the Pod resource; there is not a `ResourceClass` API that abstracts these further.
These higher level abstractions are handled by layered APIs (whether this is a CRD, an admission webhook, CI/CD tooling, etc).
This allows users the flexibility to easily configure things per-pod basis.
If the infrastructure admin wants to impose defaults or requirements on this flexibility, they are able to do so (in fact, `LimitRanger` provides a built in mechanism to do so).
Comment on lines +63 to +66
Copy link
Member

@robscott robscott May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to provide some guidelines for when we'd want Gateway-level config to be able to override class-level config. I personally think there are many cases where admins would not want to allow config to be overridden. This feels especially problematic if we have something like parametersRef that could potentially allow all GatewayClass params to be overridden at the Gateway level, including something like implementation-specific firewall config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also responding to your other comment here

Why would we not want defaults and overrides for this at GatewayClass level? Are we sure that cluster admins are going to be fine with others creating custom variations of Gateways and overriding GatewayClass config?
Why would the merging logic be implementation specific?

I think there are a few options:

  1. default and overrides in Gateway, aligning with policy attachment

Makes paramRefs awkward, since it doesn't have these concepts

Gives GatewayClass editor ability to decide what Gateway admins can do. I don't think this is useful in most cases, since most users are not actually editing GC but are using defaults provided by implementations during install.

  1. Implementation specific merge

This aligns with paramRef, which already has implementation specific merging with Gateways (if they have per-gateway settings, of course -- its all impl specific so can be anything).

Gives the controller author ability to decide what gateway admins can do. I suspect this is often actually the person writing the GC, anyways, though, so this ends up being similar to (1) but simpler.

  1. No infrastructure field at GC

Not consistent with Gateway API. Users who want this at GC level would use a paramRef to something with the same fields, and then would have impl specific merging logic. So ends up the same as (2), roughly, but with an inconsistent UX I think.


So overall it seems like if we think GC owner wants overrides we should do (1), else we should do (2).

I don't really see why a GC would want to override TBH. For things like Version and Size, you probably don't want override -- you want a range. This is better addressed through other mechanisms like Gatekeeper (effectively what LimitRanger is in core). So I would lean towards (2).


I will note that the merging logic and the presence of the field are decoupled. Since the presence of the field is blocking 3 other GEPs, I can say the merging logic is WIP and a blocker for promotion beyond 'provisional' if it helps?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, implementation-specific has been reserved for concepts like RegEx that are broadly understood but have some nuanced differences. I'd hate to extend that to something important like hierarchical values that really could be solved by the API itself. This feels entirely different from policy in that we'll have a well-known list of config, and some of these values will be representable at both GW and GWC levels. I agree that for at least some, we'd want something like both a default and a range on GWC. So IMO there's a 4th option which is for the API to define merging semantics for every infra field that is present on both GW and GWC.

I think at a minimum we should aim to have some patterns or principles defined here that cover:

  1. When a field should be configurable on GW
  2. When a field should be configurable on GWC
  3. What kinds of interactions we want to support between GWC and GW infra fields

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a section on this


### Dynamic Changes

Currently, the spec recommends `GatewayClass` to be used as a *template*.
Changes to it are not expected to change deployed `Gateway`s.

This makes usage problematic in a declarative way.
For example, if I wanted to represent a `version` field and change that to trigger an upgrade, I would need to create an entirely new `Gateway`.
robscott marked this conversation as resolved.
Show resolved Hide resolved

## API

In order to address the concerns above, I propose a standard `infrastructure` API is added to `Gateway` and `GatewayClass`.
Note the important part of this is the `Gateway` change; the `GatewayClass` aspect is mostly for consistency.

The exact fields are out of scope for this GEP and will be handled by additional GEPs.
Some example GEPs already depending on this are [GEP-1713](/geps/gep-1713.md), [GEP-1651](/geps/gep-1651.md), and [GEP-1762](/geps/gep-1762.md)

The fields as defined below are, of course, not useful.
This is intended as a basis for other PRs, not to provide value on its own.
This GEP will remain in provisional until at least one field is ready to be promoted.

```go
type GatewaySpec struct {
howardjohn marked this conversation as resolved.
Show resolved Hide resolved
// Infrastructure defines infrastructure level attributes about this Gateway instance.
Infrastructure GatewayInfrastructure `json:"infrastructure"`
// ...
}

type GatewayClassSpec struct {
// Infrastructure defines infrastructure level attributes for all Gateways in this class.
// A Gateway may provide configuration for the same values; as all fields in GatewayInfrastructure are implementation specific,
// the merging logic between these is as well. However, the GatewayClass is generally expected to be providing defaults
howardjohn marked this conversation as resolved.
Show resolved Hide resolved
// rather than overrides.
Infrastructure GatewayClassInfrastructure `json:"infrastructure"`
// ...
}

type GatewayInfrastructure struct {
// ParametersRef provides a arbitrary implementation-specific configuration for
// fields not expressed directly in this struct.
// This follows the same semantics as GatewayClass's ParametersRef, but lives on the Gateway.
ParametersRef ParametersReference
robscott marked this conversation as resolved.
Show resolved Hide resolved
howardjohn marked this conversation as resolved.
Show resolved Hide resolved
}

type GatewayClassInfrastructure struct {
}
```

### API Principles

For any given field, we will need to make two decisions:
* whether this should be a first-class field or a generic `parametersRef`.
* whether this field should be configurable on a Gateway and/or GatewayClass level

The choice to use an extension (`parametersRef`) or first-class field is a well known problem across the API, and the same logic will be used here.
Fields that are generally portable across implementations and have wide-spread demand and use cases will be promoted to first-class fields,
while vendor specific or niche fields will remain extensions.
Because infrastructure is somewhat inherently implementation specific, it is likely most fields will be Extended or ImplementationSpecific.
However, there are still a variety of concepts that have some meaning between implementations that can provide value to users.

Introduction at Gateway or GatewayClass level will depend on the specific field and use cases for the field.
In general, it makes sense to provide defaults (GatewayClass) and specific settings (Gateway) for most fields, but
this will be evaluated on a case-by-case basis.

### Status

The API should likely expose some status. However, it is not yet clear what that will look like.
This will be addressed prior to promotion beyond "Provisional".
howardjohn marked this conversation as resolved.
Show resolved Hide resolved