Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement Proposal (Implementable): NKG config #930

Merged
merged 5 commits into from
Aug 7, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 97 additions & 1 deletion docs/proposals/control-plane-config.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Enhancement Proposal-928: Control Plane Dynamic Configuration

- Issue: https://github.com/nginxinc/nginx-kubernetes-gateway/issues/928
- Status: Provisional
- Status: Implementable

## Summary

Expand All @@ -17,3 +17,99 @@ option that we will support is log level.
## Non-Goals

This proposal is *not* defining a way to dynamically configure the data plane.

## Introduction
sjberman marked this conversation as resolved.
Show resolved Hide resolved

The NKG control plane will evolve to have various user-configurable options. These could include, but are not
limited to, log level, tracing, or metrics. For the best user experience, these options should be able to be
changed at runtime, to avoid having to restart NKG. The first option that we will allow users to configure is the
log level. The easiest and most intuitive way to implement a Kubernetes-native API is through a CRD.

In this doc, the term "user" will refer to the cluster operator (the person who installs and manages NKG). The
cluster operator owns this CRD resource.

## API, Customer Driven Interfaces, and User Experience

The API would be provided in a CRD. An authorized user would interact with this CRD using `kubectl` to `get`
sjberman marked this conversation as resolved.
Show resolved Hide resolved
or `edit` the configuration.

Proposed configuration CRD example:

```yaml
apiVersion: gateway.nginx.org/v1alpha1
kind: NginxGateway
metadata:
name: nginx-gateway-config
namespace: nginx-gateway
spec:
logging:
level: info
...
status:
conditions:
...
```

- The CRD would be Namespace-scoped, living in the same Namespace as the controller that it applies to.
- CRD is initialized and created when NKG is deployed.
kate-osborn marked this conversation as resolved.
Show resolved Hide resolved
- NKG references the name of this CRD via CLI arg (`--nginx-gateway-config-name`), and only watches this CRD.
If the resource doesn't exist, then an error is logged and event created, and default values are used.
- If user deletes resource, NKG logs an error and creates an event. NKG will revert to default values.

This resource won't be referenced in the `parametersRef` of the GatewayClass, reserving that option for a data
plane CRD. The control plane may end up supporting multiple GatewayClasses, so linking the control CRD to a
GatewayClass wouldn't make sense. Referencing the CRD via a CLI argument ensures we only support one instance of
the CRD per control plane.

For discussion with team:
sjberman marked this conversation as resolved.
Show resolved Hide resolved

- kind name
- default resource name
sjberman marked this conversation as resolved.
Show resolved Hide resolved

## Use Cases

The high level use case for dynamically changing settings in the NKG control plane is to allow users to alter
behavior without the need for restarting NKG and experiencing downtime.

For the specific log level use case, users may be experiencing problems with NKG that require more information to
diagnose. These problems could include:

- Not seeing or processing Kubernetes resources that it should be.
- Configuring the data plane incorrectly based on the defined Kubernetes resources.
- Crashes or errors without enough detail.

Being able to dynamically change the log level can allow for a quick way to obtain more information about
the state of the control plane without losing that state due to a required restart.

## Testing

Unit tests can be leveraged for verifying that NKG properly watches and acts on CRD changes. These tests would
be similar in behavior as the current unit tests that verify Gateway API resource processing.

## Security Considerations
sjberman marked this conversation as resolved.
Show resolved Hide resolved

We need to ensure that any configurable fields that are exposed to a user are:

- Properly validated. This means that the fields should be the correct type (integer, string, etc.), have appropriate
length, and use regex patterns or enums to prevent any unwanted input. This will initially be done through
OpenAPI schema validation. If necessary as the CRD evolves, CEL or webhooks could be used.
- Have a valid use case. The more fields we expose, the more attack vectors we create. We should only be exposing
fields that are genuinely useful for a user to change dynamically.

RBAC via the Kubernetes API server will ensure that only authorized users can update the CRD containing NKG control
plane configuration.

## Alternatives

- ConfigMap
A ConfigMap is another type of resource that a user can provide configuration options within, however it lacks the
benefits of a CRD, specifically built-in schema validation, versioning, and conversion webhooks.

- Custom API server
The NKG control plane could implement its own custom API server. However the overhead of implementing this, which
would include auth, validation, endpoints, and so on, would not be worth it due to the fact that the Kubernetes
API server already does all of these things for us.

## References

- [Kubernetes Custom Resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/)