From 407052042be601c26984aaaef0ffe717e5d05e77 Mon Sep 17 00:00:00 2001 From: Saylor Berman Date: Wed, 2 Aug 2023 14:30:19 -0600 Subject: [PATCH 1/4] Enhancement Proposal (Implementable): NKG config Update enhancement proposal with implementable details for NKG control plane dynamic configuration. --- docs/proposals/control-plane-config.md | 85 +++++++++++++++++++++++++- 1 file changed, 84 insertions(+), 1 deletion(-) diff --git a/docs/proposals/control-plane-config.md b/docs/proposals/control-plane-config.md index 751d7079f7..55eebcf74c 100644 --- a/docs/proposals/control-plane-config.md +++ b/docs/proposals/control-plane-config.md @@ -1,7 +1,7 @@ # Enhancement Proposal-928: Control Plane Dynamic Configuration - Issue: https://github.com/nginxinc/nginx-kubernetes-gateway/issues/928 -- Status: Provisional +- Status: Implementable ## Summary @@ -17,3 +17,86 @@ option that we will support is log level. ## Non-Goals This proposal is *not* defining a way to dynamically configure the data plane. + +## Introduction + +The NKG control plane will evolve to have various user-configurable options. These could include, but are not +limited to, log level, tracing, or metrics. For the best user experience, these options should be able to be +changed at runtime, to avoid having to restart NKG. The first option that we will allow users to configure is the +log level. The easiest and most intuitive way to implement a Kubernetes-native API is through a CRD. + +## API, Customer Driven Interfaces, and User Experience + +The API would be provided in a CRD. An authorized user would interact with this CRD using `kubectl` to `get` +or `edit` the configuration. + +Proposed configuration CRD example: + +```yaml +apiVersion: nginx.gateway.k8s.io/v1beta1 +kind: NGINXControlConfig +metadata: + name: nkg-config + namespace: nginx-gateway +spec: + logLevel: info + ... +``` + +- The CRD would be Namespace-scoped, living in the same Namespace as the controller that it applies to. +- CRD is initialized and created when NKG is deployed +- NKG references the name of this CRD via CLI arg, and only watches this CRD +- If user deletes resource, NKG logs an error and creates an event. Last state is used until CRD is restored. + +For discussion with team: + +- API group name +- kind name +- default resource name + +## Use Cases + +The high level use case for dynamically changing settings in the NKG control plane is to allow users to alter +behavior without the need for restarting NKG and experiencing downtime. + +For the specific log level use case, users may be experiencing problems with NKG that require more information to +diagnose. These problems could include: + +- Not seeing or processing Kubernetes resources that it should be. +- Configuring the data plane incorrectly based on the defined Kubernetes resources. +- Crashes or errors without enough detail. + +Being able to dynamically change the log level can allow for a quick way to obtain more information about +the state of the control plane without losing that state due to a required restart. + +## Testing + +Unit tests can be leveraged for verifying that NKG properly watches and acts on CRD changes. These tests would +be similar in behavior as the current unit tests that verify Gateway API resource processing. + +## Security Considerations + +We need to ensure that any configurable fields that are exposed to a user are: + +- Properly validated. This means that the fields should be the correct type (integer, string, etc.), have appropriate +length, and use regex patterns or enums to prevent any unwanted input. +- Have a valid use case. The more fields we expose, the more attack vectors we create. We should only be exposing +fields that are genuinely useful for a user to change dynamically. + +RBAC via the Kubernetes API server will ensure that only authorized users can update the CRD containing NKG control +plane configuration. + +## Alternatives + +- ConfigMap +A ConfigMap is another type of resource that a user can provide configuration options within, however it lacks the +benefits of a CRD, specifically built-in schema validation, versioning, and conversion webhooks. + +- Custom API server +The NKG control plane could implement its own custom API server. However the overhead of implementing this, which +would include auth, validation, endpoints, and so on, would not be worth it due to the fact that the Kubernetes +API server already does all of these things for us. + +## References + +- [Kubernetes Custom Resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) From 096a020a446a96a0083b62e86fcfe99f2eb6ed2b Mon Sep 17 00:00:00 2001 From: Saylor Berman Date: Thu, 3 Aug 2023 13:43:59 -0600 Subject: [PATCH 2/4] Address review feedback --- docs/proposals/control-plane-config.md | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/docs/proposals/control-plane-config.md b/docs/proposals/control-plane-config.md index 55eebcf74c..2ffba5f946 100644 --- a/docs/proposals/control-plane-config.md +++ b/docs/proposals/control-plane-config.md @@ -25,6 +25,9 @@ limited to, log level, tracing, or metrics. For the best user experience, these changed at runtime, to avoid having to restart NKG. The first option that we will allow users to configure is the log level. The easiest and most intuitive way to implement a Kubernetes-native API is through a CRD. +In this doc, the term "user" will refer to the cluster operator (the person who installs and manages NKG). The +cluster operator owns this CRD resource. + ## API, Customer Driven Interfaces, and User Experience The API would be provided in a CRD. An authorized user would interact with this CRD using `kubectl` to `get` @@ -33,20 +36,24 @@ or `edit` the configuration. Proposed configuration CRD example: ```yaml -apiVersion: nginx.gateway.k8s.io/v1beta1 +apiVersion: nginx.gateway.k8s.io/v1alpha1 kind: NGINXControlConfig metadata: name: nkg-config namespace: nginx-gateway spec: - logLevel: info + log: + level: info + ... +status: ... ``` - The CRD would be Namespace-scoped, living in the same Namespace as the controller that it applies to. -- CRD is initialized and created when NKG is deployed -- NKG references the name of this CRD via CLI arg, and only watches this CRD -- If user deletes resource, NKG logs an error and creates an event. Last state is used until CRD is restored. +- CRD is initialized and created when NKG is deployed. +- NKG references the name of this CRD via CLI arg, and only watches this CRD. If the resource doesn't exist, +then an error is logged and event created, and default values are used. +- If user deletes resource, NKG logs an error and creates an event. NKG will revert to default values. For discussion with team: @@ -79,7 +86,8 @@ be similar in behavior as the current unit tests that verify Gateway API resourc We need to ensure that any configurable fields that are exposed to a user are: - Properly validated. This means that the fields should be the correct type (integer, string, etc.), have appropriate -length, and use regex patterns or enums to prevent any unwanted input. +length, and use regex patterns or enums to prevent any unwanted input. This will initially be done through +OpenAPI schema validation. If necessary as the CRD evolves, CEL or webhooks could be used. - Have a valid use case. The more fields we expose, the more attack vectors we create. We should only be exposing fields that are genuinely useful for a user to change dynamically. From aa8dcbda9235b16fa5908724f1301ace8372b0ba Mon Sep 17 00:00:00 2001 From: Saylor Berman Date: Fri, 4 Aug 2023 08:16:31 -0600 Subject: [PATCH 3/4] More suggestions --- docs/proposals/control-plane-config.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/docs/proposals/control-plane-config.md b/docs/proposals/control-plane-config.md index 2ffba5f946..9f64657cf7 100644 --- a/docs/proposals/control-plane-config.md +++ b/docs/proposals/control-plane-config.md @@ -36,7 +36,7 @@ or `edit` the configuration. Proposed configuration CRD example: ```yaml -apiVersion: nginx.gateway.k8s.io/v1alpha1 +apiVersion: gateway.nginx.org/v1alpha1 kind: NGINXControlConfig metadata: name: nkg-config @@ -55,9 +55,13 @@ status: then an error is logged and event created, and default values are used. - If user deletes resource, NKG logs an error and creates an event. NKG will revert to default values. +This resource won't be referenced in the `parametersRef` of the GatewayClass, reserving that option for a data +plane CRD. The control plane may end up supporting multiple GatewayClasses, so linking the control CRD to a +GatewayClass wouldn't make sense. Referencing the CRD via a CLI argument ensures we only support one instance of +the CRD per control plane. + For discussion with team: -- API group name - kind name - default resource name From ad27bd7200572c3b839b969d7e46b3232dbfec54 Mon Sep 17 00:00:00 2001 From: Saylor Berman Date: Mon, 7 Aug 2023 11:15:27 -0600 Subject: [PATCH 4/4] More name changes --- docs/proposals/control-plane-config.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/docs/proposals/control-plane-config.md b/docs/proposals/control-plane-config.md index 9f64657cf7..954f2c41bc 100644 --- a/docs/proposals/control-plane-config.md +++ b/docs/proposals/control-plane-config.md @@ -37,22 +37,23 @@ Proposed configuration CRD example: ```yaml apiVersion: gateway.nginx.org/v1alpha1 -kind: NGINXControlConfig +kind: NginxGateway metadata: - name: nkg-config + name: nginx-gateway-config namespace: nginx-gateway spec: - log: + logging: level: info ... status: + conditions: ... ``` - The CRD would be Namespace-scoped, living in the same Namespace as the controller that it applies to. - CRD is initialized and created when NKG is deployed. -- NKG references the name of this CRD via CLI arg, and only watches this CRD. If the resource doesn't exist, -then an error is logged and event created, and default values are used. +- NKG references the name of this CRD via CLI arg (`--nginx-gateway-config-name`), and only watches this CRD. + If the resource doesn't exist, then an error is logged and event created, and default values are used. - If user deletes resource, NKG logs an error and creates an event. NKG will revert to default values. This resource won't be referenced in the `parametersRef` of the GatewayClass, reserving that option for a data