From 2afd8f0482402743c5c1658ab9903197b554ced0 Mon Sep 17 00:00:00 2001 From: Saylor Berman Date: Fri, 31 May 2024 10:36:33 -0600 Subject: [PATCH 1/3] Extend troubleshooting doc Problem: As a user, I want to know how to collect info to diagnose and get support when failures occur. Solution: Extend the troubleshooting doc to contain info about collecting status, events, and logs. --- .../how-to/monitoring/troubleshooting.md | 70 ++++++++++++++++++- 1 file changed, 68 insertions(+), 2 deletions(-) diff --git a/site/content/how-to/monitoring/troubleshooting.md b/site/content/how-to/monitoring/troubleshooting.md index 8b8f4c631e..fc8445ca2f 100644 --- a/site/content/how-to/monitoring/troubleshooting.md +++ b/site/content/how-to/monitoring/troubleshooting.md @@ -9,7 +9,73 @@ docs: "DOCS-1419" This topic describes possible issues users might encounter when using NGINX Gateway Fabric. When possible, suggested workarounds are provided. -### NGINX fails to reload +### **General Troubleshooting** + +When attempting to diagnose a problem or get support, there are a few important data points that can be collected to help with understanding what issues may exist. + +#### Resource Status + +To get the status of a resource, use `kubectl describe`. For example, to check the status of the `coffee` HTTPRoute: + +```shell +kubectl describe httproutes.gateway.networking.k8s.io coffee [-n namespace] +``` + +```text +... +Status: + Parents: + Conditions: + Last Transition Time: 2024-05-31T16:22:26Z + Message: The route is accepted + Observed Generation: 1 + Reason: Accepted + Status: True + Type: Accepted + Last Transition Time: 2024-05-31T16:22:26Z + Message: All references are resolved + Observed Generation: 1 + Reason: ResolvedRefs + Status: True + Type: ResolvedRefs + Controller Name: gateway.nginx.org/nginx-gateway-controller + Parent Ref: + Group: gateway.networking.k8s.io + Kind: Gateway + Name: gateway + Namespace: default + Section Name: http +``` + +If a resource has any errors relating to its configuration or relation to other resources, it is likely that those errors will be contained within the status. + +#### Events + +Events may be created by NGINX Gateway Fabric or other Kubernetes components that could indicate system or configuration issues. To see events: + +```shell +kubectl get events [-n namespace] +``` + +#### Logs + +Logs of the NGINX Gateway Fabric control plane and data plane can contain information that isn't otherwise reported in status or events. These could include errors in processing or passing traffic. + +To see logs for the control plane container (replacing the name of the deployment if necessary): + +```shell +kubectl -n nginx-gateway logs deployments/nginx-gateway-fabric -c nginx-gateway +``` + +To see logs for the data plane container (replacing the name of the deployment if necessary): + +```shell +kubectl -n nginx-gateway logs deployments/nginx-gateway-fabric -c nginx +``` + +You can also see the logs of a container that has crashed or been killed, by specifying the `-p` flag with the above commands. + +### **NGINX fails to reload** #### Description @@ -24,7 +90,7 @@ To resolve this issue you will need to set `allowPrivilegeEscalation` to `true`. - If using Helm, you can set the `nginxGateway.securityContext.allowPrivilegeEscalation` value. - If using the manifests directly, you can update this field under the `nginx-gateway` container's `securityContext`. -### Usage Reporting errors +### **Usage Reporting errors** #### Description From c66e51ac724f451a6711cd82d4de341f994da29b Mon Sep 17 00:00:00 2001 From: Saylor Berman Date: Fri, 31 May 2024 11:22:36 -0600 Subject: [PATCH 2/3] Address code review --- .../how-to/monitoring/troubleshooting.md | 46 +++++++++++-------- 1 file changed, 27 insertions(+), 19 deletions(-) diff --git a/site/content/how-to/monitoring/troubleshooting.md b/site/content/how-to/monitoring/troubleshooting.md index fc8445ca2f..9fdddb27fd 100644 --- a/site/content/how-to/monitoring/troubleshooting.md +++ b/site/content/how-to/monitoring/troubleshooting.md @@ -9,13 +9,13 @@ docs: "DOCS-1419" This topic describes possible issues users might encounter when using NGINX Gateway Fabric. When possible, suggested workarounds are provided. -### **General Troubleshooting** +### General Troubleshooting When attempting to diagnose a problem or get support, there are a few important data points that can be collected to help with understanding what issues may exist. -#### Resource Status +##### Resource Status -To get the status of a resource, use `kubectl describe`. For example, to check the status of the `coffee` HTTPRoute: +To get the status of a resource, use `kubectl describe`. For example, to check the status of the `coffee` HTTPRoute, which has an error: ```shell kubectl describe httproutes.gateway.networking.k8s.io coffee [-n namespace] @@ -26,17 +26,17 @@ kubectl describe httproutes.gateway.networking.k8s.io coffee [-n namespace] Status: Parents: Conditions: - Last Transition Time: 2024-05-31T16:22:26Z + Last Transition Time: 2024-05-31T17:20:51Z Message: The route is accepted - Observed Generation: 1 + Observed Generation: 4 Reason: Accepted Status: True Type: Accepted - Last Transition Time: 2024-05-31T16:22:26Z - Message: All references are resolved - Observed Generation: 1 - Reason: ResolvedRefs - Status: True + Last Transition Time: 2024-05-31T17:20:51Z + Message: spec.rules[0].backendRefs[0].name: Not found: "bad-backend" + Observed Generation: 4 + Reason: BackendNotFound + Status: False Type: ResolvedRefs Controller Name: gateway.nginx.org/nginx-gateway-controller Parent Ref: @@ -47,9 +47,9 @@ Status: Section Name: http ``` -If a resource has any errors relating to its configuration or relation to other resources, it is likely that those errors will be contained within the status. +If a resource has any errors relating to its configuration or relation to other resources, it is likely that those errors will be contained within the status. The `ObservedGeneration` in the status should match the `ObservedGeneration` of the resource. Otherwise, this could mean that the resource wasn't processed yet or the status failed to update. -#### Events +##### Events Events may be created by NGINX Gateway Fabric or other Kubernetes components that could indicate system or configuration issues. To see events: @@ -57,25 +57,33 @@ Events may be created by NGINX Gateway Fabric or other Kubernetes components tha kubectl get events [-n namespace] ``` -#### Logs +For example, a warning event when the NginxGateway configuration CRD is deleted: + +```text +kubectl -n nginx-gateway get event +LAST SEEN TYPE REASON OBJECT MESSAGE +5s Warning ResourceDeleted nginxgateway/ngf-config NginxGateway configuration was deleted; using defaults +``` + +##### Logs Logs of the NGINX Gateway Fabric control plane and data plane can contain information that isn't otherwise reported in status or events. These could include errors in processing or passing traffic. -To see logs for the control plane container (replacing the name of the deployment if necessary): +To see logs for the control plane container: ```shell -kubectl -n nginx-gateway logs deployments/nginx-gateway-fabric -c nginx-gateway +kubectl -n nginx-gateway logs -c nginx-gateway ``` -To see logs for the data plane container (replacing the name of the deployment if necessary): +To see logs for the data plane container: ```shell -kubectl -n nginx-gateway logs deployments/nginx-gateway-fabric -c nginx +kubectl -n nginx-gateway logs -c nginx ``` You can also see the logs of a container that has crashed or been killed, by specifying the `-p` flag with the above commands. -### **NGINX fails to reload** +### NGINX fails to reload #### Description @@ -90,7 +98,7 @@ To resolve this issue you will need to set `allowPrivilegeEscalation` to `true`. - If using Helm, you can set the `nginxGateway.securityContext.allowPrivilegeEscalation` value. - If using the manifests directly, you can update this field under the `nginx-gateway` container's `securityContext`. -### **Usage Reporting errors** +### Usage Reporting errors #### Description From 7b49170ca4693a569318a6b45be361f1781c4160 Mon Sep 17 00:00:00 2001 From: Saylor Berman Date: Tue, 4 Jun 2024 08:22:27 -0600 Subject: [PATCH 3/3] Apply suggestions from code review Co-authored-by: Alan Dooley --- .../content/how-to/monitoring/troubleshooting.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/site/content/how-to/monitoring/troubleshooting.md b/site/content/how-to/monitoring/troubleshooting.md index 9fdddb27fd..fd85dc775a 100644 --- a/site/content/how-to/monitoring/troubleshooting.md +++ b/site/content/how-to/monitoring/troubleshooting.md @@ -9,13 +9,13 @@ docs: "DOCS-1419" This topic describes possible issues users might encounter when using NGINX Gateway Fabric. When possible, suggested workarounds are provided. -### General Troubleshooting +### General troubleshooting -When attempting to diagnose a problem or get support, there are a few important data points that can be collected to help with understanding what issues may exist. +When investigating a problem or requesting help, there are important data points that can be collected to help understand what issues may exist. -##### Resource Status +##### Resource status -To get the status of a resource, use `kubectl describe`. For example, to check the status of the `coffee` HTTPRoute, which has an error: +To check the status of a resource, use `kubectl describe`. This example checks the status of the `coffee` HTTPRoute, which has an error: ```shell kubectl describe httproutes.gateway.networking.k8s.io coffee [-n namespace] @@ -47,11 +47,11 @@ Status: Section Name: http ``` -If a resource has any errors relating to its configuration or relation to other resources, it is likely that those errors will be contained within the status. The `ObservedGeneration` in the status should match the `ObservedGeneration` of the resource. Otherwise, this could mean that the resource wasn't processed yet or the status failed to update. +If a resource has errors relating to its configuration or relationship to other resources, they can likely be read in the status. The `ObservedGeneration` in the status should match the `ObservedGeneration` of the resource. Otherwise, this could mean that the resource hasn't been processed yet or that the status failed to update. ##### Events -Events may be created by NGINX Gateway Fabric or other Kubernetes components that could indicate system or configuration issues. To see events: +Events created by NGINX Gateway Fabric or other Kubernetes components could indicate system or configuration issues. To see events: ```shell kubectl get events [-n namespace] @@ -67,7 +67,7 @@ LAST SEEN TYPE REASON OBJECT ##### Logs -Logs of the NGINX Gateway Fabric control plane and data plane can contain information that isn't otherwise reported in status or events. These could include errors in processing or passing traffic. +Logs from the NGINX Gateway Fabric control plane and data plane can contain information that isn't available to status or events. These can include errors in processing or passing traffic. To see logs for the control plane container: @@ -81,7 +81,7 @@ To see logs for the data plane container: kubectl -n nginx-gateway logs -c nginx ``` -You can also see the logs of a container that has crashed or been killed, by specifying the `-p` flag with the above commands. +You can see logs for a crashed or killed container by adding the `-p` flag to the above commands. ### NGINX fails to reload