NGF doesn't wait long enough for new NGINX workers to start #1106
Labels
bug
Something isn't working
refined
Requirements are refined and the issue is ready to be implemented.
size/extra-small
Estimated to be completed within a day
Milestone
Describe the bug
Got many reload errors in NGF during a longevity test over 4 days:
However, no errors about reload problems in NGINX config.
Also note that the timeout we use for checking for new workers is
nginx-gateway-fabric/internal/mode/static/nginx/runtime/manager.go
Line 21 in 72b6c6e
1s
, while the timeout for checking for reload by sending a request60s
nginx-gateway-fabric/internal/mode/static/nginx/runtime/manager.go
Line 22 in 72b6c6e
To Reproduce
It is hard to reproduce normally. But overloading the node where NKG is running (CPU) should help, which will delay the start of new NGINX worker processes.
Expected behavior
NGF should not give up on new workers in 1s -- too soon.
Perhaps it is better have a single timeout for the whole reload operation (
Reload()
method of runtime.Manager ), rather than individual timeouts.Your environment
NGF:
Kubernetes:
In my environment, all successful reloads finished in less than 5s:
The text was updated successfully, but these errors were encountered: