Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/increase nginx timeout #777

Merged
merged 3 commits into from
Jun 22, 2023

Conversation

kate-osborn
Copy link
Contributor

Proposed changes

Problems:

  1. The conformance tests consistently fail during setup because NKG times out waiting for the nginx process to start.
  2. The conformance test makefile command runs the incorrect set of tests and times out before running all tests.

Solutions:

  1. Increase the pid timeout (i.e. the time we wait for nginx to start) to 10s. This is sufficient in my local environment.
  2. Update the list of supported features and exempt features and pass them to the conformance test suite, and increase the go test timeout to 25 minutes to allow the suite to complete. I also added the debug flag to increase the verbosity of the test output.

Testing: Ran the conformance tests locally. Confirmed that they will consistently pass setup and run to completion.

Checklist

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked that all unit tests pass after adding my changes
  • I have updated necessary documentation
  • I have rebased my branch onto main
  • I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

@kate-osborn kate-osborn requested a review from a team as a code owner June 21, 2023 22:22
@github-actions github-actions bot added the bug Something isn't working label Jun 21, 2023
@kate-osborn kate-osborn self-assigned this Jun 21, 2023
Copy link
Contributor

@pleshakov pleshakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increase the pid timeout (i.e. the time we wait for nginx to start) to 10s. This is sufficient in my local environment.

I wonder if the timeout also depends on the time it takes for Docker to download NGINX image. if that is the case, it might take more than 10s in different environments and perhaps pre-loading NGINX image can also help?

perhaps the big problem here is that NKG once it fails to reload, doesn't try to recover from it and reload again? (unless the config is re-generated for some reason)
if NKG supported retrying, it would be able to wait until NKG started and eventually fix the statuses. I think being able to fix that will be important not only for conformance tests but for production (since the timeout 10s might not be enough)

at the same time, approving this PR, since it fixes the issue for the conformance tests and the other changes also make sense 👍

@kate-osborn
Copy link
Contributor Author

Increase the pid timeout (i.e. the time we wait for nginx to start) to 10s. This is sufficient in my local environment.

I wonder if the timeout also depends on the time it takes for Docker to download NGINX image. if that is the case, it might take more than 10s in different environments and perhaps pre-loading NGINX image can also help?

perhaps the big problem here is that NKG once it fails to reload, doesn't try to recover from it and reload again? (unless the config is re-generated for some reason) if NKG supported retrying, it would be able to wait until NKG started and eventually fix the statuses. I think being able to fix that will be important not only for conformance tests but for production (since the timeout 10s might not be enough)

at the same time, approving this PR, since it fixes the issue for the conformance tests and the other changes also make sense 👍

perhaps pre-loading NGINX image can also help

I think that's a smart idea. We should do that in our conformance test pipeline job @ciarams87 @lucacome

perhaps the big problem here is that NKG once it fails to reload, doesn't try to recover from it and reload again?

Yep, we need to handle this. Do we have a ticket for this yet? If not, I can create one.

@pleshakov
Copy link
Contributor

@kate-osborn

Yep, we need to handle this. Do we have a ticket for this yet? If not, I can create one.

we have a related #664 but nothing about retrying reloads

@kate-osborn
Copy link
Contributor Author

@kate-osborn

Yep, we need to handle this. Do we have a ticket for this yet? If not, I can create one.

we have a related #664 but nothing about retrying reloads

Updated the ticket to include retries

@kate-osborn kate-osborn merged commit 51f59d7 into nginxinc:main Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants