Skip to content

Conversation

@rebeccahhh
Copy link
Member

@rebeccahhh rebeccahhh commented Jul 9, 2025

SUMMARY

not timing out before undercuts usefulness of our log-traceback-middleware in django-ansible-base that logs a traceback from requests that get timed out -- because uwsgi or gunicorn has to send the timeout signal to the worker handling the request. Also leads to issues where requests that envoy has already timed out are filling up queues of the workers of the components.

Also, configure nginx to return a 503 if WSGI server doesn't respond.

ADDITIONAL INFORMATION

co-authored by: @kdelee

Copy link
Contributor

@dsavineau dsavineau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think removed some } char from the nginx configuration

nginx: [emerg] unexpected end of file, expecting "}" in /etc/nginx/nginx.conf:92

I think you also need to update the CRD otherwise users won't be able to customize the timeout value

@rebeccahhh
Copy link
Member Author

Apologies folks, I meant to set this as a draft.

@rebeccahhh
Copy link
Member Author

Screenshot from 2025-07-23 14-43-27 The gunicorn_timeout value is now computed, BUT can be overwritten in the CR still, the 10 is the computed value and the 120 is the me overwriting it in my galaxy-cr.yml Screenshot from 2025-07-23 14-59-01

@rebeccahhh
Copy link
Member Author

Screenshot from 2025-07-23 15-06-25 Screenshot from 2025-07-23 15-03-48 not the best view for the API, but the expanded view had some OpenShift UI difficulties for me and I chose not to address it because the point of it was that the API was successfully curled and not whether or not OpenShift's UI was working.

@rebeccahhh rebeccahhh force-pushed the timeout_api_before_proxy branch from 16d3a49 to 3fb99a2 Compare July 23, 2025 19:21
@rebeccahhh rebeccahhh requested a review from dsavineau July 23, 2025 19:30
@rebeccahhh
Copy link
Member Author

new 503 pattern applied in configmap

Screenshot from 2025-07-23 15-35-33 Screenshot from 2025-07-23 15-35-23

@rebeccahhh
Copy link
Member Author

Confirmed I was able to sync and download a collection from my galaxy with these changes in play.

rebeccahhh@lappytoppy:~/ansible$ ansible-galaxy collection install awx.awx -vvv --force -c
ansible-galaxy [core 2.16.6]
  config file = /home/rebeccahhh/ansible/ansible.cfg
  configured module search path = ['/home/rebeccahhh/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/rebeccahhh/.local/lib/python3.12/site-packages/ansible
  ansible collection location = /home/rebeccahhh/.ansible/collections:/usr/share/ansible/collections
  executable location = /home/rebeccahhh/.local/bin/ansible-galaxy
  python version = 3.12.7 (main, Oct  1 2024, 00:00:00) [GCC 13.3.1 20240913 (Red Hat 13.3.1-3)] (/usr/bin/python3)
  jinja version = 3.1.4
  libyaml = True
Using /home/rebeccahhh/ansible/ansible.cfg as config file
Starting galaxy collection install process
Found installed collection awx.awx:23.2.0 at '/home/rebeccahhh/.ansible/collections/ansible_collections/awx/awx'
Found installed collection flowerysong.hvault:0.2.0 at '/home/rebeccahhh/.ansible/collections/ansible_collections/flowerysong/hvault'
Found installed collection kubernetes.core:5.0.0 at '/home/rebeccahhh/.ansible/collections/ansible_collections/kubernetes/core'
Found installed collection redhat_performance.aap_perfscale:1.0.0 at '/home/rebeccahhh/.ansible/collections/ansible_collections/redhat_performance/aap_perfscale'
Found installed collection community.docker:3.4.9 at '/home/rebeccahhh/.ansible/collections/ansible_collections/community/docker'
Process install dependency map
Opened /home/rebeccahhh/.ansible/galaxy_token
Starting collection install process
Downloading https://galaxy-rhunter.apps-crc.testing/api/galaxy/v3/plugin/ansible/content/community/collections/artifacts/awx-awx-24.6.1.tar.gz to /home/rebeccahhh/.ansible/tmp/ansible-local-51440o9mfpgdo/tmpcf534wnv/awx-awx-24.6.1-_xh354fc
Collection 'awx.awx:24.6.1' obtained from server community https://galaxy-rhunter.apps-crc.testing/api/galaxy/content/community/
Installing 'awx.awx:24.6.1' to '/home/rebeccahhh/.ansible/collections/ansible_collections/awx/awx'
awx.awx:24.6.1 was installed successfully

@rebeccahhh rebeccahhh requested a review from fao89 July 23, 2025 20:09
@rebeccahhh
Copy link
Member Author

@fao89 @dsavineau I have addressed your comments and gotten this into a non-draft stage, it's ready for your review. :)

@rebeccahhh rebeccahhh force-pushed the timeout_api_before_proxy branch from 3fb99a2 to 33e483a Compare July 23, 2025 21:06
Copy link
Contributor

@dsavineau dsavineau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see changes in the CRD related to client_request_timeout so one won't be able to update that value.

not timing out before undercuts usefulness of our log-traceback-middleware in
django-ansible-base that logs a traceback from requests that get timed
out -- because uwsgi or gunicorn has to send the timeout signal to the
worker handling the request. Also leads to issues where requests that
envoy has already timed out are filling up queues of the workers of the
components.

Also, configure nginx to return a 503 if WSGI server doesn't respond.

co-authored by: [email protected]
@rebeccahhh rebeccahhh force-pushed the timeout_api_before_proxy branch from 33e483a to 2e5e66c Compare July 29, 2025 19:18
@rebeccahhh rebeccahhh requested review from dsavineau and fao89 July 29, 2025 19:27
@rebeccahhh
Copy link
Member Author

@dsavineau yes, that is intentional. Based on conversation with @rooftopcellist , in order to increase that we'd have to add the feature to set annotations on routes, which we currently don't have.

Copy link
Contributor

@dsavineau dsavineau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me but I just want to point out that GUNICORN_TIMEOUT_GRACE_PERIOD won't be honored for the api and content pods with the current modification.

There's a mapping within the start-api and start-content-app entrypoints to translate environment variables to gunicorn cli options and the timeout grace period isn't one of them

https://github.com/ansible/galaxy_ng/blob/main/docker/bin/start-api
https://github.com/ansible/galaxy_ng/blob/main/docker/bin/start-content-app

@kdelee
Copy link
Member

kdelee commented Jul 31, 2025

@dsavineau plan is to set env var GUNICORN_TIMEOUT_GRACE_PERIOD as its harmless and follow up with changes so it would be received
Looks like you need to dismiss your previous review requesting changes (we don't have that option given our permissions to this repo)

@dsavineau dsavineau self-requested a review July 31, 2025 17:35
@rooftopcellist rooftopcellist merged commit ff803c2 into ansible:main Aug 1, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants