-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix broken symlink python3 and unmet dependencies warnings. #124
Merged
+16
−7
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Co-authored-by: peppepetra [email protected]
agileshaw
approved these changes
May 5, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
aieri
approved these changes
May 5, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
lathiat
added a commit
to lathiat/prometheus-openstack-exporter
that referenced
this pull request
Mar 27, 2024
Currently, slow running OpenStack API Requests (either stuck connecting or still waiting for the actual response) from the periodic DataGatherer task will block the HTTPServer connections from being processed. Conversely, a stalled client of the HTTPServer (e.g. opening a telnet session and not sending a request) will also block other HTTPServer connections from being processed and also block the DataGatherer task from running. Observed Symptoms: - Slow or failed prometheus requests - Statistics not being updated as often as you would expect - HTTP 500 responses and BrokenPipeError tracebacks being logged due to later trying to respond to prometheus clients which timed out and disconnected the socket. Cause: This happens because we are intending to use the eventlet library for asynchronous non-blocking I/O, but, not all code running is correctly using the patched or "green" versions of various standard libraries (e.g. socket). As a result, we sometimes block the other tasks from running. Fix this by ensuring the entire program is correctly using eventlet and green patched functions by importing eventlet and using eventlet.patcher.monkey_patch() before importing any other modules. == History Lesson == There have been several incorrect attempts to solve this and some related problems. To try and avoid any further such problems, I have comprehensively documented the historical issues and why those fixes have not worked below, both for my understanding and yours :) 1. eventlet implements asynchronous "non-blocking" socket I/O without any code changes to the application and without using real pthreads by using co-operative "green threads" from the greenlet library. For this to work correctly, it needs to replace many standard libraries (e.g. socket, time, threading) with an alternative implementation. This applies to both our own code, and code within imported modules (e.g. novaclient). This does not happen automatically, you can find the full details at https://eventlet.readthedocs.io/en/latest/patching.html but as a brief summary this can be done with 3 different methods: - Explicitly importing all relevant modules from eventlet.green - Automatically during a single import eventlet.patcher.import_patched - Automatically during future imports with eventlet.patcher.monkey_patch 2. The original Issue canonical#112 found that the process deadlocked with the following error: greenlet.error: cannot switch to a different thread At the time, we used a native Python Thread for the DataGatherer class and separately used the ForkingHTTPServer to allow both functions to operate simultaneously. We did not intend to use eventlet/green threads at all, however, the python-cinderclient library incorrectly imports eventlet.sleep which results in sometimes using green threads accidentally, hence the error. We attempted to fix that in canonical#115 by importing the green version of threading.Thread explicitly. This avoided the "cannot switch to a different thread" issue by only using green threads and not mixing Python threads and green threads in the same process. 3. After merging canonical#115 it was found that the HTTPServer loop never co-operatively yielded to the DataGatherer's thread and the stats were never updated. To fix this, canonical#116 imported the green version of socket, asyncore and time and also littered a few sleep(0) calls around to force co-operative yielding at various points. 4. In canonical#124 we switched from ForkingHTTPServer to the normal HTTPServer because sometimes it would fork too many servers and hit the process or system-wide process limit. Though not noted elsewhere, when I reproduce this issue by connecting many clients using the tool `siege` to a server where I firewalled the nova API connections, I can see that all of those processes are defunct and not actually alive. This is most likely because the process is blocked and the calls to waitpid which would reap them never happen. Since we are not using the eventlet version of http.server.HTTPServer, without the forked model we now block anytime we are handling a server request. Additionally, anytime the DataGatherer green thread calls out through the OpenStack API libraries, it uses non-patched versions of socket/requests/urllib3 and also blocks the HTTPServer which is now inside the same process. == Testing == To test we now have a working solution, you can 1. Block access to the Nova API (causes connect to hang for 120 seconds) using this firewall command: iptables -I OUTPUT -p tcp -m state --state NEW --dport 8774 -j DROP 2. Make many concurrent and repeated requests using siege: while true; do siege http://172.16.0.30:9183/metrics -t 5s -c 5 -d 0.1; done When testing with these changes, I never see us block a server or client connection and all requests take a few milliseconds at most, whether or not the client requests are slow or we open a connection to the server that doesn't send a request. Fixes: canonical#112, canonical#115, canonical#116, canonical#124, canonical#126
lathiat
added a commit
to lathiat/prometheus-openstack-exporter
that referenced
this pull request
Jun 10, 2024
Currently, slow running OpenStack API Requests (either stuck connecting or still waiting for the actual response) from the periodic DataGatherer task will block HTTPServer connections from being processed. Blocked HTTPServer connections will also block both other connections and the DataGatherer task. Observed Symptoms: - Slow or failed prometheus requests - Statistics not being updated as often as you would expect - HTTP 500 responses and BrokenPipeError tracebacks being logged due to later trying to respond to prometheus clients which timed out and disconnected the socket - Hitting the forked process limit This happens because in the current code, we are intending to use the eventlet library for asynchronous non-blocking I/O, but we are not using it correctly. All code within the main application and all imported dependencies must import the special eventlet "green" versions of many python libraries (e.g. socket, time, threading, SimpleHTTPServer, etc) which yield to other green threads when they would have blocked waiting for I/O or to sleep. Currently this does not always happen. Fix this by importing eventlet and using eventlet.patcher.monkey_patch() before importing any other modules. This will automatically intercept all future imports (including those inside dependencies) and automatically load the green versions of relevant libraries. Documentation on correctly import eventlet can be found here: https://eventlet.readthedocs.io/en/latest/patching.html A detailed and comprehensive analysis of the issue and multiple previous attempts to fix it can be found in Issue canonical#130. If you intend to make further related changes to the use of eventlet, threads or forked processes please read the detailed history lesson available there. Fixes: canonical#130, canonical#126, canonical#124, canonical#116, canonical#115, canonical#112
lathiat
added a commit
to lathiat/prometheus-openstack-exporter
that referenced
this pull request
Jun 12, 2024
Currently, slow running OpenStack API Requests (either stuck connecting or still waiting for the actual response) from the periodic DataGatherer task will block HTTPServer connections from being processed. Blocked HTTPServer connections will also block both other connections and the DataGatherer task. Observed Symptoms: - Slow or failed prometheus requests - Statistics not being updated as often as you would expect - HTTP 500 responses and BrokenPipeError tracebacks being logged due to later trying to respond to prometheus clients which timed out and disconnected the socket - Hitting the forked process limit This happens because in the current code, we are intending to use the eventlet library for asynchronous non-blocking I/O, but we are not using it correctly. All code within the main application and all imported dependencies must import the special eventlet "green" versions of many python libraries (e.g. socket, time, threading, SimpleHTTPServer, etc) which yield to other green threads when they would have blocked waiting for I/O or to sleep. Currently this does not always happen. Fix this by importing eventlet and using eventlet.patcher.monkey_patch() before importing any other modules. This will automatically intercept all future imports (including those inside dependencies) and automatically load the green versions of relevant libraries. Documentation on correctly import eventlet can be found here: https://eventlet.readthedocs.io/en/latest/patching.html A detailed and comprehensive analysis of the issue and multiple previous attempts to fix it can be found in Issue canonical#130. If you intend to make further related changes to the use of eventlet, threads or forked processes please read the detailed history lesson available there. Fixes: canonical#130, canonical#126, canonical#124, canonical#116, canonical#115, canonical#112
lathiat
added a commit
to lathiat/prometheus-openstack-exporter
that referenced
this pull request
Jun 12, 2024
Currently, slow running OpenStack API Requests (either stuck connecting or still waiting for the actual response) from the periodic DataGatherer task will block HTTPServer connections from being processed. Blocked HTTPServer connections will also block both other connections and the DataGatherer task. Observed Symptoms: - Slow or failed prometheus requests - Statistics not being updated as often as you would expect - HTTP 500 responses and BrokenPipeError tracebacks being logged due to later trying to respond to prometheus clients which timed out and disconnected the socket - Hitting the forked process limit This happens because in the current code, we are intending to use the eventlet library for asynchronous non-blocking I/O, but we are not using it correctly. All code within the main application and all imported dependencies must import the special eventlet "green" versions of many python libraries (e.g. socket, time, threading, SimpleHTTPServer, etc) which yield to other green threads when they would have blocked waiting for I/O or to sleep. Currently this does not always happen. Fix this by importing eventlet and using eventlet.patcher.monkey_patch() before importing any other modules. This will automatically intercept all future imports (including those inside dependencies) and automatically load the green versions of relevant libraries. Documentation on correctly import eventlet can be found here: https://eventlet.readthedocs.io/en/latest/patching.html A detailed and comprehensive analysis of the issue and multiple previous attempts to fix it can be found in Issue canonical#130. If you intend to make further related changes to the use of eventlet, threads or forked processes please read the detailed history lesson available there. Fixes: canonical#130, canonical#126, canonical#124, canonical#116, canonical#115, canonical#112
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Co-authored-by: peppepetra [email protected]
original PR