-
Notifications
You must be signed in to change notification settings - Fork 390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve process is alive detection #1361
Comments
@elfenpiff This is related to both #611 and #620. We should follow RAII for the resources of the app. I suppose a hierarchical structure as sketched in the |
@mossmaurice loosely related. But the problem in here is not the handling of shared memory resource. RouDi falsely assumes that an application has died since the high cpu load prevented that application to send the heartbeat in the required time frame. |
@elfenpiff I think there is the possibility to use a pipe or stream socket. AFAIK when the writing end of a pipe/stream socket gets closed, the process with the receiving end would get a |
…essage handle thread, remove isMonitored and handle it directly in roudi Signed-off-by: Christian Eltzschig <[email protected]>
…n background threads as well as the keepalive background thread of applications. Deactivated roudi monitoring as well Signed-off-by: Christian Eltzschig <[email protected]>
Signed-off-by: Christian Eltzschig <[email protected]>
…classes Signed-off-by: Christian Eltzschig <[email protected]>
…hread and always disable monitoring in roudi Signed-off-by: Christian Eltzschig <[email protected]>
Some info shared from my side about monitor mode: |
…ory-for-process-alive-detection iox-#1361 Use shared memory for process alive detection
Brief feature description
When on high CPU load it is possible that the heartbeat thread does not send its heartbeats in a given time-frame. This can cause roudi to cleanup all resources of the application which missed the heartbeat which may lead to use of resources which are deleted.
The solution should be as efficient as possible and may avoid context switches or sending messages (if possible).
One approach could be to use
getpgid
, which returns the group id of a given pid. If the pid does not exist it will fail. If we could couple this with the process runtime or creation time we can identify a process and check if it is still alive.Relates
#1380
The text was updated successfully, but these errors were encountered: