improve process is alive detection #1361

elfenpiff · 2022-05-18T18:00:38Z

Brief feature description

When on high CPU load it is possible that the heartbeat thread does not send its heartbeats in a given time-frame. This can cause roudi to cleanup all resources of the application which missed the heartbeat which may lead to use of resources which are deleted.

The solution should be as efficient as possible and may avoid context switches or sending messages (if possible).
One approach could be to use getpgid, which returns the group id of a given pid. If the pid does not exist it will fail. If we could couple this with the process runtime or creation time we can identify a process and check if it is still alive.

Relates

#1380

The text was updated successfully, but these errors were encountered:

mossmaurice · 2022-05-19T07:59:21Z

@elfenpiff This is related to both #611 and #620. We should follow RAII for the resources of the app. I suppose a hierarchical structure as sketched in the .puml would allow easier handling of the resources in shared memory.

elfenpiff · 2022-05-19T08:39:08Z

@mossmaurice loosely related. But the problem in here is not the handling of shared memory resource.

RouDi falsely assumes that an application has died since the high cpu load prevented that application to send the heartbeat in the required time frame.

elBoberido · 2022-05-19T09:14:53Z

@elfenpiff I think there is the possibility to use a pipe or stream socket. AFAIK when the writing end of a pipe/stream socket gets closed, the process with the receiving end would get a POLLHUP via poll

…essage handle thread, remove isMonitored and handle it directly in roudi Signed-off-by: Christian Eltzschig <[email protected]>

…n background threads as well as the keepalive background thread of applications. Deactivated roudi monitoring as well Signed-off-by: Christian Eltzschig <[email protected]>

Signed-off-by: Christian Eltzschig <[email protected]>

…classes Signed-off-by: Christian Eltzschig <[email protected]>

…hread and always disable monitoring in roudi Signed-off-by: Christian Eltzschig <[email protected]>

qclzdh · 2022-06-20T02:06:49Z

Some info shared from my side about monitor mode:
When CPU load is high, There is a high possibility that "keepalivemsg" can't be sent to roudi within PROCESS_KEEP_ALIVE_TIMEOUT, we use "posix::FileLock::create(runtimeName);" to check that process is really died or not.

…eat'

…ory-for-process-alive-detection iox-#1361 Use shared memory for process alive detection

elfenpiff added enhancement New feature refactoring Refactor code without adding features technical debt unclean code and design flaws labels May 18, 2022

elfenpiff added a commit to ApexAI/iceoryx that referenced this issue Jun 3, 2022

iox-eclipse-iceoryx#1361 deactive calls to introspection classes

ed17d08

Signed-off-by: Christian Eltzschig <[email protected]>

elfenpiff added a commit to ApexAI/iceoryx that referenced this issue Jun 3, 2022

iox-eclipse-iceoryx#1361 deactive calls to the process introspection …

892b3be

…classes Signed-off-by: Christian Eltzschig <[email protected]>

elfenpiff added a commit to ApexAI/iceoryx that referenced this issue Jun 3, 2022

iox-eclipse-iceoryx#1361 Do not send keepalive message in keepalive t…

36e3a9c

…hread and always disable monitoring in roudi Signed-off-by: Christian Eltzschig <[email protected]>

elBoberido mentioned this issue Jun 20, 2022

Disable monitoring (keepalive messages) at compile time #1380

Closed

elBoberido mentioned this issue Aug 11, 2022

icehello example may happen bug when system time rollback #1292

Open

elBoberido mentioned this issue Sep 14, 2023

service discovery: terminated publishers and servers are not removed from the service registry #2027

Open

elBoberido mentioned this issue Oct 5, 2023

zero runtime memory allocation application #2040

Open

elBoberido added a commit that referenced this issue Oct 15, 2023

iox-#1361 Add 'FixedPositionContainer' to dust

e85b640

elBoberido added a commit that referenced this issue Oct 15, 2023

iox-#1361 Add tests for 'FixedPositionContainer'

de2b88f

elBoberido added a commit that referenced this issue Oct 15, 2023

iox-#1361 Deactivate deprecated 'cert-dcl21-cpp' rule

c9a58e5

elBoberido added a commit that referenced this issue Oct 15, 2023

iox-#1361 Port 'PortPool' to new FixedPositionContainer

4177ed1

elBoberido added a commit that referenced this issue Oct 15, 2023

iox-#1361 Add heartbeat timestamp to shared memory

8d4b97d

elBoberido added a commit that referenced this issue Oct 15, 2023

iox-#1361 Squash me tests FixedPositionContainer

38fdcf4

elBoberido added a commit that referenced this issue Oct 15, 2023

iox-#1361 Update release notes

ee91ab8

elBoberido added a commit that referenced this issue Oct 15, 2023

iox-#1361 Add heartbeat timestamp to shared memory

5c4c3eb

elBoberido added a commit that referenced this issue Oct 23, 2023

iox-#1361 Add 'Heartbeat' class

5c41d9c

elBoberido added a commit that referenced this issue Oct 23, 2023

iox-#1361 Add 'HeartbeatPool'

520370a

elBoberido added a commit that referenced this issue Oct 23, 2023

iox-#1361 Add 'HeartbeatPool' to management segment

2d9685c

elBoberido added a commit that referenced this issue Oct 23, 2023

iox-#1361 Use 'Heartbeat' from shared memory

c29a8ee

elBoberido added a commit that referenced this issue Oct 23, 2023

iox-#1361 Remove obsolete code

39d68db

elBoberido added a commit that referenced this issue Oct 23, 2023

iox-#1361 Update release notes

6798135

elBoberido added a commit that referenced this issue Oct 24, 2023

iox-#1361 Add workaround for iox-#2055

6774499

elBoberido added a commit that referenced this issue Oct 24, 2023

iox-#1361 Increase time for heartbeat tests

d725812

elBoberido added a commit that referenced this issue Oct 25, 2023

iox-#1361 Add 'Heartbeat' class

0ba787d

elBoberido added a commit that referenced this issue Oct 25, 2023

iox-#1361 Add 'HeartbeatPool'

5a8d24e

elBoberido added a commit that referenced this issue Oct 25, 2023

iox-#1361 Add 'HeartbeatPool' to management segment

8a39f72

elBoberido added a commit that referenced this issue Oct 25, 2023

iox-#1361 Use 'Heartbeat' from shared memory

cbb478e

elBoberido added a commit that referenced this issue Oct 25, 2023

iox-#1361 Remove obsolete code

99c6878

elBoberido added a commit that referenced this issue Oct 25, 2023

iox-#1361 Update release notes

d79269d

elBoberido added a commit that referenced this issue Oct 25, 2023

iox-#1361 Add workaround for iox-#2055

e9c5917

elBoberido added a commit that referenced this issue Oct 25, 2023

iox-#1361 Increase time for heartbeat tests

c5bd28b

elBoberido added a commit that referenced this issue Oct 25, 2023

iox-#1361 Add 'Heartbeat' class

d6f5b22

elBoberido added a commit that referenced this issue Oct 25, 2023

iox-#1361 Add 'HeartbeatPool'

e8d7ed6

elBoberido added a commit that referenced this issue Oct 25, 2023

iox-#1361 Add 'HeartbeatPool' to management segment

dd4ffe6

elBoberido added a commit that referenced this issue Oct 25, 2023

iox-#1361 Use 'Heartbeat' from shared memory

e9a72cc

elBoberido added a commit that referenced this issue Oct 25, 2023

iox-#1361 Remove obsolete code

391a274

elBoberido added a commit that referenced this issue Oct 25, 2023

iox-#1361 Update release notes

09f48d3

elBoberido added a commit that referenced this issue Oct 25, 2023

iox-#1361 Add workaround for iox-#2055

0a9fd7e

elBoberido added a commit that referenced this issue Oct 25, 2023

iox-#1361 Increase time for heartbeat tests

2f92bb0

elBoberido added a commit that referenced this issue Oct 30, 2023

iox-#1361 Add 'Heartbeat' class

b3490ae

elBoberido added a commit that referenced this issue Oct 30, 2023

iox-#1361 Add 'HeartbeatPool'

4d44bde

elBoberido added a commit that referenced this issue Oct 30, 2023

iox-#1361 Add 'HeartbeatPool' to management segment

601761e

elBoberido added a commit that referenced this issue Oct 30, 2023

iox-#1361 Use 'Heartbeat' from shared memory

78eb496

elBoberido added a commit that referenced this issue Oct 30, 2023

iox-#1361 Remove obsolete code

ac2cf6e

elBoberido added a commit that referenced this issue Oct 30, 2023

iox-#1361 Update release notes

a9c0211

elBoberido added a commit that referenced this issue Oct 30, 2023

iox-#1361 Add workaround for iox-#2055

e9568a5

elBoberido added a commit that referenced this issue Oct 30, 2023

iox-#1361 Increase time for heartbeat tests

d0ec5ae

elBoberido added a commit that referenced this issue Oct 30, 2023

iox-#1361 Delete move/copy ctor and assignment operators from 'Heartb…

dda2e7f

…eat'

elBoberido added a commit that referenced this issue Nov 1, 2023

Merge pull request #2056 from eclipse-iceoryx/iox-1361-use-shared-mem…

6b43343

…ory-for-process-alive-detection iox-#1361 Use shared memory for process alive detection

elBoberido linked a pull request Nov 1, 2023 that will close this issue

iox-#1361 Use shared memory for process alive detection #2056

Merged

21 tasks

elBoberido mentioned this issue Feb 23, 2024

mutex owner died -> POPO__CHUNK_LOCKING_ERROR #2193

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve process is alive detection #1361

improve process is alive detection #1361

elfenpiff commented May 18, 2022 •

edited by elBoberido

Loading

mossmaurice commented May 19, 2022

elfenpiff commented May 19, 2022

elBoberido commented May 19, 2022

qclzdh commented Jun 20, 2022

improve process is alive detection #1361

improve process is alive detection #1361

Comments

elfenpiff commented May 18, 2022 • edited by elBoberido Loading

Brief feature description

Relates

mossmaurice commented May 19, 2022

elfenpiff commented May 19, 2022

elBoberido commented May 19, 2022

qclzdh commented Jun 20, 2022

elfenpiff commented May 18, 2022 •

edited by elBoberido

Loading