You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In tests deliberately causing gRPC endpoints to fail, we found the EndpointWriter on the surviving node can get into an infinite loop.
The EndpointWriter is sent a Restarting message. In the restarting() method channel.shutdownNow() is called, but the lateinit channel is not yet initialized. This causes an exception in the restarting() method, which then causes the supervisor to force another restart, causing a loop using 100% CPU.
I think this may be caused if an exception is thrown in started() method. Then restarting() is called with an EndpointWriter that isn't correctly set up. I think the exception in started() could be caused by sending a message to a pid with empty address. I think this then could cause the infinite loop observed.
I have confirmed that an empty pid was causing this issue.
Question remains on the correct behaviour when a non existent remote pid is used when sending a message. An EndpointWriter will be created that currently throws an exception in Started, causing an infinite retry loop.
I think the answer might be a supervision strategy that only allows for a limited number of Restarts. Is that implemented anywhere else?
In tests deliberately causing gRPC endpoints to fail, we found the EndpointWriter on the surviving node can get into an infinite loop.
The EndpointWriter is sent a Restarting message. In the restarting() method channel.shutdownNow() is called, but the lateinit channel is not yet initialized. This causes an exception in the restarting() method, which then causes the supervisor to force another restart, causing a loop using 100% CPU.
https://github.com/AsynkronIT/protoactor-kotlin/blob/2234df6fdc5cf4175624ec3f1632de72f718bcc0/proto-remote/src/main/kotlin/actor/proto/remote/EndpointWriter.kt#L67
The text was updated successfully, but these errors were encountered: