Persistent protocol errors, probably not terminated socket #124

sarguez · 2024-02-26T10:05:54Z

Describe the bug
Hello, we encountered (in several issues now) a case where probably a socket gets stuck and the device cannot be initialized due to persistent protocol errors. This is what is seen in the log:

[/r2000_node 1708101306.555321]: Device found: R2000 [/r2000_node 1708101306.559998]: protocol error: 120 Invalid handle or no handle provided [/r2000_node 1708101306.568282]: protocol error: 333 Socket couldn't be created: Invalid argument [/r2000_node 1708101306.569719]: Connection refused [/r2000_node 1708101306.569787]: Unable to establish TCP connection [/r2000_node 1708101306.569838]: Unable to initialize device

Remarks:

Restarting the r2000_node doesn't help. It repeats the same log messages, starting with Device found: R2000
Powercycling the scanner itself doesn't help either. Restarting the device and the node in any order, and any number of times don't solve the problem.
Powercycling the entire robot (which powercycles both the scanner and the computer) solves the problem.

Since this is solved by powercycling the computer, it appears to me that it is some kind of lingering socket problem.
One thing we didn't try is waiting for +2 minutes in the hopes of kernel cleaning up the socket itself.

Another Finding:
We encountered this issue several times. In some of the cases, if we scroll up in the logs to the beginning of the issue, we see a Recv failure error. After spamming this log for some time, it gets into the state mentioned above after restarting the node.
Maybe this can give an idea about the root cause.
[/r2000_node 1708101240.116282]: HTTP ERROR: Empty reply from server [/r2000_node 1708101240.118622]: HTTP ERROR: Recv failure: Connection reset by peer [/r2000_node 1708101240.119891]: HTTP ERROR: Recv failure: Connection reset by peer [/r2000_node 1708101240.121579]: HTTP ERROR: Recv failure: Connection reset by peer [/r2000_node 1708101240.123527]: HTTP ERROR: Recv failure: Connection reset by peer

Environment (please complete the following information):
OS: Ubuntu 20.04.06 LTS
ROS Version: ROS Noetic

Sensor
Device: R2000
FW Version:"1.62"
HW Version:"1.72"

Additional context
We build the commit: 682a1fb

The text was updated successfully, but these errors were encountered:

sarguez · 2024-03-12T15:04:32Z

I am not sure but I think you need the SO_REUSEADDR option in your sockets to deal with this.

Currently, we have to powercycle the entire robot (which requires physical access) to fix this problem.
It would be a huge improvement if powercycling just the scanner worked (we can do this remotely.)
It seems to me that the connection is still refused when the scanner is restarted, because the socket on the computer side doesn't have this reuse address option. (or maybe another similar option)

It can be set with something like this using boost sockets.

    boost::asio::socket_base::reuse_address option(true);
    socket.set_option(option);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persistent protocol errors, probably not terminated socket #124

Persistent protocol errors, probably not terminated socket #124

sarguez commented Feb 26, 2024

sarguez commented Mar 12, 2024

Persistent protocol errors, probably not terminated socket #124

Persistent protocol errors, probably not terminated socket #124

Comments

sarguez commented Feb 26, 2024

sarguez commented Mar 12, 2024