Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent protocol errors, probably not terminated socket #124

Open
sarguez opened this issue Feb 26, 2024 · 1 comment
Open

Persistent protocol errors, probably not terminated socket #124

sarguez opened this issue Feb 26, 2024 · 1 comment

Comments

@sarguez
Copy link

sarguez commented Feb 26, 2024

Describe the bug
Hello, we encountered (in several issues now) a case where probably a socket gets stuck and the device cannot be initialized due to persistent protocol errors. This is what is seen in the log:

[/r2000_node 1708101306.555321]: Device found: R2000 [/r2000_node 1708101306.559998]: protocol error: 120 Invalid handle or no handle provided [/r2000_node 1708101306.568282]: protocol error: 333 Socket couldn't be created: Invalid argument [/r2000_node 1708101306.569719]: Connection refused [/r2000_node 1708101306.569787]: Unable to establish TCP connection [/r2000_node 1708101306.569838]: Unable to initialize device

Remarks:

  • Restarting the r2000_node doesn't help. It repeats the same log messages, starting with Device found: R2000
  • Powercycling the scanner itself doesn't help either. Restarting the device and the node in any order, and any number of times don't solve the problem.
  • Powercycling the entire robot (which powercycles both the scanner and the computer) solves the problem.

Since this is solved by powercycling the computer, it appears to me that it is some kind of lingering socket problem.
One thing we didn't try is waiting for +2 minutes in the hopes of kernel cleaning up the socket itself.

Another Finding:
We encountered this issue several times. In some of the cases, if we scroll up in the logs to the beginning of the issue, we see a Recv failure error. After spamming this log for some time, it gets into the state mentioned above after restarting the node.
Maybe this can give an idea about the root cause.
[/r2000_node 1708101240.116282]: HTTP ERROR: Empty reply from server [/r2000_node 1708101240.118622]: HTTP ERROR: Recv failure: Connection reset by peer [/r2000_node 1708101240.119891]: HTTP ERROR: Recv failure: Connection reset by peer [/r2000_node 1708101240.121579]: HTTP ERROR: Recv failure: Connection reset by peer [/r2000_node 1708101240.123527]: HTTP ERROR: Recv failure: Connection reset by peer

Environment (please complete the following information):
OS: Ubuntu 20.04.06 LTS
ROS Version: ROS Noetic

Sensor
Device: R2000
FW Version:"1.62"
HW Version:"1.72"

Additional context
We build the commit: 682a1fb

@sarguez
Copy link
Author

sarguez commented Mar 12, 2024

I am not sure but I think you need the SO_REUSEADDR option in your sockets to deal with this.

Currently, we have to powercycle the entire robot (which requires physical access) to fix this problem.
It would be a huge improvement if powercycling just the scanner worked (we can do this remotely.)
It seems to me that the connection is still refused when the scanner is restarted, because the socket on the computer side doesn't have this reuse address option. (or maybe another similar option)

It can be set with something like this using boost sockets.

    boost::asio::socket_base::reuse_address option(true);
    socket.set_option(option);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant