Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClientData SIGSEGV #5082

Closed
joecaraccio opened this issue Feb 12, 2023 · 8 comments · Fixed by #5101
Closed

ClientData SIGSEGV #5082

joecaraccio opened this issue Feb 12, 2023 · 8 comments · Fixed by #5101

Comments

@joecaraccio
Copy link

Describe the bug
Apologies if this is redundant. I posted about it on ChiefDelphi as well but am searching for a solution. I am seeing a SIGSEGV which crashes the whole RoboRio at the same frame every single time.

The Trace is:
`

A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0xa9ece0d4, pid=3095, tid=3130

JRE version: OpenJDK Runtime Environment (17.0.3.7) (build 17.0.3.7-frc+0-2023-17.0.5u7-1)

Java VM: OpenJDK Client VM (17.0.3.7-frc+0-2023-17.0.5u7-1, mixed mode, emulated-client, g1 gc, linux-arm)

Problematic frame:

C [libntcore.so+0x7d0d4] (anonymous namespace)::SImpl::SetValue((anonymous namespace)::ClientData*, (anonymous namespace)::TopicData*, nt::Value const&)+0x17c

No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

If you would like to submit a bug report, please visit:

https://bugreport.java.com/bugreport/crash.jsp

`

To Reproduce
Steps to reproduce the behavior:
I've rarely seen a bug like this. We had a long day of integration on our robot and around half way through the day this started to occur. It wasn't related to a code change (because when this started happening we reverted to more stable code changes!

There was some commentary on CD that perhaps it is related to an alternative connection. Looking at the error message it does seem like its sending data to some sort of client. As far as I'm aware we had 2 Instances of PhotonVision (Which I believe just talking through the Network Tables) and 2 Instances of Shuffleboard connected. That was it (unless somehow something was connected that I didn't realize). No crazy configurations or anything custom.

Expected behavior
A clear and concise description of what you expected to happen.
Robot sporadically crashes. No 'java level' runtime stack, instead the SIGSEGV error

Desktop (please complete the following information):

  • WPILib Version: 2023.3.2
  • OS: Windows 11
  • Java version 11

Additional context
I had just upgraded to the latest WPILIB version 2023.3.2 (think prior was on 2023.3.0 but wanted the fix from #5045 ). I am thinking if threre is a concern about stability I could move back. Part of me wonders if whatever issue I have others just haven't seen yet but I'm just speculation.

Here are 3 full log files from this occurrence.
https://www.chiefdelphi.com/uploads/short-url/68uM7kJvZQnJLcqpStjKKdgh0Nk.txt

https://www.chiefdelphi.com/uploads/short-url/caBSJP1NhoeHShjVKi5HZUcyVFr.txt

https://www.chiefdelphi.com/uploads/short-url/9wTaBJHKtVeWVumFAg9gDa1U9zt.txt

@PeterJohnson
Copy link
Member

Unfortunately I don’t have another fix. Once I get more info (as requested on CD), I can work to find and fix the issue. You can try downgrading in the meantime.

@joecaraccio
Copy link
Author

Theres been some speculation on discord with another team seeing this issue that its related to the quantity of data put on the NetworkTables. I guess removing DataLog usage helped one team.

I am using the NT pretty heavily (it should be able to support the amount of data) but I guess that could be it? Trace looks related to some sort of client.

@PeterJohnson
Copy link
Member

The location is different than the other crashes, which have been related to reconnects.

@joecaraccio
Copy link
Author

Oh I see what your saying

@joecaraccio
Copy link
Author

Not sure what to do then

@PeterJohnson
Copy link
Member

Actually it might be related after all. But to confirm we’d need to get a core dump.

@PeterJohnson
Copy link
Member

Can you try using the latest development build to see if it fixes the issue (instructions are here: https://github.com/wpilibsuite/allwpilib/blob/main/DevelopmentBuilds.md#development-build)

@sciencewhiz
Copy link
Contributor

Have you been able to test the development build?

@PeterJohnson PeterJohnson linked a pull request Feb 17, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants