-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPI overflow in CAN bus HAT communication - IRQ handler mcp251xfd_handle_tefif() #6644
Comments
Attached is a code snippet that seems to replicate the bug (in my setup). The longer it runs, the more overflows occur, although I'm not 100% sure, as it's difficult to gather statistics.
|
Can you please copy https://github.com/linux-can/can-utils/tree/master/mcp251xfd/99-devcoredump.rules to If the IRQ handler fails with an error, the driver will generate a dump of the driver and chip state and write it to |
i also discovered this problem. I'm on the latest available kernel it dont happens every time, it randomly occurs. also the driver feels a bit unstable currently. in the most of the time I randomly face some "timeouts" where im unable to send or read. but not every time I get IRQ handler fails. |
Did it without success, no file has been generated, but the IRQ handler message is present in dmesg | grep can filogold@raspberrypi:/etc/udev/rules.d $ ls
filogold@raspberrypi:/usr/sbin $ ls
filogold@raspberrypi:/usr/sbin $ ifconfig
filogold@raspberrypi:~ $ dmesg | grep can
|
Did you make chmod +x /usr/sbin/devcoredump |
Describe the bug
Hi everyone,
I'm encountering an issue with the IRQ handler. When I simulate multiple periodic messages over the CAN bus, at a certain point, I suspect that SPI communication fails to properly manage data exchange with the two HAT modules(https://www.waveshare.com/wiki/2-CH_CAN_FD_HAT) I am using.
I updated the kernel to the latest version (see below), which seemed to improve the situation slightly. However, it only mitigated the buffer issue by delaying the occurrence of the fault rather than fully resolving it.
My setup consists of a Raspberry Pi 5 with two CAN HAT 2CH FD modules, each using the MCP251XFD chip.
To ensure the problem is related to the transmitter and not the receiver, I tested two different scenarios:
Using two channels of the Raspberry Pi to transmit and the remaining two channels to receive (all configured with the same parameters: bitrate, data rate, and sampling time).
Using two channels of the Raspberry Pi to transmit and two channels of a Vector VN1640 (with the 1057Gcap installed) to receive. Also in this case, I carefully verified the CAN configuration.
In both cases the termination are correctly verified, providing 60 ohms in the buses.
I am aware that I am stressing the module, as I am sending multiple periodic messages (128 periodic messages per CAN FD, generating a 40% bus load). The messages require some processing to calculate the internal CRC and MC in the payload, which is correctly handled by the Python code.
Based on the information provided, the issue appears to be related to overrun packets. I suspect this is the cause of the failure.
ifconfig:
The information I provided above is essentially the same as in the second case, where I use the Vector hardware as the receiver. One observation I have made is that after a bus failure, the overruns seem to stop or decrease. However, after a certain period (several hours), at some point, the second bus also fails.
Thanks in advance
Steps to reproduce the behaviour
I can't find a piece of code that fully replicates the bug. I wrote a program that sends periodic messages, which generates some overruns, but not as many as my main code.
I use Bluetooth serial communication to share information about the message I want to simulate, including the ID, initial payload, and CAN bus settings. This communication is not constant.
I could try running the code without using Bluetooth, but I still need to work on it.
How I Initialize the bus:
To run a periodic message, I create a thread using the library function:
The message object is defined as follow:
Device (s)
Raspberry Pi 5
System
uname -a:
modinfo can:
modinfo mcp251xfd:
Logs
dmesg | grep can:
Additional context
@marckleinebudde
Thanks in advance
The text was updated successfully, but these errors were encountered: