Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STM32 FDCAN: Get all messages from FIFO in RX ISR #1017

Merged
merged 4 commits into from
May 13, 2023

Conversation

ser-plu
Copy link

@ser-plu ser-plu commented May 9, 2023

  1. In cases, when a new message comes during RX ISR execution, it would not be processed until a next message arrives. Critical for request-response protocols.
  2. The header file missed a dependency

@rleh rleh added this to the 2023q2 milestone May 9, 2023
@rleh rleh requested review from rleh and chris-durand May 9, 2023 14:33
Copy link
Member

@rleh rleh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this up!
I had noticed this code might be buggy when I wrote the SAM MCAN driver (based on this driver), but then forgot about it.

src/modm/platform/can/stm32-fdcan/can.cpp.in Outdated Show resolved Hide resolved
@rleh rleh self-requested a review May 9, 2023 15:32
@ser-plu ser-plu force-pushed the stm32-fdcan-isr-fix branch from 66ef90e to e7021c9 Compare May 9, 2023 15:56
@ser-plu
Copy link
Author

ser-plu commented May 9, 2023 via email

@rleh rleh force-pushed the stm32-fdcan-isr-fix branch from 629cd2a to 78a9435 Compare May 9, 2023 17:43
@rleh
Copy link
Member

rleh commented May 9, 2023

I implemented @chris-durand|s idea and also added a commit that fixes the fdcan unit test (see #1013).
The unittest still works...

@rleh
Copy link
Member

rleh commented May 9, 2023

The unittest still works...

... but not with the modification (7bede35) that actually triggers our corner case:

>>> LUDecomposition_test␊
>>> ad7280a_test␊
>>> angle_test␊
>>> arithmetic_traits_test␊
>>> atomic_queue_test␊
>>> bit_operation_test␊
>>> block_allocator_test␊
>>> bme280_test␊
>>> bmp085_test␊
>>> bounded_deque_test␊
>>> bounded_queue_test␊
>>> bounded_stack_test␊
>>> button_group_test␊
>>> button_test␊
>>> can_bit_timings_test␊
>>> can_connector_base_test␊
>>> can_connector_test␊
>>> can_lawicel_formatter_test␊
>><0x11>G<break x 002>
h<0x04><0x04>edddd<break x 005>
t<0xc7><0x16>>> circle_2d_test␊
>>> clock_test␊
>>> color_test␊
>>> cxa_guard_test␊
>>> debounce_test␊
>>> delay_test␊
<0x11>G<break x 002>
h<0x01><0x01>piddd<break x 005>
s,<0x17>>>> doubly_linked_list_test␊
>>> drv832x_<0x11>G<break x 002>
x<0x07><0x07>toooooooo<break>
m<0xd8>'_array_test␊
>>> endi<0x11>G<break x 002>
x<0x03><0x03>t{{{o<break x 005>
Gg>h<0x02><0x02>ezzz

WTF?

@rleh rleh linked an issue May 9, 2023 that may be closed by this pull request
@ser-plu
Copy link
Author

ser-plu commented May 10, 2023

I implemented @chris-durand|s idea and also added a commit that fixes the fdcan unit test (see #1013). The unittest still works...

I changed the code to use ILE register and adjusted the IRQ disable logic.

@chris-durand
Copy link
Member

@rleh The modified testBuffer() unit test case writes 32 bytes into an 8 byte message data buffer:

for (uint_fast16_t i = 0; i <= numberOfMsgs; ++i) {
    message.setLength(i % 8);
    for (uint_fast8_t dataIndex = 0; dataIndex < i; ++dataIndex) {
        message.data[dataIndex] = i;
    }
    Fdcan1::sendMessage(message);
}

dataIndex goes up to numberOfMsgs - 1 which is 31. No wonder that this is crashing.

@ser-plu
Copy link
Author

ser-plu commented May 11, 2023

I fixed the CAN tests as well

@ser-plu ser-plu force-pushed the stm32-fdcan-isr-fix branch from da5ebc5 to d98598c Compare May 11, 2023 13:17
@rleh
Copy link
Member

rleh commented May 11, 2023

@salkinium What's wrong with the CI now?
It fails during the step Test run of docs.modm.io-generator-script with KeyboardInterrupt exception in the SpawnPoolWorker:

Click to view log
[...]
Process SpawnPoolWorker-4:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/__w/modm/modm/tools/scripts/docs_modm_io_generator.py", line 219, in create_target
    symlinks[hash(kfile.read_bytes())].append(kfile)
  File "/usr/lib/python3.8/pathlib.py", line 1229, in read_bytes
    with self.open(mode='rb') as f:
  File "/usr/lib/python3.8/pathlib.py", line 1222, in open
    return io.open(self, mode, buffering, encoding, errors, newline,
KeyboardInterrupt
Traceback (most recent call last):
  File "tools/scripts/docs_modm_io_generator.py", line 143, in main
    results = pool.map(create_target, devices)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 765, in get
    self.wait(timeout)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 762, in wait
    self._event.wait(timeout)
  File "/usr/lib/python3.8/threading.py", line 558, in wait
    signaled = self._cond.wait(timeout)
  File "/usr/lib/python3.8/threading.py", line 302, in wait
    waiter.acquire()
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tools/scripts/docs_modm_io_generator.py", line 143, in main
    results = pool.map(create_target, devices)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 736, in __exit__
    self.terminate()
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 654, in terminate
    self._terminate()
  File "/usr/lib/python3.8/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 692, in _terminate_pool
    cls._help_stuff_finish(inqueue, task_handler, len(pool))
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 675, in _help_stuff_finish
    time.sleep(0)
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tools/scripts/docs_modm_io_generator.py", line 261, in <module>
    exit(main())
  File "tools/scripts/docs_modm_io_generator.py", line 161, in main
    return results.count(False)
  File "/usr/lib/python3.8/tempfile.py", line 966, in __exit__
    self.cleanup()
  File "/usr/lib/python3.8/tempfile.py", line 970, in cleanup
    self._rmtree(self.name)
  File "/usr/lib/python3.8/tempfile.py", line 952, in _rmtree
    _rmtree(name, onerror=onerror)
  File "/usr/lib/python3.8/shutil.py", line 718, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/usr/lib/python3.8/shutil.py", line 655, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/usr/lib/python3.8/shutil.py", line 655, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/usr/lib/python3.8/shutil.py", line 655, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  [Previous line repeated 3 more times]
  File "/usr/lib/python3.8/shutil.py", line 673, in _rmtree_safe_fd
    os.unlink(entry.name, dir_fd=topfd)
KeyboardInterrupt

@salkinium
Copy link
Member

with KeyboardInterrupt exception

Wow, impressive. Maybe the NSA guy assigned to us got annoyed by all the dots, who knows…

Multiprocessing in Python is just utterly broken. It's an underlying issue of fork() vs. globals initialization, that cannot really be fixed. You can replace it with Threadpool + subprocess.run, which effectively does the same, but with a controlled entry point back into the script and thus properly initialized globals.

@chris-durand
Copy link
Member

chris-durand commented May 11, 2023

Running the hardware unit test still fails. I'm currently investigating the issue.

@salkinium salkinium force-pushed the stm32-fdcan-isr-fix branch from a6e4125 to 32e3dee Compare May 11, 2023 20:27
@chris-durand
Copy link
Member

I have found some issues with the TX handling. sendMsg() was checking for the software queue to be not full before sending instead of the hardware buffers. Also there was a race condition. sendMsg() could be called concurrently from sendMessage() and the TX interrupt handler. Both could write to the same TX buffer. The solution was to disable CAN interrupts in sendMessage(). Now the hardware test succeeds.

@chris-durand chris-durand force-pushed the stm32-fdcan-isr-fix branch from eea2857 to 1d36b44 Compare May 12, 2023 08:31
@salkinium
Copy link
Member

(There's still a race-condition in the docs generator deduplication code, I'll look at it on Sunday)

@salkinium
Copy link
Member

Ok, I've fixed the docs generator, the PR now needs to be rebased and squashed.

@chris-durand chris-durand force-pushed the stm32-fdcan-isr-fix branch from 63712e7 to 9d33843 Compare May 13, 2023 21:00
@chris-durand chris-durand added the ci:hal Triggers the exhaustive HAL compile CI jobs label May 13, 2023
@chris-durand chris-durand merged commit 9d33843 into modm-io:develop May 13, 2023
@rleh rleh removed the ci:hal Triggers the exhaustive HAL compile CI jobs label May 14, 2023
@ser-plu ser-plu deleted the stm32-fdcan-isr-fix branch May 17, 2023 06:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Unittest failing fdcan_test on Nucleo-G474RE
4 participants