Skip to content

Conversation

@william-tang914
Copy link
Contributor

Fix message buffer (MB) overrun when CAN FD enabled by dynamically calculate the number of MB based on payload size and CAN instance capability. Maximum number of MB is less than classical CAN when CAN FD enabled.

Add new data structure member 'tx_mb_number' and 'rx_mb_number', which reflect real TX MB and RX MB per FlexCAN instance can accommodate.

Use 'tx_mb_number' and 'rx_mb_number' to define maximum semaphore count and traverse atomic bitmap, ensure safe concurrent access to shared resources on different FlexCAN instance.

This commit assumes 'tx_mb_number' less than or equal to 'MCUX_FLEXCAN_MAX_TX'. For S32 platform, it requires suitable Kconfig symbol 'CAN_MAX_MB' value to ensure above relation. MCUX platform can ignore this limit.

Fixes #92798

@github-actions
Copy link

Hello @william-tang914, and thank you very much for your first pull request to the Zephyr project!
Our Continuous Integration pipeline will execute a series of checks on your Pull Request commit messages and code, and you are expected to address any failures by updating the PR. Please take a look at our commit message guidelines to find out how to format your commit messages, and at our contribution workflow to understand how to update your Pull Request. If you haven't already, please make sure to review the project's Contributor Expectations and update (by amending and force-pushing the commits) your pull request if necessary.
If you are stuck or need help please join us on Discord and ask your question there. Additionally, you can escalate the review when applicable. 😊

@zejiang0jason
Copy link
Contributor

This commit assumes 'tx_mb_number' less than or equal to 'MCUX_FLEXCAN_MAX_TX'. For S32 platform, it requires suitable Kconfig symbol 'CAN_MAX_MB' value to ensure above relation. MCUX platform can ignore this limit.

From the MCUX_FLEXCAN_MAX_MB implementation, seems CAN_MAX_MB must be defined for the SOCs which have different MB number by FlexCAN instance, because FSL_FEATURE_FLEXCAN_HAS_MESSAGE_BUFFER_MAX_NUMBERn(0) returns -1 for them, not only S32.

#ifdef CONFIG_CAN_MAX_MB
#define MCUX_FLEXCAN_MAX_MB CONFIG_CAN_MAX_MB
#else
#define MCUX_FLEXCAN_MAX_MB FSL_FEATURE_FLEXCAN_HAS_MESSAGE_BUFFER_MAX_NUMBERn(0)
#endif

@william-tang914
Copy link
Contributor Author

Hi @zejiang0jason

For the SOCs which have different MB number by FlexCAN instance, user should configure CAN_MAX_MB by themselves.
This update was introduced in #60200.

I insert a guard clause for MCUX_FLEXCAN_MAX_RX overflow real accommodated MB max_mb.

if (max_mb < MCUX_FLEXCAN_MAX_RX) {
LOG_ERR("Not enough message buffers for RX");
return -ENODEV;
}

Another guard clause for tx_mb_number overflow MCUX_FLEXCAN_MAX_TX, which will result traverse atomic bitmap out of boundary.

if (data->tx_mb_number > MCUX_FLEXCAN_MAX_TX) {
LOG_ERR("Not enough message buffers for TX");
return -ENODEV;
}

dleach02
dleach02 previously approved these changes Sep 17, 2025
Copy link
Member

@dleach02 dleach02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@henrikbrixandersen can you provide a review. This LGTM

Copy link
Contributor

@str4t0m str4t0m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't test it on hardware, but I think I needs a few changes to be able to run on hw where instances have a different number of message buffers.

data->tx_mb_number = max_mb - MCUX_FLEXCAN_MAX_RX;
data->rx_mb_number = MCUX_FLEXCAN_MAX_RX;

if (data->tx_mb_number > MCUX_FLEXCAN_MAX_TX) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As tx_mb_number is scaled for canfd I think we need to adopt the MCUX_FLEXCAN_MAX_TX as well don't we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that MCUX_FLEXCAN_MAX_TX is static, unless update MCUX_FLEXCAN_MAX_MB by Kconfig.

return -ENODEV;
}

data->tx_mb_number = max_mb - MCUX_FLEXCAN_MAX_RX;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here we will always use all physically available mb of the CAN instance. Doesn't that make the CONFIG_CAN_MAX_MB kind of unneeded, as it will also have to be set to the max value to avoid getting an error in the following tx_mb_number check. And consequently I think it makes it impossible to use two CAN instances of differing max mb number, as we always assign the same value to flexcan_config.maxMbNum(reg MCR.MAXMB).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current implementation, CONFIG_CAN_MAX_MB is used to statically define the size of tx_cbs and tx_allocs via MCUX_FLEXCAN_MAX_TX. If we remove this macro, we can no longer rely on compile-time constants for these arrays, and would need to switch to dynamic initialization instead.

While dynamic allocation is technically feasible, I'm not sure if it's appropriate to introduce such a change within this pull request, as it affects memory layout and initialization logic. It might be better to keep the static allocation for now, or handle the transition to dynamic allocation in a separate PR to keep changes scoped and easier to review.

ATOMIC_DEFINE(rx_allocs, MCUX_FLEXCAN_MAX_RX);
struct k_mutex rx_mutex;
struct mcux_flexcan_rx_callback rx_cbs[MCUX_FLEXCAN_MAX_RX];
ATOMIC_DEFINE(tx_allocs, MCUX_FLEXCAN_MAX_TX);
struct k_sem tx_allocs_sem;
struct k_mutex tx_mutex;
struct mcux_flexcan_tx_callback tx_cbs[MCUX_FLEXCAN_MAX_TX];

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update flexcan_config.maxMbNum = MCUX_FLEXCAN_MAX_MB to flexcan_config.maxMbNum = max_mb.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't want go toward dynamic allocation either.
My concern is just that in case FSL_FEATURE_FLEXCAN_HAS_MESSAGE_BUFFER_MAX_NUMBERn(x) and MCUX_FLEXCAN_MAX_MB differ we might use more Buffers than configured in MCR.MAXMB.
And the following tx_mb_number check will not be triggered in FD mode.
I'll have to check this in more detail and will try to clarify what I meant, when I have more time, hopefully in the evening.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @str4t0m , let me introduce intent of the Pull Request.

The maximum number of FlexCAN message buffers (mailboxes) is determined by the following factors:

  1. Whether CAN FD is enabled.
  2. If CAN FD is enabled, the amount of memory allocated per mailbox (in this case, 64 bytes).
  3. The number of memory blocks available to each FlexCAN instance. Each instance's memory blocks is different on some SoCs (e.g., the S32K3 series).

Based on these three factors, I calculate the actual maximum number of mailboxes for each FlexCAN instance. Then, using the number of receive mailboxes configured via Kconfig, I compute the remaining available transmit mailboxes. Finally, I assign the actual maximum mailbox count to MCR[MAXMB].(will update)

The check max_mb < MCUX_FLEXCAN_MAX_RX ensures that the user does not configure more receive mailboxes than the hardware can support.

The check data->tx_mb_number > MCUX_FLEXCAN_MAX_TX prevents out-of-bounds access when iterating over the tx_cbs array. Expanding this condition leads to max_mb > MCUX_FLEXCAN_MAX_MB, which can only occur if:

  1. The number of memory blocks differs between FlexCAN instances, and
  2. The user incorrectly sets MCUX_FLEXCAN_MAX_MB via Kconfig.

This PR aims to make the mailbox allocation more robust and adaptable to different SoC configurations, while also preventing misconfiguration-related issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fully resolve this issue, the driver would need to dynamically calculate the actual maximum number of mailboxes for each FlexCAN instance, or alternatively, this information could be provided via the device tree.

I think each SoC should disclose the amount of mailboxes for each FlexCAN instance in the SoC device tree. Similar how it's done for the FIFO sizes for LPSPI.
https://docs.zephyrproject.org/latest/build/dts/api/bindings/spi/nxp%2Clpspi.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, device tree shall provide memory blocks each FlexCAN instance support, then calculate amount of mailboxes in driver. Payload size determines maximum number of message buffers per RAM block.
I'm not sure if it's appropriate to introduce device tree update in this PR considering granularity.
Screenshot 2025-07-30 144023

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been meaning to introduce dts properties for this for quite some time (it's also needed by the Zephyr-native FlexCAN driver, I am working on), but I have not gotten around to it yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @henrikbrixandersen , I'm glad you are trying to add mailboxes property for Zephyr-native CAN driver. Would you mind sharing me your plan or timeline to fix this issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @henrikbrixandersen , shall we apply a workaround before introduce mailbox to device tree?
On many platforms, there are only 7 mailboxes for CAN FD. So this pull request is necessary for those platfroms. Thanks a lot.
#91138

Fix message buffer (MB) overrun when CAN FD enabled by dynamically
calculate the number of MB based on payload size and CAN instance
capability. Maximum number of MB is less than classical CAN when
CAN FD enabled.

Add new data structure member 'tx_mb_number' and 'rx_mb_number',
which reflect real TX MB and RX MB per FlexCAN instance can
accommodate.

Use 'tx_mb_number' and 'rx_mb_number' to define maximum semaphore
count and traverse atomic bitmap, ensure safe concurrent access to
shared resources on different FlexCAN instance.

This commit assumes 'tx_mb_number' less than or equal to
'MCUX_FLEXCAN_MAX_TX'. For S32 platform, it requires suitable Kconfig
symbol 'CAN_MAX_MB' value to ensure above relation.
MCUX platform can ignore this limit.

Fixes zephyrproject-rtos#92798

Signed-off-by: William Tang <[email protected]>
@sonarqubecloud
Copy link

@dleach02
Copy link
Member

What is the plan for this PR

@dleach02 dleach02 added the bug The issue is a bug, or the PR is fixing a bug label Oct 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: CAN bug The issue is a bug, or the PR is fixing a bug platform: NXP Drivers NXP Semiconductors, drivers platform: NXP NXP

Projects

None yet

Development

Successfully merging this pull request may close these issues.

can: can_mcux_flexcan.c allows a mailbox overrun resulting in an imprecise data bus fault

8 participants