Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable D-Cache for Cortex-M7 #1222

Merged
merged 1 commit into from
Nov 10, 2024
Merged

Conversation

vishwamartur
Copy link
Contributor

@vishwamartur vishwamartur commented Nov 9, 2024

Closes #485

Enable D-Cache for Cortex-M7 devices.

  • Enable the D-Cache in src/modm/platform/core/cortex/startup.c.in by adding SCB_EnableDCache() after SCB_EnableICache().
  • Add a comment explaining the D-Cache enablement and the need for manual invalidation on certain operations.
  • Update the documentation in docs/src/reference/build-systems.md to reflect the D-Cache enablement for Cortex-M7 devices.
  • Add a note in the documentation about the need for manual invalidation on certain operations.

@salkinium
Copy link
Member

salkinium commented Nov 9, 2024

Mostly worried about our DMA code, but I think @chris-durand already enabled the D-Cache on M7?

Copy link
Member

@salkinium salkinium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DMA will probably fall over at some point, but that won't get fixed if it doesn't break.

@salkinium salkinium merged commit 28c87e4 into modm-io:develop Nov 10, 2024
12 checks passed
@chris-durand
Copy link
Member

chris-durand commented Nov 11, 2024

DMA will probably fall over at some point, but that won't get fixed if it doesn't break.

Sorry for not looking at this earlier. DMA is for sure broken on H7 with the D-Cache enabled if buffers are placed in cacheable memory regions.

@salkinium Could we make this an option? I'd rather avoid breaking working user code by default. I'm using an H723 with D-Cache, DMA and modm at work, but this is only possible yet with custom code.

I'm not even sure that there is a practical way to implement the appropriate cache maintenance operations in the peripheral drivers alone which will work with modm device drivers in their current state.

The granularity of cache maintenance operations is a 32-byte cache line. In case you have some device driver containing a small buffer it will share cache lines with other memory. There are lots of edge cases that will cause correctness issues when performing cache maintenance operations on those cache lines.

For example, write-back from cache to RAM can happen corrupting DMA data being written to RAM. Even if you clean and invalidate memory before the start of the DMA transaction any unrelated modification to the cache line during the DMA operation will fetch data from RAM again which can get evicted from cache, written-back and corrupt data written by DMA.

I'm not aware of a practical way to fix all of those issues in the general case without reserving exclusive cache lines for DMA buffers. That would be a non-trivial change to modm device drivers.

Others are struggling with the same issues. Zephyr also has no good solution to this problem. Another way to solve this is allocating DMA buffers exclusively in non-cacheable memory regions but that also wouldn't be enforceable with buffers inside modm device drivers you can instantiate anywhere.

This is clearly a non-trivial problem to solve and none we will fully fix now. In my opinion the default caching setting should either be safe to use with DMA, inhibit certain DMA use or at least warn the user. Of course there should be an option to override this if you know what you're doing and put buffers into non-cacheable memory, etc.

### Cache Initialization

For Cortex-M7 devices, both the I-Cache and D-Cache are enabled by default with
a write-through policy to significantly improve performance. However, it is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default SRAM cache policy on STM32 M7 devices is write-back write-allocate, not write-through.

From AN4839:
Screenshot_20241111_190138

Also write-through is broken on half of the H7 devices so we shouldn't change the code to enable it:
Screenshot_20241111_190712

@salkinium
Copy link
Member

Could we make this an option?

We could enable it only if the :platform:dma module is not present, otherwise issue a warning.
I'll open a PR and also fix the description.

I think I would prefer marking a part of SRAM as non-cachable with the MPU and having some kind of memcpy for small buffers or fast non-cachable block allocator for bigger buffers. Ideally in a way that's backwards compatible to in-place allocation (some template stuff or macro magic). I think that could work.

@chris-durand
Copy link
Member

I think I would prefer marking a part of SRAM as non-cachable with the MPU and having some kind of memcpy for small buffers or fast non-cachable block allocator for bigger buffers.

Keep in mind that some DMA units in H7s can't access all SRAMs, e.g. the BDMA on a H72x/3x is restricted to SRAM4.

@salkinium
Copy link
Member

Keep in mind that some DMA units in H7s can't access all SRAMs, e.g. the BDMA on a H72x/3x is restricted to SRAM4.

Hm ok, so the device driver will have to ask the DMA driver for some non-cachable memory. But then we could also use a cache-line aligned block allocator and dish out 32B blocks and manage the cache invalidation there without the MPU?

@chris-durand
Copy link
Member

Hm ok, so the device driver will have to ask the DMA driver for some non-cachable memory. But then we could also use a cache-line aligned block allocator and dish out 32B blocks and manage the cache invalidation there without the MPU?

Something like that could work. The cache can be managed if the DMA memory is properly aligned and isn't shared with anything else.

It seems to me the cache management operations are best handled in downstream peripheral drivers (like UART DMA, SPI DMA, I2S etc). Any more advanced access scheme would otherwise need to be hard-coded inside the DMA drivers. If one wanted to use UART DMA with circular mode and a half-transfer interrupt the DMA driver would need to include special handling for it to do the right cache flushing and invalidation on halves of the buffer. Same for double-buffering and other features.

Furthermore it would prevent implementing custom drivers e.g. with non-cacheable buffers in user code without copying and modifying the whole modm DMA implementation.

@salkinium salkinium added this to the 2024q4 milestone Dec 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Enable D-Cache for Cortex-M7
3 participants