-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Fix delay_ns cycle calculation #642
Conversation
Ah, here's what introduced the bug: #540. |
3025794
to
72087a5
Compare
I wrote a measurement example to see the difference between
|
Hm, another problem is that jumping into CCM RAM is done through a veener, which incurs 5-7 cycles for loading and pipeline stall. This is a little disappointing of GNU ld tbh, but I can see the issue with the limited jumping range of the blx instruction.
|
Well, the inlining depends heavily on the surrounding code and optimization opportunities and may yield something worse even:
|
2f7fe02
to
e2234e1
Compare
Ok, so here are my findings:
I've therefore decided on the following:
The result is a minimum cycle count of <20 for any Flash config, at any speed regardless of calling environment (ie. does not depend in inlining quality). I've tested this in hardware for most STM32 (I have no L0, F2, G4 hardware, but made educated guesses) and SAMD as well. Here is the delay_ns measurements for STM32F334K8, with a minimum cycle count of 15 or ~250ns.
I am very, very happy with this solution. I added an example for testing a bunch of settings. (cc @rleh Do you have L0, F2 or G4 hardware and could you test the example on that?) |
e06083f
to
0ca2d1b
Compare
Very nice, I have tested L0 and G4. I don't have F2 hardware. The boot clock for L0 is wrong, it boots on 2.097 MHz MSI, not HSI. Could you cherry-pick the fix? Results for G4:
Results for L0:
|
Thank you very much, @chris-durand! I've fixed the |
I added a NUCLEO-F091RC BSP, since I had that lying around. I've then used it to add Cortex-M0 compilation of the unit tests to the CI which is important for the delay, since I use |
Amazing :D |
FYI: I'm going to merge this tonight if there are no further reviews. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!
- .fastcode: always in RAM or instruction cache - .fastdata: always in RAM or data cache
Btw @rleh all jobs from travis-ci.org have been moved to travis-ci.com and it broke something in our ARM64 job. I assume it's just some missing setting, it's not super important for now. |
My fail. something with cmake handling has changed and the cmake file wasn't updated by running lbuild build |
There's a massive error in the algorithm since we muliply the nanoseconds with
modm::platform::delay_ns_per_loop
first for some reason. I'm honestly shocked at just how wrong this is?DWT->CYCCNT
orSysTick->VAL
CYCCNT
use to 32-bit instead of just 31-bit.modm::delay_us
by using binary scaling formodm::platform::delay_fcpu_MHz
.fastcode
section placement to always be in RAM or instruction cacheDWT->CYCCNT
.Fixes #641.
cc @XDjackieXD