Don't call __delay_4cycles for 0 cycle delay - takes a really long time.#11586
Don't call __delay_4cycles for 0 cycle delay - takes a really long time.#11586thinkyhead merged 5 commits intoMarlinFirmware:bugfix-2.0.xfrom
Conversation
|
I am not sure if this is the right thing to do: In fact, translating a 20ns to a 0ns delay is also not right. |
|
If the delay is so short it takes 0 (<4) cycles then it might stand to reason even the comparison to 0 was sufficient so there's no need to round up. Was this just introduced? 20ns will be less than 4 cycles on most (all?) current arm platforms. |
|
Btw. It looks like there is a similar issue with the AVR code as well. It does not look like new code (~2 Months or so). In most places DELAY_NS is called with a constant value. On my system (STM32) F_CPU is not constant and __builtin_constant_p(x) in DELAY_CYCLES does not evaluate to true, exposing the issue. If you look at the code in the __builtin_constant_p(x) == true code path there is a similar guard against calling __delay_4cycles with 0 value. |
|
Can you point out the lines in 1.1.x where the problem exists? |
Requirements
Calling DELAY_NS for small values takes really long time on ARM cpu.
Description
Stepper::set_directions was taking forever (more than 20 seconds). I traced the culprit to this line:
DELAY_NS(MINIMUM_STEPPER_DIR_DELAY);
MINIMUM_STEPPER_DIR_DELAY was set to 20. On my system F_CPU was 180Mhz, this was expanding the macro to DELAY_CYCLES(3).
In turn the ARM implementation was calling __delay_4cycles(x/4) which was computed as __delay_4cycles(0).
The loop in __delay_4cycles decrements first and checks for 0 later, making __delay_4cycles(0) actually run for 2^32 cycles.
Benefits
__delay_4cycles(0) would return immediately.
Related Issues