Skip to content

Don't call __delay_4cycles for 0 cycle delay - takes a really long time.#11586

Merged
thinkyhead merged 5 commits intoMarlinFirmware:bugfix-2.0.xfrom
ghent360:pr-vefremov-1
Aug 19, 2018
Merged

Don't call __delay_4cycles for 0 cycle delay - takes a really long time.#11586
thinkyhead merged 5 commits intoMarlinFirmware:bugfix-2.0.xfrom
ghent360:pr-vefremov-1

Conversation

@ghent360
Copy link
Contributor

Requirements

Calling DELAY_NS for small values takes really long time on ARM cpu.

Description

Stepper::set_directions was taking forever (more than 20 seconds). I traced the culprit to this line:
DELAY_NS(MINIMUM_STEPPER_DIR_DELAY);

MINIMUM_STEPPER_DIR_DELAY was set to 20. On my system F_CPU was 180Mhz, this was expanding the macro to DELAY_CYCLES(3).

In turn the ARM implementation was calling __delay_4cycles(x/4) which was computed as __delay_4cycles(0).

The loop in __delay_4cycles decrements first and checks for 0 later, making __delay_4cycles(0) actually run for 2^32 cycles.

Benefits

__delay_4cycles(0) would return immediately.

Related Issues

@ejtagle
Copy link
Contributor

ejtagle commented Aug 18, 2018

I am not sure if this is the right thing to do: In fact, translating a 20ns to a 0ns delay is also not right.
Perhaps, we should ALSO round UP

if (x = (x+3)/4) { ...

@p3p
Copy link
Member

p3p commented Aug 18, 2018

If the delay is so short it takes 0 (<4) cycles then it might stand to reason even the comparison to 0 was sufficient so there's no need to round up. Was this just introduced? 20ns will be less than 4 cycles on most (all?) current arm platforms.

@thinkyhead thinkyhead merged commit 00d24a8 into MarlinFirmware:bugfix-2.0.x Aug 19, 2018
@ghent360
Copy link
Contributor Author

Btw. It looks like there is a similar issue with the AVR code as well.

It does not look like new code (~2 Months or so). In most places DELAY_NS is called with a constant value.

On my system (STM32) F_CPU is not constant and __builtin_constant_p(x) in DELAY_CYCLES does not evaluate to true, exposing the issue.

If you look at the code in the __builtin_constant_p(x) == true code path there is a similar guard against calling __delay_4cycles with 0 value.

@thinkyhead
Copy link
Member

Can you point out the lines in 1.1.x where the problem exists?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants