[2.0.x] Fix stepper/planner race condition, Stepper pulse timer#11081
[2.0.x] Fix stepper/planner race condition, Stepper pulse timer#11081thinkyhead merged 3 commits intoMarlinFirmware:bugfix-2.0.xfrom ejtagle:misc-fixes
Conversation
…ner, some of them pointed out by @AnHardt, some of my own findings
…l time required to set ports
|
Great analysis. Let's hope this does the trick and fixes some E axis anomalies too! |
|
@ejtagle update looks good, appears to fix minimum pulse length on all axes Unfortunately it doesn't seem to fix the odd planner behaviour |
|
@p3p : Unfortunately, there is still a race condition there, I knew it was, but there is no easy fix for it. The reasoning is simple: -The Planner does not take into consideration if a block is BUSY or not. That means that it could be using the executing block as reference (-- that is perfectly fine--), and then, at some point, that block is discarded (it just completed execution) and the next block becomes busy (because the planner has accessed it to use it as reference, but not modified it yet) The result is exactly the same as described above: Planner trying to join blocks, but not able to update the executing block... This problem is caused by the inherent race condition between the planner and the stepper. The planner just takes time to execute and it is not an option to disable the stepper ISR, as that would cause severe stutter. Excluding the last(s) blocks is also not the answer - It´s a game that does not always work, because there is no easy way to predict block consumption speed. There are 2 possibilities here:
If you want to detect the situation, just add at the end of Planner::calculate_trapezoid_for_block()
I must admit this race condition is mostly diabolically hard to reproduce if you try to do it by hand, as it requires specific and perfect timing to trigger it, but Pronterface seems to be the "specialist"... |
|
Hi! After merging this PR I got this warning: |
|
@skarcha : Yes, i know there is a warning. I have tried nearly everything to get rid of it, didn´t found the cause yet. It only happens in AVR... |
|
Some of |
|
Trust me @thinkyhead . The first thing i tried is exactly that. And the warning was still there. I suspect that, as we are using F_CPU, this is right now unsolvable (and the warning only happens in AVR GCC |
|
@ejtagle @thinkyhead. Ok, after some tests, I think the problem is "the minus sign", I mean: Unfortunately I don't know how to solve these situations, but I hope this helps... ;) |
It worked! aa4cd2e |
|
I see that this PR originally defined |



As pointed out by @AnHardt in #11047 , and measured by @p3p , there seems to be something wrong with the Planner, that somehow , when jogging the Pronterface controls, outputs a first movement block that properly accelerates and decelerates, and then a 2nd block that starts at maximum feed rate and then decelerates.
There is an strong suspicion that this is caused by a race condition between the Planner and the Stepper ISR, that goes like this: When there are few blocks queued, the planner starts the forward and backward passes. And, at some point, there is a chance the block entry and exit speed will be modified before tagging it as "RECALCULATE".
That allows the Stepper to "kidnap" the block and mark it as busy after the entry and exit speed was updated by the planner, but before the new stepper parameters of the block are updated in the calc_trapezoids() function.
As the block is marked as BUSY, the planner tries to calculate the new trapezoid parameters, but is unable to update the block.
But, as the block entry and exit speed IS updated, then the planner will use those updated entry and exit speeds to plan the rest of the blocks: The result is a 1st block that decelerates to 0 (but the planner has replanned it and it should NOT decelerate!) and the 2nd block uses that assumption and starts accelerated.
The fix is to mark the block as RECALCULATE before updating its entry and exit speeds, and not after as it was done previously. Marking the block as RECALCULATE inhibits the Stepper ISR using and "kidnappiing" the block, allowing proper update of the trapezoidal form.
@AnHardt pointed out 2 places, and I found out 2 other places where this race condition exists. So, hopefully this PR will fix the "Proterface jog losing steps issue"
The other thing found by @p3p doing actual measurements on hw is that the width of the Stepper pulses is a little bit short compared with the configured value. This was caused by the fact the program takes some time to toggle the STEP line, but the pulse timing is measured from the point the first STEP line is set, to the first STEP line that is reset.
That makes the last (typically the extruder stepper) to have a pulse shorter than the specified one.
I did disassemble the code to count the cycles that takes to SET the STEPPER line for both ARM and AVR targets, and now that time is also taken into account in the timing calculations, hopefully will ensure all pulse widths are at least the specified time long.