Skip to content

[2.0.x] 6th-order jerk-controlled motion planning in real-time#10337

Merged
thinkyhead merged 2 commits intoMarlinFirmware:bugfix-2.0.xfrom
ejtagle:bugfix-2.0.x
Apr 8, 2018
Merged

[2.0.x] 6th-order jerk-controlled motion planning in real-time#10337
thinkyhead merged 2 commits intoMarlinFirmware:bugfix-2.0.xfrom
ejtagle:bugfix-2.0.x

Conversation

@ejtagle
Copy link
Contributor

@ejtagle ejtagle commented Apr 7, 2018

Only for 32bit CPUs as AVR simply does not have enough processing power for this.

As explained here (https://github.com/synthetos/TinyG/wiki/Jerk-Controlled-Motion-Explained), this PR modifies the Marlin planner so it plans movements using a 6th-order jerk-controlled motion.

The idea is "simple": We still use the Trapezoidal profile to "coalesce" movements, and we do it exactly the way it was done previously.

But, in the stepper ISR, instead of estimating the time to the next step as the inverse of the speed, and the instantaneous speed as a linear function of time and acceleration, what we do with the new algorithm is to fit a bezier curve to that trapezoidal shape.

The bezier curve warrants us that the first derivative of velocity starts and ends as 0, the 2nd derivative of speed (=jerk) also starts and ends as 0.

Then, we evaluate the bezier curve at realtime (yay!! :D) and compute instanteous speed as a function of time. That speed is used to compute time to the next ISR.

The result is mostly incredible: There is no vibration of the machine itself while printing. You have to see in action to believe it! :D

Now the bad news: To be able to evaluate in realtime, i had to convert the coefficients to fixed point (no problem there) and using specific instructions available on all ARM Cortex M3/M4 CPU, that allow us to perform 32x32 to 64bit multiplications in just 5 cycles, i was able to evaluate the bezier curve in just 40clock cycles.

The bad news is this evaluation is impossible to perform in realtime in AVR (5 multiplications of 32x32bits ... probably more than 200 cycles... I really doubt AVR is able to perform it. The generic code is there, just in case anyone wants to try it...)

To enable this new jerk-free planner, just define USE_JERK_CONTROL in configuration.h and recompile. There are no other changes required, and try it.

There is an small caveat here: The new planner could exceed the maximum acceleration configured for the machine temporarily: That is not a problem, as the change in acceleration is extremely slow compared to the previous planner.

Well, as this change is very complex to evaluate, i ask people to try it and comment on their experiences, and i am open to suggestions here.

As a curiosity, the code is NOT the one used by Synthetos. They use a fixed rate Step ISR, and that allows them to use a forward differencing technique to evaluate the bezier curve. We can´t use that method, so there is no way around evaluating the Bezier with the full formula!

This PR probably could address FR #6193, at least for 32bit CPUs

On AVR, assuming 90 cycles per 32x32->64 multiplication, we need to perform 8 operations, to evaluate the bezier curve would take 720 cycles. That would, by itself, limit the step ISR rate to 22khz... And the ISR need to do much more than that.. Anyone willing to write an ASM version of the _calc_bezier_curve_coeffs function for AVR ? (as a base we could use http://www.vfx.hu/avr/download/mult32.asm) -- Right now there is generic code written in C that should work, but i don´t know if it will run in realtime or not...

@ejtagle ejtagle changed the title 6th-order jerk-controlled motion planning in real-time [2.0.x] 6th-order jerk-controlled motion planning in real-time Apr 7, 2018
@thinkyhead
Copy link
Member

Pretty good stuff! I see that estimate_acceleration_time is defined but never used. Is something coming later that will use this planner method?

@ejtagle
Copy link
Contributor Author

ejtagle commented Apr 7, 2018

That method is unused. It can be removed. I will not use it. It was a leftover of a previous implementation i did... (Y)

@thinkyhead thinkyhead added PR: Improvement PR: New Feature T: HAL & APIs Topic related to the HAL and internal APIs. labels Apr 7, 2018
@thinkyhead
Copy link
Member

So, only need to add a #define USE_JERK_CONTROL in the Configuration_adv.h (and examples) along with a paragraph of description. Since most users perceive Marlin as already using "jerk control," maybe we should call it REALTIME_JERK_CONTROL instead.

@ejtagle
Copy link
Contributor Author

ejtagle commented Apr 7, 2018

Marlin is using the term "Jerk" as "maximum allowable jump in speed between movement segments".
The actual meaning should be "change of acceleration". You are right. The JERK term was used for a different purpose.
This patch implements an algorithm that makes sure acceleration changes are smooth, and that eliminates vibration. Jerk in this context means "change of acceleration".
The result is no vibrations while printing, and improved performance. I am still thinking on the proper term ... REALTIME_JERK_CONTROL ... ACCELERATION_SMOOTHING ...
I will do whatever changes are required. Just wanted to get some consensus on the proper terminology 👍

@thinkyhead
Copy link
Member

I've applied the name change and added the option to configs just under the other Jerk options. This can be merged for testing as soon as it passes Travis CI.

@ejtagle
Copy link
Contributor Author

ejtagle commented Apr 7, 2018

👍 ... Yes. I am pretty interested in how it performs. At least, here the improvements on smoothness of movements is pretty amazing. I haven´t tried yet, but i do suspect that this algorithm could also enable to run the machine faster... Combined with LINEAR_ADVANCE, speed improvements should be achievable :D ...
The math itself works, and i have compared my fixed-point implementation with a floating point one, and the maximum error is +1 lsb on speed (probably could be improved by proper rounding results, at the expense of some extra cycles), but i think with the current precision is more than enough 👍

@hg42
Copy link
Contributor

hg42 commented Apr 7, 2018

what about JERK_CONTROL_BEZIER, there could be other methods in the future

@teemuatlut
Copy link
Member

Is this applicable for all frame types?

@ejtagle
Copy link
Contributor Author

ejtagle commented Apr 7, 2018

@hg42: Mathematically speaking, this is mostly the ideal method. On 32bit cpus, there is simply no point degrading it.. Maybe on 8bit avrs there could exist degraded versions of this method (for example, S-curves), to try to make feasible their implementations on such low powered computing platforms.

@teemuatlut : Yes. It applies to all. by decomposition of forces and accelerations on printing head/bed, you always end controlling motor accelerations. That is the way Marlin handled them previously to thus method, and that is the way it is still handled after this PR

Please note: I am aware that for Deltas, in fact, there are 2 masses that should be considered: One of them are the actuator masses, and the other is the printing head. The movements (acceleration/velocity) of the actuators are directly controlled by the motors themselves, and the head mass movement is a result of the composition of those movements, but the acceleration of such head has a non linear dependency on the acceleration of the motors and actuators.

Even though the 6th order implementation of this PR is not completely modeling that non-linear dependency, you can consider that small movements can be thought as linear. So, the 6th order curve WILL improve movements even on deltas.

On physics modeling, there is always a tradeoff between actual model and approximate modeling. Actual modeling requires to precisely measure model parameters, and those models must be able to be computed in realtime. And it also depends on how much stable is the mechanical structure of the machine, as those parameters could be varying in time!.

I think an approximate model is much easier to handle and much easier to evaluate, even if not completely optimal.

The idea of a 6th order bezier is just to smooth out speed increase transitions, smooth out acceleration changes. That results in no sudden changes of forces applied to the machine, And, no sudden changes of forces means no exitation of mechanically resonant frrequencies of the mechanical structure of the machine.

No exitation of mechanical resonances means no vibration of the print head, and no vibration of the print head means more exact deposition of plastic into the model.

Hope you get the idea... 👍

@ejtagle
Copy link
Contributor Author

ejtagle commented Apr 7, 2018

Imagine head vibration is represented by movements of beer, see how much time it takes for beer to stop moving after movement of the print head (=glass) stops.
https://www.youtube.com/watch?v=qYJpl7SNoww

They are using S curves. We are even improving on that!

Synthetos did a demo on their 6th-order controlled jerk motion planner, just imagine the pendulum to be the print head
https://youtu.be/u1nPK70QWlU?t=8s

We should be getting exactly the same performance with our implementation 👍

thinkyhead and others added 2 commits April 7, 2018 21:03
Enable 6th-order jerk-controlled motion planning in real-time.
Only for 32bit MCUs. (AVR simply does not have enough processing power for this!)
@thinkyhead
Copy link
Member

thinkyhead commented Apr 8, 2018

Imagine head vibration is represented by movements of beer…

If only we could simulate the oscillation-dampening effects of a foamy head. That would eliminate slosh in our printed objects.

@thinkyhead thinkyhead merged commit 4382f21 into MarlinFirmware:bugfix-2.0.x Apr 8, 2018
@ejtagle
Copy link
Contributor Author

ejtagle commented Apr 8, 2018

Foamy head damps vibration, but that means something is exciting those vibrations. The idea of this algorithm is to suppress that excitation (probably, the force applied by the motors to the head to produce acceleration is the excitation.
Right now i am playing with an AVR assembly routine to also be able to evaluate the bezier curve in realtime... :)

@ejtagle
Copy link
Contributor Author

ejtagle commented Apr 8, 2018

If i get it running, i will probably open a new PR with it 👍

@oysteinkrog
Copy link
Contributor

@ejtagle: This is really exciting, it would definitely be interesting to see if an AVR could handle that, I guess it would be necessary to reduce the steprate as much as possible, but I think with TMC2100 one could do 1/4 multistepping instead of 1/16 to reduce it a lot..

@thinkyhead
Copy link
Member

On AVR it might be possible to segment the acceleration/deceleration portions of the move using a short table of fixed-point cosine factors. The result would be composed of shorter linear accelerations, which taken together would be generally S-shaped.

@ejtagle
Copy link
Contributor Author

ejtagle commented Apr 8, 2018

i already have the bezier interpolation running in AVR . Takes 150 cycles to evaluate each point of the bezier curve. (i wrote an assembler version of the evaluator and reduced the precision of the coefficients as much as possible).
That allows an step isr rate of 100khz .. Reallistically, lets say 20khz is more than achievable.
The only missing point is that to estimate one of the coefficients of the curve, i need to compute a 24bit division. This is done twice once per trapezoid, but the default division routine takes 450cycles.
There must be a better way and algorithm to compute 24bit divisions.
After much investigation, there seems to be a way: A newton-raphson iterative aproximation to division could theoretically allow 24 bit divisions in 100 cycles in AVR. The algorithm itself is used by AMD for hardware divisions and it is called Goldsmith algorithm.
As soon as i get it working, i think it will be ready

@thinkyhead
Copy link
Member

thinkyhead commented Apr 9, 2018

a better way and algorithm to compute 24bit divisions.
After much investigation … A newton-raphson iterative aproximation

There's also fixed-point multiplication of the reciprocal, which is faster if the divisors are known ahead of time, i.e., constants.

@ejtagle
Copy link
Contributor Author

ejtagle commented Apr 9, 2018

Yes, but in this case the fixed value is in the numerator. According to this paper https://www.microsoft.com/en-us/research/wp-content/uploads/2008/08/tr-2008-141.pdf , it is possible to get maximum precision using 2 iterations of newton-raphson and a 1kb loopup table. 32bit multiplies cost about 29 cycles, so i estimate the full 32bit division to take 4*29 =120 cycles. Way less than the default GCC implementation, that takes 400+cycles to perform the exact same division with the same precision.

@ejtagle
Copy link
Contributor Author

ejtagle commented Apr 9, 2018

I've got the newton-Raphson aproximation working. Only remaining to translate it to AVR assembler, and the PR for AVR will be ready...

// This function computes (1<<24) / d using Newton-Raphson approximations
uint32_t get_0x1000000_div_x(uint32_t d) {
  static const uint8_t inv_tab[256] = {
    255,253,252,250,248,246,244,242,240,238,236,234,233,231,229,227,
    225,224,222,220,218,217,215,213,212,210,208,207,205,203,202,200,
    199,197,195,194,192,191,189,188,186,185,183,182,180,179,178,176,
    175,173,172,170,169,168,166,165,164,162,161,160,158,157,156,154,
    153,152,151,149,148,147,146,144,143,142,141,139,138,137,136,135,
    134,132,131,130,129,128,127,126,125,123,122,121,120,119,118,117,
    116,115,114,113,112,111,110,109,108,107,106,105,104,103,102,101,
    100,99,98,97,96,95,94,93,92,91,90,89,88,88,87,86,
    85,84,83,82,81,80,80,79,78,77,76,75,74,74,73,72,
    71,70,70,69,68,67,66,66,65,64,63,62,62,61,60,59,
    59,58,57,56,56,55,54,53,53,52,51,50,50,49,48,48,
    47,46,46,45,44,43,43,42,41,41,40,39,39,38,37,37,
    36,35,35,34,33,33,32,32,31,30,30,29,28,28,27,27,
    26,25,25,24,24,23,22,22,21,21,20,19,19,18,18,17,
    17,16,15,15,14,14,13,13,12,12,11,10,10,9,9,8,
    8,7,7,6,6,5,5,4,4,3,3,2,2,1,0,0
  };

  if (!d) return uint32_t(-1L);

  // Compute initial estimation of 0x1000000/x

  // #1] Get most significant bit set on divisor
  uint8_t idx = 0;
  uint32_t nr = d;
  if (!(nr & 0xFF0000)) {
    nr <<= 8;
    idx += 8;
    if (!(nr & 0xFF0000)) {
      nr <<= 8;
      idx += 8;
      if (!(nr & 0xFF0000)) {
        nr <<= 8;
        idx += 8;
      }
    }
  }
  if (!(nr & 0xF00000)) {
    nr <<= 4;
    idx += 4;
  }
  if (!(nr & 0xC00000)) {
    nr <<= 2;
    idx += 2;
  }
  if (!(nr & 0x800000)) {
    nr <<= 1;
    idx++;
  }

  // Isolate top 9 bits of the denominator, to be used as index into the initial estimation table
  uint32_t tidx = nr >> 15,         // top 9 bits. bit8 is always set
           ie = 0x100 | inv_tab[tidx & 0xFF], // Get the table value. bit9 is always set
           x = idx <= 8 ? (ie >> (8 - idx)) : (ie << (idx - 8)); // Position the estimation at the proper place

  // #3] Now, refine estimation by newton-raphson. 2 iterations are enough
  x = uint32_t((x * uint64_t((1 << 25) - x * d)) >> 24);
  x = uint32_t((x * uint64_t((1 << 25) - x * d)) >> 24);

  // Estimate quotient
  uint32_t q = x * d;

  // And remainder
  uint32_t r = (1 << 24) - q;

  // Check if we must adjust result
  if (r >= d) x++;
  
  // x holds the proper estimation
  return uint32_t(x);
}

@ejtagle
Copy link
Contributor Author

ejtagle commented Apr 9, 2018

Just to give you an idea, my instruction cycle budget is about 150 cycles maximum per ISR. The interpolator itself uses 150 cycles, and the coefficient calculation should also use 150 cycles maximum. (they never run simultaneously). So, once i translate this to assembler, it should be ready for testing.

This is the interpolator itself, already done and working. But i have also to rewrite the _calc_bezier_curve_coeffs_avr function in assembler, and that last division is what i am going to replace with this newton-raphson equivalent
Attached the code i am working on, just in case.. ;)
sketch_apr07a.zip

@ejtagle
Copy link
Contributor Author

ejtagle commented Apr 9, 2018

And yes, i do prototyping of the algorithms in plain C, because i must be sure they work before translating into assembler. Assembler is extremely difficult to follow and patch. So the transcription to ASM is done once i am absolutely sure the algorithms work in each and every possible use case.
BezierComparer.zip

@thinkyhead
Copy link
Member

thinkyhead commented Apr 9, 2018

the transcription to ASM is done once i am absolutely sure the algorithms work

I would be surprised if the assembler is much more optimized than what the C++ compiler can produce, given the utter simplicity of the operation.

@thinkyhead
Copy link
Member

150 cycles maximum per ISR

My old career (circa 1990) was programming 680x0 Assembler in Amiga DevPac, so I was trained to be crazy about shaving off cycles using clever techniques, counting cycles, picking the fastest op-codes, and so on. I got pretty good at optimization, wrote a nice optimized LZH decompressor, MFM disk encoder/decoder using the "blitter" co-processor, etc. Ah, those were the days… Now we can generally trust the C/C++ compilers to be smarter than most humans.

Still, there are tricks the compiler might not think of…

I suppose this has to run in the ISR, and there's no way we can pre-compute the curves as part of assembling the blocks in the planner…?

@teemuatlut
Copy link
Member

How does M201 relate to the acceleration in this new planner?
I'm looking at the animations in the TinyG github page and do the acceleration values with this refer to the peak acceleration or the average acceleration?

@ejtagle
Copy link
Contributor Author

ejtagle commented Apr 9, 2018

@teemuatlut: M201 accelerations right now limit the average acceleration. The peak acceleration can exceed it. We could do limit the peak acceleration, but there are reasons for not doing that (and also reasons for doing it).

Usually, acceleration is used to limit skipping steps. Steps are lost because of sudden changes in direction where the motor has to overcome the mass inertia of the mechanism of the printer.

With this algorithm, there are no sudden changes in applied force, as there are no sudden changes in acceleration. Thus there are very good reasons to believe that the maximum acceleration can be increased quite a bit compared to the old method without losing steps. That is why i chose to use the m201 acceleration as the mean acceleration.

Due to the way the curve fitting works, the time to reach an specific distance is the same right now with either of the algorithms (trapezoidal or bezier)

The other reason for keeping as it is is that limiting maximul acceleration requires even more calculations, and will surely increase printing times a lot.

@thinkyhead: On any 32bit or even 16bit processor i would agree with you. GCC is pretty good at doing all kind of optimizations and transformations to the code to save cycles, but lacks support for specific architectural optimizations that sometimes can improve execution time quite a bit for specific algorithms: For example, in ARM Cortex M3/M4 there are multiply-add instructions (called MACs) that allow you to compute the product of 2 32 bit numbers, get a 64bit result and add that result to a 64 bit accumulator,all that in 5 instruction cycles in M3 or just 1 cycle in M4. GCC does not use that instruction and so compiles that sequence to a load of plain 32bit multiplications (1 cycle per plain multiplication in M3, 2 cycles per plain multiplication in M4) and 32 bit additions. The resulting code runs 5 to 10 times slower than using that specific MAC instruction. That is exactly the instruction we use to evaluate beziers.. ;)

On the AVR the situation is extremely worse: The processor has 8bit registers and no barrel shifter, but ANSI C states that all types must be promoted to 16 bits to perform operations on them. Sometimes GCC is able to realize that there is no need for that promotion and is able to carry the operation using just one 8bit register, but there are several times when it does not realize that:

Examples:

uint8_t v,a;
v &= ~a; // promotes the negation to 16 bits, promotes v to 16 bits, does the operation, discards the top 8 bits and then stores the result. 
uint32_t v1; uint8_t v2;
v2 = 0x14;
v1 |= v2; // v2 is PROMOTED to 32 bits (ouch!), and then a 32bit or operation is performed over v1.

All shifts left and right promote to 16 or 32 bits and are painfully done one bit at a time using a loop. The procesdor has a swap instruction that should allow 4bit shifts but it is rarely used by GCC. And sometimes ot is cheaper to rewrite (in asm) something like

 a << 3 

as

a << 4; a >> 1;

the first loop takes 3 iterations, the alternative version takes 2 instruction cycles (SWAP + right shift). GCC does not use that optimization at all.

GCC is using some kind of peephole rule based optimizer, and not doing the analysis of ranges and proper decomposition to 8bit operations. The main problem is that GCC was created with 32bit archs in mind, so there was no effort in the analysis passes required to reduce operations to 8bit primitives, as 32bit archs do not and did not require them - Thus unless the expression is very simple, GCC will always produce suboptimal code sequences for AVR.

The other problem is the C standard: GCC cant express a 24bit multiplication. There is no 24bit type in C. But as AVR has an 8 bit multiplier, a 32 bit multiplication by 32bitx32bit to 32bit multiplication is rwsolved by GCC as 10 8bitx8bit multiplies and about 8 32bit adds. that give an execution time of 10+4*8=42 cycles.

If you need a 64bit result from a 32bitx32bit product, the only way to get in ANSI VñC it is to expand to 64 bits both operands. Lets assume GCC knows and just does not expand, instead solves the 32x32 to 64bit product : AVR requires 16 8bit multiplications and 12 64bit additions to compute the result: 16+12*8 =112 cycles.

On the Bezier algorithm, I need about 3 24bitx16bitmultiplies, and only need the top highest 24 bits of the result: takes 6 8bit multiplies and 4 24bit adds: 6+3×4 =18 cycles. Also need 4 16bitx16bit to 16bit multiplies, and only the top 16 bits of the result: takes 4 8bit multiplies + 3 16bit adds: 4+3×2=10 cycles. And also need an extra 32bitx32bitto16bit (only need the top 16 nits of the result). This takes 6 8bit multiplirs and 6 16bit adds: 6+2×12=30 cycles.

So the total amount of cycles for the bezier evaluation is 30 + 18x3 + 4x10 = 124 cycles.

If done in plain C, that would take 112cycles×3 + 5×40= 456 cycles. So the assemblee version is 3.5x times faster.

With the newton-Raphson divide, is more or less the same problem. The asm version ends being 3x faster than the plain division used by GCC.

@ejtagle
Copy link
Contributor Author

ejtagle commented Apr 9, 2018

Yes, we could precompute the curve coefficients in the planner. And also we could, at the expense of a lot of ram,precompute the curve values. The first one is perfectly doable, the 2nd one would be a problem due to the known lack of ram problem of this device.. Placing extra things in the planner is something i dont like,because the planner uses an iterative approach to recompute speeds of movements everytime a newcmovement is queued, leading to several redos if each calculation... i prefer to delay as much as possible those unneded for the planner algorithm calculations.
But if we delay them to the moment a block is retired by the isr, then we end placing calculations on the isr... ;) the best and worst place at the same time !

@hg42
Copy link
Contributor

hg42 commented Apr 9, 2018

@ejtagle thanks for those extensive explanations. They answer the questions I asked in an email to your github email address, so simply forget that email.

I was worrying about the much increased maximum acceleration (may be 2x-3x, right?).
But because I never saw skips in the middle of a straight move, but only at the start or end of moves, I assume/hope the skips are indeed caused by abrupt changes of force (=acceleration) and maximum acceleration may be much higher when changing it smoothly.

Btw. you have quite impressive knowledge...

@ejtagle
Copy link
Contributor Author

ejtagle commented Apr 9, 2018

@thinkyhead : After all kind of optimizations to the ASM version of coefficient estimation routine, it takes 360 cycles to complete. Doubles the time of the bezier point evaluator, that takes 150cycles to evaluate.
Seems the only way to make it run in AVR is to perform coefficient estimation at the planner, otherwise, we could be consuming too much cycles at the stepper ISR...
Will do that in the following PR... even if i don´t completely like it...

@thinkyhead
Copy link
Member

……… The asm version ends being 3x faster than the plain division used by GCC.

Good explanation of the compiler issues, thanks! I guess I shouldn't be surprised that GCC is leaving 8 bit architectures behind. I'd be interested to see your 360-cycle assembler version, just in case my "naive questions" might lead to a breakthrough that could cut it in half… by caching previous results or other such trickery.

Meanwhile, coefficient estimation in the planner does seem the only viable way to go.

@ejtagle
Copy link
Contributor Author

ejtagle commented Apr 10, 2018

You are right. Caching could improve things. On32bit cpus has no sense (zivision takes 10 cycles) but on AVR maybe.. ;) -Of course, @thinkyhead. All improvements to the routines/algorithms are always welcome !

@ejtagle
Copy link
Contributor Author

ejtagle commented Apr 10, 2018

BTW. i managed to reduce the count of cycles of the coefficient estimator to 210 cycles. Seems that we can get a proper estimate using some small tables and just one Newton-Raphson iteration

@me-honest
Copy link

me-honest commented Apr 24, 2018

Any updates on this? Pretty interesting, I believe this could be a large step in quality, specially with the large vibration-prone beds currently in use, such as CR10, etc. EDIT: Found the pull implementing it for AVR, will try and report back since I couldn't find much feedback on the actual printing results.

@ejtagle
Copy link
Contributor Author

ejtagle commented Apr 24, 2018

It does what it says, nearly no ringing. But i do suspect that the linear advance feature (if you use it) also requires to be updated to take into account the new acceleration profile that this interpolation produces. I am thinking about that right now

@oysteinkrog
Copy link
Contributor

This is relevant and interesting, what do you guys think?
http://fightpc.blogspot.no/2018/05/a-better-stepper-drive-s-curve.html

@ejtagle
Copy link
Contributor Author

ejtagle commented May 3, 2018

I´d agree, but that is not the main problem in 3D printing. The intent of the bezier curves is not to increase printing speed, rather than, to reduce resonances. As a matter of fact, the printing time with the bezier curve is exactly the same as without it. But with less vibration of the mechanical structure.
Besides, I do not completely agree with the article. The main reason is that in 3D printing we are not limited by motor torque on maximum speed (the motors we and you use are capable of going way faster than we actually use them. So torque at the speeds we use them is not a problem.
The main problem is starting (because of the heavy loads (print head, bed, etc) the motor must take out of rest and overcome static friction. And reducing acceleration at the beginning (as the Bezier curve does) helps with that.
So even if the article seems to be right, i am not sure it is our case... ;)

@hg42
Copy link
Contributor

hg42 commented May 3, 2018

a test without mass is totally nonsense.

I once tested Klipper on lpc1768 and could reach very insane speeds and accelerations that were only limited by CPU power. 2000mm/s and 20000mm/s^2 were easy and still not the limit, no obvious skipping.
I am also missing a physical model behind it. E.g. why should it be an exponential curve?

@ejtagle
Copy link
Contributor Author

ejtagle commented May 3, 2018

@hg42 : Without trying to defend the article writer, what i get out from it is that the author is trying to maximize the increase of speed as much as possible, and trying to reach the maximum possible speed the motor can give the fastest it can be done, and it is true that torque decreases as speed increases (any motor has that limitation (http://lancet.mit.edu/motors/motors3.html) . The same curves also apply to stepper motors.
But, as said, and as you confirm, on 3D printing we are very far from the speed limits the stepper motors we are using have. So, that speed/torque curve limit does not play a role, and is not a limitation
The main problems on 3D printing are mainly inertial, due to the mass that must be accelerated/deaccelerated, and the instantaneous forces the motor must perform to change the movement direction (and that includes starting from a stop condition) and avoiding oscillations of the mechanical structure as much as possible.
The 6th bezier curve is probably the optimum here. And it CAN improve and reduce printing times (as reducing/suppressing oscillations allows to go at a higher speed.
Marlin itself has to improve its Jerk control algorithm, and also improve its linear pressure advance algorithms. None of them are complex things to do... But i am right now busy, so it will have to wait a bit...

@hg42
Copy link
Contributor

hg42 commented May 3, 2018

@ejtagle : with mentioning my tests, I actually wanted to support your statement, that speed is not the problem.
I don't doubt a depency spedd vs. torque but your link suggests a linear dependency, so why should he use an exponential term?

Btw. I did some tests with the new s-curve on Marlin on my corexy (X5S with lots of mechanical improvements to prevent vibrations, e.g. diagonal bars, fixating the z-axis rods, etc.).
I used Marlin with and without BEZIER for each test.

With my low current TMC2130 setup, I always had to limit acceleration to about 1000 mm/s^2 to prevent skips.
I hoped that the s-curve could allow higher acceleration settings. But it's still limited to the same value (determined experimental).
My printer doesn't vibrate much at that acceleration even for fast prints (I tested up to 500mm/s without filament) and the s-curve didn't improve on that.

So at least there are operating conditions where the s-curve doesn't make a difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR: Improvement PR: New Feature T: HAL & APIs Topic related to the HAL and internal APIs.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants