Skip to content

[2.0.x] Migrate Marlin to a new TMC library#11792

Closed
teemuatlut wants to merge 4 commits intoMarlinFirmware:bugfix-2.0.xfrom
teemuatlut:bf2_new_library
Closed

[2.0.x] Migrate Marlin to a new TMC library#11792
teemuatlut wants to merge 4 commits intoMarlinFirmware:bugfix-2.0.xfrom
teemuatlut:bf2_new_library

Conversation

@teemuatlut
Copy link
Member

I'm starting the process to migrate to a new TMC library that aims to support more driver models.
It might be a bit early but I'd like to get a review going to make things go smoothly.

Here's a general plan for the rest of the year. Ticked boxes mark the changes introduced in this PR.

  • Move to new universal TMCStepper library
  • Implement inherited Marlin TMC classes
  • Support TMC2660
  • Support TMC5160
  • Add sanity checking for driver errors
  • Greatly reduce communication attempts for monitoring
  • Improve TMC error verbosity with better error messages
  • Add communication checking on boot
  • Add the ability to change chopper timings (spreadCycle)
  • Tweak default chopper settings for quieter spreadCycle operation
  • Support Trinamic coolStep technology
  • Improve M122 with fetching raw register data
  • Add TMC section to the LCD
  • Support TMC5130
  • Support TMC2224
  • Add SOFTWARE_ENABLE for all smart drivers (Request from Panucatt)

Regarding this PR.

A more universal library allows me to support more chips without creating a completely new library for each and every one of them. The maintenance would quickly become too great when the number of supported models is more than a couple.
The current aim is to support TMC2130, TMC2208, TMC2224, TMC5130, TMC5160 and TMC2660.
The library itself should be in pretty good condition but as always, it is mostly untested at this point.

Implementing a child class within Marlin allows us to attach certain functions to the stepper objects so they don't have to be passed around as parameters, or be tracked as indexes to an array. This PR primarily uses this for printing out the axis labels and otpw counter values.

Basically this PR forms the base for many of the new features, functions and fixes.
I ask that the PRs don't get squashed together as I've tried to organize the commits thematically and it'll make it easier to debug in case of any issues.

@thinkyhead
Copy link
Member

Looks great so far. I like the axis label being made a part of the stepper object.

@ejtagle
Copy link
Contributor

ejtagle commented Sep 10, 2018

The only comment i have here is that on AVR, virtual functions consume A LOT of SRAM to store virtual function tables.
Yes, virtual function tables should be stored in FLASH, but there is a bug in the AVR compiler that stores them in SRAM, and there is no workaround for it.
I´d suggest to use a "handcrafted" virtual mechanism instead, using FLASH function pointers instead
The other choice would be to use templates as a way to implement derivation

@teemuatlut
Copy link
Member Author

I don't know much about the subject, is there somehow I can measure the impact the virtual functions have on memory consumption?

@ejtagle
Copy link
Contributor

ejtagle commented Sep 11, 2018

@teemuatlut : Just compile with and without TMC support and see the amount of free SRAM. You will be suprised! .. (In fact, i was surprised the first time i tested this.. :( ) ... On 32bit processors, there is no difference at all, as virtual function tables are stored on FLASH ... Just count the amount of storage bytes your class requires. You will notice that adding a class with virtual methods consumes much more space than the one you are using. Even defining a class that inherits and overrides just one virtual function, but adds no data members, will consume a lot of SRAM (just try it! ;) )

AVR does not store virtual function tables on FLASH: Each derived class requires a virtual function table, and all of them are stored in SRAM in AVR.

That is the reason there are no virtual functions on the Marlin codebase!

AVR SRAM is very little, so every possible effort to reduce its consumption should be done 👍
You can get mostly the same effect as virtual functions with templates...

@ejtagle
Copy link
Contributor

ejtagle commented Sep 11, 2018

Please look at
https://stackoverflow.com/questions/16988450/c-what-is-the-curiously-recurring-template-pattern-and-can-curiously-recurri

You will get even better performance without the SRAM penalty of virtual functions!

@teemuatlut
Copy link
Member Author

I'll definitely go through your link. Thanks!

I remember I did originally try some other approaches than inheritance and virtual functions but ran into some issues or just didn't know how to write something.

Perhaps it's not best to compare with and without TMC enabled, as of course with a new feature enabled, and a somewhat substantial as this, the memory consumption will increase.
So what I did was compare commits from current bugfix-2.0.x branch and after the commits introduced in this PR. The current TMC2130Stepper library makes no use of virtual functions.
Here's what I got with up-to-date PIO:

Typical 4x TMC2130 setup on AVR

Old lib

DATA:    [====      ]  37.6% (used 3080 bytes from 8192 bytes)
PROGRAM: [==        ]  22.4% (used 56894 bytes from 253952 bytes)

New lib

DATA:    [====      ]  36.9% (used 3026 bytes from 8192 bytes)
PROGRAM: [==        ]  22.8% (used 57944 bytes from 253952 bytes)

Vanilla Marlin

DATA:    [===       ]  31.0% (used 2540 bytes from 8192 bytes)
PROGRAM: [==        ]  20.9% (used 53018 bytes from 253952 bytes)

To me this seems like a small reduction in SRAM use. Maybe we can conclude that the old library is even worse then :D

Also comparing to vanilla Marlin with no configuration changes, the penalty for enabling the TMC2130 drivers is +6% in SRAM. I think I'm okey with this.

I'll try testing a 32bit platform once I get back home.

@ejtagle
Copy link
Contributor

ejtagle commented Sep 11, 2018

@teemuatlut : If the previous library also had virtual functions or was using or inheriting from classes with virtual functions, that could explain a bit.
I could be wrong, please correct me if i am, but, what are the requirements of static storage of the TMC library ? (i am referring to globals, or per instance data, not local variables ? - I think they probably are non existant or quite small.
Adding the TMC library is using (from vanilla Marlin) 3026 - 2540 = 486 bytes of globals? ... I think it is pretty high ... ! - That would essentially allow to store 120 bytes per axis! - I even doubt the TMC chips have so many internal registers...
I´d suggest inspecting the .MAP file created by the linker...
The problem with SRAM is the Planner. The more available memory we have, the more we can plan ahead movements (and that helps with stutter ... longer planned movement queue means less chances of it becoming empty, specially important with small segments or UBL...

I am a big fan of templates and templated functions: They have no runtime penalties, they compile to optimal code, If you look at the STL sources, you will notice there are no virtual methods or classes...
In AVR, the indirect call to virtual functions not only costs SRAM, but also costs execution time, as it is not straightforward to perform an indirect call
(On 32 bits, that cost decreases to nearly nothing... AVR is the problem .. ;)

@teemuatlut
Copy link
Member Author

teemuatlut commented Sep 11, 2018

If the previous library also had virtual functions or was using or inheriting from classes with virtual functions, that could explain a bit.

(TMC2130Stepper 'old library')
No inheritance, no virtual functions. Only one you could consider is an SW SPI object created within the class.

There are 26 shadow register per driver/stepper/axis. Each is 32bits.
The old library also stores EN, DIR, STEP and CS pins as 16bit variables.
Additionally there's a 16bit value for storing the user configured RMS input and a boolean flag for using SW SPI.
Then there's a stored sense resistor value, a started flag, a tracked otpw flag and a variable for storing a status response byte.
Currently there are also some variables for storing values needed by the LCD implementation but those will be moved to within Marlin.

The SW SPI stores some precalculated fast io registers so they don't have to be looked up on every write. These are 3x16b + 3x8b + 3x8b.

There are no static variables used.

All this is per one instance of the class. By my calculations this is 1136b = 142B. Multiply this by the number of axis and you'll get 568 bytes.

Looking at it like this, there definitely is room for optimization but I don't want to start changing a working architecture to chase after 10 or 20 bytes.

Concerning templates, I think if you look around in Marlin you'll see that I'm pretty much the only one who has made use of them =)

@ejtagle
Copy link
Contributor

ejtagle commented Sep 11, 2018

Yes, @teemuatlut ... templates could be helping quite a bit on Marlin ... Perhaps instead of the millions of macro lines ;) ...
Yes, i do agree that SRAM consumption matches.
My experience with virtual functions on AVR was always terrible. Specifically the Stream class from Arduino has a huge virtual function table, and everything seems to inherit from it...

@SJ-Innovation
Copy link
Contributor

Following this: started my own rewrite but realised how much of a job it was going to be. Really like how the labels are held to each stepper instance. Thats very VERY useful.

@teemuatlut
Copy link
Member Author

@thinkyhead I've merged in your PR and squashed it in with the original config file update commit.
I also took the opportunity to disable sensorless Z by default.

@thinkyhead
Copy link
Member

So, I did a first attempt to throw away the virtuals, and of course it's not that simple.

The TMCStepper class is calling the virtual methods from within itself, which of course requires the virtual routing to get to the correct subclass method.

So now I'm wondering if there's another technique to get to the right subclass method. For example, by using a helper class with methods that take the different object pointer types as the first argument. I'll do some experimentation soon and see what I can come up with.

@teemuatlut
Copy link
Member Author

Virtual functions were my solution when a parent class needs to call a method from a child class.
I believe this is what virtual functions were made for. What would my alternative be?

Read and write were made virtual because the implementation differs from driver type to another, and the parent class TMCStepper needs to call the right one.
The same pretty much applies to the rest of the virtual methods. They are all called by a parent class method but the definitions themselves are part of a child class.

So for example, the TMCStepper::blank_time() method needs to call either TMC2130Stepper::tbl() or TMC2208Stepper::tbl(), depending on which child class created the parent.
When I search for "How to call child method from parent" virtual functions seem to be the appropriate answer. Of course the discussion doesn't normally include mentions of AVR.

@teemuatlut
Copy link
Member Author

of course it's not that simple

😄 Quoted for emphasis.

I actually spent quite a few weeks trying to come up with the proper architecture and this is what worked for me.
And I'd still like to point out that even if we were to get it working and in a sensible manner, it still needs have reasonably significant memory savings to be worth the trouble.

@thinkyhead
Copy link
Member

thinkyhead commented Sep 12, 2018

I've been thinking about possible solutions. The first idea that comes to mind is to use a custom router in the root class, replacing all the methods that are currently pure virtual…

Code Samples…
void TMCStepper::write(const uint8_t addr, const uint32_t config) {
  switch (this->classID) { // Populated by constructors (see below)
    case TMC2130_CLASS: static_cast<TMC2130Stepper*>(this)->write(addr, config); break;
    case TMC2208_CLASS: static_cast<TMC2208Stepper*>(this)->write(addr, config); break;
    case TMC2660_CLASS: static_cast<TMC2660Stepper*>(this)->write(addr, config); break;
    case TMC5130_CLASS: static_cast<TMC5130Stepper*>(this)->write(addr, config); break;
    case TMC5160_CLASS: static_cast<TMC5160Stepper*>(this)->write(addr, config); break;
  }
}

(…which can be shortened to…)

#define ROUTE(T) break; case TMC##T##_CLASS: static_cast<TMC##T##Stepper*>(this)

void TMCStepper::write(const uint8_t addr, const uint32_t config) {
  switch (this->classID) { default:
    ROUTE(2130)->write(addr, config);
    ROUTE(2208)->write(addr, config);
    ROUTE(2660)->write(addr, config);
    ROUTE(5130)->write(addr, config);
    ROUTE(5160)->write(addr, config);
  }
}

While this does cost some PROGMEM, it costs only 1 extra byte of SRAM per object, for the classID.

enum TMCClassID : unsigned char {
  TMC2130_CLASS,
  TMC2208_CLASS,
  TMC2660_CLASS,
  TMC5130_CLASS,
  TMC5160_CLASS
};

class TMCStepper {
  TMCClassID classID;
  . . .
  TMCStepper(TMCClassID cid, float RS) : classID(cid), Rsense(RS) {};
}
TMC2130Stepper::TMC2130Stepper(uint16_t pinCS, float RS) :
  TMCStepper(TMC2130_CLASS, RS),
  _pinCS(pinCS)
  {}

@ejtagle
Copy link
Contributor

ejtagle commented Sep 12, 2018

Templates! ... You are not thinking in templates.. Let me show a way (it is NOT the only way, just a way:

class Base {
public:
  void performTask() {
    method1();
    method2();
  }

  virtual void method1() {}
  virtual void method2() {}
};

class Derived : public Base {
public:
  virtual void method1() { printf("Derived method1"); }
  virtual void method2() { printf("Derived method2"); }
};

Replacing virtual by templated base class:

template <class DerivedClass>
class Base {
  void performTask() {
    self().method1();
    self().method2();
  }
  
protected:
  DerivedClass& self() { return *static_cast<DerivedClass*>(this); }
  const DerivedClass& self() const { return *static_cast<const DerivedClass*>(this); }
};

class Derived : public Base<Derived> {
public:
  void method1() { printf("Derived method1"); }
  void method2() { printf("Derived method2"); }
}

The ONLY reason for using virtual classes is if you want to have an array with pointers to the Base class! And, on AVR, i´d even
avoid it... How ? -- Implementing "virtual function table by hand"

typedef void (*pfnMethod1)(void);
typedef void (*pfnMethod2)(void);

struct VirtualTable {
  pfnMethod1 method1;
  pfnMethod2 method2;
};

static void myMethod1() {
}

static void myMethod2() {
}

VirtualTable vtbl PROGMEM = {
  myMethod1,
  myMethod2
};

@thinkyhead
Copy link
Member

thinkyhead commented Sep 12, 2018

Here's a little C++ program that demonstrates the routing technique:

router.cpp.txt

@thinkyhead
Copy link
Member

thinkyhead commented Sep 12, 2018

The ONLY reason for using virtual classes is if you want to have an array with pointers to the Base class!

Or, if you want to do a call of a method from the root class, but that method is only promised in the root class (pure virtual) and only implemented in the derived classes.

TMCStepper::do_something(int x, int y) {
  my_method(123, y, 456); // this method is implemented differently by each derived class
}

@thinkyhead
Copy link
Member

thinkyhead commented Sep 12, 2018

Ah, I see that your self() methods in the templated Base<> class accomplish the same thing, but in a much neater way. So, following that technique we get this much simpler looking thing!

templated.cpp.txt

I'm now a much bigger fan of static_cast.

@ejtagle
Copy link
Contributor

ejtagle commented Sep 12, 2018

The sweetness of templates is that all the calls are resolved at compile time.. No wasted cycles, no wasted SRAM or resources :)

@thinkyhead
Copy link
Member

thinkyhead commented Sep 12, 2018

Funny, because I've employed this technique before… A looong time ago. It's pretty much a necessity in many cases to have a self() to get to the method you want. I've had to do this in C# and JavaScript also.

@thinkyhead
Copy link
Member

thinkyhead commented Sep 12, 2018

Are templated sub-classes also possible without extending this? For example, TMC5130Stepper is a subclass of TMC2130Stepper. So we need to be able to do this…

class TMC5130Stepper : public TMC2130Stepper<TMC5130Stepper> {
  . . .
};

@ejtagle
Copy link
Contributor

ejtagle commented Sep 12, 2018

Should be possible.
static_cast will always work for single inheritance, just because C++ just adds the derived class members in the same order as the inheritance (base class type ... outmost derived class type) - So the address of the base class is the same as the address of the derived classes

@thinkyhead
Copy link
Member

thinkyhead commented Sep 12, 2018

It's tricky. This of course doesn't work at all for obvious reasons…

template <class T>
class TMC2130Stepper : public TMCStepper<T> { . . . }

template <class T>
class TMC5130Stepper : public TMCStepper<T> { . . . }

class TMC5160Stepper : public TMC5130Stepper<TMC5160Stepper> { . . . }

@ejtagle
Copy link
Contributor

ejtagle commented Sep 12, 2018

(and yes, you can achieve the results you want by making TMC2130Stepper a template itself ... There are easy ways to make it behave

template <class T>
class TMC2130Stepper  { };

A typedef can "remove" the parameters and give an even cleaner usage pattern if you want that.

@ejtagle
Copy link
Contributor

ejtagle commented Sep 12, 2018

template <class T>
class TMC2130Stepper : public TMCStepper<T> { . . . };

template <class T>
class TMC5130Stepper : public TMCStepper<T> { . . . };

That should work ... Why not ? ...

@thinkyhead
Copy link
Member

Rebased, and had to move to a new PR… #11919

@thinkyhead thinkyhead closed this Sep 24, 2018
@teemuatlut
Copy link
Member Author

I disabled edits so I could review any additional changes needed.

@thinkyhead
Copy link
Member

The rebase was complicated by some dead commits. It needed a refresh. You can pull the HEAD of the new PR’s branch in as the new HEAD of your branch to rectify it, using a git reset.

@teemuatlut
Copy link
Member Author

teemuatlut commented Sep 25, 2018

I hard reseted to your branch and rebased it to upstream.
I then added some new commits. Can you review the last 5 commits and I can then sort & rebase them into something sensible. I thought it would be easier for you to go through them before I combined the new things into old commits.
You may need to re-open this PR, or look at the branch directly. Commit diff.

Here are the current standings in terms of memory consumption:

Config (4x TMC2130) PROGMEM SRAM
Upstream Marlin with TMC disabled 52192 2499
TMC enabled 56002 2999
New library 55594 2752 (-247B)

So with this PR, basic TMC2130 support should consume only 253 bytes of SRAM (down from 500B).

@ejtagle
Copy link
Contributor

ejtagle commented Sep 25, 2018

@teemuatlut : Make TMCStorage a template also, and instead of passing a pointer to the label, you can pass the label as a template parameter.
Very easy, and can save 4 extra bytes per axis 👍

(using SRAM to store non changing pointers or values is a waste of resources ;) )

BTW: This TMCLibrary is in great shape... I see no other optimizations possible right now. Great work!!

@teemuatlut
Copy link
Member Author

I'll take a look.
The storing of the label was supposed to be a somewhat direct port from before but I guess at some point it did turn into a run time variable.

@teemuatlut
Copy link
Member Author

Feel free to pitch in. Getting too late for me.
This is what I'm trying to test with:

static const char LABEL[] PROGMEM = "MSG_A";

template<char L>
class myClass {
	public:
	void foo() { char* c = L; }
};

void setup() {
  myClass<LABEL> c1;
  c1.foo();
}
CPP_template_label:13:11: error: the value of 'LABEL' is not usable in a constant expression
CPP_template_label.ino:4:19: note: 'LABEL' was not declared 'constexpr'
CPP_template_label:13:16: error: conversion from 'const char*' to 'char' not considered for non-type template argument

@ejtagle
Copy link
Contributor

ejtagle commented Sep 25, 2018

@teemuatlut : This works for me... the only requirement is to make the string constexpr

static constexpr char demo [] PROGMEM = "DEMO";

template <const char* p> class myClass {
public:
	static const char* text() { return p; }
};

int main()
{
	myClass<demo> test;
	printf("%s", test.text());
}

This works for me on both AVR and DUE

@teemuatlut
Copy link
Member Author

Okey, the next step that I need is passing the object as a argument to a function. I don't know why it's complaining as it's exactly the same as I had in Marlin.
The compiler message isn't helping much either 😄
expected a constant of type 'const char*', got 'const char*'

static constexpr char LABEL[] PROGMEM = "MSG_A";

template<const char* L>
class myClass {
	public:
	void foo() { const char* c = L; (void)c; }
};

void fun(myClass<const char*> &st) { (void)st; }

void setup() {
  myClass<LABEL> c1;
  c1.foo();
  fun(c1);
}

@ejtagle
Copy link
Contributor

ejtagle commented Sep 26, 2018

That one is "easy"...

The template you are using requires not a type, but a value to be able instantiate it...

This will work, but probably is not what you need:

static constexpr char LABEL[] PROGMEM = "MSG_A";

template<const char* L>
class myClass {
  public:
  void foo() { const char* c = L; (void)c; }
};

void fun(myClass<LABEL> &st) { (void)st; }

void setup() {
  myClass<LABEL> c1;
  c1.foo();
  fun(c1);
}

@ejtagle
Copy link
Contributor

ejtagle commented Sep 26, 2018

One possible solution to this is to use

template <typename T> void fun(T &st) { (void)st; }

@ejtagle
Copy link
Contributor

ejtagle commented Sep 26, 2018

Another possible solution is

template <const char* T> void fun(myClass<T> &st) { (void)st; }

This latter one is less generic, but i think it is the one you need @teemuatlut

@teemuatlut
Copy link
Member Author

Well, it compiles but doesn't link. For some reason Marlin lost the definition for stepperX.

teemuatlut@0f0da30

extern TMCMarlin<TMC2130Stepper, TMC_X_LABEL> stepperX;

TMCMarlin<TMC2130Stepper, TMC_X_LABEL> stepperX(X_CS_PIN, R_SENSE);

@ejtagle
Copy link
Contributor

ejtagle commented Sep 26, 2018

Wow! ... Let me see ... ;) ...

@ejtagle
Copy link
Contributor

ejtagle commented Sep 26, 2018

@teemuatlut : This seems to be a bug of GCC 4.8.x / 4.9.x series when using LTO (linker time optimizations) that sometimes removes global symbols from modules when it shouldn't (https://answers.launchpad.net/gcc-arm-embedded/+question/261316) , (https://trac.mplayerhq.hu/ticket/1695).
I am trying to find a workaround for it, but until now, none worked...

@thinkyhead
Copy link
Member

I hard reseted to your branch and rebased it to upstream.

I don't see that this PR was rebased, or new commits added. And now it cannot be re-opened, so you may want to just open a new PR based on your current work branch.

@teemuatlut
Copy link
Member Author

I don't think github keeps tracking PRs that were closed. If you follow the links to the branch commit history, you'd see plenty of new commits.

@thinkyhead
Copy link
Member

Probably. If I had re-opened it sooner then it would have them. Anyway, go ahead and start a new PR so you can have control over its content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants

Comments