Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Lite" version? #50

Open
w9ran opened this issue Jul 13, 2017 · 16 comments
Open

"Lite" version? #50

w9ran opened this issue Jul 13, 2017 · 16 comments
Assignees
Milestone

Comments

@w9ran
Copy link

w9ran commented Jul 13, 2017

I'm a noob to Arduino but the need for frequency control using the Si5351 has lit a fire under me! So first, thank you for your efforts! I really like the v2 updated library which I find much more intuitive to use.

My only request would be to create a downsized library where the combined libraries needed to control the Si5351 (e.g. "Si5151" and "Wire") could run on the ATTiny85-20. This is probably a tall order since it's only got about 6K left after the bootloader, but it would be very cool to be able to shrink the size of a frequency source to that of an old-style crystal using this 8-pin uC. Especially now that the last custom crystal maker has gone out of business.

If it were possible to slim down the library for this purpose, I'd be content with just the basic setup commands and the new set_freq() method - just the bare minimum. A simple sketch that uses just "Wire" takes a little over 2K of memory, which would leave 4K at most for "Si5351 Lite" and the user's sketch. For comparison, a fairly simple sketch using the current libraries runs about 12K.

Thanks and 73,
Bob W9RAN

@NT7S
Copy link
Member

NT7S commented Jul 16, 2017

Bob,

I would love to do so, but I have to be realistic about my lack of time availability given the large set of other tasks on my plate. So I will certainly keep this issue alive, but it's going to be back burner for now while I devote time to work on Etherkit products. If/when the opportunity arises to do this, I'll certainly give it a try. Keep in mind that the main reason that the code from this library compiles to the size that it does is because of the 64-bit integer math that is required to give a decent level of precision in setting the frequency. Cutting it down to 32-bit math, for example, will hurt the precision in setting the frequency a fair bit.

Thanks for filing the issue.

@NT7S NT7S self-assigned this Jul 19, 2017
@NT7S NT7S added this to the v3.0.0 milestone Jul 19, 2017
@ghost
Copy link

ghost commented Aug 29, 2017

Remove the 3 instances of new/delete and this will make your light version for free.

You're on a 8-bit micro, not a Xeon server...

@ghost
Copy link

ghost commented Sep 1, 2017

Also each buffer uses 8 bytes, not 20. declaring a:

uint8_t params[8]

should be enough.

Your bad code was copied all over the web, see here: https://github.com/NT7S/Si5351/blob/master/si5351.c

Also you can use fixed values instead of i++ code will be even better.

@NT7S
Copy link
Member

NT7S commented Sep 1, 2017

So you've made those changes and confirmed that it substantially improves the code? I'd like to see the data on how much things are improved and if they are I'll happily accept a PR.

Remove the 3 instances of new/delete and this will make your light version for free.

The issue here is the large amount of 64-bit math done to in order to get an accurate output frequency, which translates into more code in flash storage, not the RAM that is temporarily allocated within that method.

Your bad code was copied all over the web

A link to an old version of the code on my personal account is "all over the web"?

Drop the hyperbole.

@ghost
Copy link

ghost commented Sep 1, 2017

okay, now my feet hurts.

hyperbole apart, dynamic allocation probably triggers the linking of a set of functions to implement malloc and free, which probably also has a significant flash memory cost.

@ChocolateFrogsNuts
Copy link

Hi All,
I was running out of space on a project (code for the BIT40 transceiver, running on an Arduino Nano aka ATmega328P) and after doing a lot of work in my code, I started looking at libraries to optimize.
I got nearly a whole KB (yes, just under 1000 bytes) out of this library. That's a lot given that this and Wire only occupy about 6.5KB total.

Removing the one new in si5351 only makes about 8 bytes difference... sometimes, depending on circumstances, and I looked in Wire - it doesn't use new, and I didn't see it anywhere else.
It made no difference to the dynamic memory usage (because it's allocated on the stack/heap which share the same space anyway).

The things that REALLY make a difference are:

  • Some changes to the header to make macros for the address of some clock registers.

  • using those macros to convert some repeated code into loops
    That gave me about 350+ bytes. There may be more code that can benefit fro this.

  • Replacing both of the select_r_div methods with a single piece of code written to use a loop.
    It makes the same decisions in the same order as the old code, and returns the same results, but reduces the size by 650+ bytes! Run times should be almost identical.

Modified files attached, plus a diff so you can see what I actually changed.
Mike
vk6mn
Si5351-update.zip

@ChocolateFrogsNuts
Copy link

One more thing - the linker won't include code that's never called, so simply not calling things you don't need will result in the library being only as big as it needs to be - writing a cut down version may not yield the gains you think it will, unless you write out complexities such as not being able to use more clocks than you need (eg removing support for CLK6/7 which requires extra code).
Possibly this could be done with #if - I may look at it when I run low on progmem again as I only need CLK0-3.

@ghost
Copy link

ghost commented Oct 25, 2017

My initial remarks were about RAM savings. some buffers allocated dynamically are clearly unneededly big, and could be allocated once and for all as a global/class variable.
Glad you could find mode flash savings by optimizing the code.

@ChocolateFrogsNuts
Copy link

I ended up chasing the obvious savings available from optional support for CLK6/7.. and while I was at it found other uses of new... no idea why ctrl+F didn't find them for me before... anyway....
I set the params buffers to 10 bytes all round, so it's a fair test....
I tried both versions - using new and just putting the data on the stack. By getting rid of all uses of new, I gained 692 bytes of flash and 10 bytes RAM (probably from malloc() management structures). I think that counts as significant.
Making the buffers one global is likely to cost more RAM - keeping them local only allocates 10 bytes more stack when needed - making them global reduces the available room for stack by 10 bytes permanently.

I also added the ability to disable support for clocks 6 and 7 with a #define - for a gain of 2200+ bytes!
The only problem with that is there is no way in the IDE to set the define... other than editing the .h file in the library itself. There could be slightly more places to trim, but I've left in full initialization for all clocks and got most of the easy reductions.
At least it's a full solution for Makefile and IDEs that support defines, and a workable solution for Arduino IDE users that don't need to use both versions.

I've yet to actually test the code (and I can't fully test the extra clocks) but I'll post it when I've had a chance to load it into a chip and make sure it's not a total disaster :)

@ghost
Copy link

ghost commented Oct 25, 2017

Are you sure about that? It seems to me that GNU LD is linking whole files, at least that is the case on my x86 linux pc; and there is no reason to have a different behaviour for avr.

That's the reason why projects that want to include only the required function actually split their code into multiple files, each containing a single function.

Stack buffers still use RAM, but temporarily, and this limits the possible levels of recursion, which can grow unexpectedly fast in modular code. Sure, a static RAM buffer uses the same mem at all times, but you have better chances to avoid stack overflow (in general).

IIRC, Last time I read the code 8 bytes were sufficient for storing registers, why did you chose 10 bytes? - note: thats only 8 bytes, so stack allocation is completely fine in that case.

@NT7S
Copy link
Member

NT7S commented Oct 25, 2017

@ChocolateFrogsNuts Thanks, I appreciate your analysis. I'll dig into it soon, as I'm currently knee-deep in a few other things but I'd like to see your optimizations.

@ChocolateFrogsNuts
Copy link

seems to work ok, so here's the updated code.
Si5351-update2.zip

@ChocolateFrogsNuts
Copy link

@f4grx it definitely only links code that is referenced on my x86 linux pc with IDE 1.8.4 here. Adding in a single line of code that references something new in an already linked library can add large amounts of size. It definitely also works that way in the sketch - #ifdef-ing out a function that isn't used anywhere makes no difference to the compiled size.

Limiting the possible levels of recursion is going to happen no matter where the buffer is allocated - the only difference is whether that limiting is happening to other code when the function in question is not running. The only time a static buffer will reduce the chances of a stack overflow are if one of the functions using a buffer calls another that uses a buffer, which doesn't happen here. A better way to manage that risk is to simply declare the buffer within a code block that's closed by the time the call is made (if possible) - that way the local variable is removed from the stack before the call is made.

10 bytes was a round number - it leaves a little margin for error if someone updates the code and miscounts :-) 8 bytes is all that's really needed.

@ChocolateFrogsNuts
Copy link

ChocolateFrogsNuts commented Oct 27, 2017

Ok, I don't understand why this one makes such a big difference - I expected a few bytes, but almost 500???
In set_pll() I did this:

#if 0
        if(target_pll == SI5351_PLLA)
	{
		pll_calc(SI5351_PLLA, pll_freq, &pll_reg, ref_correction[plla_ref_osc], 0);
	}
	else
	{
		pll_calc(SI5351_PLLB, pll_freq, &pll_reg, ref_correction[pllb_ref_osc], 0);
	}
#else
	// this is 480 bytes smaller??? really?? somebody please verify this!
	pll_calc(target_pll, pll_freq, &pll_reg, 
		 ref_correction[(target_pll == SI5351_PLLA) ? plla_ref_osc : pllb_ref_osc],
		 0);
#endif

Making similar changes elsewhere only made a few bytes difference as expected. Maybe it's the result of an optimiser bug.... or maybe due to the position in the parameter list the optimiser can't see how to do it?

I also got a few bytes by merging the common code from set_pll() and set_vcxo() into a write_pll_parameters() function. That one's more about improving code management - the code does exactly the same thing. I did test replacing i++ with constants in that code.. for a grand saving of 0 bytes... I prefer the i++ version. It also ensures the params buffer only exists when it's actually needed, at the expense of 1 call frame on the stack. Given we've cut the buffer size by 10 I'm not too concerned about the extra 7 bytes for the call.

I also noticed that set_pll() updates plla_freq or pllb_freq as appropriate, but set_vcxo does not after performing the pll parameter write. Is this a bug? If so I'll apply the fix.

@ChocolateFrogsNuts
Copy link

ChocolateFrogsNuts commented Oct 28, 2017

I discovered a good answer to selecting the "Lite" version for compiling with or without CLK67 support from within the IDE!
It's a bit tricky to get right, but it's dead simple to use.
With the update2.zip version above installed....
Define a custom board in the IDE by creating a boards.txt in sketch_dir/hardware/board_name/avr
Set it up with a line:

board_name.name=My Custom Board
board_name.build.extra_flags=-DSI5351_WITH_CLK67=0`

and a bunch of other stuff you will have to look up but it's basically a copy of the appropriate section of the boards.txt for your arduino. I made one for the nano that works boards.txt

That's it - in the IDE you will have a new board to choose from which will use the 'Lite' version of SI5351. So if you select "Arduino Nano" you get the full version of SI5351, and if you select "My Custom Board" it automatically rebuilds with the lite version.
Unfortunately there's no way to remember the setting per-project :( but at least it works, and the .h file is set up so that the compiler will produce an error if your sketch use CLK6 or CLK7 when compiling the lite version.
EDIT: updated boards.txt because I didn't test uploading the sketch with the original.

@WestfW
Copy link

WestfW commented Jul 29, 2020

Would it still be useful to have a version that uses 32bit integers for frequency calculations (instead of int64)? That would cut down the size substantially on 8bit cpus, and would still allow operation up to about 40MHz, I think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants