Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added integer based sin()/cos() functions, changed all trig functions to wled_math #4181

Merged
merged 8 commits into from
Nov 27, 2024

Conversation

DedeHai
Copy link
Collaborator

@DedeHai DedeHai commented Oct 5, 2024

I wrote a new integer math based sine/cosine approximation function implementing Bhaskara I's approximation formula. It is blazing fast and quite accurate (almost 3 significant digits when converting to float).

  • new sin16_t() / cos16_t() are more accurate than fastled versions
  • sin_approx() / cos_approx() are float wrappers for sin16_t() / cos16_t() and are accurate enough to replace sinf()/cosf() for all current WLED purposes
  • atan2() is used only in octopus to calculate center offset, new approximated version saves flash
  • tan(), atan(), asin(), acos(), floor(), fmod() are used only for sunrise/sunset calculation, using existing wled_math version saves flash
  • beatsinx() replacements are to make use of new sin16_t()/sin8_t() functions to reduce flash size
  • Extensively tested surnise/sunset calculation: deviation is 1min. max
  • Tested some of the relevant FX and found no visual difference: Julia, 2D Drift, Drift Rose, Ghost rider, Rotozoomer, Palette, Arc 1D expansion
  • total flash savings: ~7.4k

…ns to wled_math

- `sin16_t() / cos16_t()` are faster and more accurate than fastled versions
- `sin_approx() / cos_approx()` are float wrappers for `sin16_t() / cos16_t()` and are accurate enough to replace `sinf()/cosf()`
- `atan2()` is used only in octopus to calculate center offset, new approximated version saves flash
- `tan(), atan(), asin(), acos(), floor(), fmod()` are used only for sunrise/sunset calculation, using wled_math version saves flash
- `beatsinx()` replacements are to make use of new `sin16_t()/sin8_t()` functions to reduce flash size
- Extensively tested surnise/sunset calculation: deviation is 1min. max
- Tested some of the relevant FX and found no visual difference: Julia, 2D Drift, Drift Rose, Ghost rider, Rotozoomer, Palette, Arc 1D expansion
- total flash savings: 7.4k
@DedeHai DedeHai added waiting for feedback addition information needed to better understand the issue Awaiting testing labels Oct 5, 2024
@DedeHai DedeHai changed the title Added integer based sin()/cos() functions, changed all trig functins to wled_math Added integer based sin()/cos() functions, changed all trig functions to wled_math Oct 5, 2024
@softhack007
Copy link
Collaborator

softhack007 commented Oct 5, 2024

  • total flash savings: ~7.4k

Is this with or without audioreactive usermod? Because ArduinoFFT pulls in some trig functions (sin,cos,atan, sqrt) -so possibly this finally increases flash size because two variants of the function need to be linked into the final binary.

Edit: I have a fork of ArduinoFFT where "double" trigonometry functions are replaced with float functions (sinf, cosf, atanf). I thinks it's not possible to replace them with approximations ... FFT could become very noisy if we do, because the fourier transform of the window function must be as exact as possible. Any inaccuracy will add "ghost frequencies".

@softhack007
Copy link
Collaborator

softhack007 commented Oct 5, 2024

  • sin_approx() / cos_approx() are float wrappers for sin16_t() / cos16_t() and are accurate enough to replace sinf()/cosf() for all current WLED purposes

One question: is the approximated function also faster than sinf() and cosf()? In earlier tests I was suprised to find that the libm functions are really fast (faster than fastled sin16/cos16), at least on esp32 and ESP32-S3 with FPU.

@softhack007
Copy link
Collaborator

I'll give it a try with pinwheel mapping in the next days.

Pinwheel is very sensitive to inaccuracies in sinf() and cosf(), and possibly leaves holes as a consequence.

// the math.h functions use several kB of flash and are to be avoided if possible
// sin16_t / cos16_t are faster and much more accurate than the fastled variants
// sin_approx and cos_approx are float wrappers for sin16_t/cos16_t and have an accuracy of +/-0.0015 compared to sinf()
// sin8_t / cos8_t are fastled replacements and use sin16_t / cos16_t. Slightly slower than fastled version but very accurate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 very good that we have a replacement for sin8 and cos8.
These FastLed functions were so inaccurate that you could not even draw a circle with them.

Copy link
Collaborator Author

@DedeHai DedeHai Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the comparison between the new version and fastled (compared to sinf() )
sin8() absolute error, output range is ±127:
image

sin16() absolute error, output range is ±32767:
image

the new sin8_t() is a bit slower than the fastled variant due to overhad in converstion to 16bit range. The approximation can be re-written to any scale so it could be done for 8bit inputs/outputs natively but since sin16_t() is now faster than sin8() it would make more sense to use the sin16_t() instead wherever speed is of utmost importance.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I like the graphs 👍 very useful to understand accuracy and limitations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but since sin16_t() is now faster than sin8() it would make more sense to use the sin16_t() instead wherever speed is of utmost importance.

I agree - I don't think we need a specialized sin8, as most "sine wave" effects are using beatsin88() and not sin8.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea about sin8_t and cos8_t:

You could move these two functions into fx.cpp and make them "static inline". This will allow the compiler to optimize out function calls (->faster). I think there is not other place where these 8bit version would be needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is not other place where these 8bit version would be needed.

It is used in beatsin8, which is also used in sound simulation in util.cpp.
While making that an inline function could boost speed, I don't think it those extra few instructions will do any harm. If we really want it faster, it should have its own dedicated function (which is doable, it uses roughly 100 bytes)

Copy link
Collaborator

@softhack007 softhack007 Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(oops wrong thread) see #4181 (comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that works, maybe a good tradoff between accuracy and speed.
image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 👍

Copy link
Collaborator

@softhack007 softhack007 Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that works, maybe a good tradoff between accuracy and speed.

I've just compared your new trig functions against the old ones, using the "Lissajous" effect on 128x64. It's really a major difference. The "fastled" lissajous curve looks very edgy and somehow wrong, more like a twisted hexagon. With your new functions, I see a perfect wave.

Good job👍 thanks.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Oct 6, 2024

Is this with or without audioreactive usermod?

Goot point. I checked and it is the same: ArduinoFFT uses cos(x) which is the double float implementation and sqrt() was not touched in this PR.
btw: did you ever check if the Espressif DSP library functions can be used for FFT and if there is any speed/memory/precision advantage? According to the doc it has speed optimized FFT for ESP32 and ESP32 S3.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Oct 6, 2024

One question: is the approximated function also faster than sinf() and cosf()? In earlier tests I was suprised to find that the libm functions are really fast (faster than fastled sin16/cos16), at least on esp32 and ESP32-S3 with FPU.

Another good point. I did not actually measure that. But now I checked with rough test. Here are my findings:
numbers are in function calls per second executed:

MCU sinf() sin_approx()
ESP32-C3 9k 75k
ESP32-S3 130k 184k

I am not 100% sure about the method I used so no warranty about correctness of these numbers. Using the same method for sin16() functions gave like a 50x speed increase, which seems impossible without the compiler playing some tricks.

So on an S3 the difference is (as expected) not as large thanks to the FPU but still an improvement. On S2, C3 (and ESP8266) the speed-gain is huge.

wled00/wled_math.cpp Outdated Show resolved Hide resolved
@DedeHai
Copy link
Collaborator Author

DedeHai commented Oct 6, 2024

I'll give it a try with pinwheel mapping in the next days.

Pinwheel is very sensitive to inaccuracies in sinf() and cosf(), and possibly leaves holes as a consequence.

I am looking at pinwheel RN. The issue is not the precision of sin() cos() but the number of steps:
I just tried with 128x16 and even with floats there are missing pixels. If I increase steps for XL, they are gone, even when using the integer version of sin() cos() (more testing required though)

edit:
the missing pixels are also related to the Fixed_Scale value. any reason it is 512? it could be as large as 31bits / max where max is maxwidth or maxheight (which is a uint16). So it could be 2^15 or if we take into account that any matrix dimension beyond 4096 is nonsensical for WLED, it could be even larger but not future proof.

@softhack007
Copy link
Collaborator

softhack007 commented Oct 6, 2024

@DedeHai pinwheel has a general problem with holes - we tested it until 56x56, so below 56x56 there should be no holes (using sinf()/cosf(), not the _t approximations from AC).

@softhack007
Copy link
Collaborator

softhack007 commented Oct 6, 2024

any reason it is 512? it could be [larger]

512 was just a "shot in the dark". You're right we could increase the value to 2^15 or higher.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Oct 6, 2024

@DedeHai pinwheel has a generic problem with holes - we tested it until 56x56, so below 56x56 there should be no holes (using sinf()/cosf(), not the _t approximations from AC).

I saw that and I am trying to work out the math why that is.
Here is what I have so far:

(maxdim/2)*sin(2pi/steps) <= 1 -> requirement for no holes, i.e. maximum angle such that at the furthest distance its less than 1 pixel
therefore:
steps=(2*pi)/asin(1/(maxdim/2))
since asin(x) = x for x << 1 this resolves into:
steps >= pi * maxdim
now what comes into play is the integer math applied and the rounding errors adding up as it is done incrementally and not by multiplication. So steps needs to be increased to overcome that. That is what I am trying to figure out.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Oct 6, 2024

@softhack007 looking at the pinwheel some more. I have some fixes but also: why is the center calculated in such a complicated way (with shifting odd rays)? I removed all that code and to me it still looks the same. Can you explain the intentions?

@softhack007
Copy link
Collaborator

softhack007 commented Oct 6, 2024

@softhack007 looking at the pinwheel some more. I have some fixes but also: why is the center calculated in such a complicated way (with shifting odd rays)? I removed all that code and to me it still looks the same. Can you explain the intentions?

@DedeHai the "odd rays" part was an invention from @Brandon502, to reduce the number of "full rays" by only drawing outer parts of odd-numbered rays. That was around 30% faster in our tests.

Its correct that omitting this optimization should not influence the look. But the odd-rays optimization should be kept as-is for performance reasons.

@softhack007
Copy link
Collaborator

softhack007 commented Oct 6, 2024

@softhack007 looking at the pinwheel some more. I have some fixes

@DedeHai would be nice if we find an improvement, because each time I get a larger setup, there are new holes in pinwheel. Looks like a kind of Moire effect.

I've found the "magic parameters" up to 128x128: see https://github.com/MoonModules/WLED/blob/a84216947bfb28488e58837d5a6732836be93d6c/wled00/FX_fcn.cpp#L833-L854

I'm really curious to know if there is a better way to avoid these inaccuracies, while preserving speed.

@Brandon502
Copy link
Contributor

@softhack007 looking at the pinwheel some more. I have some fixes

@DedeHai would be nice if we find an improvement, because each time I get a larger setup, there are new holes in pinwheel. Looks like a kind of Moire effect.

I've found the "magic parameters" up to 128x128: see https://github.com/MoonModules/WLED/blob/a84216947bfb28488e58837d5a6732836be93d6c/wled00/FX_fcn.cpp#L833-L854

I'm really curious to know if there is a better way to avoid these inaccuracies, while preserving speed.

How much memory can a expand1D type use? If it could work similar to jMap you could create a lookup map on the first call then never have to do the math again. Would never produce holes and likely be much faster, but the memory footprint might be too big for 128x128+

@DedeHai
Copy link
Collaborator Author

DedeHai commented Oct 7, 2024

@Brandon502 I have sort of working code improvements: it may take a slight hit on loop speed but it work for larger sizes and reduces steps for smaller ones by dynamically calculating the required steps, resulting in an overall speed improvement. Also "pixel holes" are controlled better by reducing the step size, running more 'blind' loops (but setting less pixels):
https://github.com/DedeHai/WLED/tree/0_15_pinwheel_improvement
I have a basic concept on how to improve speed by also dynamically calculating required steps vs. current "radius" being drawn. While it is not perfect and a compromise between code-complexity and calculation speed (it needs to run on ESP32 C3 and ESP8266 as well). For the ESP32 and ESP32 S3 more calculations could be done in float to squeeze out more speed (by more accurately calculating if a pixel needs to be drawn or not). I will leave those improvements to the advanced devs at MM ;)
I will update the branch later today and make a PR, we can continue the discussion there as this PR has a different purpose.

@softhack007 softhack007 self-requested a review October 7, 2024 21:02
Copy link
Collaborator

@softhack007 softhack007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and works well for me.

@softhack007 softhack007 added optimization re-working an existing feature to be faster, or use less memory and removed waiting for feedback addition information needed to better understand the issue labels Oct 10, 2024
approximation was incorrect, now doing it right.
also removed hypotf() from octopus, saving a little flash.
@@ -88,8 +88,8 @@ uint8_t cos8_t(uint8_t theta) {
float sin_approx(float theta)
{
theta = modd(theta, TWO_PI); // modulo: bring to -2pi to 2pi range
if(theta < 0) theta += M_TWOPI; // 0-2pi range
uint16_t scaled_theta = (uint16_t)(theta * (0xFFFF / M_TWOPI));
if(theta < 0) theta += TWO_PI; // 0-2pi range
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Why not change M_PI below?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question. I think this was auto-complete suggestion from co-pilot.
I will make it consistant to M_PI variants.
Also: the conversion of the angle using float is really not needed, I added that to fix a bug. but the actual fix is: cast a negative float to a int and NEVER to a unsigned type, as that behaviour is undefined and depends on the CPU/Compiler, hence the issues on the C3 which does (uint)(float) = 0

-replaced all PI references with M_PI version
-there is no need to do the angle-modulo in float, casting it to an integer does the same BUT it has to be cast to an `int` first, see comment.
@blazoncek
Copy link
Collaborator

IMO, no but there is no reason to take my answer into account.

@softhack007
Copy link
Collaborator

softhack007 commented Nov 2, 2024

I am not sure about this. These changes were not in any beta release

@DedeHai thanks I think your judgement as the original author is very important. I agree with with, better to get some confidence (beta testing) before releasing this (and #4138) to all users. 👍

So we'll merge all the PRs tagged as "optimization" after 0.15.0 is released, right?

@DedeHai
Copy link
Collaborator Author

DedeHai commented Nov 2, 2024

I think the decision is up to @Aircoookie whether to push 0.15 release ASAP or add in these improvements.

If it were up to me I would add all pending improvements I and @blazoncek made as they improve FX speed and FPS and FX quality quite a bit but only if these are extensively tested as there are chances of introducing new bugs. By all I mean this PR as well as #4138 #4145 #4245

@TripleWhy
Copy link
Contributor

FYI I just stumbled upon this PR, and decided to run some quick execution time benchmarks on the ESP32 (lower is better):
image
benchmark.log

Btw. I believe the _t postfix used to indicate that the implementation was a taylor series, so adding new functions with that postfix but without taylor seems strange

@blazoncek
Copy link
Collaborator

@TripleWhy there's also code size aspect to sin_t.

This was the primary reason @Aircoookie introduced these functions.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Nov 25, 2024

@TripleWhy interesting results, do you have a link to the code you used? Would be interesting to compare other MCUs.
the _t I just chose to match the old ones, if that is confusing, I do not mind chanigng it, maybe to _b for Bhaskara?

One thing that is really strange in your graph is that sin_approx and sin16_t are so far apart in speed: approx is just one multiplication on top plus the overhead of the function call. Maybe it should be inlined? Even stranger is the same for cos_approx which is just a call to sin_approx with one addition, so there is zero reason it being not almost identical to sin_approx but double. how sure are you about the results (and the naming)?

@softhack007
Copy link
Collaborator

softhack007 commented Nov 25, 2024

Thanks @TripleWhy for the benchmark 🥇
Actually from my POV the main reason to use the new functions is accuracy when compared to fastLed sin8 or sin16 - especially fastled sin8 is so bad that you can't even draw a circle with it. So the new functions proposed by @DedeHai seem to provide a good balance between accuracy and speed.

Btw. I believe the _t postfix used to indicate that the implementation was a taylor series, so adding new functions with that postfix but without taylor seems strange

Finally someone has an explanation for the "_t" - it's not written in the code, and my best guess was it stands for "trick" 😆

As nobody except the three of us knows what "_t" stands for, it does not matter much which letter we pick. But for minimal code change, imho it would be good to keep the "_t".

@TripleWhy
Copy link
Contributor

there's also code size aspect to sin_t.

I didn't mean to give any recommendation here, just information.

interesting results, do you have a link to the code you used?

Just pushed it here: https://github.com/TripleWhy/WLED/tree/benchTrig
I believe there are some other changes, but the measurements should be representative.

One thing that is really strange in your graph is that sin_approx and sin16_t are so far apart in speed: approx is just one multiplication on top plus the overhead of the function call.

And some float<->int conversions.

Even stranger is the same for cos_approx which is just a call to sin_approx with one addition

I guess float additions and calls are expensive?
If you want to do a relative comparison, you also need to take into account that the benchmark code itself also consumes a bit of time. So, here is an updated graph with float additions, and all times adjusted for benchmark code overhead:

image

Well... According to this diagram, I have no idea what's going on. I guess more fiddling is required here.

how sure are you about the results (and the naming)?

Very. Look at the raw log of sin_t vs sin_approx and cos_t vs cos_approx. They are aliases, I just left them in for sanity checking. The numbers are identical. There is also (almost) no run-to run variation, these are fixed frequency single threaded risc chips without task scheduling after all. You are free to check the code and run it for yourself.

main reason to use the new functions is accuracy when compared to fastLed sin8 or sin16 - especially fastled sin8 is so bad that you can't even draw a circle with it.

I don't disagree. As I said, I was just publishing information that I generated locally.

@TripleWhy
Copy link
Contributor

there's also code size aspect to sin_t.

Oh right, I also did a size comparison:

with both *_t and *_approx:
RAM:   [==        ]  16.0% (used 52592 bytes from 327680 bytes)
Flash: [========= ]  86.4% (used 1358941 bytes from 1572864 bytes)


with std::*
RAM:   [==        ]  16.0% (used 52592 bytes from 327680 bytes)
Flash: [========= ]  86.8% (used 1364649 bytes from 1572864 bytes)


with *f:
RAM:   [==        ]  16.0% (used 52592 bytes from 327680 bytes)
Flash: [========= ]  86.7% (used 1363177 bytes from 1572864 bytes)

@TripleWhy
Copy link
Contributor

sin_t vs sin_approx and cos_t vs cos_approx. They are aliases

Uh... looking at the code a bit more, it appears that this aliasing isn't intentional?

@DedeHai
Copy link
Collaborator Author

DedeHai commented Nov 25, 2024

Uh... looking at the code a bit more, it appears that this aliasing isn't intentional?

it is intentional, sin_t / cos_t are the original functions and are replaced with the new, faster implementation.

I ran my own benchmark code and it comes out to this:
image
this is the time in ms for 1 million executions (+ a little overhead), in my test, sin_approx and cos_approx are closer to what I would expect. still slower than what I hoped, given that the underlaying sin16_t is so much faster and ESP32 has hardware float.

@TripleWhy
Copy link
Contributor

it is intentional, sin_t / cos_t are the original functions and are replaced with the new, faster implementation.

In that case the original implementations should be removed.

this is the time in ms for 1 million executions (+ a little overhead), in my test, sin_approx and cos_approx are closer to what I would expect.

Interesting. What are the differences to my runs?
Were those executed on a regular ESP32 (not a risc-v version) as well? What numbers do you call the functions with? I took samples from all normal floats greater than 0 and lte TWO_PI. How sure are you that the compiler didn't optimize the repetition? Is your benchmark code available somewhere?

@DedeHai
Copy link
Collaborator Author

DedeHai commented Nov 25, 2024

Here is the same code running on a C3 and where the new code really shines:
image

@DedeHai
Copy link
Collaborator Author

DedeHai commented Nov 25, 2024

In that case the original implementations should be removed.

I wanted to keep the original functions as they do take a completely different approach. I could comment them out for readability though.

Interesting. What are the differences to my runs? Were those executed on a regular ESP32 (not a risc-v version) as well? What numbers do you call the functions with? I took samples from all normal floats greater than 0 and lte TWO_PI. How sure are you that the compiler didn't optimize the repetition? Is your benchmark code available somewhere?

I ran it on a regular ESP32, this is the code I used:
trig_benchmark.zip

edit: I am not 100% sure the compiler does not play tricks, using volatile should force it to not remove stuff though.

@blazoncek
Copy link
Collaborator

I want to clarify that anything I introduced into WLED was libc based. I think those functions are well tested and optimised for speed and accuracy. And I never bothered doing size or speed comparisons of alternatives.

But that does not matter. There may have been time in WLED development that needed those functions and that time may have passed. It may be time to move forward, there was a lot of development and progress in every way since that introduction. There will always be a reason to optimise either for speed or accuracy or code size. It is difficult to have all at once.

As for this PR: Since @DedeHai started to replace FastLED core functions (here and elsewhere) it might be beneficial to remove FastLED entirely from dependencies if those replacements are not copy/paste and perform better.

@TripleWhy
Copy link
Contributor

The most obvious difference between your benchmark and mine is that you only test for a single input: 0.5.

Here is my esp32 result again (this time with the same dimensions as the following images for better comparison):
esp32

And here is one where I replaced the call argument with 0.5:
esp32 0 5 only

For completeness, here is my run on a c3:
c3

@DedeHai
Copy link
Collaborator Author

DedeHai commented Nov 26, 2024

For completeness, here is my run on a c3:

this is really weird, I do get completely different results, especially on sinf/cosf: they get much worse if I use an incremented input.
image

I also found room to optimize the approx versions, turns out M_PI and its variations are treated as a double float by default.

- speed improvement: by default M_TWOPI is treated as a double float
- directly calling sin16_t in cos_approx() saves a lot of overhead
@netmindz
Copy link
Collaborator

Are you happy for this to go into the 0.15.0.rc1 @softhack007 or wait for 0.1.5.1 ?

@softhack007
Copy link
Collaborator

Are you happy for this to go into the 0.15.0.rc1 @softhack007 or wait for 0.1.5.1 ?

I think it could even go into 0.15.0.rc1 - just that @DedeHai previously said he would like to see some more confidence testing on 8266. If @DedeHai is happy with the benchmarks and tests we have, then OK for me to put this into 0.15.0.rc1.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Nov 27, 2024

I think it can go into 0.15.0.rc1, @TripleWhy an me were running some more tests and it is looking good, I will post results in a bit, once I confirmed it works on S2, S3 and ESP8266 without any issues.

@netmindz
Copy link
Collaborator

Ok, if you can fix the merge conflict @DedeHai fixed, we can merge

@DedeHai
Copy link
Collaborator Author

DedeHai commented Nov 27, 2024

my tests are done, its all looking good but there may be a small pitfall @TripleWhy found: calling sin_approx() and cos_approx with large floats leads to increasing errors. mode_2DJulia() for example calls the sin() function with a parameter of strip.now/305.f which can grow large. There may be other places in the code where this happens. These function calls should be replaced with the integer version as that one truncates to a uint16_t and always works.
I will check if I find other places in the code where this could be problematic.
Apart from that sure, I can fix the merge errors and merge.

As for my final results: here are some graphs. Input to float versions was approximately 0-2pi, integer versions just use a increment of 1 for each loop.
image
image
image
image

…th incrementing number

- sin/cos calls with incrementing numbers can lead to bad outcomes, the functions (_approx or original sinf/cosf) return bad values for very large float inputs
@DedeHai DedeHai merged commit 92e59af into Aircoookie:0_15 Nov 27, 2024
19 checks passed
softhack007 pushed a commit to MoonModules/WLED that referenced this pull request Nov 28, 2024
Added integer based `sin()/cos()` functions, changed all trig functions to wled_math
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good job optimization re-working an existing feature to be faster, or use less memory
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants