Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization: color_blend() variable range now determined by overloading #4245

Open
wants to merge 5 commits into
base: 0_15
Choose a base branch
from

Conversation

DedeHai
Copy link
Collaborator

@DedeHai DedeHai commented Nov 2, 2024

Removing the bool saves on code size and makes the function a tiny bit faster. Also this is a cleaner solution IMHO.

Tested the affected FX, did not notice any change.

…ding

Removing the bool saves on code size and makes the function a tiny bit faster. Also this is a cleaner solution IMHO.
Tested the affected FX, did not notice any change.
@DedeHai DedeHai added the optimization re-working an existing feature to be faster, or use less memory label Nov 2, 2024
changes got lost along the way to PR...
@DedeHai DedeHai added this to the 0.15.1 candidate milestone Nov 2, 2024
@DedeHai DedeHai changed the title Optimization: color_blend() variable range now determined by ovleroding Optimization: color_blend() variable range now determined by overloading Nov 2, 2024
wled00/fcn_declare.h Outdated Show resolved Hide resolved
- previous version did not return C2 when blend=255
- inlining 0 and max check may be faster in some cases and uses just 40 extra bytes.
- 8bit and 16bit versions both call the 16bit base function
@softhack007
Copy link
Collaborator

softhack007 commented Nov 15, 2024

@DedeHai I've played a bit with the color_blend function, and I found some improvements

converting 8bit blend to 16bit:

(uint16_t)b << 8 : with the shift only, the max "blend16" we can achieve is 0xFF00 so if (blend == blendmax) return color2 is never taken.

To use the full range, we could use (uint16_t)b << 8) | b.

improved blend accuracy

The blend logic seems to be based on some very old FastLED code. FastLED has changed their blend8() function about 3 years ago, to avoid discontinuities and rounding errors. The new math is explained here and it seems to be numerically more stable:

https://github.com/FastLED/FastLED/blob/9176e0ff9a67365c7bd4393da52be76b6ce4a5b3/src/lib8tion/math8.h#L565

For the 8bit case, we can use the same logic - i've tested it and indeed its more accurate than the code in 0_15.

   uint8_t r3 = (((r1 << 8) | r2) + (r2 * blend) - (r1*blend) ) >> 8;

For 16bit blend, it might be trickier, because | r2 (solves rounding problems) should be either | r2<<8 or | r2<<8 | r2

@DedeHai
Copy link
Collaborator Author

DedeHai commented Nov 15, 2024

with the shift only, the max "blend16" we can achieve is 0xFF00

this was fixed in the latest commit by adding a check in the inline function.

I need to take a closer look at the optimization, maybe this new version can be done the way I did with color_add() in #4138 (calculating two colors in one go)

- efficient color blend calculation in fews operations possible
- omitting min / max checks makes it faster on average
- using 8bit for "blend" variable does not significantly influence the resulting color, just transition points are slightly shifted but yield very good results (and better than the original 16bit version using the old fastled math with improper rounding)
- updated drawCircle and drawLine to use 8bit directly instead of 16bit with a shift
@DedeHai
Copy link
Collaborator Author

DedeHai commented Nov 21, 2024

I made some further optimizations based on the new fastLED version you proposed. In my tests, your suggested variant was a tiny bit faster when testing it on the ripple FX, my updated version uses less code so I am guessing the compile did some inlining to speed up drawCircle() in some way.
The new variant with ditching the 16bit blend and going to 8bit gives almost identical results, just the rounding is slightly different. When plotting the output of the functions, the 8bit variant is actually not any worse, at least in the cases I have taken a closer look at. When checking the ripple FX, which is the only one using the blend function apart from transitions, I could not see any difference but the new code is slightly faster.
@softhack007 Can you please check if this works as expected in your test cases?

@netmindz
Copy link
Collaborator

netmindz commented Dec 5, 2024

Are we good to merge this once the conflict is resolved?

@DedeHai
Copy link
Collaborator Author

DedeHai commented Dec 5, 2024

there are some pending changes.
@blazoncek can you merge your changes into this PR rather than the speed_optimization PR?

edit: sorry, I think I confused this with #4256
edit2: I think this can be merged, I will resolve conflicts and do so.

@blazoncek
Copy link
Collaborator

I'm not sure I need to do anything regarding this PR.
However, IMO the side effect of this PR is that it will always blend using 8 bit logic which may introduce posterization. If that is of any concern is up to the users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization re-working an existing feature to be faster, or use less memory
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants