replaced repeated progress() calculation calls with a variable #4256

DedeHai · 2024-11-07T18:11:15Z

progress() is called in setPixelColor() re-calculating the transition progress for each pixel.
Replaced that call with an inline function to get the new segment variable. The progress is updated in service() when handleTransition() is called. The new variable is in a spot where padding is added, so this should not use more RAM.

Result: over 10% increase in FPS on 16x16 matrix during transitions

progress() is called in setPixelColor(), calculating the transition progress for each pixel. Replaced that call with an inline function to get the new segment variable. The progress is updated in service() when handleTransition() is called. The new variable is in a spot where padding is added, so this should not use more RAM. Result: over 10% increase in FPS on 16x16 matrix

softhack007 · 2024-11-09T17:32:05Z

wled00/FX_fcn.cpp

@@ -317,12 +317,12 @@ void Segment::stopTransition() {
 }

 // transition progression between 0-65535
-uint16_t IRAM_ATTR Segment::progress() const {
+void IRAM_ATTR Segment::updateTransitionProgress() {


I think you could use IRAM_ATTR_YN here - it means that esp32 puts the function into IRAM, while 8266 doesn't. We'll save some IRAM space especially on the "_compat" builds.

As the function is only called once per frame in WS2812FX::service() - via seg.handleTransition() - it might even be better to remove the IRAM_ATTR as this call is not performance critical any more.

I thought about removing the attribute but left it as is since I have no way to check the difference. I once as a test removed all IRAM_ATTR and on my setup there was zero performance change.
I think removing it here is safe, as you say, this is only called once per frame.

esp8266 will thank you ;-)

Maybe WLED_SAVE_IRAM should also be defined on ESP32 C3: it is not as performant as the ESP32 any way and putting stuff in IRAM uses a lot more flash for some reason. If I enable WLED_SAVE_IRAM on the C3 that saves 1.6k of flash. On the ESP32 it only saves 68 bytes of flash.
Any suggestions for performance tests that would show if this is a valid option?

Any function marked with IRAM_ATTR will always be kept in fast SRAM and will never be fetched from flash. The basic idea for IRAM_ATTR is to be used in ISR or functions that may access (write to) flash directly.
The benefit of using it elsewhere is to speed up access to such function as it will never go to cache hit/miss logic.

Contrary to what @softhack007 is saying I still think adding IRAM_ATTR to functions that are called very often is beneficial. I am telling this from experience with over 50 installed ESP8266's with various options and usermods installed.

Contrary to what @softhack007 is saying I still think adding IRAM_ATTR to functions that are called very often is beneficial.

It might be beneficial, however we talked about the new progress() that's only called a few hundred times per second (max) now. The function is not time critical any more with this PR, so why use IRAM_ATTR for it?

(side-topic)

I once as a test removed all IRAM_ATTR and on my setup there was zero performance change.

This aligns with my own experiments on -S3 and esp32 with 80mhz flash - no noticeable performance impact, however sometimes IRAM_ATTR increases program size. This can be explained because the compiler cannot inline such a function, even when there would be a benefit for program size.

Maybe it's also depending a lot on the CPU caches. In fact a function that's called really often has a good chance to be cached by the CPU already. Also a board with fast flash (qio 80mhz) is like 4x faster on flash reading, compared to slow flash (dout 40mhz).

Many cheap 8266 still have 40mhz dout, plus smaller caches, so it makes sense that there is still some benefit of IRAM_ATTR on boards with slow flash.

I have no doubt that any ESP32 performs adequately without IRAM_ATTR.
However ESP8266 is another thing and while it may be old and lacking it is still used by many users (including me) who keep attaching plenty of peripherals to it while running WLED. Hence I strongly urge to keep IRAM_ATTR as many times as possible.

can you test if this PR with latest commit has any impact on ESP8266? i.e. removed IRAM_ATTR from updateTransitionProgress()

The other question was to add WLED_SAVE_IRAM to C3 builds as it would save on flash size but may have negative performance impacts in certain situations / setups. If there is a way to test that on the C3 I could check. After all, the C3 is more of an upgraded ESP8266 compared to other ESP32 variants.

wled00/FX_fcn.cpp

`updateTransitionProgress()` is called only once per frame, no need to put it in RAM.

blazoncek · 2024-11-11T16:50:41Z

wled00/FX.h

@@ -363,6 +363,7 @@ typedef struct Segment {
    };
    uint8_t startY;  // start Y coodrinate 2D (top); there should be no more than 255 rows
    uint8_t stopY;   // stop Y coordinate 2D (bottom); there should be no more than 255 rows
+    uint16_t transitionprogress; // current transition progress 0 - 0xFFFF


This could be a static variable as it will be modified in each handleTransition() call at the start of service() loop.

what is the current behaviour if for example the palette of one segment is changed with 10s transition, and after 5s palette of a second segment is changed? Will that stop transition of the first segment? i.e. are transition start/stop per segment or global?

The transition time is bound to strip ATM but each segment may start transition at its own (point in) time. So each segment should transition on its own (that was my goal when implementing current transitions, compared to previous, limited transitions) independent from others.

This could be a static variable as it will be modified in each handleTransition() call at the start of service() loop.

Rather than make it static, I'd say put it into strip.

Static member attributes of a class are always a good source of confusion when trying to read code written by someone else - because a 'static' member is technicially not even part of the object you work with. It survives delete segment, changing it in one object instances also changes the value in all other instances. Coping one segment to another will not create a copy of static members attributes - there will still be only one value.

Rather than make it static, I'd say put it into strip.

That would be a worse choice IMO. It can be considered in the same way as _vLength, etc in speed improvements branch by @DedeHai . It is used as a speedup, pre-calculated value.
If you think this will cause confusion, keep it as an instance member rather than strip member.
But that is just my opinion, no need to take it into account.

progress() determines the state where transition is. It should not be "frame" based. It only depends on time since transition (for that segment) started. BTW it is possible to have multiple segments in transition with differing progress() value.

what is the reason progress() should depend on the order of segment processing or when does this give an advantage aver calculating it once per frame? If for example there are lots of segments with heavy FX calculation (>1ms) later segments would already be further in the transition in the same frame, resulting in different palette or brightness for example. That does not appear to be the correct approach IMHO. It would only be right if startTransition() is called with the same time delay, which I don't think it is. Using the 'per segment' update is more consistent code though, as you said. I think your suggestion of following the Segment::beginDraw() is a good compromise.

what is the reason progress() should depend on the order of segment processing

It should not be dependant on the order of segment processing. Each "segment" (if you want it) has its own "next time" for update in the form of effect function return value. So some effects may run at independent frame rate. Look at Android and a few others (i.e. Game of life).

What matters is the time difference between the start of transition and current time (in regards to the transition duration). This is not frame or segment based. It is true that segment holds both start time and transition duration and these two are unique to each segment. This was the prime reason to have progress() as a function that calculates how far transition has progressed. If you impose a limit that progress is calculated at the start of frame (for optimisation for speed) then it needs to be pre-calculated for each segment separately. Hence the suggestion for beginDraw().

Why do you think that having different progress value for is wrong? Each segment operates independently and displays effect and/or palette independently of other segments. Even if overlaid. It is totally acceptable for two segments to be at different progress level.

And it is quite easy to achieve that.

There are different scenarios, your reasoning seems sound to me. My scenario was this: have 10 segments, all displaying the same FX, change palette on all of them at the same time -> a transition starts on all segments. Palette change may now not be simulatanuous on all segments, the last ones may already have progressed further. Is my thinking here correct for current code? I am not arguing against your suggestion to keep it 'per segment', which I think is the way to go, just trying to understand where a 'once per frame' would be better and where it would be worse.

It is correct and in most cases this will be the case (same start for all segments). But not in all cases (and not all segments may have same effect and/or palette).

So, if you have an exception, treat it as a regular behaviour (not all segments may start transition at the same time).

blazoncek · 2024-11-16T10:15:16Z

I've tested the implementation that uses static transitionProgress variable but otherwise uses exactly the same logic as this PR and everything seems to work as expected.

netmindz · 2024-12-05T08:18:22Z

Did we reach an agreement on the final implementation? @DedeHai

DedeHai · 2024-12-05T08:21:28Z

we did, as just (wrongly) mentioned in the other PR: @blazoncek has already implemented and tested the changes

netmindz · 2024-12-05T08:27:41Z

Are these changes in another pr, already merged or just local on your machine at the moment @blazoncek ?

blazoncek · 2024-12-05T08:42:32Z

All of the changes are in my fork, combined with other updates. Probably not cherry-pickable, unfortunately. Implementation used there uses static member.
IMO A better place for this introduction would be "speed improvements" PR as it is just that.

Speed improvements and progress() improvements are running on my set-ups for over a month now without issues or glitches. But with some of my additions on top.

DedeHai · 2024-12-05T10:48:10Z

I am not sure it is the best idea to put all changes in the same "speed improvements" PR, if you feel confident about that, I am fine with closing this one. Just have to remember the speed improvements PR must not be squash-merged or debugging will be hard.

softhack007 · 2024-12-05T12:52:03Z

I am not sure it is the best idea to put all changes in the same "speed improvements" PR

I agree with you - mashing up everything into a "speed improvement" PR which will not be cherry-pickable (as bugfixes and enhancements are all squashed up) does not seem like a good solution. We need a way to keep the codebase understandable and maintainable, so smaller PRs are better imho,

It might be possible to create a temporary branch of 0_15, and then merge all optimizations into this 0_15_optimized branch. Finally create a new PR where everything is consistent, but commit history is preserved. This may require a few "git merge" steps from the command line, though.

Edit: If I get a list of PRs to combine, I can give it a try next week.

DedeHai · 2024-12-06T06:33:53Z

@softhack007 I have most of my PRs running here already, merging them all is a bit of a pain as there are many conflicts to resolve. I merged them in this branch: https://github.com/DedeHai/WLED/tree/0_15_PS_potpourri
The open PRs in question besides this one are:
#4245 #4225 #4138
also #4145 (adds new FX features)

@blazoncek are your changes public? I can manually add them to this PR if not cherry-pickable, its just a few lines that changed right?

blazoncek · 2024-12-06T06:43:21Z

Anything I do is public when stable enough. Never hid anything. 😉

DedeHai · 2024-12-06T07:21:23Z

@softhack007 I think merging into a new branch is not necessary, I can do the (non squashed) merge of the PRs in the correct order to have the least conflicts and resolve any that remain. What I probably will do is a test-run of the merges in my repo to figure out what the best order is: in that potpourri branch I had to start over twice...

this is now more aligned with other variables using the same logic

DedeHai · 2024-12-07T09:37:12Z

@blazoncek are the changes in the latest commit ok?

blazoncek · 2024-12-07T12:27:21Z

I think yes, though I also moved updateTransitionProgress() into inline handleTransition() as it is pointless to force a function call since the function is only called once.

DedeHai · 2024-12-07T17:27:07Z

pointless yes but I find it much more readable than jamming it all in that inline.

DedeHai requested a review from softhack007 November 7, 2024 18:11

softhack007 reviewed Nov 9, 2024

View reviewed changes

wled00/FX_fcn.cpp Outdated Show resolved Hide resolved

removed IRAM_ATTR

7f62152

`updateTransitionProgress()` is called only once per frame, no need to put it in RAM.

blazoncek reviewed Nov 11, 2024

View reviewed changes

DedeHai mentioned this pull request Nov 23, 2024

Crashes when using -D WLED_ENABLE_DMX builds with led output #4298

Open

1 task

DedeHai mentioned this pull request Dec 5, 2024

Optimization: color_blend() variable range now determined by overloading #4245

Open

softhack007 added the optimization re-working an existing feature to be faster, or use less memory label Dec 5, 2024

changed transitionprogress to static, private variable

35d4438

this is now more aligned with other variables using the same logic

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replaced repeated progress() calculation calls with a variable #4256

replaced repeated progress() calculation calls with a variable #4256

DedeHai commented Nov 7, 2024 •

edited

Loading

softhack007 Nov 9, 2024

DedeHai Nov 9, 2024

softhack007 Nov 9, 2024

DedeHai Nov 10, 2024 •

edited

Loading

blazoncek Nov 11, 2024

softhack007 Nov 11, 2024

softhack007 Nov 11, 2024 •

edited

Loading

blazoncek Nov 12, 2024

DedeHai Nov 12, 2024

blazoncek Nov 11, 2024

DedeHai Nov 11, 2024

blazoncek Nov 11, 2024

softhack007 Nov 11, 2024 •

edited

Loading

blazoncek Nov 12, 2024

blazoncek Nov 15, 2024

DedeHai Nov 15, 2024

blazoncek Nov 15, 2024 •

edited

Loading

DedeHai Nov 15, 2024

blazoncek Nov 15, 2024

blazoncek commented Nov 16, 2024

netmindz commented Dec 5, 2024

DedeHai commented Dec 5, 2024

netmindz commented Dec 5, 2024

blazoncek commented Dec 5, 2024

DedeHai commented Dec 5, 2024

softhack007 commented Dec 5, 2024 •

edited

Loading

DedeHai commented Dec 6, 2024 •

edited

Loading

blazoncek commented Dec 6, 2024

DedeHai commented Dec 6, 2024

DedeHai commented Dec 7, 2024

blazoncek commented Dec 7, 2024

DedeHai commented Dec 7, 2024

replaced repeated progress() calculation calls with a variable #4256

Are you sure you want to change the base?

replaced repeated progress() calculation calls with a variable #4256

Conversation

DedeHai commented Nov 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DedeHai Nov 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

softhack007 Nov 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

softhack007 Nov 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blazoncek Nov 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blazoncek commented Nov 16, 2024

netmindz commented Dec 5, 2024

DedeHai commented Dec 5, 2024

netmindz commented Dec 5, 2024

blazoncek commented Dec 5, 2024

DedeHai commented Dec 5, 2024

softhack007 commented Dec 5, 2024 • edited Loading

DedeHai commented Dec 6, 2024 • edited Loading

blazoncek commented Dec 6, 2024

DedeHai commented Dec 6, 2024

DedeHai commented Dec 7, 2024

blazoncek commented Dec 7, 2024

DedeHai commented Dec 7, 2024

DedeHai commented Nov 7, 2024 •

edited

Loading

DedeHai Nov 10, 2024 •

edited

Loading

softhack007 Nov 11, 2024 •

edited

Loading

softhack007 Nov 11, 2024 •

edited

Loading

blazoncek Nov 15, 2024 •

edited

Loading

softhack007 commented Dec 5, 2024 •

edited

Loading

DedeHai commented Dec 6, 2024 •

edited

Loading