-
-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replaced repeated progress() calculation calls with a variable #4256
base: 0_15
Are you sure you want to change the base?
Conversation
progress() is called in setPixelColor(), calculating the transition progress for each pixel. Replaced that call with an inline function to get the new segment variable. The progress is updated in service() when handleTransition() is called. The new variable is in a spot where padding is added, so this should not use more RAM. Result: over 10% increase in FPS on 16x16 matrix
wled00/FX_fcn.cpp
Outdated
@@ -317,12 +317,12 @@ void Segment::stopTransition() { | |||
} | |||
|
|||
// transition progression between 0-65535 | |||
uint16_t IRAM_ATTR Segment::progress() const { | |||
void IRAM_ATTR Segment::updateTransitionProgress() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could use IRAM_ATTR_YN
here - it means that esp32 puts the function into IRAM, while 8266 doesn't. We'll save some IRAM space especially on the "_compat" builds.
As the function is only called once per frame in WS2812FX::service() - via seg.handleTransition() - it might even be better to remove the IRAM_ATTR as this call is not performance critical any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about removing the attribute but left it as is since I have no way to check the difference. I once as a test removed all IRAM_ATTR and on my setup there was zero performance change.
I think removing it here is safe, as you say, this is only called once per frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
esp8266 will thank you ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe WLED_SAVE_IRAM
should also be defined on ESP32 C3: it is not as performant as the ESP32 any way and putting stuff in IRAM uses a lot more flash for some reason. If I enable WLED_SAVE_IRAM
on the C3 that saves 1.6k of flash. On the ESP32 it only saves 68 bytes of flash.
Any suggestions for performance tests that would show if this is a valid option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any function marked with IRAM_ATTR will always be kept in fast SRAM and will never be fetched from flash. The basic idea for IRAM_ATTR is to be used in ISR or functions that may access (write to) flash directly.
The benefit of using it elsewhere is to speed up access to such function as it will never go to cache hit/miss logic.
Contrary to what @softhack007 is saying I still think adding IRAM_ATTR to functions that are called very often is beneficial. I am telling this from experience with over 50 installed ESP8266's with various options and usermods installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contrary to what @softhack007 is saying I still think adding IRAM_ATTR to functions that are called very often is beneficial.
It might be beneficial, however we talked about the new progress()
that's only called a few hundred times per second (max) now. The function is not time critical any more with this PR, so why use IRAM_ATTR for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(side-topic)
I once as a test removed all IRAM_ATTR and on my setup there was zero performance change.
This aligns with my own experiments on -S3 and esp32 with 80mhz flash - no noticeable performance impact, however sometimes IRAM_ATTR increases program size. This can be explained because the compiler cannot inline such a function, even when there would be a benefit for program size.
Maybe it's also depending a lot on the CPU caches. In fact a function that's called really often has a good chance to be cached by the CPU already. Also a board with fast flash (qio 80mhz) is like 4x faster on flash reading, compared to slow flash (dout 40mhz).
Many cheap 8266 still have 40mhz dout, plus smaller caches, so it makes sense that there is still some benefit of IRAM_ATTR on boards with slow flash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no doubt that any ESP32 performs adequately without IRAM_ATTR.
However ESP8266 is another thing and while it may be old and lacking it is still used by many users (including me) who keep attaching plenty of peripherals to it while running WLED. Hence I strongly urge to keep IRAM_ATTR as many times as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you test if this PR with latest commit has any impact on ESP8266? i.e. removed IRAM_ATTR
from updateTransitionProgress()
The other question was to add WLED_SAVE_IRAM
to C3 builds as it would save on flash size but may have negative performance impacts in certain situations / setups. If there is a way to test that on the C3 I could check. After all, the C3 is more of an upgraded ESP8266 compared to other ESP32 variants.
`updateTransitionProgress()` is called only once per frame, no need to put it in RAM.
wled00/FX.h
Outdated
@@ -363,6 +363,7 @@ typedef struct Segment { | |||
}; | |||
uint8_t startY; // start Y coodrinate 2D (top); there should be no more than 255 rows | |||
uint8_t stopY; // stop Y coordinate 2D (bottom); there should be no more than 255 rows | |||
uint16_t transitionprogress; // current transition progress 0 - 0xFFFF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a static
variable as it will be modified in each handleTransition()
call at the start of service()
loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the current behaviour if for example the palette of one segment is changed with 10s transition, and after 5s palette of a second segment is changed? Will that stop transition of the first segment? i.e. are transition start/stop per segment or global?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The transition time is bound to strip
ATM but each segment may start transition at its own (point in) time. So each segment should transition on its own (that was my goal when implementing current transitions, compared to previous, limited transitions) independent from others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a
static
variable as it will be modified in eachhandleTransition()
call at the start ofservice()
loop.
Rather than make it static, I'd say put it into strip
.
Static member attributes of a class are always a good source of confusion when trying to read code written by someone else - because a 'static' member is technicially not even part of the object you work with. It survives delete segment
, changing it in one object instances also changes the value in all other instances. Coping one segment to another will not create a copy of static members attributes - there will still be only one value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than make it static, I'd say put it into
strip
.
That would be a worse choice IMO. It can be considered in the same way as _vLength, etc in speed improvements branch by @DedeHai . It is used as a speedup, pre-calculated value.
If you think this will cause confusion, keep it as an instance member rather than strip
member.
But that is just my opinion, no need to take it into account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
progress()
determines the state where transition is. It should not be "frame" based. It only depends on time since transition (for that segment) started. BTW it is possible to have multiple segments in transition with differing progress()
value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the reason progress()
should depend on the order of segment processing or when does this give an advantage aver calculating it once per frame? If for example there are lots of segments with heavy FX calculation (>1ms) later segments would already be further in the transition in the same frame, resulting in different palette or brightness for example. That does not appear to be the correct approach IMHO. It would only be right if startTransition() is called with the same time delay, which I don't think it is. Using the 'per segment' update is more consistent code though, as you said. I think your suggestion of following the Segment::beginDraw()
is a good compromise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the reason
progress()
should depend on the order of segment processing
It should not be dependant on the order of segment processing. Each "segment" (if you want it) has its own "next time" for update in the form of effect function return value. So some effects may run at independent frame rate. Look at Android and a few others (i.e. Game of life).
What matters is the time difference between the start of transition and current time (in regards to the transition duration). This is not frame or segment based. It is true that segment holds both start time and transition duration and these two are unique to each segment. This was the prime reason to have progress()
as a function that calculates how far transition has progressed. If you impose a limit that progress is calculated at the start of frame (for optimisation for speed) then it needs to be pre-calculated for each segment separately. Hence the suggestion for beginDraw()
.
Why do you think that having different progress value for is wrong? Each segment operates independently and displays effect and/or palette independently of other segments. Even if overlaid. It is totally acceptable for two segments to be at different progress level.
And it is quite easy to achieve that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are different scenarios, your reasoning seems sound to me. My scenario was this: have 10 segments, all displaying the same FX, change palette on all of them at the same time -> a transition starts on all segments. Palette change may now not be simulatanuous on all segments, the last ones may already have progressed further. Is my thinking here correct for current code? I am not arguing against your suggestion to keep it 'per segment', which I think is the way to go, just trying to understand where a 'once per frame' would be better and where it would be worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is correct and in most cases this will be the case (same start for all segments). But not in all cases (and not all segments may have same effect and/or palette).
So, if you have an exception, treat it as a regular behaviour (not all segments may start transition at the same time).
I've tested the implementation that uses static |
Did we reach an agreement on the final implementation? @DedeHai |
we did, as just (wrongly) mentioned in the other PR: @blazoncek has already implemented and tested the changes |
Are these changes in another pr, already merged or just local on your machine at the moment @blazoncek ? |
All of the changes are in my fork, combined with other updates. Probably not cherry-pickable, unfortunately. Implementation used there uses static member. Speed improvements and |
I am not sure it is the best idea to put all changes in the same "speed improvements" PR, if you feel confident about that, I am fine with closing this one. Just have to remember the speed improvements PR must not be squash-merged or debugging will be hard. |
I agree with you - mashing up everything into a "speed improvement" PR which will not be cherry-pickable (as bugfixes and enhancements are all squashed up) does not seem like a good solution. We need a way to keep the codebase understandable and maintainable, so smaller PRs are better imho, It might be possible to create a temporary branch of 0_15, and then merge all optimizations into this Edit: If I get a list of PRs to combine, I can give it a try next week. |
@softhack007 I have most of my PRs running here already, merging them all is a bit of a pain as there are many conflicts to resolve. I merged them in this branch: https://github.com/DedeHai/WLED/tree/0_15_PS_potpourri @blazoncek are your changes public? I can manually add them to this PR if not cherry-pickable, its just a few lines that changed right? |
Anything I do is public when stable enough. Never hid anything. 😉 |
@softhack007 I think merging into a new branch is not necessary, I can do the (non squashed) merge of the PRs in the correct order to have the least conflicts and resolve any that remain. What I probably will do is a test-run of the merges in my repo to figure out what the best order is: in that potpourri branch I had to start over twice... |
this is now more aligned with other variables using the same logic
@blazoncek are the changes in the latest commit ok? |
I think yes, though I also moved |
pointless yes but I find it much more readable than jamming it all in that inline. |
progress() is called in
setPixelColor()
re-calculating the transition progress for each pixel.Replaced that call with an inline function to get the new segment variable. The progress is updated in service() when
handleTransition()
is called. The new variable is in a spot where padding is added, so this should not use more RAM.Result: over 10% increase in FPS on 16x16 matrix during transitions