Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed improvements for discussion #4138
base: 0_15
Are you sure you want to change the base?
Speed improvements for discussion #4138
Changes from 12 commits
c3f472f
9341768
feac45f
992d11b
b07658b
09428dc
ec938f2
d45b4ad
2afff05
6a37f25
0e5bd4e
f3137eb
686866c
6962905
a88436c
17d59d3
0a54002
33cf82a
906f8fc
bef1ac2
c44b9f8
b404458
a76a895
7c0fe12
202901b
c842994
9114867
ffbc8c5
336da25
8e78fb4
0ae7329
ee380c5
ba3a61f
a15c391
ca06214
eb5ad23
be64930
210191b
ef1e24c
5c2bac4
0a05611
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is only used as an uint8_t down the line and not involved in any calculation.
edit:
correction: I have index as an unsigned in the
ColorFromPaletteWLED()
and do abyte()
cast there anyway to prevent overflows. so it may be better to revert back to unsigned for consistency.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. ATM. what about in the future?
if it is unsigned, compiler will trim it down when passing to a function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok reverted to unsigned locally. saves 6 bytes of code. I thought compiler would do better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XTensa doesn't have non-word arithmetic instructions -- any operations done on a 8-bit or 16-bit types have to be performed by upcasting, operating, and downcasting afterwards to restrict the output range. At least here,
int
orunsigned
is almost always less code (and slightly faster).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general: absolutely yes. But if a function is performed purely in 8bit (like most of the fastled stuff) sometimes the compiler optimizes in a way that makes it actually faster (smaller code) when using
uint8_t
, don't ask me why exactly, I have been playing with types to improve code quite a bit and sometimes its just not really clear what is happening.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non obligatory: I would prefer
__attribute__
at the end but[[...]]
in front.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly have no Idea what the precompiler instructions mean in detail, I copied this from what they use in fastled...
the idea behind this is to get rid of the 'function entry' instructions that are added when a function is called. When I added the
inline
flash size increased by a few bytes, telling me that it is actually inlined. Since this short function is only called from two places and is called A LOT this may be faster. I have no way to check (would need a proper debugger that shows assembly instructions being executed line by line).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I've recently learned these are compiler attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 not sure if always_inline plays well with IRAM_ATTR .... the first tells the compiler to always inline the function, the latter says "put the function into IRAM" which means that a real function is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is what I was wondering too, this is just a suggestion, i.e. to inline this for speed but how to tell the compiler to inline it to the functions that are in ram... not sure how it will do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my understanding:
inline
is a hint/suggestion to the compiler. So it might get inlined, or not.__attribute__((always_inline))
is a directive. So the compiler must inline this function, no matter if its efficient or not.If you want to optimize function calls, its sometime useful to add
__attribute__((pure))
or__attribute__((const))
to the function declaration. But only do this after double-checking that the code is actually "pure" (no side-effects) or "const" (solely depends on arguments). I did this in the MoonModules fork, but honestly it does not give you more than 1 or 2 fps even if you apply it to lots of functions.See MoonModules@7f9da30