-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce Binary Size by Removing <typename Uint> from int_writer #1778
Comments
I understand that using the largest data type is a naive approach, but maybe some sort of type erasure can help to remove that template parameter. |
Reduce code bloat by removing multiple instantiation of int_writer based on the <typename UInt> parameter. Resolves fmtlib#1778
Reduce code bloat by removing multiple instantiation of int_writer based on the <typename UInt> parameter. Resolves fmtlib#1778
Reduce code bloat by removing multiple instantiation of int_writer based on the <typename UInt> parameter. Rationale: - The only functions that gains a speedup by int size would be int_writer::on_dec()'s call to count_digits which uses CLZ. Thus to still take advantage of this speedup, we store the size of the int so we can use a switch statement to call the correct count_digits. - All other implementations of count_digits require some sort of looping that terminates when the value hits zero regardless of what sized int it is. Caveats: - There is a performance hit when dealing with and passing around 64-bit/128-bit values compared to 32-bit values on 32-bit platforms, and with 64-bit values on 64-bit systems. But this should not reduce the performance that dramatically. - There is also a performance hit for on_dec() due to the addition of a switch case. But, due to it size, this should reduce to a jump table. Resolves fmtlib#1778
Reduce code bloat by removing multiple instantiation of int_writer based on the <typename UInt> parameter. Rationale: - The only functions that gains a speedup by int size would be int_writer::on_dec()'s call to count_digits which uses CLZ. Thus to still take advantage of this speedup, we store the size of the int so we can use a switch statement to call the correct count_digits. - All other implementations of count_digits require some sort of looping that terminates when the value hits zero regardless of what sized int it is. Caveats: - There is a performance hit when dealing with and passing around 64-bit/128-bit values compared to 32-bit values on 32-bit platforms, and with 64-bit values on 64-bit systems. But this should not reduce the performance that dramatically. - There is also a performance hit for on_dec() due to the addition of a switch case. But, due to it size, this should reduce to a jump table. Resolves fmtlib#1778
Reduce code bloat by removing multiple instantiation of int_writer based on the <typename UInt> parameter. Rationale: - The only functions that gains a speedup by int size would be int_writer::on_dec()'s call to count_digits which uses CLZ. Thus to still take advantage of this speedup, we store the size of the int so we can use a switch statement to call the correct count_digits. - All other implementations of count_digits require some sort of looping that terminates when the value hits zero regardless of what sized int it is. Caveats: - There is a performance hit when dealing with and passing around 64-bit/128-bit values compared to 32-bit values on 32-bit platforms, and with 64-bit values on 64-bit systems. But this should not reduce the performance that dramatically. - There is also a performance hit for on_dec() due to the addition of a switch case. But, due to it size, this should reduce to a jump table. Resolves fmtlib#1778
Reduce code bloat by removing multiple instantiation of int_writer based on the <typename UInt> parameter. Rationale: - The only functions that gains a speedup by int size would be int_writer::on_dec()'s call to count_digits which uses CLZ. Thus to still take advantage of this speedup, we store the size of the int so we can use a switch statement to call the correct count_digits. - All other implementations of count_digits require some sort of looping that terminates when the value hits zero regardless of what sized int it is. Caveats: - There is a performance hit when dealing with and passing around 64-bit/128-bit values compared to 32-bit values on 32-bit platforms, and with 64-bit values on 64-bit systems. But this should not reduce the performance that dramatically. - There is also a performance hit for on_dec() due to the addition of a switch case. But, due to it size, this should reduce to a jump table. Resolves fmtlib#1778
Thanks for the suggestion - I commented on the PR. BTW you might want to try using format string compilation (https://fmt.dev/latest/api.html#compile-api) - this will give you more code per-call but potentially smaller overall binary if don't have too many formatting function calls. |
Reduce code bloat by removing multiple instantiation of int_writer based on the <typename UInt> parameter. Rationale: - The only functions that gains a speedup by int size would be int_writer::on_dec()'s call to count_digits which uses CLZ. Thus to still take advantage of this speedup, we store the size of the int so we can use a switch statement to call the correct count_digits. - All other implementations of count_digits require some sort of looping that terminates when the value hits zero regardless of what sized int it is. Caveats: - There is a performance hit when dealing with and passing around 64-bit/128-bit values compared to 32-bit values on 32-bit platforms, and with 64-bit values on 64-bit systems. But this should not reduce the performance that dramatically. - There is also a performance hit for on_dec() due to the addition of a switch case. But, due to it size, this should reduce to a jump table. Resolves fmtlib#1778
* Remove <typename UInt> from int_writer Reduce code bloat by removing multiple instantiation of int_writer based on the <typename UInt> parameter. Rationale: - The only functions that gains a speedup by int size would be int_writer::on_dec()'s call to count_digits which uses CLZ. Thus to still take advantage of this speedup, we store the size of the int so we can use a switch statement to call the correct count_digits. - All other implementations of count_digits require some sort of looping that terminates when the value hits zero regardless of what sized int it is. Caveats: - There is a performance hit when dealing with and passing around 64-bit/128-bit values compared to 32-bit values on 32-bit platforms, and with 64-bit values on 64-bit systems. But this should not reduce the performance that dramatically. - There is also a performance hit for on_dec() due to the addition of a switch case. But, due to it size, this should reduce to a jump table. Resolves #1778 * Add FMT_USE_SMALLEST_INT flag When defined and set to zero, will use the largest available integer container for writing ints. The has the benefit of reducing instances the of int_writer class which will reduce the binary cost. * Rename flag to FMT_REDUCE_INT_INSTANTIATIONS Add comment above FMT_REDUCE_INT_INSTANTIATIONS definition describing why a developer would use it. * Move FMT_REDUCE_INT_INSTANTIATIONS to format.h Co-authored-by: Khalil Estell <[email protected]>
Current Situation
Current class template parameters for int_writer:
Proposal
Removing
<typename UInt>
from the structure template parameters. This would eliminate a few of the template instances generated of the int_writer structure. Simply replacing the Uint with uint64_t showed a 2476 byte reduction for the following:What do you all think? I'm thinking of either using the largest data size or something along those lines in order to eliminate this template parameter.
The text was updated successfully, but these errors were encountered: