Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce Binary Size by Removing <typename Uint> from int_writer #1778

Closed
kammce opened this issue Jul 16, 2020 · 2 comments · Fixed by #1781
Closed

Reduce Binary Size by Removing <typename Uint> from int_writer #1778

kammce opened this issue Jul 16, 2020 · 2 comments · Fixed by #1781

Comments

@kammce
Copy link
Contributor

kammce commented Jul 16, 2020

Current Situation

Current class template parameters for int_writer:

template <typename OutputIt, typename Char, typename UInt> struct int_writer {}

Proposal

Removing <typename UInt> from the structure template parameters. This would eliminate a few of the template instances generated of the int_writer structure. Simply replacing the Uint with uint64_t showed a 2476 byte reduction for the following:

  • Compiler: arm-none-eabi-gcc
  • Processor: cortex-m3
  • Optimization level: -Os -fno-lto
  • float/double/long double all disabled

What do you all think? I'm thinking of either using the largest data size or something along those lines in order to eliminate this template parameter.

@kammce
Copy link
Contributor Author

kammce commented Jul 16, 2020

I understand that using the largest data type is a naive approach, but maybe some sort of type erasure can help to remove that template parameter.

@kammce kammce changed the title Reduce Binary size by remove <typename Uint> from int_writer Reduce Binary Size by Removing <typename Uint> from int_writer Jul 16, 2020
kammce pushed a commit to kammce/fmt that referenced this issue Jul 17, 2020
Reduce code bloat by removing multiple instantiation of int_writer based
on the <typename UInt> parameter.

Resolves fmtlib#1778
kammce pushed a commit to kammce/fmt that referenced this issue Jul 17, 2020
Reduce code bloat by removing multiple instantiation of int_writer based
on the <typename UInt> parameter.

Resolves fmtlib#1778
kammce pushed a commit to kammce/fmt that referenced this issue Jul 17, 2020
Reduce code bloat by removing multiple instantiation of int_writer based
on the <typename UInt> parameter.

Rationale:
- The only functions that gains a speedup by int size would be
  int_writer::on_dec()'s call to count_digits which uses CLZ. Thus to
  still take advantage of this speedup, we store the size of the int
  so we can use a switch statement to call the correct count_digits.
- All other implementations of count_digits require some sort of looping
  that terminates when the value hits zero regardless of what sized int
  it is.

Caveats:
- There is a performance hit when dealing with and passing around
  64-bit/128-bit values compared to 32-bit values on 32-bit platforms,
  and with 64-bit values on 64-bit systems. But this should not reduce the
  performance that dramatically.
- There is also a performance hit for on_dec() due to the addition of a
  switch case. But, due to it size, this should reduce to a jump table.

Resolves fmtlib#1778
kammce pushed a commit to kammce/fmt that referenced this issue Jul 17, 2020
Reduce code bloat by removing multiple instantiation of int_writer based
on the <typename UInt> parameter.

Rationale:
- The only functions that gains a speedup by int size would be
  int_writer::on_dec()'s call to count_digits which uses CLZ. Thus to
  still take advantage of this speedup, we store the size of the int
  so we can use a switch statement to call the correct count_digits.
- All other implementations of count_digits require some sort of looping
  that terminates when the value hits zero regardless of what sized int
  it is.

Caveats:
- There is a performance hit when dealing with and passing around
  64-bit/128-bit values compared to 32-bit values on 32-bit platforms,
  and with 64-bit values on 64-bit systems. But this should not reduce the
  performance that dramatically.
- There is also a performance hit for on_dec() due to the addition of a
  switch case. But, due to it size, this should reduce to a jump table.

Resolves fmtlib#1778
kammce pushed a commit to kammce/fmt that referenced this issue Jul 17, 2020
Reduce code bloat by removing multiple instantiation of int_writer based
on the <typename UInt> parameter.

Rationale:
- The only functions that gains a speedup by int size would be
  int_writer::on_dec()'s call to count_digits which uses CLZ. Thus to
  still take advantage of this speedup, we store the size of the int
  so we can use a switch statement to call the correct count_digits.
- All other implementations of count_digits require some sort of looping
  that terminates when the value hits zero regardless of what sized int
  it is.

Caveats:
- There is a performance hit when dealing with and passing around
  64-bit/128-bit values compared to 32-bit values on 32-bit platforms,
  and with 64-bit values on 64-bit systems. But this should not reduce the
  performance that dramatically.
- There is also a performance hit for on_dec() due to the addition of a
  switch case. But, due to it size, this should reduce to a jump table.

Resolves fmtlib#1778
kammce pushed a commit to kammce/fmt that referenced this issue Jul 17, 2020
Reduce code bloat by removing multiple instantiation of int_writer based
on the <typename UInt> parameter.

Rationale:
- The only functions that gains a speedup by int size would be
  int_writer::on_dec()'s call to count_digits which uses CLZ. Thus to
  still take advantage of this speedup, we store the size of the int
  so we can use a switch statement to call the correct count_digits.
- All other implementations of count_digits require some sort of looping
  that terminates when the value hits zero regardless of what sized int
  it is.

Caveats:
- There is a performance hit when dealing with and passing around
  64-bit/128-bit values compared to 32-bit values on 32-bit platforms,
  and with 64-bit values on 64-bit systems. But this should not reduce the
  performance that dramatically.
- There is also a performance hit for on_dec() due to the addition of a
  switch case. But, due to it size, this should reduce to a jump table.

Resolves fmtlib#1778
@vitaut
Copy link
Contributor

vitaut commented Jul 18, 2020

Thanks for the suggestion - I commented on the PR.

BTW you might want to try using format string compilation (https://fmt.dev/latest/api.html#compile-api) - this will give you more code per-call but potentially smaller overall binary if don't have too many formatting function calls.

kammce pushed a commit to kammce/fmt that referenced this issue Jul 18, 2020
Reduce code bloat by removing multiple instantiation of int_writer based
on the <typename UInt> parameter.

Rationale:
- The only functions that gains a speedup by int size would be
  int_writer::on_dec()'s call to count_digits which uses CLZ. Thus to
  still take advantage of this speedup, we store the size of the int
  so we can use a switch statement to call the correct count_digits.
- All other implementations of count_digits require some sort of looping
  that terminates when the value hits zero regardless of what sized int
  it is.

Caveats:
- There is a performance hit when dealing with and passing around
  64-bit/128-bit values compared to 32-bit values on 32-bit platforms,
  and with 64-bit values on 64-bit systems. But this should not reduce the
  performance that dramatically.
- There is also a performance hit for on_dec() due to the addition of a
  switch case. But, due to it size, this should reduce to a jump table.

Resolves fmtlib#1778
vitaut pushed a commit that referenced this issue Jul 19, 2020
* Remove <typename UInt> from int_writer

Reduce code bloat by removing multiple instantiation of int_writer based
on the <typename UInt> parameter.

Rationale:
- The only functions that gains a speedup by int size would be
  int_writer::on_dec()'s call to count_digits which uses CLZ. Thus to
  still take advantage of this speedup, we store the size of the int
  so we can use a switch statement to call the correct count_digits.
- All other implementations of count_digits require some sort of looping
  that terminates when the value hits zero regardless of what sized int
  it is.

Caveats:
- There is a performance hit when dealing with and passing around
  64-bit/128-bit values compared to 32-bit values on 32-bit platforms,
  and with 64-bit values on 64-bit systems. But this should not reduce the
  performance that dramatically.
- There is also a performance hit for on_dec() due to the addition of a
  switch case. But, due to it size, this should reduce to a jump table.

Resolves #1778

* Add FMT_USE_SMALLEST_INT flag

When defined and set to zero, will use the largest available integer
container for writing ints. The has the benefit of reducing instances
the of int_writer class which will reduce the binary cost.

* Rename flag to FMT_REDUCE_INT_INSTANTIATIONS

Add comment above FMT_REDUCE_INT_INSTANTIATIONS definition describing
why a developer would use it.

* Move FMT_REDUCE_INT_INSTANTIATIONS to format.h

Co-authored-by: Khalil Estell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants