Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize append #2164

Merged
merged 3 commits into from
Mar 13, 2021
Merged

optimize append #2164

merged 3 commits into from
Mar 13, 2021

Conversation

moiwi
Copy link
Contributor

@moiwi moiwi commented Mar 5, 2021

Avoid one unneeded empty loop pass, e.g. with

std::string message = fmt::sprintf("The answer is %d", 42);

(mostly if % placeholder is at the beginning or end of the string to be formatted)

I agree that my contributions are licensed under the {fmt} license, and agree to future changes to the licensing.

optimize loop
@moiwi moiwi changed the title Update format.h optimize append Mar 5, 2021
@vitaut
Copy link
Contributor

vitaut commented Mar 10, 2021

Thanks for the PR. Could you post a benchmark demonstrating the effect of this change in the comments here?

@rimathia
Copy link
Contributor

rimathia commented Mar 11, 2021

I've performed the following elementary benchmark:
fmt::format("{}", 'a') (and the corresponding case with 5 and 10 arguments)
The benchmark parameter introduces N non-format characters after "{}". N != 0 is mostly to check that the "normal" case isn't affected in an unexpected way.
The expected improvement for N=0 shows up for the five-argument case for gcc and in the five- and ten-argument case for clang.
I've only checked the reproducibility of the benchmark results by manual comparison of runs.

gcc (g++-10 (Homebrew GCC 10.2.0_4) 10.2.0):

cd ../fmt/ && git checkout e718ec3e93 && git log -1 && cd ../build_gcc && make dowhileopt && ./dowhileopt
HEAD is now at e718ec3e Make truncating_iterator an output_iterator (#2158)
commit e718ec3e93dc75598bfbcdaf492554ae2114ec19 (HEAD, moiwi/master)
Author: Jason Cobb <[email protected]>
Date:   Thu Mar 4 18:53:08 2021 -0500

    Make truncating_iterator an output_iterator (#2158)
<build cut out>
2021-03-11 17:25:45
Running ./dowhileopt
Run on (16 X 2400 MHz CPU s)
CPU Caches:
  L1 Data 32K (x8)
  L1 Instruction 32K (x8)
  L2 Unified 262K (x8)
  L3 Unified 16777K (x1)
Load Average: 2.15, 2.08, 1.86
---------------------------------------------------------------------
Benchmark                           Time             CPU   Iterations
---------------------------------------------------------------------
one_format_argument/0            7.77 ns         7.77 ns     80873433
one_format_argument/10           29.2 ns         29.2 ns     23374863
one_format_argument/100          90.5 ns         90.5 ns      7942361
one_format_argument/1000          256 ns          256 ns      2957542
five_format_arguments/0          62.5 ns         62.5 ns     11517325
five_format_arguments/10          144 ns          144 ns      4760383
five_format_arguments/100         307 ns          307 ns      2168290
five_format_arguments/1000        976 ns          976 ns       655977
ten_format_arguments/0            111 ns          111 ns      6218298
ten_format_arguments/10           264 ns          264 ns      2778362
ten_format_arguments/100          458 ns          458 ns      1462666
ten_format_arguments/1000        1782 ns         1782 ns       414768
Mathiass-MacBook-Pro-2:build_gcc mathiasritzmann$ cd ../fmt/ && git checkout b3e6d && git log -1 && cd ../build_gcc && make dowhileopt && ./dowhileopt
Previous HEAD position was e718ec3e Make truncating_iterator an output_iterator (#2158)
HEAD is now at b3e6d017 Update format.h
commit b3e6d017a5045e7d6a5828bc0a59066da03c216a (HEAD, moiwi/moiwi-opt)
Author: moiwi <[email protected]>
Date:   Fri Mar 5 09:52:54 2021 +0100

    Update format.h
    
    optimize loop
<build cut out>
2021-03-11 17:26:25
Running ./dowhileopt
Run on (16 X 2400 MHz CPU s)
CPU Caches:
  L1 Data 32K (x8)
  L1 Instruction 32K (x8)
  L2 Unified 262K (x8)
  L3 Unified 16777K (x1)
Load Average: 1.76, 1.99, 1.83
---------------------------------------------------------------------
Benchmark                           Time             CPU   Iterations
---------------------------------------------------------------------
one_format_argument/0            7.64 ns         7.63 ns     82745251
one_format_argument/10           28.8 ns         28.8 ns     25440576
one_format_argument/100          84.4 ns         84.4 ns      8184359
one_format_argument/1000          244 ns          244 ns      2932625
five_format_arguments/0          55.7 ns         55.7 ns     10177821
five_format_arguments/10          139 ns          139 ns      4978840
five_format_arguments/100         316 ns          316 ns      2271562
five_format_arguments/1000        945 ns          945 ns       722797
ten_format_arguments/0            109 ns          109 ns      6446681
ten_format_arguments/10           252 ns          252 ns      2669351
ten_format_arguments/100          469 ns          469 ns      1511367
ten_format_arguments/1000        1694 ns         1694 ns       413663

clang (Apple clang version 12.0.0 (clang-1200.0.32.29)):

cd ../fmt/ && git checkout e718ec3e93 && git log -1 && cd ../build_clang && make dowhileopt && ./dowhileopt
Previous HEAD position was b3e6d017 Update format.h
HEAD is now at e718ec3e Make truncating_iterator an output_iterator (#2158)
commit e718ec3e93dc75598bfbcdaf492554ae2114ec19 (HEAD, moiwi/master)
Author: Jason Cobb <[email protected]>
Date:   Thu Mar 4 18:53:08 2021 -0500

    Make truncating_iterator an output_iterator (#2158)
<build cut out>
2021-03-11 17:36:30
Running ./dowhileopt
Run on (16 X 2400 MHz CPU s)
CPU Caches:
  L1 Data 32K (x8)
  L1 Instruction 32K (x8)
  L2 Unified 262K (x8)
  L3 Unified 16777K (x1)
Load Average: 1.66, 1.74, 1.76
---------------------------------------------------------------------
Benchmark                           Time             CPU   Iterations
---------------------------------------------------------------------
one_format_argument/0            4.26 ns         4.26 ns    157338374
one_format_argument/10           26.9 ns         26.9 ns     27105518
one_format_argument/100          87.7 ns         87.7 ns      7845950
one_format_argument/1000          266 ns          266 ns      2863302
five_format_arguments/0          45.8 ns         45.8 ns     15294262
five_format_arguments/10          155 ns          155 ns      4683339
five_format_arguments/100         313 ns          313 ns      2212935
five_format_arguments/1000       1062 ns         1062 ns       685019
ten_format_arguments/0           86.0 ns         86.0 ns      7977026
ten_format_arguments/10           255 ns          255 ns      2765093
ten_format_arguments/100          512 ns          512 ns      1022570
ten_format_arguments/1000        1846 ns         1846 ns       378706
Mathiass-MacBook-Pro-2:build_clang mathiasritzmann$ cd ../fmt/ && git checkout b3e6d && git log -1 && cd ../build_clang && make dowhileopt && ./dowhileopt
Previous HEAD position was e718ec3e Make truncating_iterator an output_iterator (#2158)
HEAD is now at b3e6d017 Update format.h
commit b3e6d017a5045e7d6a5828bc0a59066da03c216a (HEAD, moiwi/moiwi-opt)
Author: moiwi <[email protected]>
Date:   Fri Mar 5 09:52:54 2021 +0100

    Update format.h
    
    optimize loop
<build cut out>
2021-03-11 17:37:19
Running ./dowhileopt
Run on (16 X 2400 MHz CPU s)
CPU Caches:
  L1 Data 32K (x8)
  L1 Instruction 32K (x8)
  L2 Unified 262K (x8)
  L3 Unified 16777K (x1)
Load Average: 1.68, 1.73, 1.75
---------------------------------------------------------------------
Benchmark                           Time             CPU   Iterations
---------------------------------------------------------------------
one_format_argument/0            4.36 ns         4.36 ns    128839889
one_format_argument/10           26.7 ns         26.7 ns     26903417
one_format_argument/100          88.2 ns         88.2 ns      7920523
one_format_argument/1000          261 ns          261 ns      2792951
five_format_arguments/0          42.3 ns         42.3 ns     16470123
five_format_arguments/10          156 ns          156 ns      4676081
five_format_arguments/100         314 ns          314 ns      2210008
five_format_arguments/1000       1069 ns         1069 ns       687251
ten_format_arguments/0           71.9 ns         71.9 ns      9592458
ten_format_arguments/10           252 ns          252 ns      2639756
ten_format_arguments/100          528 ns          528 ns      1000000
ten_format_arguments/1000        1837 ns         1837 ns       370500

here's the benchmark code:

#include <benchmark/benchmark.h>
#include <fmt/format.h>
#include <fmt/printf.h>

void one_format_argument(benchmark::State& state) {
  const auto format_string = "{}" + std::string(state.range(0), 'x');
  const auto test_output = fmt::format(format_string, 'a');
  if (test_output != "a" + std::string(state.range(0), 'x')) {
    throw std::runtime_error("wrong");
  }
  for (auto _ : state) {
    benchmark::DoNotOptimize(fmt::format(format_string, 'a', 'a'));
  }
}

void five_format_arguments(benchmark::State& state) {
  auto format_string = std::string();
  for (int i = 0; i < 5; ++i) {
    format_string += "{}" + std::string(state.range(0), 'x');
  }
  const auto test_output = fmt::format(format_string, 'a', 'a', 'a', 'a', 'a');
  if (state.range(0) == 0 && test_output != std::string(5, 'a')) {
    throw std::runtime_error("wrong");
  }
  for (auto _ : state) {
    benchmark::DoNotOptimize(
        fmt::format(format_string, 'a', 'a', 'a', 'a', 'a'));
  }
}

void ten_format_arguments(benchmark::State& state) {
  auto format_string = std::string();
  for (int i = 0; i < 10; ++i) {
    format_string += "{}" + std::string(state.range(0), 'x');
  }
  const auto test_output = fmt::format(format_string, 'a', 'a', 'a', 'a', 'a',
                                       'a', 'a', 'a', 'a', 'a');
  if (state.range(0) == 0 && test_output != std::string(10, 'a')) {
    throw std::runtime_error("wrong");
  }
  for (auto _ : state) {
    benchmark::DoNotOptimize(fmt::format(format_string, 'a', 'a', 'a', 'a', 'a',
                                         'a', 'a', 'a', 'a', 'a'));
  }
}

BENCHMARK(one_format_argument)->Arg(0)->Arg(10)->Arg(100)->Arg(1000);
BENCHMARK(five_format_arguments)->Arg(0)->Arg(10)->Arg(100)->Arg(1000);
BENCHMARK(ten_format_arguments)->Arg(0)->Arg(10)->Arg(100)->Arg(1000);

BENCHMARK_MAIN();

@vitaut vitaut merged commit b8ff3c1 into fmtlib:master Mar 13, 2021
@vitaut
Copy link
Contributor

vitaut commented Mar 13, 2021

Thank you both!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants