-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance compare with native dragonbox::to_chars #3675
Comments
It is expected that the (default) runtime formatting will be slightly slower than calling Dragonbox directly because the formatting function has to do some extra work. This overhead can be reduced by using format string compilation: https://fmt.dev/latest/api.html#compile-api. It's hard to say anything more specific and numbers don't look particularly meaningful because your test is quite broken: {fmt} cases do additional nul termination, there is stack corruption and you are using gtest instead of a proper benchmark. I recommend looking at an existing benchmark, e.g. https://github.com/miloyip/dtoa-benchmark. Also keep in mind that {fmt} uses compact Dragonbox tables by default so if you want maximum perf at the cost of binary size you could switch to larger tables. |
I tested agian with opt method and the benchmark tools you have mentioned, but result seems still about 2.x slower than dragonbox. Verifying doubleconv... OK. Length Avg = 22.426, Max = 25
Verifying dragonbox... OK. Length Avg = 22.027, Max = 24
Verifying dragonbox_comp... OK. Length Avg = 22.027, Max = 24
Verifying fmt... OK. Length Avg = 22.445, Max = 24
Verifying fmt_full_cache_test... OK. Length Avg = 22.445, Max = 24
Verifying ostringstream... OK. Length Avg = 22.940, Max = 24
Verifying ostrstream... OK. Length Avg = 22.940, Max = 24
Verifying sprintf... OK. Length Avg = 22.940, Max = 24
Benchmarking randomdigit doubleconv... Done
Benchmarking randomdigit dragonbox... Done
Benchmarking randomdigit dragonbox_comp... Done
Benchmarking randomdigit fmt... Done
Benchmarking randomdigit fmt_full_cache_test... Done
Benchmarking randomdigit null... Done
Benchmarking randomdigit ostringstream... Done
Benchmarking randomdigit ostrstream... Done
Benchmarking randomdigit sprintf... Done
Function | Min ns | RMS ns | Max ns | Sum ns | Speedup |
:-------------|--------:|---------:|--------:|----------:|--------:|
null | 1.6 | 1.600 | 1.6 | 27.2 | ×597.4 |
dragonbox | 28.4 | 30.379 | 33.6 | 515.9 | ×31.5 |
dragonbox_comp| 34.4 | 36.937 | 41.9 | 627.3 | ×25.9 |
fmt_full_cache_test| 53.4 | 59.377 | 68.7 | 1007.5 | ×16.1 |
fmt | 53.5 | 59.513 | 68.1 | 1010.0 | ×16.1 |
doubleconv | 82.9 | 129.439 | 168.7 | 2170.8 | ×7.5 |
sprintf | 868.0 | 957.211 | 1028.4 | 16249.7 | ×1.0 |
ostrstream | 1197.1 | 1285.831 | 1357.9 | 21841.8 | ×0.7 |
ostringstream | 1279.8 | 1377.940 | 1462.4 | 23401.2 | ×0.7 | append null termination is necessary for correct |
Will need to look in more details but one surprising thing is that full and compact cache results are identical. |
So I looked in more details and one obvious problem with the new benchmark is ODR violation: you are trying to use {fmt} compiled with different configurations in different TUs. This is a UB. If you correctly enable full Dragonbox cache with
you'll get a noticeable speedup from
to
on my system. It is still not as fast as calling Dragonbox directly which is worth investigating further. |
@zhiqiang-hhhh If you really want to test {fmt} with multiple different configurations in a single executable, you can do something like this to avoid the ODR issue: https://github.com/jk-jeon/dtoa-benchmark/blob/master/src/fmt_full_cachetest.cpp This still feels like a terrible hack, but since it is just for testing I think it should be alright. |
Hello, I am using
dragonbox::to_chars
as my float pointer number to string method, and trying to replace dragonbox with lib fmt 10.x since it has already integrated with dragonbox and lib fmt has more output format control.But according to my simple benchmark, dragonbox is almost x1.7 faster than lib fmt when doing float-point to string.
build with release, result:
According to my basic knowledge, time consumption of float point to decimal should be almost same after lib fmt integrated with dragon box. so the determining factor of performance difference here should be the output formatting control?
The text was updated successfully, but these errors were encountered: