Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use dragonbox algorithm; alternative to #18008 #18139

Merged
merged 8 commits into from
Jun 1, 2021
Merged

use dragonbox algorithm; alternative to #18008 #18139

merged 8 commits into from
Jun 1, 2021

Conversation

Araq
Copy link
Member

@Araq Araq commented May 31, 2021

Note: the string handling code is still nuts, impossible to verify. We should rewrite it to write into var openArray and benefit from the bounds checking.

@timotheecour
Copy link
Member

timotheecour commented Jun 1, 2021

after benchmarking this PR vs #18008 (with -d:danger), this results in slower code, loosing important benefits of using dragonbox for people interested in performance, float to string can be a bottleneck in some applications;

in addition, this causes significantly slower code in VM for addFloat and slight increase in compile times for processing system/dragonbox.nim, and we lose connection with original code in case it gets updates; there will be more such cases where we should handle external C, C++ depdendencies anyways.

I'd rather follow your suggestion from #18008 (comment)

We need an external "bundler" tool that is available at boot time (maybe by generating C sources for it).

and update the original PR to do that instead, deferring the addDependency approach to future work

(unrelated: the bootstrap failure in this PR is yet another reason why updating csources_v1 to a recent nim release reduces friction when writing compiler code; in this case >= 1.4.0 would make it work; refs timotheecour#251); at least now we have a better CI/build script integration that should correctly handle csources_v1 updates, and soon a nimdigger tool ready to be used

@Araq
Copy link
Member Author

Araq commented Jun 1, 2021

after benchmarking this PR vs #18008 (with -d:danger), this results in slower code

Which benchmark, how much is it slower and why? We'll be in the "omg, we need to use existing (risky) C(++) code for good because we lose performance otherwise" world for good. I like the NIH world better -- it's also what Go/D/Rust do btw.

@Araq
Copy link
Member Author

Araq commented Jun 1, 2021

there will be more such cases where we should handle external C, C++ dependencies anyways.

But not for this case. Code like

  var n: int = c_sprintf(addr buf, "%.16g", value)
  var hasDot = false
  for i in 0..n-1:
    if buf[i] == ',':
      buf[i] = '.'
      hasDot = true
    elif buf[i] in {'a'..'z', 'A'..'Z', '.'}:
      hasDot = true
  if not hasDot:
    buf[n] = '.'
    buf[n+1] = '0'
    buf[n+2] = '\0'
    result = n + 2
  else:
    result = n
  # On Windows nice numbers like '1.#INF', '-1.#INF' or '1.#NAN' or 'nan(ind)'
  # of '-1.#IND' are produced.
  # We want to get rid of these here:
  if buf[n-1] in {'n', 'N', 'D', 'd', ')'}:
    writeToBuffer(buf, "nan")
    result = 3
  elif buf[n-1] == 'F':
    if buf[0] == '-':
      writeToBuffer(buf, "-inf")
      result = 4
    else:
      writeToBuffer(buf, "inf")
      result = 3

does scream "we need to be in control of the library!"...

@timotheecour
Copy link
Member

timotheecour commented Jun 1, 2021

Which benchmark, how much is it slower and why?

1.2 to 1.6 X slower depending on distribution (in VM it's up to 2.8X slower, it matters less but still)
will share benchmarks tmrw after double checking

as to why, i don't know yet, but it could be due to something getting lost in c2nim/manual postprocessing translation ; i tried different things including replacing -d:danger with --passc:-O3 --stacktrace:off --checks:off to get closer to what original PR uses to compile dragonbox.cc, but no effect there

While 1.6X doesn't matter for most programs, it will matter for some programs where string to float is on the critical path, and it's easy enough to just erase the gap.

@Araq
Copy link
Member Author

Araq commented Jun 1, 2021

Independent CI failure, merging.

@Araq Araq merged commit 63db2b1 into devel Jun 1, 2021
@Araq Araq deleted the araq-dragonbox branch June 1, 2021 20:29
@Varriount
Copy link
Contributor

If one must use the C++ version, would it be possible to use patchFile to substitute an alternate implementation?

@timotheecour
Copy link
Member

timotheecour commented Jun 2, 2021

@Araq here's the benchmark:

results

(taking best of 5 runs independently for CT vs RT, but it doesn't vary much between runs)
PR #18008:
CT: 0.195
RT: 2.188

PR #18139 (which just got merged in devel):
CT: 0.484
RT: 3.109

PR #18008 is ~1.4X faster at RT and ~2.5X faster at CT compared to #18139

details

# for PR #18008
nim r -o:/tmp/z03 -d:case_pr_18008 -d:danger --experimental:vmopsdanger -f t12341.nim
# for PR #18139
nim r -o:/tmp/z02 -d:danger --experimental:vmopsdanger -f t12341.nim
  • --experimental:vmopsdanger -f are only needed for static: main() so that cpuTime works, and -f to force recompiling so that VM code re-runs, you can remove those if only looking at non-VM code

  • n is taken to be smaller for VM to avoid having to pass --maxLoopIterationsVM

  • in RT code, I'm using the fastest float to string code available for each approach, ie taking an array buffer instead of calling addFloat on a var string, to avoid the corresponding (small but nonzero) overhead for applications caring about optimal performance achievable

  • getFloat can be changed to any other code generating floats, but this template was chosen to minimize the cost (which would affect benchmarking with unrelated code) as well as generate realistic floats (hence starting at 1.0 instead of at 0, which would just generate subnormal numbers); using rand(1.0) is possible too but would add to the timing because rand isn't as cheap;

  • cast[float](start + (i * 2038074743)) (with mult by 100000000'th prime) or similar is another sensible choice which would generate uniform log-spaced floats from 1.0 to 4e13; the RT factor in this case is 1.3X instead of 1.4X, presumably because of the extra cost for multiplication

benchmark code

# t12341.nim:
when true:
  import std/times
  when defined case_pr_18008:
    # PR https://github.com/nim-lang/Nim/pull/18008
    # 3080108d57ea553142036764a118921845c9005f, as of revert the diff in system.nim (moved it to another PR)
    import std/strfloats
    const N = 64
  else:
    # after 63db2b19bf0787db89ce89c8861e976d44e33cbd https://github.com/nim-lang/Nim/pull/18139
    import system/formatfloat
    const N = 65

  const start = cast[uint](1.0)
  template getFloat(i): float =
    cast[float](start + i) # consecutive floats
    # cast[float](start + (i * 2038074743)) # consecutive floats spaced by the 100000000'th prime

  proc main() =
    var s = ""
    var c = 0
    var n = 0'u
    when nimvm:
      n = 1_000_000'u # can increase if you also increase `--maxLoopIterationsVM`
    else:
      n = 100_000_000'u
    var buf: array[N, char]
    let t = cpuTime()
    for i in 0'u ..< n:
      let f = getFloat(i)
      when nimvm:
        s.setLen 0
        s.addFloat f
        c+=s.len
      else:
        # fastest float to string (avoiding addFloat overhead)
        when defined case_pr_18008:
          let m = toString(buf, f)
        else:
          let m = writeFloatToBuffer(buf, f)
        c+=m
    echo (c, cpuTime() - t)
  static: main() # requires # --experimental:vmopsdanger -f
  main()

conclusions

var a = 1.1'f32
doAssert $a == "1.1", $a # fails

as explained in #18008 (comment), Drachennest-schubfach_32 seems like the simplest approach, as I did here: timotheecour#732 using lib/vendor/drachennest/schubfach_32.cc from https://github.com/abolz/Drachennest/blob/master/src/schubfach_32.cc; this would be the code to port/wrap
=> followup #18148

  • independently of all this, IMO addDependency is still a desirable feature for reasons described here 10-70x faster $float, roundtrip + correct rounding via dragonbox #18008 (comment); that's the simplest/most flexible way to handle complex dependencies like those needed for std/decimals, std/bigints, etc, which are significantly more involved than dragonbox.cc (for which c2nim required already manual patching)

kaushalmodi added a commit to kaushalmodi-forks/parsetoml that referenced this pull request Jun 7, 2021
With the float string representation fixed in Nim devel (See
nim-lang/Nim#18139), that uncovered a bug in
the logic for parsing float TOML values.

For e.g. to parse "foo = 0.123", internally, 0.1 + 0.02 + 0.003 was
done which would evaluate to 0.12300000000000001. Earlier
float stringification bug caused that value to print out as "0.123"
instead of "0.12300000000000001".

This is now fixed by using parseutils.parsefloat, which parses the
"0.123" float value as 0.123 and not 0.12300000000000001.

Fixes NimParsers#45.
kaushalmodi added a commit to kaushalmodi-forks/parsetoml that referenced this pull request Jun 7, 2021
With the float string representation fixed in Nim devel (See
nim-lang/Nim#18139), that uncovered a bug in
the logic for parsing float TOML values.

For e.g. to parse "foo = 0.123", internally, 0.1 + 0.02 + 0.003 was
done which would evaluate to 0.12300000000000001. Earlier
float stringification bug caused that value to print out as "0.123"
instead of "0.12300000000000001".

This is now fixed by using parseutils.parsefloat, which parses the
"0.123" float value as 0.123 and not 0.12300000000000001.

Fixes NimParsers#45.
kaushalmodi added a commit to kaushalmodi-forks/parsetoml that referenced this pull request Jun 7, 2021
With the float string representation fixed in Nim devel (See
nim-lang/Nim#18139), that uncovered a bug in
the logic for parsing float TOML values.

For e.g. to parse "foo = 0.123", internally, 0.1 + 0.02 + 0.003 was
done which would evaluate to 0.12300000000000001. Earlier
float stringification bug caused that value to print out as "0.123"
instead of "0.12300000000000001".

This is now fixed by using parseutils.parsefloat, which parses the
"0.123" float value as 0.123 and not 0.12300000000000001.

Fixes NimParsers#45.
kaushalmodi added a commit to NimParsers/parsetoml that referenced this pull request Jun 7, 2021
With the float string representation fixed in Nim devel (See
nim-lang/Nim#18139), that uncovered a bug in
the logic for parsing float TOML values.

For e.g. to parse "foo = 0.123", internally, 0.1 + 0.02 + 0.003 was
done which would evaluate to 0.12300000000000001. Earlier
float stringification bug caused that value to print out as "0.123"
instead of "0.12300000000000001".

This is now fixed by using parseutils.parsefloat, which parses the
"0.123" float value as 0.123 and not 0.12300000000000001.

Fixes #45.
kaushalmodi added a commit to kaushalmodi/Nim that referenced this pull request Jun 10, 2021
kaushalmodi added a commit to kaushalmodi/Nim that referenced this pull request Jun 23, 2021
kaushalmodi added a commit to kaushalmodi/Nim that referenced this pull request Jun 24, 2021
Araq pushed a commit that referenced this pull request Jun 24, 2021
PMunch pushed a commit to PMunch/Nim that referenced this pull request Mar 28, 2022
* use dragonbox algorithm; alternative to nim-lang#18008
* removed unsafe code
PMunch pushed a commit to PMunch/Nim that referenced this pull request Mar 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants