-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Grisu printing for floating-point numbers is now the standard.
This means that what we print for Float64s will *always* eval to the same floating-point value we originally had. Moreover, the printed form uses the least number of digits possible to accomplish this for any floating-point value. There are still some unresolved issues around 32-bit floats. See issue #335.
- Loading branch information
1 parent
1cfeaf6
commit d37b9df
Showing
3 changed files
with
54 additions
and
20 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
d37b9df
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coolness.
Why both
print_shortest
andshow(Float)
? Looks like there is duplicated code between the two. There should be one core routine that takes all options andshow
calls that with defaults.d37b9df
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I replied to this, but the reply appears to have gone missing. Basically, the two methods are similar, but not similar enough to make it easy to merge the two. Note, in particular that
print_shortest
does not use proper scientific notation, but rather always uses an integer mantissa followed by an exponent. That's because this "improper" form takes one less character on account of not having a.
in it. Also, we want these methods to be very fast for printing lots and lots of numbers — especiallyprint_shortest
since it's likely to be used to print huge volumes of data into CSV or TSV files. For that purpose, I suspect the grisu algorithm will be very effective at giving good compression. I'm guessing that grisu-formatted, bzipped CSV/TSV files will actually be a very compact data representation for a lot of real-world data, which tends to be more "special" when written in decimal than in binary.d37b9df
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This already saves so many digits that I don't feel the need to play games to save an extra byte.
We need to find out if performance tweaking on our end is helpful, or if all the time is already inside the grisu library. By merging the routines, we can avoid the ASCIIString and tuple allocation. There is also some array copying in the
print(digits[1:pt])
types of operations.It would also be nice to have the
num2str
function back.d37b9df
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick test and this is only about 2.6x slower than printing an integer. I'm quite pleased with that, considering the extra objects and indexing and such involved. Still might make sense to try to max out performance though.
In related news, it may be time for the built-in integer printing methods (based on printf) to die:
d37b9df
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's some value to being able to generate the absolute tersest human-readable ASCII representation of data. Saving a byte for every value adds up for large amounts of data. I'm going to leave it. Are you talking about merging all of these into a single function that contains the call to the grisu library? If so, that's a non-starter because I'm going to have to write a fair number of these to implement printf formatting and there's no way they're all going to be one huge function with a bazillion options. Once it all works, I can figure out how to avoid the
ASCIIString
allocation and substring copy. The most obvious thing would be for all of them to share a byte buffer and just have the core call return how many digits there are (which is what it actually returns). On the printing side, I need to expose a write method for printing part of a byte array without having to create a subarray copy.d37b9df
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woah. That's awesome. So the Julia-implemented integer printing is that much faster? Very cool. That only leaves bootstrapping concerns. It would really, really suck to not be able to print integers or floats before bootstrapping. Floats, sure, but integers, that's brutal. It's already hard enough to debug changes in, say,
range.j
.d37b9df
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can deal with that using the debug code block at the top of sysimg.j. We can put in a definition that does a ccall to print an integer. Then when our bootstrapping process improves we can just delete that whole block, and its functionality will be provided by a prior version of the julia library.
d37b9df
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that's fair enough. Moving more things from C to Julia (and gaining performance) is always good.
d37b9df
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, we have
write(stream, pointer(arr, start), len)
. But this reminds me we need awrite
method forSubArray
.d37b9df
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, I can use that.