Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NaNs are weird #750

Closed
pkoppstein opened this issue Apr 14, 2015 · 24 comments
Closed

NaNs are weird #750

pkoppstein opened this issue Apr 14, 2015 · 24 comments

Comments

@pkoppstein
Copy link
Contributor

In brief:

$ jq -n '1e+1234 * 0'
null

In long:

/usr/local/bin/jq --version
jq-1.4
$ /usr/local/bin/jq -n '0 * 1e+1234'
/usr/local/bin/jq -n '0 * 1e+1234'
null

$ jq --version
jq-1.5rc1-57-gddad961
$ jq -n '0 * 1e+1234'
null
@wtlangford
Copy link
Contributor

So, I took a quick look and this is, oddly enough, intended behavior.

1e+1234 renders to +inf in a double representation. Once we do +inf * 0, we get back NaN.
Since JSON doesn't support NaN, we display null instead.

$ ./jq -n '1e+1234 * 0' --debug-dump-disasm
0000 TOP
0001 LOADK null
0003 RET

null
$ ./jq -n '1e+14 * 0' --debug-dump-disasm
0000 TOP
0001 LOADK 0
0003 RET

0

@pkoppstein
Copy link
Contributor Author

Hmmm.

$ jq -n '1e+1234'
1.7976931348623157e+308
$ jq -n '1.7976931348623157e+308'
1.7976931348623157e+308
$ jq -n '1.7976931348623157e+308 * 0'
0

@wtlangford
Copy link
Contributor

Right. Remember, 1.79769e+308 is the maximum value a double can contain, so 1.79769e+308 * 0 -should- return 0. I'm a little unclear on why 1e+1234 is returning dbl_max instead of inf, though. Maybe that should be the thing to look into.

@wtlangford
Copy link
Contributor

Also, it's worth pointing out that the null in my code is still a number, it's just being displayed as null due to how we print jvs.

@nicowilliams
Copy link
Contributor

This is a "bug" in the constant folding code. See the output of jq --debug-dump-disasm -n '0 * 1e+1234':

0000 TOP
0001 LOADK null
0003 RET

The code that produces this is:

193     case '*': res = jv_number(na * nb); break;

in parser.y, but if you were to step into this with a debugger you'd see
that jv_number() did not return jv_null(). Instead what happens is
that NaNs are rendered as "null" by the printer.

We could have the constant folding code special-case multiplication by zero
when the other number is not a NaN...

@nicowilliams
Copy link
Contributor

nicowilliams commented Apr 14, 2015 via email

@wtlangford
Copy link
Contributor

@nicowilliams I can confirm that everything's a NaN, not a null.
Interestingly, there's some code in jv_print.c (lines 167-170) that causes +inf and -inf to be printed as +DBL_MAX and -DBL_MAX, which is somewhat upsetting to me.

@nicowilliams
Copy link
Contributor

nicowilliams commented Apr 14, 2015 via email

@wtlangford
Copy link
Contributor

What's the best behavior, then? Does JSON allow +inf and -inf? How do we want to render these numbers that are too great in magnitude to store in double format?

@nicowilliams
Copy link
Contributor

nicowilliams commented Apr 14, 2015 via email

@nicowilliams
Copy link
Contributor

nicowilliams commented Apr 14, 2015 via email

@pkoppstein
Copy link
Contributor Author

Using JavaScript-C 1.8.5 2011-03-31:

$ js
js> 1e1234
Infinity
js> 0 * 1e1234
NaN
js> 1/0
Infinity

v8 gives the same results.

It seems to me that we'd get results that are both self-consistent and generally conformant with javascript if we treated infinities and NaN as null.

This would, however, mean that 1/0 would evaluate to null in contrast to the current behavior, which (misleadingly) shows the result as 1.7976931348623157e+308 --- "misleadingly" because currently 0 *(1/0) => null whereas (0 * 1.7976931348623157e+308) => 0.

@nicowilliams
Copy link
Contributor

nicowilliams commented Apr 14, 2015 via email

@pkoppstein
Copy link
Contributor Author

@nicowilliams asked:

What does JavaScript do when encoding a NaN/infinity in JSON?

js> JSON.stringify(1/0)
"null"
JSON.stringify(0*1e1234)
"null"

Just to be clear, the "bug" that I was referring to is the inconsistency in jq's handling of the various "edge cases", not its handling of any particular case.

While we're on the topic of null and arithmetic, it might be worth revisiting the fact that jq dislikes 0 * null, whereas it doesn't have the same prejudice against '{} + null' or '[] + null' or even 'true + null'.

@nicowilliams
Copy link
Contributor

@pkoppstein Maybe I'm being dense, but can you list the inconsistencies?

Addition of null to anything makes sense, but multiplication by null... what should that mean?

@pkoppstein
Copy link
Contributor Author

@nicowilliams asked:

Maybe I'm being dense, but can you list the inconsistencies?

I think you and @wtlangford have already identified the problem; specifically, you wrote:

@wtlangford I agree, that's weird and upsetting. Feel fre to fix it.

That is, the problem is exemplified by this transcript:

$ jq -n 1e1234
1.7976931348623157e+308
$ jq -n 1.7976931348623157e+308
1.7976931348623157e+308
$ jq -n 0*1.7976931348623157e+308
0
$ jq -n 0*1e1234
null

Regarding "multiplication by null", I was only pointing out that a case could be made for evaluating 0 * null as 0. The analogy is with '0 + null #=> 0'.

One could say, in the spirit of Nihil fit ex nihilo, that if the nothingness that is 0+null is 0, then the nothingness that is 0*null should also be 0.

Yes, I realize that one might not want 0 * (1/0) to be 0. But jq already allows the convenience of having 0+null evaluate to 0, and I think the same argument-from-convenience holds for 0*null.

@nicowilliams
Copy link
Contributor

@pkoppstein But that is an artifact of IEEE754 yielding a NaN in that case. I suppose we could check that we're multiplying by zero, and then always return zero. But... suppose instead of byte-compiling and interpreting that we were coding to LLVM and thus generated highly optimized native object code... each additional special case is one more branch we could not dispense with. I'm quite tempted to say that when you play with IEEE754, you get what you get, and you have to know going in. The alternative for me isn't to special case IEEE754 oddities, but to add an arbitrary precision mode, and that's not happening any time soon.

@nicowilliams
Copy link
Contributor

On the plus side, I finally added an ieee754 label for this sort of issue! :^)

@nicowilliams
Copy link
Contributor

Also, recall that we represent NaNs as null only on output. This results in this weirdness:

$./jq -n '1e+1234 * 0'
null
$ ./jq -n '1e+1234 * 0|type'
"number"
$

How about that. Internally, NaNs are numbers.

As to addition of null, the intent there almost certainly ad to do with addition of objects (where one may be null).

I'm inclined to leave everything as is, except that I'll consider special-casing multiplication by zero when the other operand is a number and not a NaN or infinity, but my take as to that is to reject it (for the reasons given a couple of comments back).

@nicowilliams nicowilliams changed the title bug: multiplication by 0 NaNs are weird Jun 18, 2015
@nicowilliams
Copy link
Contributor

@pkoppstein In master now:

% jq -n '1e+1234 * 0'
null
% jq -n '1e+1234 * 0 | isnan'
true

I suppose we could add new filters to go with numbers but for infinities and NaNs, but it's not clear that it'd be useful. Perhaps numbers should be redefined to filter out infinities and NaNs, or at least NaNs.

@pkoppstein
Copy link
Contributor Author

The "numbers" filter should, I believe, match the "number" type. Assuming the semantics of the latter is not going to change, maybe a new filter to select the ordinary (finite) numbers could be added. (Names such as "finitenumbers" or "decimals" should be OK.)

@nicowilliams
Copy link
Contributor

"decimals" won't do (they aren't, not internally). "finites" and "normals" would do, the former for not-infinite and not-NaN numbers, and the latter for finite sub-normals. This would match the C <math.h> isfinite() and isnormal() macros. (A not-normal finite would be a "sub-normal".)

@pkoppstein
Copy link
Contributor Author

@nicowilliams - Did you intend that "finites" would select non-numbers?

Currently:

def finites: select(isinfinite|not);

I assume you meant: select(isfinite)

@nicowilliams
Copy link
Contributor

On Wed, Jun 17, 2015 at 10:31:57PM -0700, pkoppstein wrote:

@nicowilliams - Did you intend that "finites" would select non-numbers?

Nah, just a thinko. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants