Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Produce valid JSON in the presence of NaN or INF values #22

Closed
JonjonHays opened this issue Nov 23, 2019 · 7 comments
Closed

Produce valid JSON in the presence of NaN or INF values #22

JonjonHays opened this issue Nov 23, 2019 · 7 comments
Labels
tag:numerical tag:reference info that may be of interest to future audiences type:bug
Milestone

Comments

@JonjonHays
Copy link
Contributor

Currently the backend server extension is using Python's native JSON package to serialize the payload. Python's JSON library will produce invalid JSON in the presence of NaN or Inf values. The JSON library will add values equivalent to Javascript's NaN and Inf values, but unfortunately these values are not considered valid JSON, and we fail on the frontend when parsing the JSON payload. While this could be handled on the frontend, I think it makes more sense to handle this server-side. Unfortunately, I wasn't able to find a solution for this without bringing in a third party dependency. The simplejson library provides an ignore_nan flag as input to its dump function that encodes NaN values as None/null. While Javascript data rendering and plotting libraries typically prefer null values, Python libraries (e.g, Matplotlib) tend to prefer NaN values, and may produce errors in the presence of null values. Is it likely this extension will ever be called from a non-Javascript context? If not, it might make sense to simply encode NaN values as null by default. However, if it is likely that this extension will be called from a non-Javascript context, we could add an ignoreNan flag to the query parameters. In my PR for n-dim support, I added an implementation of the former. Here's a discussion on this topic:

https://stackoverflow.com/questions/15228651/how-to-parse-json-string-containing-nan-in-node-js

@telamonian
Copy link
Member

Erg. I'd classify this as a major issue. I'm gonna do a quick whip-around of low-level javascript experts and see what they think

@kgryte
Copy link
Member

kgryte commented Nov 25, 2019

The standard approach for dealing with custom values in serialized JSON-like blobs in JavaScript is to use revivers. For example, supposing that Python inserts NaN and Infinity into a JSON blob, thus generating something like

var str = '{"a":NaN,"b":Infinity,"c":[NaN,Infinity,-Infinity]}';

then you'd create the following reviver function

function reviver( key, value ) {
    if ( value === '***NaN***' ) {
        return NaN;
    }
    if ( value === '***Infinity***' ) {
        return Infinity;
    }
    if ( value === '***-Infinity***' ) {
        return -Infinity;
    }
    return value;
}

which can then be applied as follows:

var tmp = str.replace( /(NaN|-?Infinity)/g, '***$1***' );
var obj = JSON.parse( tmp, reviver );
// returns { a: NaN, b: Infinity, c: [ NaN, Infinity, -Infinity ] }

thus, behaving as expected.

Two caveats:

  1. You should expect a perf hint on the frontend in JupyterLab, as applying a custom reviver can be relatively expensive for larger JSON blobs.
  2. If your serialized JSON blob can contain the character sequences NaN and Infinity in places other than values, such as in a string of text, you'll need to improve the regular expression provided to replace.

The other alternative is to use a custom JavaScript JSON parser which can handle NaN and Infinity, which is straightforward based on Crockford's work.

Hope this helps.

@kgryte
Copy link
Member

kgryte commented Nov 25, 2019

One additional addendum:

One can avoid the string replace if the serialization code properly encodes special values, such as NaN and Infinity. For example, in Python, one can provide a parse_constant callback.

Doing this work upon serialization, while not obviating the need for a custom reviver, should improve performance upon parsing the blob in JavaScript and circumvents tricky edge cases such as character sequences appearing in places other than as JSON values (e.g., such as in the middle of a string).

@telamonian
Copy link
Member

Thanks @kgryte for the detailed info.

I feel like I missed a beat somewhere. If we ignore the possibility of NaN or Infinity occurring within normal strings (which I think will be safe for this particular application), why can't the reviver just be:

function reviver( key, value ) {
    if ( value === 'NaN' ) {
        return NaN;
    }
    if ( value === 'Infinity' ) {
        return Infinity;
    }
    if ( value === '-Infinity' ) {
        return -Infinity;
    }
    return value;
}

?

Or is this exactly what you were referring to in your addendum?

@kgryte
Copy link
Member

kgryte commented Nov 25, 2019

Because value arguments are deserialized values. If you attempt to parse a JSON-like string with bare NaN and Infinity values, you should get a parse error if a parser conforms to the JSON-specification. E.g.,

JSON.parse('{"a":NaN,"b":Infinity}')
// throws SyntaxError: Unexpected token N in JSON at position 5

If the serialization logic happens to encode NaN and Infinity as string values, e.g.,

var str = '{"a":"NaN","b":"Infinity"}';

then, yes, you can simplify the reviver implementation to what you describe.

@kgryte
Copy link
Member

kgryte commented Nov 25, 2019

I chose to replace with '***$1***', but this could be anything to serve as a sentinel, including just '$1'.

@telamonian telamonian added this to the reference milestone Dec 14, 2020
@telamonian
Copy link
Member

This issue should be handled by the work in #41, so I'm closing this

@telamonian telamonian added the tag:reference info that may be of interest to future audiences label Dec 14, 2020
@telamonian telamonian modified the milestones: reference, 0.5.0 Dec 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tag:numerical tag:reference info that may be of interest to future audiences type:bug
Projects
None yet
Development

No branches or pull requests

3 participants