Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jsonbig doesn't seem to be as performant #114

Closed
ryanwilliamquinn opened this issue Sep 2, 2014 · 22 comments
Closed

jsonbig doesn't seem to be as performant #114

ryanwilliamquinn opened this issue Sep 2, 2014 · 22 comments

Comments

@ryanwilliamquinn
Copy link
Contributor

i'm parsing large json responses (about 50mb). it works fine in 0.2.9, but in 0.3 it hangs on JSONbig.parse(text). if i change that back to JSON.parse it works again. maybe the answer is for me to get less data per batch, just thought i would let you know.

@marc-portier
Copy link
Collaborator

thx for letting us know, would be great if we could get a smoking gun on this (is it really size - off the whole set, or off one of the fields?, or rather depth/complexitiy or maybe one specific subtype of data that is causing this?)

Having a sample of json and/or test-case that fails would be most helpfull. Is there a sample you can share?

I'm also quite sure @sidorares will like to know about this to fix in https://github.com/sidorares/json-bigint

@lbdremy
Copy link
Owner

lbdremy commented Sep 2, 2014

Thanks @ryanwilliamquinn, like @marc-portier said it would be really helpful to have an example with which JSONbig.parse hangs, to be honest I haven't try with a sample of 50mb maybe the size's criteria is enough.

Also I need to benchmark JSON vs JSONbigint, if the performance are not similar (and this problem is not a bug), we should allow the user to switch between the two and document the problem.

@sidorares
Copy link

@ryanwilliamquinn can you post example data somewhere? ( link to dropbox etc ) Feel free to open https://github.com/sidorares/json-bigint/issues/new

@lbdremy it's definitely slower. Built-in JSON.parse is super optimized deep inside v8. For example, we are doing a lot of xml -> js transformations and tried to build objects in binary module. However, high level v8 api to build tree is still slower than "create large string + JSON.parse". I'll post benchmarks later here

@sidorares
Copy link

2322420 bytes JSON:
json-bigint: 7.34 per sec ( 0.13 sec )
native JSON.parse: 65.36 / sec ( 0,015 sec )

@ryanwilliamquinn
Copy link
Contributor Author

I'm not sure I can post the data, I will talk to my boss about it and get back to you.

I tried manually parsing the data in the node repl. I downloaded the json data from solr, read it from the file, and tried both parsers. JSON.parse works fine, in about 2 seconds. The json-big parser ends up throwing an error after a second, saying that the data:

has no method 'charAt'
at next (/Users/ryan.quinn/repos/Homes-SitemapCreator/node_modules/json-bigint/lib/parse.js:127:23)
at white (/Users/ryan.quinn/repos/Homes-SitemapCreator/node_modules/json-bigint/lib/parse.js:225:17)
at value (/Users/ryan.quinn/repos/Homes-SitemapCreator/node_modules/json-bigint/lib/parse.js:325:9)
at Function.parse (/Users/ryan.quinn/repos/Homes-SitemapCreator/node_modules/json-bigint/lib/parse.js:350:18)
at repl:1:17
at REPLServer.self.eval (repl.js:110:21)
at repl.js:249:20
at REPLServer.self.eval (repl.js:122:7)
at Interface. (repl.js:239:12)
at Interface.emit (events.js:95:17)

@ryanwilliamquinn
Copy link
Contributor Author

I talked to my boss and unfortunately we can't distribute the data. I can run any script you want on the data though.

@sidorares
Copy link

can you make sure input is a string (it looks like it's a Buffer)? I might add extra toString() on argument if it's not a string

@sidorares
Copy link

read it from the file, and tried both parsers

readFile returns buffer unless you pass encoding as extra argument

@lbdremy
Copy link
Owner

lbdremy commented Sep 2, 2014

Oops wrong click, adding back the message...

This is the code where the response is deserialized https://github.com/lbdremy/solr-node-client/blob/master/lib/solr.js#L715-L743, text is a utf8 string thanks to res.setEncoding('utf-8').

@ryanwilliamquinn
Copy link
Contributor Author

ah got it, sorry about that. here is the latest test i ran:

'use strict';
var fs = require('fs'),
  jb = require('json-bigint');

var buff = fs.readFileSync('/tmp/solrdata');
var data = buff.toString();

console.time('json.parse');
JSON.parse(data);
console.timeEnd('json.parse');

console.time('json-bigint.parse');
jb.parse(data);
console.timeEnd('json-bigint.parse');

here is the response:

json.parse: 1047ms

FATAL ERROR: JS Allocation failed - process out of memory
fish: Job 1, 'node jsonparsetest.js ' terminated by signal SIGABRT (Abort)

the error occurred after about 10 minutes. i could try with a smaller dataset if that would be helpful.

@ryanwilliamquinn
Copy link
Contributor Author

3.5M data:
json.parse: 73ms
json-bigint.parse: 510ms

17M data:
json.parse: 278ms
json-bigint.parse: 2506ms

35M data:
json.parse: 541ms
json-bigint.parse: 6075ms

52M:
json.parse: 809ms
json-bigint.parse: 8582ms

71M:
json.parse: 1086ms
FATAL ERROR: JS Allocation failed - process out of memory

@sidorares
Copy link

I think I need to rewrite it and have first pass producing valid json with big numbers replaced by string + some sentinel value and second pass - native JSON.parse + reviver

@lbdremy
Copy link
Owner

lbdremy commented Sep 3, 2014

Ok thanks @sidorares.
In the meantime, what are the alternatives for us? @marc-portier do you know if we can ask Solr to send big numbers as string? Like that we could cast those string (with a hint from Solr telling us which string are actual big number) into BigNumber with https://github.com/MikeMcl/bignumber.js/, after JSON.parse.

@lbdremy
Copy link
Owner

lbdremy commented Sep 3, 2014

The first thing to do is to allow the user to use the native JSON.parse by setting a flag on the Client instance.

Also @ryanwilliamquinn, do you need support for big number for your response or JSON.parse is enough for you?

@marc-portier
Copy link
Collaborator

@lbdremy I immediately (2-3 weeks ago) flagged the bignumber issue in SOLR land as: https://issues.apache.org/jira/browse/SOLR-6364

There already is a conceptual solution there to introduce some param (&json.long) and value ('long'=default or 'string' probably) to return these bigints in the json as being string.

But AFAICS they have no active plan to implement and release such thing. (Even if the own js based solr ui is affected by this as well.)

Obviously providing a patch might enhance their enthusiasm.

On short notice I think a client-options-switch to be able to choose between classic json or the json-bigint is our safest bet.

@ryanwilliamquinn
Copy link
Contributor Author

@lbdremy I do not need support for big number, JSON.parse is fine for me.

@lbdremy
Copy link
Owner

lbdremy commented Sep 6, 2014

Alright, considering the difference of performance between JSON and JSONbig lib, I kind want to switch back and use the native implementation as default. Obviously we will allow the user to switch to the JSONbig via a method or a flag (any suggestion for this method/flag name?). I will publish a new version, I will bump a minor version (0.4.0) because it breaks the current behavior (expecting big int to be parsed correctly). @marc-portier are you ok with that, anything I'm missing?

@sidorares
Copy link

@lbdremy I'll let you know when two-pass version of JSONbig is available ( it should have nearly native performance )

@lbdremy
Copy link
Owner

lbdremy commented Sep 7, 2014

Ok thanks @sidorares

@marc-portier
Copy link
Collaborator

@lbdremy 100% ok with the approach - makes sense
in terms of overlooking; don't think so: you'll probably want to have the classic json parse on by default? So you'll need to configure the client in the bigints-test differently then, but the failing test was going to tell you that for sure :) - hm, 2nd thought: considering full test coverage might be wise to run all tests (except the bigint one) with both json parsers, but that might be overdoing things? --> we could keep that as a todo in the issues?

And also: if #90 can slide into 0.4.0 that would be great, but if it needsmore work we should not keep others waiting... Thx.

@lbdremy
Copy link
Owner

lbdremy commented Sep 14, 2014

Alright this commit does the necessary 4c805c3 and this one 3ef7224 makes sure the test suite runs with JSON and JSONbig.

@lbdremy
Copy link
Owner

lbdremy commented Sep 14, 2014

Ok I think we are done with this, 0.4.0 has been released, let me know if you have troubles with it.

@lbdremy lbdremy closed this as completed Sep 14, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants