Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize parseLiteral for number-heavy JSON files (ala GeoJSON) #34

Merged
merged 2 commits into from
Sep 16, 2020

Conversation

mbullington
Copy link
Contributor

When parsing number-heavy files like GeoJSON, trying the naive parseFloat first then going for full JSON compatibility if NaN provides meaningful performance benefits.

Tried on a 75MB GeoJSON file, raw parse time was ~3500ms before change and ~2600-2700ms after.

Best.

@aeschli
Copy link
Contributor

aeschli commented Jun 9, 2020

Cool. Do you have examples of numbers that need to go through JSON.parse ? We should add some test cases: https://github.com/microsoft/node-jsonc-parser/blob/master/src/test/json.test.ts#L180

@mbullington
Copy link
Contributor Author

mbullington commented Jun 10, 2020

Good call. I switched from parseFloat(...) to Number(...) for correctness. While the scanner may prevent this beforehand, parseFloat("123abc") would return 123 instead of NaN as expected.

According to MDN the only cases Number(...) may not have JSON-parity are when there is a leading zero, for example 001 or -002.5. In JS these are valid but in JSON they are not, however not sure how strict node-jsonc-parser should be about this in the spirit of fault tolerance.

@aeschli
Copy link
Contributor

aeschli commented Jun 10, 2020

Actually, the scanner in scanNumber already ensures that the number format is correct following the spec from https://www.json.org/json-en.html.
So there can't be any numberics leading nulls in the parser and no extra validation on the number format is necessary. Any number at that point is a valid JSON number. I believe the only thing that can happen that it can't be that it can't be expressed by a JS number, e.g. as it is too big, but that's a not JSON issue. InvalidNumberFormat is wrong and probably whould be fixed, but not sure its wort the efford as it is a breaking change.

You changes look good. Any additional tests you want to add or are we already covered well enough?

@mbullington
Copy link
Contributor Author

I think the tests are already sufficient, I double checked to make sure exponent corner cases were tested. Thanks!

@mbullington
Copy link
Contributor Author

Hi @aeschli , are there any updates on this PR? Thanks!

@aeschli aeschli merged commit ad06ba4 into microsoft:master Sep 16, 2020
@aeschli
Copy link
Contributor

aeschli commented Sep 16, 2020

@mbullington Sorry for the wait, thanks for the PR!

@aeschli aeschli added this to the September 2020 milestone Sep 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants