Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof of concept: new JSON parser #933

Open
wants to merge 54 commits into
base: main
Choose a base branch
from
Open

Conversation

radex
Copy link
Contributor

@radex radex commented Mar 4, 2023

Hi, I've been playing around for fun with the JSON parser, and made a new proof of concept implementation, which in my measurements is ~2x faster to the current implementation. Some more detail on it here: https://radex.io/react-native/json-parse/ , but the most important points are:

  • I've used simdjson as the underlying JSON parser (note: simdjson is already used successfully in React Native land by WatermelonDB)
  • I've replaced Unicode transcoding functions with simdutf (note: Node and Bun use simdutf for Unicode transcoding)
  • JSON.parse-specific fast paths for setting object properties and array items (not nicely extracted into an JSObject API in this PR, just turned some of its private methods to public so I could use them in the PoC), also SymbolID fast path

The question to Hermes team: is there any interest in possibly adding simdjson and simdutf as dependencies to Hermes? I can imagine that given the project's main goals, this might be a hard sell, as they are quite large. They added ~180KB (~4%) to bin/hermes in release mode, but TBH I haven't messed with build/linker settings to see if they're really that large, or if unused code isn't properly pruned.

Please treat this PR as just a proof of concept, the diff is full of junk and duplicated (rather than changed) code, for my own convenience while experimenting. If there is any interest in using any of this code, this would be submitted in multiple clean PRs.

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Mar 4, 2023
@radex radex mentioned this pull request Mar 4, 2023
1 task
@tmikov
Copy link
Contributor

tmikov commented Mar 4, 2023

Wow Thanks for publishing that! I haven't even looked at the code yet, but I started looking at simdjson, and it appears to only support 64-bit CPUs https://github.com/simdjson/simdjson/blob/master/doc/implementation-selection.md .
I am not sure why, but it states (perhaps incorrectly?) that even its fallback implementation requires a 64-bit CPU.

Since lots of ARM32 devices are running Hermes, it would be interesting to know whether it is supported and what the performance is there.

EDIT: I really enjoyed reading your blog post! We will start the internal discussion right away and will try to keep as much of it as possible public here in the comments.

@radex
Copy link
Contributor Author

radex commented Mar 4, 2023

Thanks! This seems incorrect, I believe simdjson will use generic implementation fine on 32-bit https://github.com/simdjson/simdjson/blob/ec0b48b7722112e4fa005e557aa0531756c1718a/include/simdjson/portability.h#L89-L90 but I must admit that I haven't checked this, and I haven't benchmarked on Android.

EDIT: Okay, I think fallback and generic are two different things. fallback is a very simplified multi-arch SIMD implementation, and generic is a scalar implementation

@lemire
Copy link

lemire commented Mar 6, 2023

Some additional information...

  1. The simdutf library used in this project is part of bun and node.js.
  2. The simdjson runs 'everywhere': we even test on big-endian mainframes. IBM engineers contributed a POWER kernel. Intel engineers contributed an AVX-512 kernel. Evidently, the kernels matching recent commodity hardware are more performant.

@tmikov
Copy link
Contributor

tmikov commented Mar 6, 2023

@lemire Hermes is intended primarily for mobile devices, many of which are still running underpowered ARM32 CPUs. Do you have an intuition of the performance there, or do you know of existing usages in a similar environment?

@lemire
Copy link

lemire commented Mar 6, 2023

@tmikov We have decent fallback performance in simdjson thanks to @jkeiser’s great engineering. Of course, our purpose is to milk the performance of higher performance 64-bit commodity cores.

@fbmal7
Copy link
Contributor

fbmal7 commented Mar 6, 2023

Hey @radex, thanks for this PR! I have just got around to reading your blogpost and slowly starting to poke around at some of the code. That was a great write-up, and the team is super appreciative of all the effort you put in to it!

There's a lot to respond to, but here are some of my initial thoughts.

V8's JSON.parse is still >2x faster than Hermes without pulling in an external library like simdjson. I think, as you note in your blogpost, there is probably a lot more room for optimization aside from the direct scanning. A very rich area to explore is the concept of speculation, as you say. V8 implements an iterative scanner, and interestingly they do not build their VM data structures in-place as opposed to Hermes. This allows them to do some things like allocating VM objects and arrays with the proper sizes. Here is a relevant code pointer if you are interested. Just to get a taste of some of the things V8 does, here is one cool optimization that I actually have a prototype diff for in Hermes. Basically, they check the 'next' transition on the current hidden class, and compare that to the string they just finished parsing. If it matches, no need to perform a hash computation+lookup into the identifier table.

I 100% agree with you on a fast-path API for setting properties on a JSON object. I also noticed you even added some missing LLVM_UNLIKELYs in our existing code, thanks for that!

If we did go the route of pulling in simdjson/some version of it, I wonder if the absolute best-performance would mean including some VM concepts inside of the parsing itself. For example: constructing the Symbol hash incrementally, which is an optimization I landed a couple weeks ago. Maybe we wouldn't have to write another separate pass over the data structure that simdjson produces.

I will continue reviewing the PR and start the discussion with the team of the best way of incorporating these ideas into Hermes. Thanks again, it has been very fun so far reading the blog and code :)

@radex
Copy link
Contributor Author

radex commented Mar 7, 2023

@fbmal7 Thanks for insight on V8 parser, very interesting!

@fbmal7
Copy link
Contributor

fbmal7 commented Jun 2, 2023

Hey @radex, sorry we have not been able to get back sooner. We have still been discussing this PR as a team. I have been looking through your code more recently and have some more thoughts to share. I think I would lean in on the opinion I shared in my initial reaction: simdjson / simdutf are not the best ways to improve our JSON parser. Potential perf gains aside, this is also a dependency the team would have to take on which would come with binary size increase and also maintenance overhead. All in all, I don't think we have the capacity right now to be able to import these libraries.

However, I do think your PR shows scattered inefficiencies across the parser and also how we deal with strings, specifically with regards to how we treat UTF16/UTF8 strings. I still really like your ideas on speculation and I absolutely think we should use fast paths for setting values on objects and arrays. I would be very interested to see how much faster the parser would be if all the non-simdjson changes were kept.

@radex
Copy link
Contributor Author

radex commented Jun 5, 2023

@fbmal7 Thanks for the answer! I suspected that this would be your decision, and if I were in Hermes team's shoes, I'd probably also be reluctant to take on simdjson as a dependency.

Regardless, I am also curious to see how the other perf improvements (and some ideas not fully fleshed out) will stack up in the future :)

@tmikov
Copy link
Contributor

tmikov commented Jun 5, 2023

@radex have you considered publishing your work as an independent library (I have thought of a great name: fastjson.npm)?

My feeling is that simdjson and simdutf are amazing libraries that provide a real performance improvement, but they are targeted more towards server and desktop use cases. So making them the default for Hermes is not necessarily a good tradeoff, but it would be nice if people still had the option to choose it.

@radex
Copy link
Contributor Author

radex commented Jun 5, 2023

@tmikov I have considered it, but I don't think I can achieve much of a perf win from simdjson via JSI, access to Hermes-internal API seems essential

@tmikov
Copy link
Contributor

tmikov commented Jun 10, 2023

@radex good point. I assumed that it would still use the internal APIs, but I realize now that it is impossible in practice - they aren't exported. Hmm. I need to think more about cases like this.

@tmikov
Copy link
Contributor

tmikov commented Jun 30, 2023

@radex some further thoughts that may be relevant in this comment: #684 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants