-
Notifications
You must be signed in to change notification settings - Fork 647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proof of concept: new JSON parser #933
base: main
Are you sure you want to change the base?
Conversation
Wow Thanks for publishing that! I haven't even looked at the code yet, but I started looking at simdjson, and it appears to only support 64-bit CPUs https://github.com/simdjson/simdjson/blob/master/doc/implementation-selection.md . Since lots of ARM32 devices are running Hermes, it would be interesting to know whether it is supported and what the performance is there. EDIT: I really enjoyed reading your blog post! We will start the internal discussion right away and will try to keep as much of it as possible public here in the comments. |
Thanks! This seems incorrect, I believe simdjson will use EDIT: Okay, I think |
Some additional information...
|
@lemire Hermes is intended primarily for mobile devices, many of which are still running underpowered ARM32 CPUs. Do you have an intuition of the performance there, or do you know of existing usages in a similar environment? |
Hey @radex, thanks for this PR! I have just got around to reading your blogpost and slowly starting to poke around at some of the code. That was a great write-up, and the team is super appreciative of all the effort you put in to it! There's a lot to respond to, but here are some of my initial thoughts. V8's JSON.parse is still >2x faster than Hermes without pulling in an external library like simdjson. I think, as you note in your blogpost, there is probably a lot more room for optimization aside from the direct scanning. A very rich area to explore is the concept of speculation, as you say. V8 implements an iterative scanner, and interestingly they do not build their VM data structures in-place as opposed to Hermes. This allows them to do some things like allocating VM objects and arrays with the proper sizes. Here is a relevant code pointer if you are interested. Just to get a taste of some of the things V8 does, here is one cool optimization that I actually have a prototype diff for in Hermes. Basically, they check the 'next' transition on the current hidden class, and compare that to the string they just finished parsing. If it matches, no need to perform a hash computation+lookup into the identifier table. I 100% agree with you on a fast-path API for setting properties on a JSON object. I also noticed you even added some missing If we did go the route of pulling in simdjson/some version of it, I wonder if the absolute best-performance would mean including some VM concepts inside of the parsing itself. For example: constructing the Symbol hash incrementally, which is an optimization I landed a couple weeks ago. Maybe we wouldn't have to write another separate pass over the data structure that simdjson produces. I will continue reviewing the PR and start the discussion with the team of the best way of incorporating these ideas into Hermes. Thanks again, it has been very fun so far reading the blog and code :) |
@fbmal7 Thanks for insight on V8 parser, very interesting! |
Hey @radex, sorry we have not been able to get back sooner. We have still been discussing this PR as a team. I have been looking through your code more recently and have some more thoughts to share. I think I would lean in on the opinion I shared in my initial reaction: simdjson / simdutf are not the best ways to improve our JSON parser. Potential perf gains aside, this is also a dependency the team would have to take on which would come with binary size increase and also maintenance overhead. All in all, I don't think we have the capacity right now to be able to import these libraries. However, I do think your PR shows scattered inefficiencies across the parser and also how we deal with strings, specifically with regards to how we treat UTF16/UTF8 strings. I still really like your ideas on speculation and I absolutely think we should use fast paths for setting values on objects and arrays. I would be very interested to see how much faster the parser would be if all the non-simdjson changes were kept. |
@fbmal7 Thanks for the answer! I suspected that this would be your decision, and if I were in Hermes team's shoes, I'd probably also be reluctant to take on simdjson as a dependency. Regardless, I am also curious to see how the other perf improvements (and some ideas not fully fleshed out) will stack up in the future :) |
@radex have you considered publishing your work as an independent library (I have thought of a great name: fastjson.npm)? My feeling is that simdjson and simdutf are amazing libraries that provide a real performance improvement, but they are targeted more towards server and desktop use cases. So making them the default for Hermes is not necessarily a good tradeoff, but it would be nice if people still had the option to choose it. |
@tmikov I have considered it, but I don't think I can achieve much of a perf win from simdjson via JSI, access to Hermes-internal API seems essential |
@radex good point. I assumed that it would still use the internal APIs, but I realize now that it is impossible in practice - they aren't exported. Hmm. I need to think more about cases like this. |
@radex some further thoughts that may be relevant in this comment: #684 (comment) |
Hi, I've been playing around for fun with the JSON parser, and made a new proof of concept implementation, which in my measurements is ~2x faster to the current implementation. Some more detail on it here: https://radex.io/react-native/json-parse/ , but the most important points are:
The question to Hermes team: is there any interest in possibly adding simdjson and simdutf as dependencies to Hermes? I can imagine that given the project's main goals, this might be a hard sell, as they are quite large. They added ~180KB (~4%) to bin/hermes in release mode, but TBH I haven't messed with build/linker settings to see if they're really that large, or if unused code isn't properly pruned.
Please treat this PR as just a proof of concept, the diff is full of junk and duplicated (rather than changed) code, for my own convenience while experimenting. If there is any interest in using any of this code, this would be submitted in multiple clean PRs.