-
-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check if using simd-json
to parse rustdoc produces any speedup
#885
Comments
super quick first hacky test:
|
Assuming compile times don't get drastically worse, and that we wouldn't have some platform-related limitations by using |
I will like to tackle this issue with @Licenser, as to ensure its not being worked on by multiple people:) |
Hi, so along with @Licenser , we created a repo that extracted the key functionalities to allow easy benchmarking for others. Just a My results for last run (didnt include ouput of all of them since it would be too long) Gnuplot not found, using plotters backend
async_impl_future_equivalence/SIMD/parse
time: [1.2377 ms 1.2453 ms 1.2545 ms]
Found 10 outliers among 100 measurements (10.00%)
3 (3.00%) high mild
7 (7.00%) high severe
Benchmarking async_impl_future_equivalence/SERDE/parse: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.9s, enable flat sampling, or reduce sample count to 50.
async_impl_future_equivalence/SERDE/parse
time: [1.3678 ms 1.3811 ms 1.3938 ms]
Found 11 outliers among 100 measurements (11.00%)
9 (9.00%) high mild
2 (2.00%) high severe
auto_trait_impl_removed/SIMD/parse
time: [1.4148 ms 1.4170 ms 1.4192 ms]
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild
auto_trait_impl_removed/SERDE/parse
time: [1.5048 ms 1.5095 ms 1.5150 ms]
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) high mild
6 (6.00%) high severe
Benchmarking broken_rustdoc/SIMD/parse: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.4s, enable flat sampling, or reduce sample count to 50.
broken_rustdoc/SIMD/parse
time: [1.1311 ms 1.1615 ms 1.1874 ms]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
broken_rustdoc/SERDE/parse
time: [1.3656 ms 1.3718 ms 1.3805 ms]
Found 6 outliers among 100 measurements (6.00%)
6 (6.00%) high severe
constructible_struct_adds_field/SIMD/parse
time: [1.2849 ms 1.2906 ms 1.2973 ms]
Found 9 outliers among 100 measurements (9.00%)
1 (1.00%) high mild
8 (8.00%) high severe
constructible_struct_adds_field/SERDE/parse
time: [1.8033 ms 1.8195 ms 1.8392 ms]
Found 12 outliers among 100 measurements (12.00%)
4 (4.00%) high mild
8 (8.00%) high severe
Benchmarking constructible_struct_adds_private_field/SIMD/parse: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.2s, enable flat sampling, or reduce sample count to 50.
constructible_struct_adds_private_field/SIMD/parse
time: [1.3073 ms 1.3102 ms 1.3140 ms]
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) low mild
2 (2.00%) high mild
4 (4.00%) high severe
constructible_struct_adds_private_field/SERDE/parse
time: [1.8016 ms 1.8033 ms 1.8050 ms]
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) low mild
1 (1.00%) high mild
Sometimes results vary slightly more but maybe also @Licenser can share some of his results as well. I thought it would also be nice for others to try to share what they get, and then we can (depending on results) start following the next steps like maybe creating CI runs for it. As far as the crate you wanted to run
|
Am I reading the data correctly, that the EC2 crate parses half a second slower with SIMD than without? 👀 |
Well apparently so, but maybe something with my system as with Licenser's system it was not the case if i recall, so maybe he can chime in later. Thats why I wanted others to also test on that |
Yes, the context here is important. Those tests are run on a laptop, where CPUs tend to thermal throttle massively when more power-hungry instructions (aka SIMD) are run. That's part of why we set up a reproduction repository to get a better feeling about how the people running the tool will be affected; @dmatos2012 can share some numbers from a server too and I'll run them on my own machine later today as well |
Here the results from a Digital Oceans (2 core, 4GB memory) server results-server.txt |
The test crates aren't really the best representative for this kind of performance work. It's good to have a few of them for "just in case" reasons, but they are generally a few orders of magnitude too small to be a realistic stand-in for most real-world Rust crates. The crates where we do want to demonstrate consistent speedups are things like An additional axis I'd love to explore is how GitHub Actions runners fare with SIMD, on Linux / macOS / Windows. GitHub Actions is where a ton of our users will run |
ja copying the 500MB file to the server takes time 😂 , it's a WIP |
but FWIW results from my local machine are already in for the aws crate
I'll upload a full run of all tests when they're dune |
Thanks for being super thorough on this, I appreciate it! I'm a bit surprised the SIMD benefit is only about ~12%. Is there something we might be doing that is pessimizing for SIMD? Could we adapt our approach in some way to make it better for SIMD decoding? |
I will get later the results from the bigger crates you mentioned later, to see what the results are. It's good to know where the end goal is rather than extracting couple of We intially thought to start with the smaller ones as those easier to run and debug. |
A initial one would be using simd-json-derive instead of serde derive but I think that'd mean having a paralell crate for the rustdoc-types which could be challenging but there is a good bit of gains to be had usually - or even going untyped and using the I could see some optimisations to be made to simd-json too but I'm not sure how feasable they are a big thing is that simd-json copies the entire data in a seperate buffer and keeps an additional string buffer (that's lots of size) on a server the go-to way would be to re-use the buffers which armortizes that to a degree for a one of tool that's not an option, but it might be possible to avoid some of that allocation and copying (but there are benchmarks to be had to validate that as we'd trade branches for memory consumption) that'd allos allow us to use borrowd strings and for big crates that would avoid a ton of copying then there is the option to in-line the utf8 checking better but we'd need to validate first if we catually copy the data for the utf8 checks or if the compiler is smart enough to re-use registers (it often is) |
If we were able to show a concrete experiment that demonstrates significant benefits across a range of crates, I'd be happy to bring this up with the rustdoc maintainers as an option for upstreaming into the |
If there is a chance, it's worth doing the research :) if @dmatos2012 we can go through that experiment together. |
Yes, for sure @Licenser, if we do like with this initial task and I have kinda some guidance looks like something interesting to experiment indeed :) |
Many of the rustdoc JSON files we parse are large — from a few MB to ~500MB in size. In the largest cases, we spend ~5s parsing JSON per
cargo-semver-checks
run.Speeding up JSON parsing by switching from
serde_json
tosimd-json
might be an easy win. Let's check!simd-json
aws-sdk-ec2
cd
into the crate's directory, and use theRUSTDOCFLAGS="-Z unstable-options --document-private-items --document-hidden-items --output-format=json --cap-lints=allow" cargo doc --lib --no-deps
shell command which will putaws-sdk-ec2.json
into thetarget/doc
directory of the workspacedivan
orcriterion
to ensure the perf delta is real and not a measurement artifactcargo test
loads dozens of small JSON files, so you can check thoseThis is a good first issue for someone who is already familiar with Rust and wants to start contributing to this project. It might not be a good first issue for folks new to Rust.
The text was updated successfully, but these errors were encountered: