Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization performance regression #38021

Closed
dtolnay opened this issue Nov 26, 2016 · 10 comments
Closed

Serialization performance regression #38021

dtolnay opened this issue Nov 26, 2016 · 10 comments
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@dtolnay
Copy link
Member

dtolnay commented Nov 26, 2016

Between rustc 1.14.0-nightly (5665bdf3e 2016-11-02) and rustc 1.14.0-nightly (cae6ab1c4 2016-11-05) all of my serialization benchmarks showed a ~50% regression. It affected all the libraries (Serde, json-rust, rustc-serialize). Deserialization was not affected. So far I have narrowed it down to the following code:

#![feature(test)]

extern crate test;

#[bench]
fn bench(b: &mut test::Bencher) {
    const N: usize = 65536;
    let mut buf = Vec::with_capacity(N);
    b.iter(|| {
        buf.clear();
        for _ in 0..N {
            w(&mut buf, b'[').unwrap();
        }
    });
}

#[inline(never)]
fn w(buf: &mut Vec<u8>, b: u8) -> Result<(), ::std::io::Error> {
    buf.extend_from_slice(&[b]);
    Ok(())
}

On my linux laptop this regressed from 160μs to 220μs, on my mac laptop 215μs to 300μs.

cc @bluss who seems to know about extend_from_slice.

@bluss
Copy link
Member

bluss commented Nov 26, 2016

The TrustedLen PR is in that range, so it's quite possibly that. It changed the implementation of extend_from_slice.

git log --oneline --first-parent 5665bdf..cae6ab1
cae6ab1 Auto merge of #37470 - arthurprs:sip-smaller, r=alexcrichton
0883996 Auto merge of #37427 - nnethercote:opt-IchHasher, r=michaelwoerister
e96b9d2 Auto merge of #37422 - bluss:wrapping-offset, r=alexcrichton
713a360 Auto merge of #37356 - cristicbz:wrapsum, r=alexcrichton
81601cd Auto merge of #37306 - bluss:trusted-len, r=alexcrichton
ccfc38f Auto merge of #37167 - nikomatsakis:jroesch-issue-18937, r=pnkfelix
d2bc30b Auto merge of #37037 - Mark-Simulacrum:stack-error, r=alexcrichton
49c36bf Auto merge of #36306 - nagisa:mir-local-cleanup, r=eddyb
ac919fc Auto merge of #37541 - nikomatsakis:issue-37291, r=brson
f9f45c6 Auto merge of #36993 - nnethercote:obligation, r=nikomatsakis

@bluss bluss added A-libs I-slow Issue: Problems and improvements with respect to performance of generated code. labels Nov 26, 2016
@bluss
Copy link
Member

bluss commented Nov 26, 2016

I don't see much difference, the new code should be almost equivalent. One uses a counted loop and the other a pointer = pointer.offset(1) loop, though.

@bluss
Copy link
Member

bluss commented Nov 26, 2016

Ok just picking a recent rust commit to work off of (127a83d).

Command line:

cargo run --bin json-benchmark  --release --no-default-features --features="lib-json-rust performance file-canada"

With that compiler I get the benchmark result

     Running `target/release/json-benchmark`
                                DOM                STRUCT
======= json-rust ======== parse|stringify === parse|stringify ===
data/canada.json          10.5ms     4.3ms

If I put in the old implementation for extend_from_slice, I get this result

                                DOM                STRUCT
======= json-rust ======== parse|stringify === parse|stringify ===
data/canada.json          10.4ms     3.4ms

Do you think that could be it? That's the kind of difference? The test case in the original issue is not affected by the change, though.

@dtolnay
Copy link
Member Author

dtolnay commented Nov 26, 2016

That could be it. For me:

$ rustup run nightly-2016-11-03 cargo run --bin json-benchmark --release --no-default-features --features="lib-json-rust performance file-canada" -- -n 256
                                DOM                STRUCT
======= json-rust ======== parse|stringify === parse|stringify ===
data/canada.json           9.1ms     3.1ms
$ rustup run nightly-2016-11-06 cargo run --bin json-benchmark --release --no-default-features --features="lib-json-rust performance file-canada" -- -n 256
                                DOM                STRUCT
======= json-rust ======== parse|stringify === parse|stringify ===
data/canada.json           9.1ms     3.9ms

Can you try this case? For me this showed a 70% regression from 995μs to 1685μs.

#![feature(test)]

extern crate serde;
extern crate serde_json;
extern crate test;

#[derive(Clone, Default)]
struct Area(u32);

impl serde::Serialize for Area {
    fn serialize<S>(&self, serializer: &mut S) -> Result<(), S::Error>
        where S: serde::Serializer
    {
        let mut state = try!(serializer.serialize_struct("Area", 1));
        try!(serializer.serialize_struct_elt(&mut state, "blockIds", &[(); 0]));
        serializer.serialize_struct_end(state)
    }
}

#[bench]
fn bench(b: &mut test::Bencher) {
    let areas = vec![Area::default(); 65536];
    let len = serde_json::to_string(&areas).unwrap().len();
    let mut buf = Vec::with_capacity(len);
    b.iter(|| {
        buf.clear();
        serde_json::to_writer(&mut buf, &areas).unwrap();
    });
}

@bluss
Copy link
Member

bluss commented Nov 26, 2016

oh wow.

before test bench ... bench: 2,358,534 ns/iter (+/- 51,417)
after test bench ... bench: 1,641,021 ns/iter (+/- 33,637)

@bluss
Copy link
Member

bluss commented Nov 26, 2016

well something is regressing to a byte-by-byte loop there, and it's definitely .extend(slice.iter().cloned()), I thought SetLenOnDrop had worked around those (we used to have those for extend_from_slice too).

@bluss
Copy link
Member

bluss commented Nov 26, 2016

Ok it's not the same thing.

@bluss
Copy link
Member

bluss commented Nov 27, 2016

We've verified that putting back extend_from_slice's old implementation would work well, but I don't have a reduced test case. Would be ideal if .extend() could be fixed instead of the duplication.

@bluss
Copy link
Member

bluss commented Nov 27, 2016

This is enough to reproduce I think. The .extend() loop is byte-by-byte (albeit unrolled in this case). https://gist.github.com/bluss/61c32daae1bd31d9d7605983df3ace5f#file-extend_from_slice-ll-L146 Rust code is below the llvm ir.

@bluss
Copy link
Member

bluss commented Dec 5, 2016

The codegen issue is #38175, if this issue is resolved by other means (by reverting for example).

bors added a commit that referenced this issue Dec 8, 2016
Specialization for Extend<&T> for vec

Specialize to use copy_from_slice when extending a Vec with &[T] where
T: Copy.

This specialization results in `.clone()` not being called in `extend_from_slice` and `extend` when the element is `Copy`.

Fixes #38021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants