Improve IpAddr/SocketAddr serialization by avoiding Display #2001

saethlin · 2021-03-16T14:24:54Z

I'm opening a draft PR because I'd appreciate feedback, but this is definitely not done.

In some profiling, I've found that serializing Ipv4Addr to JSON is disproportionately slow. This (currently crude) patch is ~4x faster. I'm no expert on integer formatting, but I think I can improve the implementation by running cargo expand on itoa then pasting in the chunks I need. As far as I can tell, going through itoa's public interface is not faster than this code because LLVM keeps the itoa::Buffer in memory. Done. The algorithm is faster.

Is there any concern with not using std::fmt::Display? As far as I can tell from experimenting, there's basically no opportunity to make the standard library formatting code faster because it uses so much dynamic dispatch.

dtolnay

Would you be able to share your benchmark showing 4x faster serialization to JSON?

saethlin · 2021-03-21T22:58:54Z

🤦 Yes

use std::net::Ipv4Addr;
fn main() {
    let mut buf = [0u8; 15];
    for _ in 0..10_000_000 {
        ser_ip(&mut buf, &Ipv4Addr::new(127, 10, 0, 1));
    }
}

#[inline(never)]
fn ser_ip(buf: &mut [u8], thing: &Ipv4Addr) {
    serde_json::to_writer(buf, thing).unwrap();
}

$ cargo build --release
$ perf stat -r10 ./target/release/scratch

I'm aware this is not a typical way to do benchmarking in the Rust community, but I find it is much more stable than using cargo bench or criterion, probably because it averages over a few process startups.

dtolnay · 2021-03-21T23:31:50Z

Would it be worth comparing against an unsafe version that removes the zero initialization of the whole buffer and the utf8 validation? My expectation from benchmarking itoa in the past is that those unnecessary things account for 30-40% of this code still. We can also consider removing all those bounds checks in format_u8 but that tends not to be measurable in a microbenchmark, since they're 100% correctly predicted, and the performance implication is only in polluting the branch predictor.

saethlin · 2021-03-22T22:16:25Z

Would it be worth comparing against an unsafe version that removes the zero initialization of the whole buffer and the utf8 validation? My expectation from benchmarking itoa in the past is that those unnecessary things account for 30-40% of this code still.

In this case, it appears the utf8 validation is ~30% of the benchmark's runtime. Adding #[inline] to the function in the standard library causes it to be inlined but not cleaned up very much, but even so this takes off half the overhead. Still seems like a silly thing to be spending cycles and code size on.

As far as I can tell, leaving the buffer uninitialized doesn't help at all. But initializing it with b'.' instead of 0u8 and skipping over writing each b'.' knocks a few percent off the runtime. The annotated assembly from perf suggests the initialization instructions are ~0.10% of total runtime. Maybe things look different in a real program but they can't be that much different. Maybe this will matter more for Ipv6 addresses.

We can also consider removing all those bounds checks in format_u8

As far as I can tell, there are no bounds checks emitted. It looks like LLVM correctly figures out that the buffer is large enough.

After these improvements, the big hog in the benchmark is calling serialize_str which is ~57% of runtime, and the time is spent in the JSON escaping loop and the memset calls that LLVM emits to write the string delimiters (which I've reported here). It would be nifty to skip over the escaping, but it's unclear to me if that's possible.

dtolnay · 2021-03-22T22:22:49Z

Thanks -- LGTM once tests pass.

saethlin · 2021-03-22T22:48:55Z

I've made the tests pass; do we want to give the same treatment to SocketAddrV4 or the IpV6 types? Or save that for another PR?

dtolnay

Thanks, looks good.

Yes I am open to optimizing SocketAddrV4 and the V6 types too, if it matters to anyone.

Faster Ipv4 serialization prototype

857a805

saethlin force-pushed the optimize-ipaddr branch from 6639830 to 857a805 Compare March 16, 2021 14:41

use the algorithm from itoa

ba8c1d6

dtolnay reviewed Mar 21, 2021

View reviewed changes

skip UTF8 checking and initialize with b'.'

8bb07b0

Fix off-by-one mistake, explain the offset

4114e90

dtolnay approved these changes Mar 22, 2021

View reviewed changes

dtolnay merged commit 9be4c96 into serde-rs:master Mar 22, 2021

lrazovic mentioned this pull request Sep 10, 2021

Add more const fn functions svartalf/rust-macaddr#4

Open

jonasbb mentioned this pull request Sep 30, 2021

remove unsafe from serde and add #![forbid(unsafe_code)] #2096

Closed

dtolnay mentioned this pull request Sep 6, 2024

Remove 'useless' unsafe? #2819

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve IpAddr/SocketAddr serialization by avoiding Display #2001

Improve IpAddr/SocketAddr serialization by avoiding Display #2001

saethlin commented Mar 16, 2021 •

edited

Loading

dtolnay left a comment

saethlin commented Mar 21, 2021

dtolnay commented Mar 21, 2021

saethlin commented Mar 22, 2021 •

edited

Loading

dtolnay commented Mar 22, 2021

saethlin commented Mar 22, 2021 •

edited

Loading

dtolnay left a comment

Improve IpAddr/SocketAddr serialization by avoiding Display #2001

Improve IpAddr/SocketAddr serialization by avoiding Display #2001

Conversation

saethlin commented Mar 16, 2021 • edited Loading

dtolnay left a comment

Choose a reason for hiding this comment

saethlin commented Mar 21, 2021

dtolnay commented Mar 21, 2021

saethlin commented Mar 22, 2021 • edited Loading

dtolnay commented Mar 22, 2021

saethlin commented Mar 22, 2021 • edited Loading

dtolnay left a comment

Choose a reason for hiding this comment

saethlin commented Mar 16, 2021 •

edited

Loading

saethlin commented Mar 22, 2021 •

edited

Loading

saethlin commented Mar 22, 2021 •

edited

Loading