Skip to content

Conversation

@h-a-n-a
Copy link
Contributor

@h-a-n-a h-a-n-a commented Aug 15, 2024

Background:

When I was migrating PathBuf to Utf8PathBuf, etc, I found out some regression in our benchmarks. Then I found out as_str is actually not cost-free as in the older version of rustc there's no way to get the underlying bytes out of an OsStr until 1.74.0.

In this PR, with the help of OsStr::as_encoded_bytes was stabilized in 1.74.0, We can perform a cost-free conversion from &OsStr to &str with constraint of it's underlying bytes are UTF-8 encoded.

Benchmark:

I did an ad-hoc benchmark with the following code and turned out the time cost is a constant now.

code
use std::ffi::OsStr;
use criterion::{black_box, criterion_group, criterion_main, BenchmarkId, Criterion};

fn bench_path(c: &mut Criterion) {
    let mut group = c.benchmark_group("match UTF-8 validation check");
    for i in [10, 100, 1000, 10000] {
        let p = "i".repeat(i);
        group.bench_with_input(BenchmarkId::new("osstr to_str", i), &p, |b, i| {
            b.iter(|| {
                let a = OsStr::new(black_box(i));
                let _ = unsafe { black_box(a).to_str().unwrap_unchecked() };
            })
        });

        group.bench_with_input(BenchmarkId::new("osstr as_encoded_bytes", i), &p, |b, i| {
            b.iter(|| {
                let a = OsStr::new(black_box(i));
                let _ = unsafe { std::str::from_utf8_unchecked(black_box(a).as_encoded_bytes()) };
            })
        });
    }
}
criterion_group!(benches, bench_path);
criterion_main!(benches);

Result:

// String length of 10
osstr to_str/10 
                        time:   [5.9769 ns 5.9913 ns 6.0060 ns]
osstr as_encoded_bytes/10 
                        time:   [554.90 ps 558.32 ps 562.19 ps]

// String length of 100
osstr to_str/100
                        time:   [6.6113 ns 6.6250 ns 6.6404 ns]
osstr as_encoded_bytes/100
                        time:   [553.18 ps 557.33 ps 561.68 ps]

// String length of 1000
osstr to_str/1000
                        time:   [26.990 ns 27.033 ns 27.086 ns]
osstr as_encoded_bytes/1000
                        time:   [553.66 ps 560.67 ps 570.42 ps]

// String length of 10000
osstr to_str/10000
                        time:   [310.17 ns 310.77 ns 311.32 ns]
osstr as_encoded_bytes/10000
                        time:   [550.98 ps 555.16 ps 559.53 ps]

///
/// * On Unix, a path is absolute if it starts with the root, so
/// `is_absolute` and [`has_root`] are equivalent.
/// `is_absolute` and [`has_root`] are equivalent.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunshowers
Copy link
Collaborator

sunshowers commented Aug 15, 2024

Thank you for the fantastic contribution! I'd also love to get your benchmark into the repo -- would you be willing to update this PR with it, otherwise is it okay if I pull it in? Using criterion as you've done is great.

There is some history here, which is that we previously did pointer casting from OsStr to str. But it was privately flagged to me that there's no repr(transparent) relationship between the two so that wasn't sound. as_encoded_bytes solves this, though.

@h-a-n-a
Copy link
Contributor Author

h-a-n-a commented Aug 15, 2024

would you be willing to update this PR with it

I would love to! Would you also like to integrate codspeed into the repo? If the answer is a yes, I can add a feature "codspeed" to enable the bench case running with codspeed on CI (This requires additional CODSPEED_TOKEN to get it running). Otherwise, I can just add the bench case.

Pre-requisites for codspeed running in GitHub Actions: https://docs.codspeed.io/ci/github-actions

@sunshowers sunshowers merged commit 44dd11c into camino-rs:main Aug 15, 2024
@sunshowers
Copy link
Collaborator

Thanks again! Will get a release out shortly.

@sunshowers
Copy link
Collaborator

This is now out in camino 1.1.8.

@h-a-n-a
Copy link
Contributor Author

h-a-n-a commented Aug 16, 2024

Thanks for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants