Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Macro construction without syn #556

Closed
Plecra opened this issue Nov 2, 2021 · 22 comments · Fixed by #566
Closed

Macro construction without syn #556

Plecra opened this issue Nov 2, 2021 · 22 comments · Fixed by #566

Comments

@Plecra
Copy link

Plecra commented Nov 2, 2021

Motivation
Macros are notoriously slow at compile time to include, it'd be nice to remove the tradeoff for users

Solution
The uuid macro can be implemented with const fn (Playground). This makes the decision easier for users - just move the call to a const context.

If we wish, we can also have the uuid macro to force the parsing to be const.

@QnnOkabayashi
Copy link
Contributor

I think this comes down to preference. On one hand, a proc-macro allows us to have extremely fine grain diagnostics, which I think are important. On the other, 95% of the time these string literals will be correct and so diagnostics aren't important. That being said, keep in mind most projects using uuid are probably using proc-macros from other crates (serde, tracing...), so the added time is marginal.

I'm in support of const fns once const panic is stabilized but until then, I think the tradeoff for a marginally increased compile time is worth it for the substantially better error diagnostics.

@Nugine
Copy link
Contributor

Nugine commented Nov 3, 2021

I have optimized my const parser implementation. It's now 2x faster than Uuid::parse_str at runtime. In exchange, error details are dropped.
I can not easily port this algorithm to uuid crate now as it will break a lot of things.

https://github.com/Nugine/const-str/blob/80037ecd47e2e2d77d1f68881d9f343a5aa2f2fb/const-str/src/__ctfe/uuid.rs

benchmark violin plot

use criterion::{criterion_group, criterion_main, BenchmarkId, Criterion};
use uuid::Uuid;

#[inline(never)]
fn parse_v1(s: &str) -> Uuid {
    Uuid::parse_str(s).unwrap()
}

#[inline(never)]
fn parse_v2(s: &str) -> Uuid {
    const_str::__ctfe::parse_uuid(s)
}

pub fn criterion_benchmark(c: &mut Criterion) {
    let mut group = c.benchmark_group("uuid_parser");

    let inputs = [
        ("simple", "67e5504410b1426f9247bb680e5fe0c8"),
        ("hyphenated", "67e55044-10b1-426f-9247-bb680e5fe0c8"),
        ("guid", "{67e55044-10b1-426f-9247-bb680e5fe0c8}"),
        ("urn", "urn:uuid:67e55044-10b1-426f-9247-bb680e5fe0c8"),
    ];

    for (tag, input) in inputs {
        group.bench_with_input(BenchmarkId::new("uuid-latest", tag), input, |b, s| {
            b.iter(|| parse_v1(s))
        });
        group.bench_with_input(BenchmarkId::new("const_str-latest", tag), input, |b, s| {
            b.iter(|| parse_v2(s))
        });
    }
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
GitHub
compile-time string operations. Contribute to Nugine/const-str development by creating an account on GitHub.

@Nugine
Copy link
Contributor

Nugine commented Nov 3, 2021

I'm in support of const fns once const panic is stabilized but until then, I think the tradeoff for a marginally increased compile time is worth it for the substantially better error diagnostics.

I think we don't have to wait for const panic because the parser should never panic.

fuzz_target!(|data: &[u8]| {
if let Ok(uuid) = str::from_utf8(data) {
// Ensure the parser doesn't panic
let _ = Uuid::parse_str(uuid);
}
});

@KodrAus
Copy link
Member

KodrAus commented Nov 3, 2021

I can not easily port this algorithm to uuid crate now as it will break a lot of things.

What kinds of things do you imagine would break @Nugine?

@QnnOkabayashi
Copy link
Contributor

I'm in support of const fns once const panic is stabilized but until then, I think the tradeoff for a marginally increased compile time is worth it for the substantially better error diagnostics.

I think we don't have to wait for const panic because the parser should never panic.

fuzz_target!(|data: &[u8]| {
if let Ok(uuid) = str::from_utf8(data) {
// Ensure the parser doesn't panic
let _ = Uuid::parse_str(uuid);
}
});

The fuzzer never panics because the parser returns a Result (if I'm reading that correctly). The whole point is that the macro returns the Uuid type, or fails to compile, which involves panicking in a const environment.

@Nugine
Copy link
Contributor

Nugine commented Nov 3, 2021

I can not easily port this algorithm to uuid crate now as it will break a lot of things.

What kinds of things do you imagine would break @Nugine?

Error definition, parser tests, ui tests and merge uuid_macro into uuid. I'm not sure whether a big change is suitable.

@KodrAus
Copy link
Member

KodrAus commented Nov 3, 2021

I can see a const Uuid::parse_str as complimentary to a uuid! macro. There are benefits to the global scoping of macros (I still use #[macro_use] extern crate x; readily myself). Since uuid! would use the same parser implementation we wouldn't end up with multiple implementations to maintain either.

The main loss holding us back at the moment would be nice diagnostics in the error and their reporting to end-users, but I don't think making parse_str const would necessarily have to be considered a big change. Eventually it really should be const, along with the formatting APIs too.

@Nugine
Copy link
Contributor

Nugine commented Nov 3, 2021

I'm in support of const fns once const panic is stabilized but until then, I think the tradeoff for a marginally increased compile time is worth it for the substantially better error diagnostics.

I think we don't have to wait for const panic because the parser should never panic.

fuzz_target!(|data: &[u8]| {
if let Ok(uuid) = str::from_utf8(data) {
// Ensure the parser doesn't panic
let _ = Uuid::parse_str(uuid);
}
});

The fuzzer never panics because the parser returns a Result (if I'm reading that correctly). The whole point is that the macro returns the Uuid type, or fails to compile, which involves panicking in a const environment.

You are right. I missed the point. But there is a way to emit a compile error without feature(const_panic). That means we don't have to bump MSRV to 1.57.

https://docs.rs/static_assertions/1.1.0/src/static_assertions/const_assert.rs.html#52-57

Source to the Rust file `src/const_assert.rs`.

@QnnOkabayashi
Copy link
Contributor

What do the diagnostics look like? I'm thinking we use this implementation in a const parser to return a Result<[u8; 16], Error>, where the proc macro can then use the Error diagnostic info. This would give us great runtime parsing, and maintain good diagnostics. Thoughts?

@Nugine
Copy link
Contributor

Nugine commented Nov 3, 2021

The const parser provides ambiguous information. Good diagnostics require a second parsing to generate more detailed error reports.

#[derive(Debug, PartialEq, Eq)]
pub struct Error(ErrorKind);

#[derive(Debug, PartialEq, Eq)]
enum ErrorKind {
    InvalidLength,
    InvalidCharacter,
}

const fn invalid_length() -> Error {
    Error(ErrorKind::InvalidLength)
}

const fn invalid_character() -> Error {
    Error(ErrorKind::InvalidCharacter)
}

@KodrAus
Copy link
Member

KodrAus commented Nov 3, 2021

Running a second pass on error, even just in the macro and not in Uuid::parse_str would probably be a reasonable compromise if it were still possible to share some of the implementation.

Is it possible to write const parsers without needing any additional dependencies?

@Nugine
Copy link
Contributor

Nugine commented Nov 3, 2021

Running a second pass on error, even just in the macro and not in Uuid::parse_str would probably be a reasonable compromise if it were still possible to share some of the implementation.

Is it possible to write const parsers without needing any additional dependencies?

Yes. The only necessary dependency is rustc. We need to test MSRV in CI (currently 1.46).

@KodrAus
Copy link
Member

KodrAus commented Nov 3, 2021

If the MSRV needs to bump too that is something we can do. As a guide I tend to look at whatever the current version of rustc available as a package for the current LTS of Ubuntu is, which is 1.51.0, released in March. I think we could safely consider bumping to it if need be.

@Plecra
Copy link
Author

Plecra commented Nov 3, 2021

The nice thing about this is that the feature flag can be kept, but doesn't need to change the API - Users can use uuid!("...") whenever they want compile-time construction, and when the macros feature is enabled, we can transparently use a proc_macro implementation with better errors.

(I'd also say that removing syn will make the biggest impact on compile times, and won't make parsing such a simple macro any harder)

@QnnOkabayashi
Copy link
Contributor

QnnOkabayashi commented Nov 3, 2021

I really like the idea of having a second parser that can go back over and provide fine-grain diagnostics. I think it's very reasonable for developers to have use cases where they know it won't fail at runtime and speed is more important, as well as cases where if something goes wrong, the developer doesn't particularly care what because they have their own error handling.

I'm currently working on making some very minor modifications to the const_str optimized version that will provide us with hints (like index of invalid byte) so that if we opt in for more detailed diagnostics, that function can be a bit faster.

Of course, the core parsing function will be entirely const, and we can have another const fn that can do the unwrap const equivalent and panic at compile time if we want.

@Nugine
Copy link
Contributor

Nugine commented Nov 4, 2021

I try to find the fastest way to parse a uuid text. So I write another implementation using AVX2.
The parsing takes less than 10ns, which is 11x~13x faster than Uuid::parse_str.

violin plot

#[inline]
pub fn parse_uuid(s: &str) -> Uuid {
    match try_parse_uuid(s) {
        Ok(b) => Uuid::from_bytes(b),
        Err(_) => panic!(),
    }
}

#[inline]
pub fn try_parse_uuid(s: &str) -> Result<[u8; 16], Error> {
    let mut s = s.as_bytes();
    let n = s.len();
    if n == 32 {
        return unsafe { parse_uuid_simple(s) };
    }
    match n {
        // hyphenated UUID
        36 => {}
        // URN prefixed UUID
        45 => match s.strip_prefix(b"urn:uuid:") {
            Some(val) => s = val,
            None => return Err(invalid_length()),
        },
        // Microsoft GUID
        38 => match s {
            [b'{', xs @ .., b'}'] => s = xs,
            _ => return Err(invalid_length()),
        },
        _ => return Err(invalid_length()),
    }
    unsafe { parse_uuid_hyphenated(s) }
}

unsafe fn parse_uuid_simple(s: &[u8]) -> Result<[u8; 16], Error> {
    parse(_mm256_loadu_si256(s.as_ptr().cast()))
}

unsafe fn parse_uuid_hyphenated(s: &[u8]) -> Result<[u8; 16], Error> {
    // b'-' is 0x2d
    let x: u32 = mem::transmute([
        *s.get_unchecked(8),
        *s.get_unchecked(13),
        *s.get_unchecked(18),
        *s.get_unchecked(23),
    ]);
    if x != 0x2d2d2d2d {
        return Err(invalid_character());
    }

    #[repr(C, align(32))]
    struct Buf {
        p1: u64,
        p2: u32,
        p3: u32,
        p4: u32,
        p5: u32,
        p6: u64,
    }

    let base: *const u8 = s.as_ptr();
    let buf = Buf {
        p1: base.cast::<u64>().read_unaligned(),
        p2: base.add(9).cast::<u32>().read_unaligned(),
        p3: base.add(14).cast::<u32>().read_unaligned(),
        p4: base.add(19).cast::<u32>().read_unaligned(),
        p5: base.add(24).cast::<u32>().read_unaligned(),
        p6: base.add(28).cast::<u64>().read_unaligned(),
    };
    parse(_mm256_load_si256(&buf as *const Buf as *const _))
}

unsafe fn parse(a: __m256i) -> Result<[u8; 16], Error> {
    if !hex_check(a) {
        return Err(invalid_character());
    }
    let ans = hex_decode(a);
    Ok(mem::transmute(ans))
}

/// Rewritten from <https://github.com/nervosnetwork/faster-hex/blob/0114ee6bcf8a02718bc19e769f9de076ce119726/src/decode.rs#L108-L143>
unsafe fn hex_check(a: __m256i) -> bool {
    let m_0 = _mm256_set1_epi8((b'0' - 1) as i8);
    let ge_0 = _mm256_cmpgt_epi8(a, m_0); // a > b'0' - 1
    let m_9 = _mm256_set1_epi8((b'9' + 1) as i8);
    let le_9 = _mm256_cmpgt_epi8(m_9, a); // b'9' + 1 > a
    let in_0_9 = _mm256_and_si256(ge_0, le_9);

    let m_ua = _mm256_set1_epi8((b'A' - 1) as i8);
    let ge_ua = _mm256_cmpgt_epi8(a, m_ua); // a > b'A - 1
    let m_uf = _mm256_set1_epi8((b'F' + 1) as i8);
    let le_uf = _mm256_cmpgt_epi8(m_uf, a); // b'F' + 1 > a
    let in_ua_uf = _mm256_and_si256(ge_ua, le_uf);

    let m_la = _mm256_set1_epi8((b'a' - 1) as i8);
    let ge_la = _mm256_cmpgt_epi8(a, m_la); // a > b'a' - 1
    let m_lf = _mm256_set1_epi8((b'f' + 1) as i8);
    let le_lf = _mm256_cmpgt_epi8(m_lf, a); // b'f' + 1 > a
    let in_la_lf = _mm256_and_si256(ge_la, le_lf);

    let ans = _mm256_or_si256(_mm256_or_si256(in_la_lf, in_ua_uf), in_0_9);
    _mm256_movemask_epi8(ans) as u32 == 0xffffffff
}

unsafe fn hex_decode(a: __m256i) -> __m128i {
    // epi16
    // a1 = c & 0x0f0f;
    // a2 = a1 >> 4;
    // a3 = a1 + a2;
    // a4 = c & 0x4040;
    // a5 = a4 >> 6;
    // a6 = a4 >> 10;
    // a7 = a5 + a6;
    // a8 = a7 * 9;
    // a9 = a3 + a8;
    // a10 = a9 & 0x00ff

    #[repr(align(32))]
    struct Align32I8x32([i8; 32]);
    const SHUFFLE_MASK: Align32I8x32 = Align32I8x32([
        1, 0, 3, 2, 5, 4, 7, 6, //
        9, 8, 11, 10, 13, 12, 15, 14, //
        17, 16, 19, 18, 21, 20, 23, 22, //
        25, 24, 27, 26, 29, 28, 31, 30, //
    ]);

    let m = _mm256_load_si256(SHUFFLE_MASK.0.as_ptr().cast());
    let a = _mm256_shuffle_epi8(a, m); // big endian -> little endian

    let a1 = _mm256_and_si256(a, _mm256_set1_epi16(0x0f0f));
    let a2 = _mm256_srai_epi16::<4>(a1);
    let a3 = _mm256_add_epi16(a1, a2);

    let a4 = _mm256_and_si256(a, _mm256_set1_epi16(0x4040));
    let a5 = _mm256_srai_epi16::<6>(a4);
    let a6 = _mm256_srai_epi16::<10>(a4);
    let a7 = _mm256_add_epi16(a5, a6);
    let a8 = _mm256_mullo_epi16(a7, _mm256_set1_epi16(0x9));

    let a9 = _mm256_add_epi16(a3, a8);
    let a10 = _mm256_and_si256(a9, _mm256_set1_epi16(0x00ff));

    avx2_mm256_cvtepi16_epi8(a10)
}

#[inline(always)]
unsafe fn avx2_mm256_cvtepi16_epi8(a: __m256i) -> __m128i {
    let a1 = _mm256_castsi256_si128(a);
    let a2 = _mm256_extractf128_si256::<1>(a);
    _mm_packus_epi16(a1, a2)
}

@KodrAus
Copy link
Member

KodrAus commented Nov 4, 2021

Wow that’s really awesome @Nugine! 😁 I wonder if it’s possible to write with std::simd, even if it turns out a little slower. I’ll have to spend a bit more time studying it, maybe by trying to port to Neon or WASM (neither of which support 256bit wide vectors AFAIK).

@Nugine
Copy link
Contributor

Nugine commented Nov 4, 2021

Wow that’s really awesome @Nugine! 😁 I wonder if it’s possible to write with std::simd, even if it turns out a little slower. I’ll have to spend a bit more time studying it, maybe by trying to port to Neon or WASM (neither of which support 256bit wide vectors AFAIK).

I believe it's possible to write with std::simd. But that is unstable now. The faster-hex crate inspires me.

@QnnOkabayashi
Copy link
Contributor

QnnOkabayashi commented Nov 4, 2021

That's awesome! Is this architecture specific though? Also, is there any way we can make this const friendly?

I also did some more thinking, and now I propose that we have some ultra-fast parsing function that returns something like Result<[u8; 16], ParseError<'a> where the err variant is a wrapper struct containing the original string, which is always guaranteed to contain an invalid uuid string. Another function can accept a ParseError<'a> and obtain some basic diagnostics for the error. Finally, a third function contained purely in the proc macro would accept a ParseError<'a> and provide super detailed diagnostics, like exactly which bytes are invalid so we can have nice error highlighting.

@Nugine
Copy link
Contributor

Nugine commented Nov 4, 2021

That's awesome! Is this architecture specific though? Also, is there any way we can make this const friendly?

AVX2 is widely used but not all CPUs support it.
The feature can be detected at compile-time or runtime.

Unfortunately, SIMD operations are not available at const context now.

@KodrAus
Copy link
Member

KodrAus commented Nov 4, 2021

We did have some designs for fallbacks in std::simd that should be naturally const friendly, but am not sure where it all landed yet.

@Nugine
Copy link
Contributor

Nugine commented Nov 7, 2021

https://github.com/Nugine/uuid-simd
Probably the fastest uuid parser in the world 🚀

  1. Add a feature simd. When simd and std are enabled, use runtime cpu feature detection to dispatch the parser. When simd is enabled, use conditional compilation to dispatch the parser. Otherwise, use the fallback parser (it's const).
  2. The uuid! macro can use the proc-macro parser when the arg is literal and use the const parser when the arg is expr.

Drawbacks

  1. It's too complex. Maybe we don't need so many parsers.
  2. Uuid::parse_str can not be const if it internally calls simd functions.
GitHub
My SIMD playground. Contribute to Nugine/uuid-simd development by creating an account on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants