Figure out endianness story #45

Manishearth · 2021-05-11T21:51:07Z

Right now TinyStr contains an integer that can be interpreted as a string. There are also ways of looking at TinyStr as an integer, both by looking at the integer in the native endianness, and by converting it to a little-endian representation that can be sent across machines.

The conversion methods are kinda haphazard, #43 completes the set but really we should figure out what utilities we want to expose for viewing the TinyStr as an integer and stick to those.

Furthermore, it's unclear to me if the tinystr!() macro is sound when cross compiling: if you build your code on a little endian system targeting big endian, will the codegen use the little endian representation? It's kinda tricky to investigate, and our constructors from numbers should probably be exceedingly clear about this.

Ideally we can add dedicated constructors for the ULE use case (#44)

The text was updated successfully, but these errors were encountered:

sffc · 2021-05-11T23:12:49Z

The key insight is that the bit pattern is constant whether you are on LE or BE systems. The thing that differs is the numerical representation of those bytes. This is different from most LE/BE problems, where you want to represent the same numerical value but need to do so using different bit patterns.

Therefore, when interfacing with external systems (incl. serialization, ULE, etc.), the type we should be using is [u8; N].

The constructors could look like, e.g.,

impl TinyStr4 {
    const unsafe fn new_unchecked(bytes: &[u8; 4]) {
        Self(NonZeroU32::new_unchecked(u32::from_ne_bytes(bytes)))
    }
    const fn as_raw(&self) -> &[u8; 4] {
        self.0.get().as_ne_bytes()
    }
}

If we care about the alignment of the bytes, we could use something like https://docs.rs/aligned/0.3.4/aligned/ instead of returning a raw byte array.

To be clear: I don't think there's a reason to change the internal representation of TinyStr. I just think that using byte arrays instead of integers on interchange will make things a lot easier to reason about.

sffc · 2021-05-11T23:26:16Z

Related: rust-lang/rust#76976

Manishearth · 2021-05-12T01:35:55Z

Yeah, I like this idea!

sffc · 2021-05-18T23:29:47Z

@zbraniecki, thoughts? Would you approve a PR that does what #45 (comment) proposes?

zbraniecki · 2021-06-09T05:57:54Z

I'm ok with that change, but I'd be curious what the performance impact may be.

Manishearth · 2021-06-09T15:37:16Z

I don't think there will be one, we're just changing the types used, not runtime behavior

sffc · 2022-01-07T21:29:43Z

Discussion with me and Manish: we can have AsciiULE<N> and TinyStrN where N is 1 to 16, and TinyStrN is a wrapper around AsciiULE<N> but repr(align(...)).

Manishearth · 2022-02-18T20:52:59Z

Fixed by unicode-org/icu4x#1508

Manishearth · 2022-02-18T20:53:13Z

@zbraniecki should we mark this repo as archived and point to icu4x?

Manishearth mentioned this issue May 11, 2021

Add zerovec ULE impls for TinyStr #44

Merged

sffc mentioned this issue Jul 22, 2021

Clean up endianness in TinyStr unicode-org/icu4x#881

Closed

sffc mentioned this issue Dec 31, 2021

Add icu4x-key-extract for Static Data Slicing unicode-org/icu4x#1460

Closed

Manishearth mentioned this issue Jan 14, 2022

Add tinystr-neo to experimental/ unicode-org/icu4x#1508

Merged

Manishearth closed this as completed Feb 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figure out endianness story #45

Figure out endianness story #45

Manishearth commented May 11, 2021

sffc commented May 11, 2021

sffc commented May 11, 2021

Manishearth commented May 12, 2021

sffc commented May 18, 2021 •

edited

Loading

zbraniecki commented Jun 9, 2021

Manishearth commented Jun 9, 2021

sffc commented Jan 7, 2022

Manishearth commented Feb 18, 2022

Manishearth commented Feb 18, 2022

Figure out endianness story #45

Figure out endianness story #45

Comments

Manishearth commented May 11, 2021

sffc commented May 11, 2021

sffc commented May 11, 2021

Manishearth commented May 12, 2021

sffc commented May 18, 2021 • edited Loading

zbraniecki commented Jun 9, 2021

Manishearth commented Jun 9, 2021

sffc commented Jan 7, 2022

Manishearth commented Feb 18, 2022

Manishearth commented Feb 18, 2022

sffc commented May 18, 2021 •

edited

Loading