Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add native uint24 support #1932

Open
awelzel opened this issue Nov 29, 2024 · 7 comments
Open

Add native uint24 support #1932

awelzel opened this issue Nov 29, 2024 · 7 comments
Labels
Feature Request Request for new functionality Language

Comments

@awelzel
Copy link
Contributor

awelzel commented Nov 29, 2024

Working on DHCPv6, there's a 24bit transaction ID:
https://datatracker.ietf.org/doc/html/rfc8415#section-8

My current "hack" for parsing is as follows:

    transaction_id: uint8[3] &convert=uint32((uint32($$[0]) << 16) + (uint32($$[1]) << 8) + uint32($$[2]));

MySQL also has a 24bit packet length field:
https://dev.mysql.com/doc/dev/mysql-server/8.4.3/page_protocol_basic_packets.html#sect_protocol_basic_packets_packet

Zeek's Spicy SSL parser apparently also has a need for uint24:
https://github.com/zeek/zeek/blob/master/src/analyzer/protocol/ssl/spicy/SSL.spicy#L968

type uint24 = unit {
    a: bytes &size=3;
} &convert=$$.a.to_uint(spicy::ByteOrder::Big);

This version likey has quite some overhead due to "bytes" and the to_uint() call.

The SMB protocol's message length is also using 24bits:
https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-smb2/1dfacde4-b5c7-4494-8a14-a09d3ab4cc83

This seems sufficient examples in popular protocols to motivate native uint24 support in Spicy. Thoughts?

@evantypanski
Copy link
Contributor

What about arbitrary bit-width integers? I've been using a lot of Zig and I've loved that feature, specifically for cases like this (for example I used u56 here)

it's not absolutely necessary, but I could see that being really nice for Spicy too.

I have no idea how I'd implement that, though :)

@awelzel
Copy link
Contributor Author

awelzel commented Nov 30, 2024

What about arbitrary bit-width integers?

:-)

For arbitrary width integer parsing, bitfields might be a reasonable and existing mechanism, actually.

As a clarification, this isn't about supporting 24bit or arbitrary-width integer arithmetic (seems esoteric), but having a short-cut to parse 3 bytes into an integer.

@rsmmr
Copy link
Member

rsmmr commented Dec 2, 2024

For reference, we've also had a request for uint128 (but no ticket yet I believe).

That said, I don't think I want to do arbitrary bit-width integers. That'd be lot of work (in particular making sure they interact nicely with current std sizes, and performance doesn't suffer), with not that much actual demand afaics.

but having a short-cut to parse 3 bytes into an integer.

I would indeed be reluctant to introduce a full uint24 as well, but could see a short-cut for unpacking it. Do you have an idea for syntax? It doesn't fit nicely with the current unpack unfortunately as that expects a known target type (see https://docs.zeek.org/projects/spicy/en/latest/programming/language/packing.html#packing).

@awelzel
Copy link
Contributor Author

awelzel commented Dec 2, 2024

Do you have an idea for syntax?

Grml, I guess my hope was to allow just x: uint24 as a field specification, but then promoting it to uint32 behind the scenes. But yeah, that's not "clean" if there's no actual uint24 type.

Maybe some lifted bitfield'ish notation instead?

   x: uint24;  # for comparison
   x: uint32:24`  # parse 24bits (number needs to be a multiple of 8 and less-or-equal of given type) with the current endian setting into a uint32
   x: uint32::24`
   x: uint32[24]`
   x: uint32{24}`
   x: uint32 &width=24  # Benjamin mentioned something like that and it might just be the best fit outside of having uint24.

@bbannier
Copy link
Member

bbannier commented Dec 2, 2024

Do you have an idea for syntax?

Maybe some lifted bitfield'ish notation instead?

The "bitfield'ish notation" looks like a new thing to me, would maybe an optional &width attribute which takes the number of bits (bytes?) for parsing of integers integrate better, e.g.,

: uint8 &width=8;
# equivalent to
: uint8;

or for the original issue

transaction_id: uint32 &width=24;

I think this should also integrate nicely with &byte-order.

EDIT:

Actually we already have &size (which takes number of bytes). Currently we require that an integer consumes all bytes specified in that attribute for its "natural width", but we might be able to relax that, e.g.,

: uint32 &size=3; # ParseError: expecting 4 bytes for unpacking value (3 bytes available)

@rsmmr
Copy link
Member

rsmmr commented Dec 2, 2024

: uint32 &size=3; # ParseError: expecting 4 bytes for unpacking value (3 bytes available)

I like that, that looks most natural to me. The size couldn't be more than sizeof(type), but anything smaller would be fine. Well, except that I'm not sure how byte order would be applied in general ...

@awelzel
Copy link
Contributor Author

awelzel commented Dec 2, 2024

I really like, &size, too!

Well, except that I'm not sure how byte order would be applied in general ...

Wouldn't filling the int from left to right and right-shift by (sizeof(type)-&size)*8 bits before after byte-order-flipping be reasonable?

@rsmmr rsmmr added Feature Request Request for new functionality Language labels Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request Request for new functionality Language
Projects
None yet
Development

No branches or pull requests

4 participants