Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use simpler dissection model #5

Open
wants to merge 78 commits into
base: main
Choose a base branch
from

Conversation

immanuelhume
Copy link
Contributor

@immanuelhume immanuelhume commented Aug 8, 2023

First, sorry for the big PR. But this is a major improvement (and simplification) to the code which I simply cannot pass up 😬. This PR does not introduce any breaking changes - it adds new APIs but the old ones remain usable.

Motivation

As mentioned in #4, the existing code deeply parses the syntax tree of the struct/enum and tries to form a full picture of each field. This results in a) "stringy" and brittle parsing, and b) an unwieldly data model. This patch flips the approach completely. We no longer care what the type of each field is, and assume it is dissectable via the new Dissect trait. If the field's type does not implement Dissect, then we get a compile error as expected.

The new code, using well-defined traits, is much easier to reason about. We also shed ~2500 lines of old data modelling code which was rather hard to grok as well as probably ~1000 lines of other helper code. Although this PR adds ~3000 lines, I expect the total code size to fall slightly after we remove the old stuff.

New traits

Four traits are introduced in the wsdf crate.

  1. Dissect - a type which can be registered and dissected
  2. Primitive - an extension of Dissect for basic types (integers, bytes)
  3. Subdissect - a type which can be given to subdissectors (bytes)
  4. SubdissectKey - a type which can be used to search for a subdissector and invoke it

The wsdf crate also provides impls of these traits for some types (integers, bytes). In the future if we really wanted to be very generic, we can also implement them for stuff like Box<T> since from a protocol description perspective, a Box<T> is just T.

For a sense of how this works, see this (simplified) example.

#[derive(Dissect)]
struct Data {
  src: u16,
  data: [u8; 32],
}

will expand to (with many details omitted)

impl Dissect for Data {
  fn add_to_tree() {
    // recursively dissect each field, as they should implement Dissect
    <u16 as Dissect>::add_to_tree();
    <[u8; 32] as Dissect>::add_to_tree();
  }
  // ...other methods
}

This is the main idea behind this update. Everything else is just stuff to make it happen.

Other notable changes

  • A new get_variant attribute to replace dispatch_field for decoding enums, following a tap-like interface and returning a static string instead of an integer. See the DNS dissector for an example. (closes Feature requests: a (more) dynamic enum variant decoder #7)
  • The DNS, UDP, and MoldUDP examples are updated to the new API and verified to be working
  • In model.rs, new StructInnards and Enum types to handle codegen for structs and enums. These ~1000 lines can replace ~2500 lines of old data model logic.
  • Protocols are now registered via the protocol!() macro, and multiple protocols can be placed in one dylib (e.g. protocol!(Udp, UdpLite, Tcp) would register three protocols in one dylib.

Tradeoffs

A consequence of not having a "full picture of each field" is that for vector types, we can no longer validate that users have provided a length field. The following struct would not have compiled previously as the size of the bytes is unknown even at runtime. With the new trait-based approach, we lose the ability to check this at compile time, since the type Vec<u8> is unknown to us.

struct Bytes(Vec<u8>);

Under the new API, this would be a runtime panic (can be refined in the future to not panic but raise a flag in Wireshark). However, given that this new code is vastly simpler than the old one, this tradeoff okay in my view.

Next steps

Once this is merged, we can proceed to mark the old API as deprecated and make a release (v0.2.0?). If there are no major issues, we can proceed with deleting the old code so that everything is cleaner and easier to maintain.

Some methods truly only concern numeric and bytes
types. We'll house them in a separate trait.
This will be used to mark a Vec<u8>, [u8; _], &[u8]
as proper bytes types, rather than lists of u8s.
We also check with wireshark's proto_registrar
Many things are marked with todo!() for now.
Just to make clippy happy
@immanuelhume immanuelhume marked this pull request as ready for review August 8, 2023 13:50
@immanuelhume
Copy link
Contributor Author

@oswal-dheeraj how do you feel about the general approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature requests: a (more) dynamic enum variant decoder
1 participant