Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose design for packed conditional fields #192

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
205 changes: 205 additions & 0 deletions doc/design_docs/packed_conditional_fields.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
# Design Sketch: Packed Conditional Fields

## Motivation

The [`$next` keyword](archive/next_keyword.md) was introduced to facilitate
writing Emboss definitions for packed or mostly-packed structs, while avoiding

- the potential error of unintentionally introducing a gap or overlap by
using the wrong start location,
- complexity of computing the start location of variable sized fields, and
- including "redundant" and potentially hardcoded values that must be updated
separately if the dependent values are updated.

However, one use case not covered by the existing `$next` keyword is that it
does not keep a structure tightly packed if there are conditional fields. For
instance, the structure in
[Example 1](#example-1-conditional-structure-with-next) will have a 4 byte gap
if `type_code` is `1`, even though the `metadata` field is not present.

This tends to be non-intuitive, the author of this design sketch has seen
multiple authors of emboss definitions write `$next` after conditional fields
in this manner and it always results in surprise when it does not work how they
intended, which in this example is to tightly pack `payload` to follow after
`type_code` when `metadata` is not present.

###### Example 1. Conditional structure with $next
```
struct ConditionalExample:
0 [+2] UInt type_code
if type_code == 0x01:
$next [+4] UInt:8[4] metadata
$next [+10] Uint:8[4] payload
```

## Considerations

There are a few considerations for implementing the behavior that developers
intend when they write code like
[Example 1](#example-1-conditional-structure-with-next) that are covered in
this section.

### Backwards Compatibility
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two major thoughts on this:

  1. There probably are not any (correct) uses of $next after a conditional field in the wild (I did not turn up any in an internal audit).
  2. In principle, a breaking change should break noisily. In this case, that means turning any use of $next immediately after a conditional field into an error, and either using a new keyword ($pack is fine) or having users opt into the new behavior ([next_keyword = "Pack"]). Compilation errors are annoying, but not nearly as problematic as silently changing behavior.


Updating the functionality of `$next` in this manner could be considered to
break the backwards-compatibility guarantees of Emboss. An existing Emboss
definition may rely on the current behavior of `$next` to produce an intended
layout. However, it's unlikely to be the case, as the goal of `$next` is
explicitly to facilitate writing packed structures. From that point-of-view it
may be more appropriate to consider the current behavior of `$next` to be a
bug.

However, should we need to maintain the existing behavior of `$next` we could
use an alternative symbol (specifics TBD, but possibly `$pack`) for the new
behavior and retain `$next` for backward-compatibility.

For simplicity the rest of this design sketch will use `$next` as if it has the
new behavior in examples, regardless of what decision is made about
compatibility.

### Partially Packed Structures

Another consideration is that any potential solution must work for partially
packed structures, not just fully and tightly packed structures. For instance,
in [Example 2](#example-2-conditional-partially-packed-structure), when the
conditional block is `true`, the `$next` value for `e` should be `14` and `2`
when the block is `false`. This may seem fairly straightforward, but the part
to note is that the implementation must not rely on successive `$next` fields
to provide the correct packing.

###### Example 2. Conditional partially packed structure
```
struct ConditionalPartiallyPackedExample:
0 [+2] UInt a
if a == 0x01:
$next [+2] UInt b
10 [+2] UInt c
$next [+2] UInt d
$next [+2] UInt e
```

### Future Features

This proposal also considers two potential future features, one of which is
mentioned in the [`$next` keyword design sketch](archive/next_keyword.md), and
one is mentioned in the emboss code comments.

#### Alignment Function

Another form of (potentially) partially-packed structure would be one where
fields are aligned or otherwise computed using `$next` as a value. To borrow
an example from the [`$next` keyword design sketch](archive/next_keyword.md)
with a slight modification to make it conditional, and using a hypothetical
future `$align(start_location, alignment)` feature, see
[Example 3](#example-3-conditional-alignment-packed).

In this case, `body` should start at the next 4-byte aligned boundary after
the end of `header` if the `header` indicates that the structure doesn't have
a `body_length` field, but start at the next 4-byte aligned boundary after
`body_length` if it is present.

The implementation of the new `$next` should not need to take any of this into
account specifically, but the implementation should be written in such a way
that it will "just work" in the same way that it would work currently other
than the conditional field aspect.

###### Example 3. Conditional alignment-packed
```
struct ConditionalWithAlignment:
0 [+2] UInt header_length (h)
$align($next, 4) [+h] Header header
if !header.basic_flag:
$align($next, 4) [+2] UInt body_length
$align($next, 4) [+b] Body body
$align($next, 4) [+4] UInt crc
let b = $present(body_length) ? body_length : 16
```

#### Conditional Block Improvements

Currently Emboss does not support nested conditional blocks, nor does it
support `else` or `elif` conditional blocks. That said, these are mentioned in
the Emboss code comments as potential future features and are common in
languages that offer conditionals, so this proposal considers how they may
impact the new `$next` behavior.

## Proposed Implementation

A high level description of the proposed algorithm for implementing the new
behavior in `$next` (or a new symbol chosen for backward-compatibility) is
provided below, with some error and edge condition checking omitted for
brevity:

- Similar to the existing algorithm, traverse the IR for `Structure`s
(`struct`s and `bit`s), but for each structure encountered:
- Initialize the `lookback` to an empty stack.
- Initialize `last` to `None`.
- Iterate over expressions (looking for `$next`):
- As an incidental action, for each non-virtual and non-synthetic field
encountered:
- If `last.existence_condition` is constant and
`last.existence_condition` is `true`:
- Clear `lookback` entirely.
- Push `last` onto `lookback`.
- Otherwise, if `last.existence_condition` is not the same expression
as the current `existence_condition`:
- If any fields in `lookback` have the same `existence_condition` as
the current field, pop all fields up to and including that field.
- Push `last` onto `lookback`.
- Set `last` to the current field.
- For each `$next` encountered:
- Initialize `replacement` to be `None`.
- For each reference in `lookback` as `field`, starting from the bottom
of the stack and iterating towards the top:
- If `replacement` is `None`
- Set `replacement` to `field.start_location + field.size`
- Note: This branch is equivalent to the existing implementation.
- Otherwise:
- Copy `$present(f) ? next : previous` as an expression skeleton
- Replace
- `f` with a `FieldRerference` to `field`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Rerference => Reference

- `next` with `field.start_location + field.size`.
- `previous` with `replacement`.
- Set `replacement` to this updated expression.
- Replace `$next` with `replacement`.

The purpose of using a stack for `lookback` is twofold, firstly it ensures
correct operation with chained conditional blocks one-after-another, such as in
[Example 4](#example-4-consecutive-conditional-blocks), as well as future
Comment on lines +166 to +168
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lookback stack and some of the condition checking are optimizations that, I think, make this section harder to understand.

I think the new (unoptimized) implementation of $next would just be:

  1. For each structure:
    1. Initialize next to 0.
    2. For each physical field:
      1. Replace $next in appropriate contexts with the current value of next.
      2. Update the definition of next to field_condition ? field_start + field_size : (previous_next).

It may even be better to use the unoptimized version in the desugaring stage, then let a later optimization pass strip out the unreachable branches -- I have some related code that I was working on recently that is intended to optimize the $size_in_bytes expression, which has many of the same problems.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, you're right that is an optimization now that I think about it, and the next value can be propagated forward through non-present fields. I'll update this to make it simpler.

I still think there would be value to doing the optimization in the actual code though, I know that every C++ compiler out there would optimize away chained conditionals with constant values or repeated conditions (with only non-volatile and non-aliased components), but it wouldn't be optimized for something like a potential future Python backend, for instance, which would make accessing the nth field of a struct an O(n) operation, as it would process the conditionals for all n-1 fields before it to determine the start location, even if said location is statically known.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's probably best for some of the optimization to happen inside Emboss. I'm just thinking it would be better in a separate optimization stage, instead of trying to implement it in the desugaring stage -- separation of concerns, and it would let any similar constructs get optimized.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it, as in a separate optimization pass that resolves conditionals, then it's more general and applies to other designating like $size_in_bytes too?

You're right I like that idea a lot. I'll consider it out of scope of this proposal since it won't matter for any decent C++ compiler and we don't currently support any other backend, but I see now that we don't need to preemptively optimize in the desugaring step.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. The two specific optimizations here are:

  • If a CHOICE has a constant condition, replace it with the appropriate branch; i.e. true ? y : z => y and false ? : y : z => z.
  • For any expression x ? y : (x ? z : q), replace the inner (x ? z : q) with q. There is a larger possible optimization around propagating constraints into choice operator branches, but this would cover a lot of cases and would be fairly easy to implement.

I'll see about getting the prototype optimizer I hacked together onto GitHub for reference.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually now that I took a second to look through it I think the former is already handled by constant_value in ir_utils.

Copy link
Collaborator

@reventlov reventlov Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only partially handled by constant_value -- that works for true ? constant : variable and false ? variable : constant, but not for true ? variable : variable2 or false ? variable : variable2, so it would cover:

struct Foo:
  0     [+4]  UInt  x
  $next [+4]  UInt y
  $next [+4]  UInt z

But not:

struct Foo:
  0     [+4]    UInt      x
  $next [+4*x]  UInt:32[] y
  $next [+4]    UInt      z

support for nested conditionals. In this example, the `lookback` stack for
`crc` should be:

- `(TOP)`
- `baz_data`
- `bar_data`
- `foo_data`
- `code`
- `(BOTTOM)`

And the resulting `$next` expression for `crc` would be (using a fictional
`$end` field for brevity, equivalent to the field offset + field size):

```
($present(baz_data) ? baz_data.$end :
($present(bar_data) ? bar_data.$end :
($present(foo_data) ? foo_data.$end :
code.$end)))
```

###### Example 4. Consecutive Conditional Blocks
```
struct ConditionalWithAlignment:
0 [+1] bits:
0 [+1] Flag foo
1 [+1] Flag bar
2 [+1] Flag baz
$next [+1] UInt code
if foo:
$next [+FooData.$size_in_bytes] FooData foo_data
if bar:
$next [+BarHeader.$size_in_bytes] BarHeader bar_header
$next [+BarData.$size_in_bytes] BarData bar_data
if baz:
$next [+BazData.$size_in_bytes] BazData baz_data
$next [+2] UInt crc
```