Nail down the semantics for `min`/`max` and `min_precise`/ `max_precise`

Different architectures have different semantics for floating-point minimum and maximum, particularly around signed zero and NaN. We currently provide the `min`/`max` operations (which are assumed to map to the "fastest" operations on the target platform), and `precise_min`/`precise_max` (which our tests currently say should behave like ARM's [FPMinNum](https://developer.arm.com/documentation/ddi0596/2021-03/Shared-Pseudocode/Shared-Functions?lang=en#shared.functions.float.fpminnum.FPMinNum)/FPMaxNum).

There are a variety of minimum/maximum semantics to choose from, not all of which are available on all platforms:

## IEEE 754-2019

The IEEE 754-2019 standard defines two sets of min/max operations: `minimum`/`maximum` and `minimumNumber`/`maximumNumber`.

The first set, `minimum`/`maximum`, returns NaN if either operand is NaN. The latter, `minimumNumber`/`maximumNumber`, returns NaN only if both operands are NaN. If only one operand is NaN, it returns the other operand.

Both operations treat negative zero as less than zero.

`minimumNumber`/`maximumNumber` seem like the most "correct" operations, but they have the downside of not being implemented on all platforms.

## x86

x86 provides the `minps`/`maxps`/`minpd`/`maxpd` instructions. As far as I can tell, they map to the following pseudocode:

```rust
fn minps(a: f32, b: f32) -> f32 {
  if a < b { a } else { b }
}

fn maxps(a: f32, b: f32) -> f32 {
  if a > b { a } else { b }
}
```

This implies the following semantics:
- If either operand is NaN, it returns the second operand *b* (since the comparison will always be false).
- If both operands are zero, it returns the second operand *b* regardless of either operand's sign (since negative and positive zero compare as equal).
- Otherwise, it returns the smallest operand as expected.

## WebAssembly

WebAssembly provides the `fmin`/`fmax` and `fpmin`/`fpmax` abstract operations.

The first set,  `fmin`/`fmax`, implement IEEE 754-2019's `minimum` and `maximum` operations exactly. That is, they return NaN if either operand is NaN, and treat negative zero as less than zero.

The second set, `fpmin`/`fpmax`, are similar to Intel's instructions except with the operands flipped:
```rust
fn fpmin(a: f32, b: f32) -> f32 {
  if b < a { b } else { a }
}

fn fpmax(a: f32, b: f32) -> f32 {
  if b > a { b } else { a }
}
```

That is, if either operand is NaN or both operands are zero, they return the *first* operand. https://github.com/linebender/fearless_simd/pull/23 updated our WebAssembly `min_precise`/`max_precise` implementations to flip the operand order and match Intel's behavior.

## AArch64 w/ NEON

NEON actually implements both sets of IEEE 754-2019 operations. In [the list of abstract operations](https://developer.arm.com/documentation/ddi0596/2021-03/Shared-Pseudocode/Shared-Functions), there are `FPMin`/`FPMax` and `FPMinNum`/`FPMaxNum`. The former implements IEEE 754's `minimum`/`maximum` and the latter implements `minimumNumber`/`maximumNumber`.

## Rust

Rust exposes min/max operations that choose the non-NaN operand if possible, but are non-deterministic if both inputs are zero. For instance, [the `f32::min` documentation](https://doc.rust-lang.org/std/primitive.f32.html#method.min) states:

> In particular, if the inputs compare equal (such as for the case of `+0.0` and `-0.0`), either input may be returned non-deterministically.

[In practice, on x86 this compiles down to a `minps` plus a `cmpunordps` to select the non-NaN operand. On AArch64, it compiles down to an `fminnm` (the `FPMinNum` abstract operation described above).](https://godbolt.org/z/vcxj115rz)

## fearless_simd?

Now we need to decide which semantics to expose.

Ideally, we would just expose IEEE 754-2019's operations. That's what the current tests assume we do, at least. Unfortunately, neither x86 nor WebAssembly implement `minimumNumber`/`maximumNumber` semantics. In fact, x86 doesn't even implement `minimum`/`maximum` semantics!

As #23 notes, there is value in min/max operations that can "massage" a NaN into some other value. For example, if you're keeping a running tally of "lowest/highest value seen so far", it's helpful to be able to ignore NaNs.

We *could* implement the following operations:

- "Logical" minimum/maximum. These behave like `if a < b { a } else { b }`/`if a > b { a } else { b }`. The asymmetry seems a bit ugly, but most min/max use cases have some inherent asymmetry anyway (the "running tally", for instance, or the Vello use case that prompted #23). These are optimal on x86, but compile to a compare/select on AArch64.

- "Precise" minimum/maximum. These are the IEEE 754 `minimumNumber`/`maximumNumber` operations. These are optimal on AArch64, but will require quite a bit of polyfilling on x86, especially with regards to signed zero. These will also be difficult in the fallback implementation, since Rust doesn't expose the operations we need. [WebAssembly doesn't expose these either.](https://github.com/WebAssembly/design/issues/1548)

- "Precise with NaN" minimum/maximum. These are the IEEE 754 `minimum`/`maximum` operations. These are optimal on AArch64, but will require quite a bit of polyfilling on x86, especially with regards to signed zero. These will also be difficult in the fallback implementation, since Rust doesn't expose the operations we need.

- "Rust-y" minimum/maximum. These match Rust's semantics of returning whichever operand is non-NaN if the other is NaN, but makes no guarantees about signed zero. These are optimal on AArch64, but require an additional `cmpunord` and `blendv` on x86. These will be very easy to implement in the fallback implementation.

- "Relaxed" minimum/maximum. These behave like the "logical" "precise", or "Rust-y" operations, depending on whichever is fastest on the target architecture. That means that if the *second* operand is non-NaN, the result is guaranteed to never be NaN. These are optimal on x86, requiring only a `minps`/`maxps` operation, but won't let us use `fmin`/`fmax` on AArch64. I'm not sure if that's a big deal; we'll probably need to benchmark `fmin`/`fmax` versus `fminnm`/`fmaxnm`.

- "Fast" minimum/maximum. These behave like the "logical", "precise with NaN", or "Rust-y" operations, depending on whichever is fastest on the target architecture. These are optimal on both x86 and AArch64, but NaN values can easily pollute the result.

Implementing all of those would be tedious and result in quite an unwieldy API surface. We therefore need to decide which subset is most useful. It's unfortunate that each fully-deterministic operation (logical, precise, and precise with NaN) is slow on at least one target architecture.

I think that the best set of operations to expose is "logical" and "relaxed". The "logical" min/max is deterministic, and the "relaxed" min/max is fast(er). This doesn't give us *any* operations that map to `fmin`/`fmax` on AArch64, but I don't know if `fminnm`/`fmaxnm` are actually slower.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Nail down the semantics for `min`/`max` and `min_precise`/ `max_precise` #133

IEEE 754-2019

x86

WebAssembly

AArch64 w/ NEON

Rust

fearless_simd?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Nail down the semantics for min/max and min_precise/ max_precise #133

Description

IEEE 754-2019

x86

WebAssembly

AArch64 w/ NEON

Rust

fearless_simd?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Nail down the semantics for `min`/`max` and `min_precise`/ `max_precise` #133