[API Proposal]: FMA for Vector2/3/4 specifically #87377

BreyerW · 2023-06-11T14:02:45Z

Background and motivation

Currently FMA is exposed for primitives (double & floats) and full blown SIMD vectors but nothing for convenience primititves like Vector2/3/4 AFAIK which sits between them. FMA isnt about only perf (on hardware that has built-in FMA/SIMD FMA ofc) but also about avoiding intermediate rounding.

API Proposal

namespace System.Numerics;

public struct Vector2
{
+    public static Vector2 MultiplyAddEstimate(Vector2 x,Vector2 y, Vector2 z);
}
public struct Vector3
{
+    public static Vector3 MultiplyAddEstimate(Vector3 x,Vector3 y, Vector3 z);
}
public struct Vector4
{
+    public static Vector4 MultiplyAddEstimate(Vector4 x,Vector4 y, Vector4 z);
}

Under the hood Vector2 could use Vector64<float> or MathF.FusedMultiplyAdd where applicable and faster. Vector3 i imagine would likely widen to Vector128<float> and set 0 to last element since it will be discarded when returning while Vector4 would be used as-is as Vector128<float>.

Software fallback would be simple (a * b) + c component-wise for perf reasons hence Estimate suffix since software fallback would differ in rounding behaviour for very large components.

API Usage

var x = Vector3.UnitX;
var y = Vector3.UnitY;
var z = Vector3.UnitZ;

var fma = Vector3.MultiplyAddEstimate(x,y,z);

Alternative Designs

Alternative would be to write platform-agnostic SIMD FMA (which currently would use S.R.I.x86.FMA and S.R.I.ARM + software fallback under the hood) at which point handrolling FMA for Vector2/3/4 wouldnt be too bad.

Another alternative is to handroll on your own FMA for each component but that becomes ugly the more components there are and adding SIMD FMA support for perf makes this even worse, especially since theres no platform-agnostic SIMD FMA AFAIK.

Risks

Estimate behaviour in face of different hardware support for FMA could be suprising but thats mostly documentation exercise and Estimate suffix already points out its not exactly FusedMultiplyAdd.

The text was updated successfully, but these errors were encountered:

ghost · 2023-06-11T14:02:53Z

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

Currently FMA is exposed for primitives (double & floats) and full blown SIMD vectors but nothing for convenience primititves like Vector2/3/4 AFAIK which sits between them. FMA isnt about only perf (on hardware that has built-in FMA/SIMD FMA ofc) but also about avoiding intermediate rounding.

API Proposal

namespace System.Numerics;

public struct Vector2
{
+    public static Vector2 FusedMultiplyAdd(Vector2 x,Vector2 y, Vector2 z);
}
public struct Vector3
{
+    public static Vector3 FusedMultiplyAdd(Vector3 x,Vector3 y, Vector3 z);
}
public struct Vector4
{
+    public static Vector4 FusedMultiplyAdd(Vector4 x,Vector4 y, Vector4 z);
}

Under the hood Vector2 could use Vector64<float> or MathF.FusedMultiplyAdd where applicable and faster. Vector3 i imagine would likely widen to Vector128<float> and set 0 to last element since it will be discarded when returning while Vector4 would be used as-is as Vector4<float>

Note: for 1st version it would be fine to just expose FMA as component-wise MathF.FusedMultiplyAdd without fancy SIMD support. The idea here is to enable simple FMA for System.Numerics

API Usage

var x = Vector3.UnitX;
var y = Vector3.UnitY;
var z = Vector3.UnitZ;

var fma = Vector3.FusedMultiplyAdd(x,y,z);

Alternative Designs

Alternative would be to write platform-agnostic SIMD FMA (which currently would use S.R.I.x86.FMA and S.R.I.ARM + software fallback under the hood) at which point handrolling FMA for Vector2/3/4 wouldnt be too bad.

Another alternative is to handroll on your own FMA for each component but that becomes ugly the more components there are and adding SIMD FMA support for perf makes this even worse, especially since theres no platform-agnostic SIMD FMA AFAIK.

Risks

None AFAIK

Author:	BreyerW
Assignees:	-
Labels:	`api-suggestion`, `area-System.Numerics`
Milestone:	-

tannergooding · 2023-07-21T18:47:13Z

This API is very difficult to provide since not all hardware has FMA support and it needs to behave the same whether that support exists or not. Therefore, it would resolve to a very slow implementation on older hardware which may be unexpected.

It would likely be better to expose MultiplyAddEstimate which is then free to do (a * b) + c -or- fma(a, b, c) depending on what the hardware supports. Such a name follows the existing convention we've established.

If the proposal is updated to follow that, we should consider exposing a similar API to float/double and the corresponding INumberBase interface.

ghost · 2023-07-21T18:47:25Z

This issue has been marked needs-author-action and may be missing some important information.

BreyerW · 2023-07-23T10:22:20Z

@tannergooding done, let me know if i need to tweak proposal further.

BTW is there API that checks FMA support specifically? (not SIMD) or good enough approximate check in SIMD? Cause some folks may want to know that MAE is going to differ for very large inputs on unsupported hardware.

Also maybe we should add FusedMultiplyAdd along with Estimate variant anyway since im pretty sure there would be cases where correctness would trump any perf concerns (im referring to rounding behaviour difference). Software fallback would be just component-wise MathF.FusedMultiplyAdd which already has proper semantics but slow execution in face of lacking hardware support no?

And just food for thought: whats the newest hardware that does NOT support FMA? Im not hardware expert but maybe last hardware is old enough its no longer real concern?

BreyerW added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jun 11, 2023

dotnet-issue-labeler bot added the area-System.Numerics label Jun 11, 2023

ghost added the untriaged New issue has not been triaged by the area owner label Jun 11, 2023

tannergooding added needs-author-action An issue or pull request that requires more info or actions from the author. and removed untriaged New issue has not been triaged by the area owner labels Jul 21, 2023

ghost added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Jul 23, 2023

BreyerW mentioned this issue Feb 7, 2024

[API Proposal]: MultiplyAddEstimate #98053

Closed

stephentoub added this to the Future milestone Jul 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[API Proposal]: FMA for Vector2/3/4 specifically #87377

[API Proposal]: FMA for Vector2/3/4 specifically #87377

BreyerW commented Jun 11, 2023 •

edited

Loading

ghost commented Jun 11, 2023

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

tannergooding commented Jul 21, 2023

ghost commented Jul 21, 2023

BreyerW commented Jul 23, 2023

[API Proposal]: FMA for Vector2/3/4 specifically #87377

[API Proposal]: FMA for Vector2/3/4 specifically #87377

Comments

BreyerW commented Jun 11, 2023 • edited Loading

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

ghost commented Jun 11, 2023

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

tannergooding commented Jul 21, 2023

ghost commented Jul 21, 2023

BreyerW commented Jul 23, 2023

BreyerW commented Jun 11, 2023 •

edited

Loading