Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: FMA for Vector2/3/4 specifically #87377

Open
BreyerW opened this issue Jun 11, 2023 · 4 comments
Open

[API Proposal]: FMA for Vector2/3/4 specifically #87377

BreyerW opened this issue Jun 11, 2023 · 4 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Numerics needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Milestone

Comments

@BreyerW
Copy link

BreyerW commented Jun 11, 2023

Background and motivation

Currently FMA is exposed for primitives (double & floats) and full blown SIMD vectors but nothing for convenience primititves like Vector2/3/4 AFAIK which sits between them. FMA isnt about only perf (on hardware that has built-in FMA/SIMD FMA ofc) but also about avoiding intermediate rounding.

API Proposal

namespace System.Numerics;

public struct Vector2
{
+    public static Vector2 MultiplyAddEstimate(Vector2 x,Vector2 y, Vector2 z);
}
public struct Vector3
{
+    public static Vector3 MultiplyAddEstimate(Vector3 x,Vector3 y, Vector3 z);
}
public struct Vector4
{
+    public static Vector4 MultiplyAddEstimate(Vector4 x,Vector4 y, Vector4 z);
}

Under the hood Vector2 could use Vector64<float> or MathF.FusedMultiplyAdd where applicable and faster. Vector3 i imagine would likely widen to Vector128<float> and set 0 to last element since it will be discarded when returning while Vector4 would be used as-is as Vector128<float>.

Software fallback would be simple (a * b) + c component-wise for perf reasons hence Estimate suffix since software fallback would differ in rounding behaviour for very large components.

API Usage

var x = Vector3.UnitX;
var y = Vector3.UnitY;
var z = Vector3.UnitZ;

var fma = Vector3.MultiplyAddEstimate(x,y,z);

Alternative Designs

Alternative would be to write platform-agnostic SIMD FMA (which currently would use S.R.I.x86.FMA and S.R.I.ARM + software fallback under the hood) at which point handrolling FMA for Vector2/3/4 wouldnt be too bad.

Another alternative is to handroll on your own FMA for each component but that becomes ugly the more components there are and adding SIMD FMA support for perf makes this even worse, especially since theres no platform-agnostic SIMD FMA AFAIK.

Risks

Estimate behaviour in face of different hardware support for FMA could be suprising but thats mostly documentation exercise and Estimate suffix already points out its not exactly FusedMultiplyAdd.

@BreyerW BreyerW added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jun 11, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jun 11, 2023
@ghost
Copy link

ghost commented Jun 11, 2023

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

Currently FMA is exposed for primitives (double & floats) and full blown SIMD vectors but nothing for convenience primititves like Vector2/3/4 AFAIK which sits between them. FMA isnt about only perf (on hardware that has built-in FMA/SIMD FMA ofc) but also about avoiding intermediate rounding.

API Proposal

namespace System.Numerics;

public struct Vector2
{
+    public static Vector2 FusedMultiplyAdd(Vector2 x,Vector2 y, Vector2 z);
}
public struct Vector3
{
+    public static Vector3 FusedMultiplyAdd(Vector3 x,Vector3 y, Vector3 z);
}
public struct Vector4
{
+    public static Vector4 FusedMultiplyAdd(Vector4 x,Vector4 y, Vector4 z);
}

Under the hood Vector2 could use Vector64<float> or MathF.FusedMultiplyAdd where applicable and faster. Vector3 i imagine would likely widen to Vector128<float> and set 0 to last element since it will be discarded when returning while Vector4 would be used as-is as Vector4<float>

Note: for 1st version it would be fine to just expose FMA as component-wise MathF.FusedMultiplyAdd without fancy SIMD support. The idea here is to enable simple FMA for System.Numerics

API Usage

var x = Vector3.UnitX;
var y = Vector3.UnitY;
var z = Vector3.UnitZ;

var fma = Vector3.FusedMultiplyAdd(x,y,z);

Alternative Designs

Alternative would be to write platform-agnostic SIMD FMA (which currently would use S.R.I.x86.FMA and S.R.I.ARM + software fallback under the hood) at which point handrolling FMA for Vector2/3/4 wouldnt be too bad.

Another alternative is to handroll on your own FMA for each component but that becomes ugly the more components there are and adding SIMD FMA support for perf makes this even worse, especially since theres no platform-agnostic SIMD FMA AFAIK.

Risks

None AFAIK

Author: BreyerW
Assignees: -
Labels:

api-suggestion, area-System.Numerics

Milestone: -

@tannergooding
Copy link
Member

This API is very difficult to provide since not all hardware has FMA support and it needs to behave the same whether that support exists or not. Therefore, it would resolve to a very slow implementation on older hardware which may be unexpected.

It would likely be better to expose MultiplyAddEstimate which is then free to do (a * b) + c -or- fma(a, b, c) depending on what the hardware supports. Such a name follows the existing convention we've established.

If the proposal is updated to follow that, we should consider exposing a similar API to float/double and the corresponding INumberBase interface.

@tannergooding tannergooding added needs-author-action An issue or pull request that requires more info or actions from the author. and removed untriaged New issue has not been triaged by the area owner labels Jul 21, 2023
@ghost
Copy link

ghost commented Jul 21, 2023

This issue has been marked needs-author-action and may be missing some important information.

@BreyerW
Copy link
Author

BreyerW commented Jul 23, 2023

@tannergooding done, let me know if i need to tweak proposal further.

BTW is there API that checks FMA support specifically? (not SIMD) or good enough approximate check in SIMD? Cause some folks may want to know that MAE is going to differ for very large inputs on unsupported hardware.

Also maybe we should add FusedMultiplyAdd along with Estimate variant anyway since im pretty sure there would be cases where correctness would trump any perf concerns (im referring to rounding behaviour difference). Software fallback would be just component-wise MathF.FusedMultiplyAdd which already has proper semantics but slow execution in face of lacking hardware support no?

And just food for thought: whats the newest hardware that does NOT support FMA? Im not hardware expert but maybe last hardware is old enough its no longer real concern?

@ghost ghost added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Jul 23, 2023
@stephentoub stephentoub added this to the Future milestone Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Numerics needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Projects
None yet
Development

No branches or pull requests

3 participants