-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: FMA for Vector2/3/4 specifically #87377
Comments
Tagging subscribers to this area: @dotnet/area-system-numerics Issue DetailsBackground and motivationCurrently FMA is exposed for primitives ( API Proposalnamespace System.Numerics;
public struct Vector2
{
+ public static Vector2 FusedMultiplyAdd(Vector2 x,Vector2 y, Vector2 z);
}
public struct Vector3
{
+ public static Vector3 FusedMultiplyAdd(Vector3 x,Vector3 y, Vector3 z);
}
public struct Vector4
{
+ public static Vector4 FusedMultiplyAdd(Vector4 x,Vector4 y, Vector4 z);
} Under the hood Note: for 1st version it would be fine to just expose FMA as component-wise API Usagevar x = Vector3.UnitX;
var y = Vector3.UnitY;
var z = Vector3.UnitZ;
var fma = Vector3.FusedMultiplyAdd(x,y,z); Alternative DesignsAlternative would be to write platform-agnostic SIMD FMA (which currently would use S.R.I.x86.FMA and S.R.I.ARM + software fallback under the hood) at which point handrolling FMA for Another alternative is to handroll on your own FMA for each component but that becomes ugly the more components there are and adding SIMD FMA support for perf makes this even worse, especially since theres no platform-agnostic SIMD FMA AFAIK. RisksNone AFAIK
|
This API is very difficult to provide since not all hardware has It would likely be better to expose If the proposal is updated to follow that, we should consider exposing a similar API to |
This issue has been marked |
@tannergooding done, let me know if i need to tweak proposal further. BTW is there API that checks FMA support specifically? (not SIMD) or good enough approximate check in SIMD? Cause some folks may want to know that MAE is going to differ for very large inputs on unsupported hardware. Also maybe we should add And just food for thought: whats the newest hardware that does NOT support FMA? Im not hardware expert but maybe last hardware is old enough its no longer real concern? |
Background and motivation
Currently FMA is exposed for primitives (
double
&floats
) and full blown SIMD vectors but nothing for convenience primititves likeVector2/3/4
AFAIK which sits between them. FMA isnt about only perf (on hardware that has built-in FMA/SIMD FMA ofc) but also about avoiding intermediate rounding.API Proposal
Under the hood
Vector2
could useVector64<float>
orMathF.FusedMultiplyAdd
where applicable and faster.Vector3
i imagine would likely widen toVector128<float>
and set 0 to last element since it will be discarded when returning whileVector4
would be used as-is asVector128<float>
.Software fallback would be simple
(a * b) + c
component-wise for perf reasons henceEstimate
suffix since software fallback would differ in rounding behaviour for very large components.API Usage
Alternative Designs
Alternative would be to write platform-agnostic SIMD FMA (which currently would use S.R.I.x86.FMA and S.R.I.ARM + software fallback under the hood) at which point handrolling FMA for
Vector2/3/4
wouldnt be too bad.Another alternative is to handroll on your own FMA for each component but that becomes ugly the more components there are and adding SIMD FMA support for perf makes this even worse, especially since theres no platform-agnostic SIMD FMA AFAIK.
Risks
Estimate behaviour in face of different hardware support for FMA could be suprising but thats mostly documentation exercise and
Estimate
suffix already points out its not exactlyFusedMultiplyAdd
.The text was updated successfully, but these errors were encountered: