[API Proposal]: MultiplyAddEstimate #98053

tannergooding · 2024-02-06T19:34:19Z

Summary

Today users have access to the direct operators (x * y) + z and the fused named method FusedMultiplyAdd(x, y, z). On some hardware using one over the other can provide significant perf advantages, however not all hardware provides acceleration for fma and further due to the difference in result (which may be 1 ULP difference for any standalone operation), it is not something that the runtime can freely optimize to use.

As such, it is proposed we provide an explicit Estimate function (matching the convention we've used for other operation where the result can differ across machines but the operation is not itself unsafe) that will implicitly pick the "best" of the two operations. Users who want a specific behavior can still get it by using the direct functions and users who want to opt-in can.

API Proposal

namespace System
{
    public partial struct Double
    {
        public static double MultiplyAddEstimate(double x, double y, double addend);
    }

    public partial struct Half
    {
        public static Half MultiplyAddEstimate(Half x, Half y, Half addend);
    }

    public partial struct Single
    {
        public static float MultiplyAddEstimate(float x, float y, float addend);
    }
}

namespace System.Numerics
{
    public partial interface IFloatingPointIeee754<TSelf>
    {
        static virtual TSelf MultiplyAddEstimate(TSelf x, TSelf y, TSelf addend);
    }

    public partial class Vector
    {
        static Vector<double> MultiplyAddEstimate(Vector<double> x, Vector<double> y, Vector<double> addend);
        static Vector<float> MultiplyAddEstimate(Vector<float> x, Vector<float> y, Vector<float> addend);
    }
}

namespace System.Numerics.Tensors
{
    public partial class TensorPrimitives
    {
        public static void MultiplyAddEstimate<T>(ReadOnlySpan<T> x, ReadOnlySpan<T> y, ReadOnlySpan<T> addend, Span<T> destination) where T : IFloatingPointIeee754<T>;
    }
}

namespace System.Runtime.InteropServices
{
    public partial struct NFloat
    {
        public static NFloat MultiplyAddEstimate(NFloat x, NFloat y, NFloat addend);
    }
}

namespace System.Runtime.Intrinsics
{
    public partial class Vector64
    {
        static Vector64<double> MultiplyAddEstimate(Vector64<double> x, Vector64<double> y, Vector64<double> addend);
        static Vector64<float> MultiplyAddEstimate(Vector64<float> x, Vector64<float> y, Vector64<float> addend);
    }

    public partial class Vector128
    {
        static Vector128<double> MultiplyAddEstimate(Vector128<double> x, Vector128<double> y, Vector128<double> addend);
        static Vector128<float> MultiplyAddEstimate(Vector128<float> x, Vector128<float> y, Vector128<float> addend);
    }

    public partial class Vector256
    {
        static Vector256<double> MultiplyAddEstimate(Vector256<double> x, Vector256<double> y, Vector256<double> addend);
        static Vector256<float> MultiplyAddEstimate(Vector256<float> x, Vector256<float> y, Vector256<float> addend);
    }

    public partial class Vector512
    {
        static Vector512<double> MultiplyAddEstimate(Vector512<double> x, Vector512<double> y, Vector512<double> addend);
        static Vector512<float> MultiplyAddEstimate(Vector512<float> x, Vector512<float> y, Vector512<float> addend);
    }
}

The text was updated successfully, but these errors were encountered:

ghost · 2024-02-06T19:34:24Z

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

Summary

Today users have access to the direct operators (x * y) + z and the fused named method FusedMultiplyAdd(x, y, z). On some hardware using one over the other can provide significant perf advantages, however not all hardware provides acceleration for fma and further due to the difference in result (which may be 1 ULP difference for any standalone operation), it is not something that the runtime can freely optimize to use.

As such, it is proposed we provide an explicit Estimate function (matching the convention we've used for other operation where the result can differ across machines but the operation is not itself unsafe) that will implicitly pick the "best" of the two operations. Users who want a specific behavior can still get it by using the direct functions and users who want to opt-in can.

API Proposal

namespace System
{
    public partial struct Double
    {
        public static double MultiplyAddEstimate(double x, double y, double addend);
    }

    public partial struct Half
    {
        public static Half MultiplyAddEstimate(Half x, Half y, Half addend);
    }

    public partial struct Single
    {
        public static float MultiplyAddEstimate(float x, float y, float addend);
    }
}

namespace System.Numerics
{
    public partial interface IFloatingPointIeee754<TSelf>
    {
        static virtual TSelf MultiplyAddEstimate(TSelf x, TSelf y, TSelf addend);
    }

    public partial class Vector
    {
        static Vector<double> MultiplyAddEstimate(Vector<double> x, Vector<double> y, Vector<double> addend);
        static Vector<float> MultiplyAddEstimate(Vector<float> x, Vector<float> y, Vector<float> addend);
    }
}

namespace System.Numerics.Tensors
{
    public partial class TensorPrimitives
    {
        public static void MultiplyAddEstimate<T>(ReadOnlySpan<T> x, ReadOnlySpan<T> y, ReadOnlySpan<T> addend, Span<T> destination) where T : IFloatingPointIeee754<T>;
    }
}

namespace System.Runtime.InteropServices
{
    public partial struct NFloat
    {
        public static NFloat MultiplyAddEstimate(NFloat x, NFloat y, NFloat addend);
    }
}

namespace System.Runtime.Intrinsics
{
    public partial class Vector64
    {
        static Vector64<double> MultiplyAddEstimate(Vector64<double> x, Vector64<double> y, Vector64<double> addend);
        static Vector64<float> MultiplyAddEstimate(Vector64<float> x, Vector64<float> y, Vector64<float> addend);
    }

    public partial class Vector128
    {
        static Vector128<double> MultiplyAddEstimate(Vector128<double> x, Vector128<double> y, Vector128<double> addend);
        static Vector128<float> MultiplyAddEstimate(Vector128<float> x, Vector128<float> y, Vector128<float> addend);
    }

    public partial class Vector256
    {
        static Vector256<double> MultiplyAddEstimate(Vector256<double> x, Vector256<double> y, Vector256<double> addend);
        static Vector256<float> MultiplyAddEstimate(Vector256<float> x, Vector256<float> y, Vector256<float> addend);
    }

    public partial class Vector512
    {
        static Vector512<double> MultiplyAddEstimate(Vector512<double> x, Vector512<double> y, Vector512<double> addend);
        static Vector512<float> MultiplyAddEstimate(Vector512<float> x, Vector512<float> y, Vector512<float> addend);
    }
}

Author:	tannergooding
Assignees:	-
Labels:	`area-System.Numerics`, `api-ready-for-review`
Milestone:	-

bartonjs · 2024-02-06T20:04:55Z

Video

Looks good as proposed. For anyone who wants to hear Jeremy be way too nerdy about linguistics, you need to listen to the API Review video, because Estimate and Estimate are different words.

namespace System
{
    public partial struct Double
    {
        public static double MultiplyAddEstimate(double x, double y, double addend);
    }

    public partial struct Half
    {
        public static Half MultiplyAddEstimate(Half x, Half y, Half addend);
    }

    public partial struct Single
    {
        public static float MultiplyAddEstimate(float x, float y, float addend);
    }
}

namespace System.Numerics
{
    public partial interface IFloatingPointIeee754<TSelf>
    {
        static virtual TSelf MultiplyAddEstimate(TSelf x, TSelf y, TSelf addend);
    }

    public partial class Vector
    {
        static Vector<double> MultiplyAddEstimate(Vector<double> x, Vector<double> y, Vector<double> addend);
        static Vector<float> MultiplyAddEstimate(Vector<float> x, Vector<float> y, Vector<float> addend);
    }
}

namespace System.Numerics.Tensors
{
    public partial class TensorPrimitives
    {
        public static void MultiplyAddEstimate<T>(ReadOnlySpan<T> x, ReadOnlySpan<T> y, ReadOnlySpan<T> addend, Span<T> destination) where T : IFloatingPointIeee754<T>;
    }
}

namespace System.Runtime.InteropServices
{
    public partial struct NFloat
    {
        public static NFloat MultiplyAddEstimate(NFloat x, NFloat y, NFloat addend);
    }
}

namespace System.Runtime.Intrinsics
{
    public partial class Vector64
    {
        static Vector64<double> MultiplyAddEstimate(Vector64<double> x, Vector64<double> y, Vector64<double> addend);
        static Vector64<float> MultiplyAddEstimate(Vector64<float> x, Vector64<float> y, Vector64<float> addend);
    }

    public partial class Vector128
    {
        static Vector128<double> MultiplyAddEstimate(Vector128<double> x, Vector128<double> y, Vector128<double> addend);
        static Vector128<float> MultiplyAddEstimate(Vector128<float> x, Vector128<float> y, Vector128<float> addend);
    }

    public partial class Vector256
    {
        static Vector256<double> MultiplyAddEstimate(Vector256<double> x, Vector256<double> y, Vector256<double> addend);
        static Vector256<float> MultiplyAddEstimate(Vector256<float> x, Vector256<float> y, Vector256<float> addend);
    }

    public partial class Vector512
    {
        static Vector512<double> MultiplyAddEstimate(Vector512<double> x, Vector512<double> y, Vector512<double> addend);
        static Vector512<float> MultiplyAddEstimate(Vector512<float> x, Vector512<float> y, Vector512<float> addend);
    }
}

BreyerW · 2024-02-07T12:07:29Z

any chance of adding same functions to Vec2/3/4 as convenience? like i suggested here a while ago #87377 (originally my proposal was for FMA proper but changed to Estimate per @tannergooding feedback)

tannergooding · 2024-02-07T15:04:46Z

Vector2/3/4 would need to be considered separately, they are themselves "out of sync" as compared to Vector128<float> and some of that is "by design" since they aren't general purpose vectors, but rather specifically Euclidean vectors.

As is, dev's aren't blocked from the APIs as they can simply Vector128.MultiplyAddEstimate(x.AsVector128(), y.AsVector128(), z.AsVector128()).AsVector4() or similar.

stephentoub · 2024-02-07T21:49:47Z

Just FYI, I'm adding the TensorPrimitives APIs as part of fixing the FusedMultiplyAdd APIs.

tannergooding · 2024-02-08T03:30:25Z

Half was approved here already, it likely won't be accelerated immediately.

MineCake147E · 2024-02-08T06:21:45Z

Half was approved here already, it likely won't be accelerated immediately.

I missed reading that. Sorry about that.

MineCake147E · 2024-02-08T06:25:03Z

By the way, should I also include MultiplyAddEstimate in #62416?

tannergooding · 2024-02-08T06:59:30Z

For explicit consistency, yes.

vladd · 2024-02-11T17:33:18Z

I wonder why can’t jitter automatically use this function instead of requiring to specify it explicitly.

tannergooding · 2024-02-11T17:45:44Z

I wonder why can’t jitter automatically use this function instead of requiring to specify it explicitly.

Because it isn't safe. fma(x, y, z) and (x * y) + z can produce different results (for normal inputs, fma(x, y, z) can be 1ulp more accurate). Automatically doing this leads to hard to diagnose bugs that may reproduce on some but not all hardware.

Some native compilers offer a feature switch such as /fp:fast, but that itself leads to major problems as well, especially in components that aren't setup to handle the difference or which may be already carefully tuned for precision assuming 2 rounding operations. -- A common problem with fp:fast like switches can also be how it impacts external libraries/functions that are then linked or inlined into the library which opted into fp:fast. This can lead to components you never intended to change to now subtly break, and there have been some decently sized bugs that have cropped up from this from all native compilers in the past.

When it comes to non-deterministic features like this, the best practice is to require the user to opt-in at a per call site basis. This minimizes the risk and helps ensure that it's only done where it would actually be beneficial and the user understands the impact of what they're opting into.

tannergooding added area-System.Numerics api-ready-for-review API is ready for review, it is NOT ready for implementation labels Feb 6, 2024

tannergooding added this to the 9.0.0 milestone Feb 6, 2024

ghost added the untriaged New issue has not been triaged by the area owner label Feb 6, 2024

bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Feb 6, 2024

tannergooding removed the untriaged New issue has not been triaged by the area owner label Feb 6, 2024

stephentoub mentioned this issue Feb 7, 2024

Fix TensorPrimitives.FusedMultiplyAdd #98131

Merged

This comment was marked as resolved.

Sign in to view

MineCake147E mentioned this issue Feb 9, 2024

[API Proposal]: VectorXXX<Half> support, and related various Hardware Intrinsics(F16C, AVX-512 FP16, and AdvSimd.Arm64) #62416

Open

tannergooding mentioned this issue May 14, 2024

Expose the FusedMultiplyAdd and MultiplyAddEstimate APIs on relevant vector and scalar types #102181

Merged

dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label May 14, 2024

tannergooding closed this as completed in #102181 May 16, 2024

github-actions bot locked and limited conversation to collaborators Jun 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[API Proposal]: MultiplyAddEstimate #98053

[API Proposal]: MultiplyAddEstimate #98053

tannergooding commented Feb 6, 2024

ghost commented Feb 6, 2024

Summary

API Proposal

bartonjs commented Feb 6, 2024 •

edited by dotnet-api-review bot

Loading

BreyerW commented Feb 7, 2024 •

edited

Loading

tannergooding commented Feb 7, 2024

stephentoub commented Feb 7, 2024

This comment was marked as resolved.

tannergooding commented Feb 8, 2024

MineCake147E commented Feb 8, 2024

MineCake147E commented Feb 8, 2024

tannergooding commented Feb 8, 2024

vladd commented Feb 11, 2024

tannergooding commented Feb 11, 2024 •

edited

Loading

[API Proposal]: MultiplyAddEstimate #98053

[API Proposal]: MultiplyAddEstimate #98053

Comments

tannergooding commented Feb 6, 2024

Summary

API Proposal

ghost commented Feb 6, 2024

Summary

API Proposal

bartonjs commented Feb 6, 2024 • edited by dotnet-api-review bot Loading

BreyerW commented Feb 7, 2024 • edited Loading

tannergooding commented Feb 7, 2024

stephentoub commented Feb 7, 2024

This comment was marked as resolved.

tannergooding commented Feb 8, 2024

MineCake147E commented Feb 8, 2024

MineCake147E commented Feb 8, 2024

tannergooding commented Feb 8, 2024

vladd commented Feb 11, 2024

tannergooding commented Feb 11, 2024 • edited Loading

bartonjs commented Feb 6, 2024 •

edited by dotnet-api-review bot

Loading

BreyerW commented Feb 7, 2024 •

edited

Loading

tannergooding commented Feb 11, 2024 •

edited

Loading