Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: MultiplyAddEstimate #98053

Closed
tannergooding opened this issue Feb 6, 2024 · 12 comments · Fixed by #102181
Closed

[API Proposal]: MultiplyAddEstimate #98053

tannergooding opened this issue Feb 6, 2024 · 12 comments · Fixed by #102181
Labels
api-approved API was approved in API review, it can be implemented area-System.Numerics in-pr There is an active PR which will close this issue when it is merged
Milestone

Comments

@tannergooding
Copy link
Member

Summary

Today users have access to the direct operators (x * y) + z and the fused named method FusedMultiplyAdd(x, y, z). On some hardware using one over the other can provide significant perf advantages, however not all hardware provides acceleration for fma and further due to the difference in result (which may be 1 ULP difference for any standalone operation), it is not something that the runtime can freely optimize to use.

As such, it is proposed we provide an explicit Estimate function (matching the convention we've used for other operation where the result can differ across machines but the operation is not itself unsafe) that will implicitly pick the "best" of the two operations. Users who want a specific behavior can still get it by using the direct functions and users who want to opt-in can.

API Proposal

namespace System
{
    public partial struct Double
    {
        public static double MultiplyAddEstimate(double x, double y, double addend);
    }

    public partial struct Half
    {
        public static Half MultiplyAddEstimate(Half x, Half y, Half addend);
    }

    public partial struct Single
    {
        public static float MultiplyAddEstimate(float x, float y, float addend);
    }
}

namespace System.Numerics
{
    public partial interface IFloatingPointIeee754<TSelf>
    {
        static virtual TSelf MultiplyAddEstimate(TSelf x, TSelf y, TSelf addend);
    }

    public partial class Vector
    {
        static Vector<double> MultiplyAddEstimate(Vector<double> x, Vector<double> y, Vector<double> addend);
        static Vector<float> MultiplyAddEstimate(Vector<float> x, Vector<float> y, Vector<float> addend);
    }
}

namespace System.Numerics.Tensors
{
    public partial class TensorPrimitives
    {
        public static void MultiplyAddEstimate<T>(ReadOnlySpan<T> x, ReadOnlySpan<T> y, ReadOnlySpan<T> addend, Span<T> destination) where T : IFloatingPointIeee754<T>;
    }
}

namespace System.Runtime.InteropServices
{
    public partial struct NFloat
    {
        public static NFloat MultiplyAddEstimate(NFloat x, NFloat y, NFloat addend);
    }
}

namespace System.Runtime.Intrinsics
{
    public partial class Vector64
    {
        static Vector64<double> MultiplyAddEstimate(Vector64<double> x, Vector64<double> y, Vector64<double> addend);
        static Vector64<float> MultiplyAddEstimate(Vector64<float> x, Vector64<float> y, Vector64<float> addend);
    }

    public partial class Vector128
    {
        static Vector128<double> MultiplyAddEstimate(Vector128<double> x, Vector128<double> y, Vector128<double> addend);
        static Vector128<float> MultiplyAddEstimate(Vector128<float> x, Vector128<float> y, Vector128<float> addend);
    }

    public partial class Vector256
    {
        static Vector256<double> MultiplyAddEstimate(Vector256<double> x, Vector256<double> y, Vector256<double> addend);
        static Vector256<float> MultiplyAddEstimate(Vector256<float> x, Vector256<float> y, Vector256<float> addend);
    }

    public partial class Vector512
    {
        static Vector512<double> MultiplyAddEstimate(Vector512<double> x, Vector512<double> y, Vector512<double> addend);
        static Vector512<float> MultiplyAddEstimate(Vector512<float> x, Vector512<float> y, Vector512<float> addend);
    }
}
@tannergooding tannergooding added area-System.Numerics api-ready-for-review API is ready for review, it is NOT ready for implementation labels Feb 6, 2024
@ghost
Copy link

ghost commented Feb 6, 2024

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

Summary

Today users have access to the direct operators (x * y) + z and the fused named method FusedMultiplyAdd(x, y, z). On some hardware using one over the other can provide significant perf advantages, however not all hardware provides acceleration for fma and further due to the difference in result (which may be 1 ULP difference for any standalone operation), it is not something that the runtime can freely optimize to use.

As such, it is proposed we provide an explicit Estimate function (matching the convention we've used for other operation where the result can differ across machines but the operation is not itself unsafe) that will implicitly pick the "best" of the two operations. Users who want a specific behavior can still get it by using the direct functions and users who want to opt-in can.

API Proposal

namespace System
{
    public partial struct Double
    {
        public static double MultiplyAddEstimate(double x, double y, double addend);
    }

    public partial struct Half
    {
        public static Half MultiplyAddEstimate(Half x, Half y, Half addend);
    }

    public partial struct Single
    {
        public static float MultiplyAddEstimate(float x, float y, float addend);
    }
}

namespace System.Numerics
{
    public partial interface IFloatingPointIeee754<TSelf>
    {
        static virtual TSelf MultiplyAddEstimate(TSelf x, TSelf y, TSelf addend);
    }

    public partial class Vector
    {
        static Vector<double> MultiplyAddEstimate(Vector<double> x, Vector<double> y, Vector<double> addend);
        static Vector<float> MultiplyAddEstimate(Vector<float> x, Vector<float> y, Vector<float> addend);
    }
}

namespace System.Numerics.Tensors
{
    public partial class TensorPrimitives
    {
        public static void MultiplyAddEstimate<T>(ReadOnlySpan<T> x, ReadOnlySpan<T> y, ReadOnlySpan<T> addend, Span<T> destination) where T : IFloatingPointIeee754<T>;
    }
}

namespace System.Runtime.InteropServices
{
    public partial struct NFloat
    {
        public static NFloat MultiplyAddEstimate(NFloat x, NFloat y, NFloat addend);
    }
}

namespace System.Runtime.Intrinsics
{
    public partial class Vector64
    {
        static Vector64<double> MultiplyAddEstimate(Vector64<double> x, Vector64<double> y, Vector64<double> addend);
        static Vector64<float> MultiplyAddEstimate(Vector64<float> x, Vector64<float> y, Vector64<float> addend);
    }

    public partial class Vector128
    {
        static Vector128<double> MultiplyAddEstimate(Vector128<double> x, Vector128<double> y, Vector128<double> addend);
        static Vector128<float> MultiplyAddEstimate(Vector128<float> x, Vector128<float> y, Vector128<float> addend);
    }

    public partial class Vector256
    {
        static Vector256<double> MultiplyAddEstimate(Vector256<double> x, Vector256<double> y, Vector256<double> addend);
        static Vector256<float> MultiplyAddEstimate(Vector256<float> x, Vector256<float> y, Vector256<float> addend);
    }

    public partial class Vector512
    {
        static Vector512<double> MultiplyAddEstimate(Vector512<double> x, Vector512<double> y, Vector512<double> addend);
        static Vector512<float> MultiplyAddEstimate(Vector512<float> x, Vector512<float> y, Vector512<float> addend);
    }
}
Author: tannergooding
Assignees: -
Labels:

area-System.Numerics, api-ready-for-review

Milestone: -

@tannergooding tannergooding added this to the 9.0.0 milestone Feb 6, 2024
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Feb 6, 2024
@bartonjs
Copy link
Member

bartonjs commented Feb 6, 2024

Video

Looks good as proposed. For anyone who wants to hear Jeremy be way too nerdy about linguistics, you need to listen to the API Review video, because Estimate and Estimate are different words.

namespace System
{
    public partial struct Double
    {
        public static double MultiplyAddEstimate(double x, double y, double addend);
    }

    public partial struct Half
    {
        public static Half MultiplyAddEstimate(Half x, Half y, Half addend);
    }

    public partial struct Single
    {
        public static float MultiplyAddEstimate(float x, float y, float addend);
    }
}

namespace System.Numerics
{
    public partial interface IFloatingPointIeee754<TSelf>
    {
        static virtual TSelf MultiplyAddEstimate(TSelf x, TSelf y, TSelf addend);
    }

    public partial class Vector
    {
        static Vector<double> MultiplyAddEstimate(Vector<double> x, Vector<double> y, Vector<double> addend);
        static Vector<float> MultiplyAddEstimate(Vector<float> x, Vector<float> y, Vector<float> addend);
    }
}

namespace System.Numerics.Tensors
{
    public partial class TensorPrimitives
    {
        public static void MultiplyAddEstimate<T>(ReadOnlySpan<T> x, ReadOnlySpan<T> y, ReadOnlySpan<T> addend, Span<T> destination) where T : IFloatingPointIeee754<T>;
    }
}

namespace System.Runtime.InteropServices
{
    public partial struct NFloat
    {
        public static NFloat MultiplyAddEstimate(NFloat x, NFloat y, NFloat addend);
    }
}

namespace System.Runtime.Intrinsics
{
    public partial class Vector64
    {
        static Vector64<double> MultiplyAddEstimate(Vector64<double> x, Vector64<double> y, Vector64<double> addend);
        static Vector64<float> MultiplyAddEstimate(Vector64<float> x, Vector64<float> y, Vector64<float> addend);
    }

    public partial class Vector128
    {
        static Vector128<double> MultiplyAddEstimate(Vector128<double> x, Vector128<double> y, Vector128<double> addend);
        static Vector128<float> MultiplyAddEstimate(Vector128<float> x, Vector128<float> y, Vector128<float> addend);
    }

    public partial class Vector256
    {
        static Vector256<double> MultiplyAddEstimate(Vector256<double> x, Vector256<double> y, Vector256<double> addend);
        static Vector256<float> MultiplyAddEstimate(Vector256<float> x, Vector256<float> y, Vector256<float> addend);
    }

    public partial class Vector512
    {
        static Vector512<double> MultiplyAddEstimate(Vector512<double> x, Vector512<double> y, Vector512<double> addend);
        static Vector512<float> MultiplyAddEstimate(Vector512<float> x, Vector512<float> y, Vector512<float> addend);
    }
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Feb 6, 2024
@tannergooding tannergooding removed the untriaged New issue has not been triaged by the area owner label Feb 6, 2024
@BreyerW
Copy link

BreyerW commented Feb 7, 2024

any chance of adding same functions to Vec2/3/4 as convenience? like i suggested here a while ago #87377 (originally my proposal was for FMA proper but changed to Estimate per @tannergooding feedback)

@tannergooding
Copy link
Member Author

Vector2/3/4 would need to be considered separately, they are themselves "out of sync" as compared to Vector128<float> and some of that is "by design" since they aren't general purpose vectors, but rather specifically Euclidean vectors.

As is, dev's aren't blocked from the APIs as they can simply Vector128.MultiplyAddEstimate(x.AsVector128(), y.AsVector128(), z.AsVector128()).AsVector4() or similar.

@stephentoub
Copy link
Member

Just FYI, I'm adding the TensorPrimitives APIs as part of fixing the FusedMultiplyAdd APIs.

@MineCake147E

This comment was marked as resolved.

@tannergooding
Copy link
Member Author

Half was approved here already, it likely won't be accelerated immediately.

@MineCake147E
Copy link
Contributor

Half was approved here already, it likely won't be accelerated immediately.

I missed reading that. Sorry about that.

@MineCake147E
Copy link
Contributor

By the way, should I also include MultiplyAddEstimate in #62416?

@tannergooding
Copy link
Member Author

For explicit consistency, yes.

@vladd
Copy link

vladd commented Feb 11, 2024

I wonder why can’t jitter automatically use this function instead of requiring to specify it explicitly.

@tannergooding
Copy link
Member Author

tannergooding commented Feb 11, 2024

I wonder why can’t jitter automatically use this function instead of requiring to specify it explicitly.

Because it isn't safe. fma(x, y, z) and (x * y) + z can produce different results (for normal inputs, fma(x, y, z) can be 1ulp more accurate). Automatically doing this leads to hard to diagnose bugs that may reproduce on some but not all hardware.

Some native compilers offer a feature switch such as /fp:fast, but that itself leads to major problems as well, especially in components that aren't setup to handle the difference or which may be already carefully tuned for precision assuming 2 rounding operations. -- A common problem with fp:fast like switches can also be how it impacts external libraries/functions that are then linked or inlined into the library which opted into fp:fast. This can lead to components you never intended to change to now subtly break, and there have been some decently sized bugs that have cropped up from this from all native compilers in the past.

When it comes to non-deterministic features like this, the best practice is to require the user to opt-in at a per call site basis. This minimizes the risk and helps ensure that it's only done where it would actually be beneficial and the user understands the impact of what they're opting into.

@dotnet-policy-service dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label May 14, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Jun 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Numerics in-pr There is an active PR which will close this issue when it is merged
Projects
None yet
6 participants