Skip to content

Conversation

a74nh
Copy link
Contributor

@a74nh a74nh commented Jun 20, 2025

Fixes #116847

When folding, allow arg1 to be a constant mask

Fixes dotnet#116847

When folding, allow arg1 to be a constant mask
@a74nh a74nh marked this pull request as ready for review June 20, 2025 10:46
@a74nh
Copy link
Contributor Author

a74nh commented Jun 20, 2025

@dotnet/arm64-contrib @kunalspathak

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jun 20, 2025
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 20, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@kunalspathak
Copy link
Contributor

/azp run Antigen, Fuzzlyn

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Copy link
Contributor

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bug in EvaluateSimdVectorToPattern where we are not validating it against the contents of arg0.

@a74nh
Copy link
Contributor Author

a74nh commented Jun 23, 2025

The simd.h changes in this PR are out of date. We should wait for #116854 to be merged first.

@kunalspathak
Copy link
Contributor

i will hold off reviewing this until #116854 is merged.

@a74nh
Copy link
Contributor Author

a74nh commented Jun 24, 2025

Fuzzlyn was showing some errors with

Assert failure(PID 1845356 [0x001c286c], Thread: 1845356 [0x1c286c]): Assertion failed 'OperIs(GT_CNS_VEC)' in 'Program:M1(byref,int,System.Runtime.Intrinsics.Vector64`1[ushort]):ushort' during 'Morph - Global' (IL size 56; hash 0xc2f1cf02; FullOpts)

    File: /mnt/sdb/home/alahay01/dotnet/runtime_madv/src/coreclr/jit/gtstructs.h:64
    Image: /mnt/sdb/home/alahay01/dotnet/runtime_madv/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/corerun

This is due to EvalHWIntrinsicFunTernary() not treating arg1 of NI_Sve_ConditionalSelect` as a mask.

The fix is closely related to the fix in this PR, so I've put it into this PR with a test.

I've also updated simd.h to the latest from #116854

Copy link
Contributor

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@a74nh
Copy link
Contributor Author

a74nh commented Jul 2, 2025

I've switched EvaluateSimdCvtVectorToMask() to check against 0 on Arm64, as per this discussion: #116991 (comment)

I put it in this PR as it's all related to constant vectors/masks.

With these changes, I'm not currently seeing any Fuzzlyn failures.

@tannergooding
Copy link
Member

Couple minor changes needed, but otherwise LGTM

@a74nh
Copy link
Contributor Author

a74nh commented Jul 2, 2025

@tannergooding

Fuzzlyn isn't completely clean, I'm seeing one issue per hour.

    public static int M12()
    {
        for (int lvar0 = 2; lvar0 > 0; lvar0--)
        {
            var vr3 = Sve.ConditionalSelect(Vector128.CreateScalar((byte)1).AsVector(), Vector.Create<byte>(0), Vector.Create<byte>(1));
            uint var1 = (uint)Sve.SaturatingIncrementByActiveElementCount(s_11, vr3);
            Consume(var1);
        }

        return 0;
    }
Assert failure(PID 3174564 [0x003070a4], Thread: 3174564 [0x3070a4]): Assertion failed 'use.User()->OperIsHWIntrinsic()' in 'Program:M12():int' during 'Lowering nodeinfo' (IL size 60; hash 0x9314a4ed; FullOpts)

    File: /home/alahay01/dotnet/runtime_sve/src/coreclr/jit/lowerarmarch.cpp:1184
    Image: /home/alahay01/dotnet/runtime_sve/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/corerun
------------ BB01 [0000] [000..03C) (return), preds={} succs={}
               [000077] -----------                            IL_OFFSET void   INLRT @ 0x021[E-]
N001 (  3, 12) [000041] H----------                   t41 =    CNS_INT(h) long   0xf2451404b1a8 static Fseq[s_11] $c0
                                                            /--*  t41    long   
N002 (  3, 12) [000069] DA---------                         *  STORE_LCL_VAR long   V03 cse0         d:1 $VN.Void
N003 (  1,  1) [000070] -----------                   t70 =    LCL_VAR   long   V03 cse0         u:1 $c0
                                                            /--*  t70    long   
N005 (  8, 16) [000040] nA--G------                   t40 = *  IND       ushort <l:$100, c:$140>
N006 (  3,  2) [000042] -----------                   t42 =    CNS_MSK   mask  <0x0000fffe, 0x00000000> $180
                                                            /--*  t42    mask   
N007 (  3,  3) [000073] DA---------                         *  STORE_LCL_VAR mask   V04 cse1         d:1 $VN.Void
N008 (  1,  1) [000074] -----------                   t74 =    LCL_VAR   mask   V04 cse1         u:1 $180
                                                            /--*  t40    ushort 
                                                            +--*  t74    mask   
N010 ( 13, 21) [000039] -A--G------                   t39 = *  HWINTRINSIC long   16 ubyte SaturatingIncrementByActiveElementCount <l:$200, c:$201>
                                                            /--*  t39    long   
N011 ( 14, 23) [000038] -A--G------                   t38 = *  CAST      int <- uint <- long <l:$240, c:$241>
                                                            /--*  t38    int    arg0 x0
N012 ( 28, 26) [000037] -ACXG------                         *  CALL      void   Program:Consume[uint](uint) $VN.Void
               [000078] -----------                            IL_OFFSET void   INLRT @ 0x021[E-]
N001 (  1,  1) [000072] -----------                   t72 =    LCL_VAR   long   V03 cse0         u:1 $c0
                                                            /--*  t72    long   
N002 (  4,  3) [000056] n---G------                   t56 = *  IND       ushort <l:$101, c:$141>
N003 (  1,  1) [000076] -----------                   t76 =    LCL_VAR   mask   V04 cse1         u:1 $180
                                                            /--*  t56    ushort 
                                                            +--*  t76    mask   
N004 (  6,  5) [000055] ----G------                   t55 = *  HWINTRINSIC long   16 ubyte SaturatingIncrementByActiveElementCount <l:$202, c:$203>
                                                            /--*  t55    long   
N005 (  7,  7) [000054] ----G------                   t54 = *  CAST      int <- uint <- long <l:$242, c:$243>
                                                            /--*  t54    int    arg0 x0
N006 ( 21, 10) [000053] --CXG------                         *  CALL      void   Program:Consume[uint](uint) $VN.Void
               [000079] -----------                            IL_OFFSET void   INLRT @ 0x03A[E-]
N001 (  1,  2) [000028] -----+-----                   t28 =    CNS_INT   int    0 $40
                                                            /--*  t28    int    
N002 (  2,  3) [000029] -----+-----                         *  RETURN    int    $VN.Void

What's happening is the conditionalSelect has been optimised down to a constant vector, but it's only used as a mask, so it gets replaced with a constant mask. This mask is then stored as a local variable.

When lowering the mask, it fails to find an SVE mask pattern, so it tries to convert it back to a constant vector. To do that it needs the type of vector type of the user of the mask. But the user is a lcl store, so it asserts.

I don't think there is any obvious way of fixing this inside lowering. Is it safe to have lcl vars stored as masks? Can we get the required base types from tree of the LCL_VAR (urgh) ? Should we just prevent constant vectors being converted to masks when being stored as local vars?

@tannergooding
Copy link
Member

tannergooding commented Jul 2, 2025

Is it safe to have lcl vars stored as masks?

Yes, this is intentional to allow CSE and other core optimizations to lightup

Can we get the required base types from tree of the LCL_VAR (urgh) ?

Not trivially today, and it should generally be assumed there may be cases that it is not possible.

Should we just prevent constant vectors being converted to masks when being stored as local vars?

I don't think this is desirable. Rather you want to ensure that the HWIntrinsicInfo::CanBenefitFromConstantProp and GenTree::ShouldConstantProp methods are aware of Sve.ConditionalSelect and that there are cases where constants or key constant values are containable and therefore always profitable to propagate rather than CSE: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/assertionprop.cpp#L3121-L3136. We have a similar check in morph here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/morph.cpp#L11226-L11232 (although one or both of these should be updated to also handle CNS_MSK rather than just CNS_VEC).

But, we may still get non-propagatable scenarios, so really this should be just loading a mask constant, rather than doing the vector constant -> mask conversion so the issue can be avoided altogether. Short of that, however, this can always be done safely presuming byte as the base type, as that will ensure each bit of the predicate is correctly initialized. I believe it can even just use pmov (to predicate) which allows this to only load 16-bits and can effectively be CreateScalarUnsafe<ushort>(...) followed by pmov pg.b, zn

@a74nh
Copy link
Contributor Author

a74nh commented Jul 2, 2025

I believe it can even just use pmov (to predicate) which allows this to only load 16-bits and can effectively be CreateScalarUnsafe<ushort>(...) followed by pmov pg.b, zn

Yes it can, but that is SVE2.1 and there aren't any machines that support it yet.

@a74nh
Copy link
Contributor Author

a74nh commented Jul 2, 2025

I don't think this is desirable. Rather you want to ensure that the HWIntrinsicInfo::CanBenefitFromConstantProp and GenTree::ShouldConstantProp methods are aware of Sve.ConditionalSelect and that there are cases where constants or key constant values are containable and therefore always profitable to propagate rather than CSE: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/assertionprop.cpp#L3121-L3136. We have a similar check in morph here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/morph.cpp#L11226-L11232 (although one or both of these should be updated to also handle CNS_MSK rather than just CNS_VEC).

If the PR is functionally ok as is, could we push this to another PR. That way we can get fuzzlyn working again with this PR.

@tannergooding
Copy link
Member

If the PR is functionally ok as is, could we push this to another PR. That way we can get fuzzlyn working again with this PR.

Yeah, for sure. That isn't a functionality issue just something that can help improve codegen.

Was more calling it out as what you'd want to do to avoid the more expensive path if we have places that can otherwise optimize.

@a74nh
Copy link
Contributor Author

a74nh commented Jul 3, 2025

This PR is looking good now. Fuzzlyn looks happy (except for hitting the known #113940). CI failures aren't related to this PR. @amanasifkhalid

Copy link
Contributor

@amanasifkhalid amanasifkhalid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannergooding any other comments?

@tannergooding tannergooding merged commit 4a06ac3 into dotnet:main Jul 3, 2025
116 of 118 checks passed
@a74nh a74nh deleted the cndselcns_github branch July 4, 2025 08:20
@github-actions github-actions bot locked and limited conversation to collaborators Aug 3, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Arm64 SVE: Error when ConditionalSelect has all constant arguments

4 participants