-
Notifications
You must be signed in to change notification settings - Fork 16.7k
[IR] Add llvm.masked.load.first.fault intrinsic #156470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
huntergr-arm
wants to merge
7
commits into
llvm:main
Choose a base branch
from
huntergr-arm:speculative-load-intrinsic
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
21688bd
[IR] Add llvm.masked.load.first.fault intrinsic
huntergr-arm c1492bb
Reword langref entry based on comments
huntergr-arm ca5d3a6
Try to fix windows build
huntergr-arm c517f9f
Renamed intrinsic
huntergr-arm dcb8f37
Moved alignment to parameter attribute
huntergr-arm 15491cf
Replace semantics section with link to vp.load.ff + differences
huntergr-arm bed61d9
Update tests after rebase
huntergr-arm File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -27809,6 +27809,55 @@ The '``llvm.masked.compressstore``' intrinsic is designed for compressing data i | |
|
|
||
| Other targets may support this intrinsic differently, for example, by lowering it into a sequence of branches that guard scalar store operations. | ||
|
|
||
| .. _int_mloadff: | ||
|
|
||
| '``llvm.masked.load.ff.*``' Intrinsics | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| Syntax: | ||
| """"""" | ||
| This is an overloaded intrinsic. The loaded data is a vector of any integer, | ||
| floating-point or pointer data type. | ||
|
|
||
| :: | ||
|
|
||
| declare { <16 x float>, <16 x i1> } @llvm.masked.load.ff.v16f32.p0(ptr <ptr>, <16 x i1> <mask>) | ||
| declare { <2 x double>, <2 x i1> } @llvm.masked.load.ff.v2f64.p0(ptr <ptr>, <2 x i1> <mask>) | ||
| ;; The data is a vector of pointers | ||
| declare { <8 x ptr>, <8 x i1> } @llvm.masked.load.ff.v8p0.p0(ptr align 8 <ptr>, <8 x i1> <mask>) | ||
|
|
||
| Overview: | ||
| """"""""" | ||
|
|
||
| Reads a vector from memory according to the provided mask, suppressing faults | ||
| for any lane beyond the first. The mask holds a bit for each vector lane, and | ||
| is used to prevent memory accesses to the masked-off lanes. | ||
|
|
||
| Returns the loaded data and a mask indicating which lanes are valid, which may | ||
| not be the same as the input mask depending on whether the processor encountered | ||
| a reason to avoid loading from that address. Invalid lanes contain poison | ||
| values. | ||
|
|
||
| Arguments: | ||
| """""""""" | ||
|
|
||
| The first argument is the base pointer for the load. The second argument, mask, | ||
| is a vector of boolean values with the same number of elements as the return | ||
| type. | ||
|
|
||
| The :ref:`align <attr_align>` parameter attribute can be provided for the first | ||
| argument. | ||
|
|
||
| Semantics: | ||
| """""""""" | ||
|
|
||
| The '``llvm.masked.load.ff``' intrinsic is very similar to the | ||
| '``llvm.vp.load.ff``' intrinsic, with the differences being the lack of an EVL | ||
| parameter and the second returned value being a mask instead of an updated EVL | ||
| value. | ||
|
|
||
| If the processor suppresses a fault for any lane, then the returned mask will | ||
| indicate that lane and all subsequent lanes are inactive. | ||
|
Comment on lines
+27859
to
+27860
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this mean that the mask is a prefix mask? I.e. after the first set bit, every bit after will be set. I think we will need to depend on that property in the loop vectorizer. Does SVE guarantee that? |
||
|
|
||
| Memory Use Markers | ||
| ------------------ | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
108 changes: 108 additions & 0 deletions
108
llvm/test/CodeGen/AArch64/masked-load-first-faulting.ll
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,108 @@ | ||
| ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 | ||
| ; RUN: llc -O3 -mtriple=aarch64-linux-gnu < %s | FileCheck %s --check-prefix=NEON | ||
| ; RUN: llc -O3 -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s --check-prefix=SVE | ||
| ; RUN: llc -O3 -mtriple=aarch64-linux-gnu -mattr=+sme -force-streaming < %s | FileCheck %s --check-prefix=SME_STREAMING | ||
|
|
||
| define { <4 x i32>, <4 x i1> } @load_ff_v4i32(ptr %p, <4 x i1> %mask) { | ||
| ; NEON-LABEL: load_ff_v4i32: | ||
| ; NEON: // %bb.0: | ||
| ; NEON-NEXT: // kill: def $d0 killed $d0 def $q0 | ||
| ; NEON-NEXT: umov w8, v0.h[0] | ||
| ; NEON-NEXT: tbz w8, #0, .LBB0_2 | ||
| ; NEON-NEXT: // %bb.1: // %load.ff.first.lane | ||
| ; NEON-NEXT: mov w8, #1 // =0x1 | ||
| ; NEON-NEXT: ldr s0, [x0] | ||
| ; NEON-NEXT: fmov d1, x8 | ||
| ; NEON-NEXT: // kill: def $d1 killed $d1 killed $q1 | ||
| ; NEON-NEXT: ret | ||
| ; NEON-NEXT: .LBB0_2: | ||
| ; NEON-NEXT: movi v1.2d, #0000000000000000 | ||
| ; NEON-NEXT: // implicit-def: $q0 | ||
| ; NEON-NEXT: // kill: def $d1 killed $d1 killed $q1 | ||
| ; NEON-NEXT: ret | ||
| ; | ||
| ; SVE-LABEL: load_ff_v4i32: | ||
| ; SVE: // %bb.0: | ||
| ; SVE-NEXT: // kill: def $d0 killed $d0 def $q0 | ||
| ; SVE-NEXT: umov w8, v0.h[0] | ||
| ; SVE-NEXT: tbz w8, #0, .LBB0_2 | ||
| ; SVE-NEXT: // %bb.1: // %load.ff.first.lane | ||
| ; SVE-NEXT: mov w8, #1 // =0x1 | ||
| ; SVE-NEXT: ldr s0, [x0] | ||
| ; SVE-NEXT: fmov d1, x8 | ||
| ; SVE-NEXT: // kill: def $d1 killed $d1 killed $q1 | ||
| ; SVE-NEXT: ret | ||
| ; SVE-NEXT: .LBB0_2: | ||
| ; SVE-NEXT: movi v1.2d, #0000000000000000 | ||
| ; SVE-NEXT: // implicit-def: $q0 | ||
| ; SVE-NEXT: // kill: def $d1 killed $d1 killed $q1 | ||
| ; SVE-NEXT: ret | ||
| ; | ||
| ; SME_STREAMING-LABEL: load_ff_v4i32: | ||
| ; SME_STREAMING: // %bb.0: | ||
| ; SME_STREAMING-NEXT: fmov w8, s0 | ||
| ; SME_STREAMING-NEXT: tbz w8, #0, .LBB0_2 | ||
| ; SME_STREAMING-NEXT: // %bb.1: // %load.ff.first.lane | ||
| ; SME_STREAMING-NEXT: ptrue p0.s | ||
| ; SME_STREAMING-NEXT: adrp x8, .LCPI0_1 | ||
| ; SME_STREAMING-NEXT: ldr d1, [x8, :lo12:.LCPI0_1] | ||
| ; SME_STREAMING-NEXT: ld1rw { z0.s }, p0/z, [x0] | ||
| ; SME_STREAMING-NEXT: ret | ||
| ; SME_STREAMING-NEXT: .LBB0_2: | ||
| ; SME_STREAMING-NEXT: mov z1.h, #0 // =0x0 | ||
| ; SME_STREAMING-NEXT: adrp x8, .LCPI0_0 | ||
| ; SME_STREAMING-NEXT: ldr q0, [x8, :lo12:.LCPI0_0] | ||
| ; SME_STREAMING-NEXT: ret | ||
| %res = call { <4 x i32>, <4 x i1> } @llvm.masked.load.ff(ptr align 16 %p, <4 x i1> %mask) | ||
| ret { <4 x i32>, <4 x i1> } %res | ||
| } | ||
|
|
||
| define { <2 x double>, <2 x i1> } @load_ff_v2f64_all_true_fully_aligned(ptr %p) { | ||
| ; NEON-LABEL: load_ff_v2f64_all_true_fully_aligned: | ||
| ; NEON: // %bb.0: // %load.ff.first.lane | ||
| ; NEON-NEXT: mov w8, #1 // =0x1 | ||
| ; NEON-NEXT: ldr d0, [x0] | ||
| ; NEON-NEXT: fmov d1, x8 | ||
| ; NEON-NEXT: ret | ||
| ; | ||
| ; SVE-LABEL: load_ff_v2f64_all_true_fully_aligned: | ||
| ; SVE: // %bb.0: // %load.ff.first.lane | ||
| ; SVE-NEXT: ldr d0, [x0] | ||
| ; SVE-NEXT: index z1.s, #1, #-1 | ||
| ; SVE-NEXT: // kill: def $d1 killed $d1 killed $z1 | ||
| ; SVE-NEXT: ret | ||
| ; | ||
| ; SME_STREAMING-LABEL: load_ff_v2f64_all_true_fully_aligned: | ||
| ; SME_STREAMING: // %bb.0: // %load.ff.first.lane | ||
| ; SME_STREAMING-NEXT: ptrue p0.d | ||
| ; SME_STREAMING-NEXT: index z1.s, #1, #-1 | ||
| ; SME_STREAMING-NEXT: ld1rd { z0.d }, p0/z, [x0] | ||
| ; SME_STREAMING-NEXT: ret | ||
| %res = call { <2 x double>, <2 x i1> } @llvm.masked.load.ff(ptr align 16 %p, <2 x i1> <i1 true, i1 true>) | ||
| ret { <2 x double>, <2 x i1> } %res | ||
| } | ||
|
|
||
| define { <2 x double>, <2 x i1> } @load_ff_v2f64_all_true_partially_aligned(ptr %p) { | ||
| ; NEON-LABEL: load_ff_v2f64_all_true_partially_aligned: | ||
| ; NEON: // %bb.0: // %load.ff.first.lane | ||
| ; NEON-NEXT: mov w8, #1 // =0x1 | ||
| ; NEON-NEXT: ldr d0, [x0] | ||
| ; NEON-NEXT: fmov d1, x8 | ||
| ; NEON-NEXT: ret | ||
| ; | ||
| ; SVE-LABEL: load_ff_v2f64_all_true_partially_aligned: | ||
| ; SVE: // %bb.0: // %load.ff.first.lane | ||
| ; SVE-NEXT: ldr d0, [x0] | ||
| ; SVE-NEXT: index z1.s, #1, #-1 | ||
| ; SVE-NEXT: // kill: def $d1 killed $d1 killed $z1 | ||
| ; SVE-NEXT: ret | ||
| ; | ||
| ; SME_STREAMING-LABEL: load_ff_v2f64_all_true_partially_aligned: | ||
| ; SME_STREAMING: // %bb.0: // %load.ff.first.lane | ||
| ; SME_STREAMING-NEXT: ptrue p0.d | ||
| ; SME_STREAMING-NEXT: index z1.s, #1, #-1 | ||
| ; SME_STREAMING-NEXT: ld1rd { z0.d }, p0/z, [x0] | ||
| ; SME_STREAMING-NEXT: ret | ||
| %res = call { <2 x double>, <2 x i1> } @llvm.masked.load.ff(ptr align 8 %p, <2 x i1> <i1 true, i1 true>) | ||
| ret { <2 x double>, <2 x i1> } %res | ||
| } |
82 changes: 82 additions & 0 deletions
82
llvm/test/Transforms/ScalarizeMaskedMemIntrin/expand-masked-load-first-fault.ll
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,82 @@ | ||
| ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6 | ||
| ; RUN: opt -p scalarize-masked-mem-intrin -S < %s | FileCheck %s | ||
|
|
||
| define { <4 x i32>, <4 x i1> } @load_ff_v4i32(ptr %p, <4 x i1> %mask) { | ||
| ; CHECK-LABEL: define { <4 x i32>, <4 x i1> } @load_ff_v4i32( | ||
| ; CHECK-SAME: ptr [[P:%.*]], <4 x i1> [[MASK:%.*]]) { | ||
| ; CHECK-NEXT: [[FIRST_ACTIVE:%.*]] = extractelement <4 x i1> [[MASK]], i64 0 | ||
| ; CHECK-NEXT: br i1 [[FIRST_ACTIVE]], label %[[LOAD_FF_FIRST_LANE:.*]], label %[[LOAD_FF_RESULT:.*]] | ||
| ; CHECK: [[LOAD_FF_FIRST_LANE]]: | ||
| ; CHECK-NEXT: [[TMP1:%.*]] = load i32, ptr [[P]], align 16 | ||
| ; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> poison, i32 [[TMP1]], i64 0 | ||
| ; CHECK-NEXT: br label %[[LOAD_FF_RESULT]] | ||
| ; CHECK: [[LOAD_FF_RESULT]]: | ||
| ; CHECK-NEXT: [[TMP3:%.*]] = phi <4 x i32> [ poison, [[TMP0:%.*]] ], [ [[TMP2]], %[[LOAD_FF_FIRST_LANE]] ] | ||
| ; CHECK-NEXT: [[TMP4:%.*]] = phi <4 x i1> [ zeroinitializer, [[TMP0]] ], [ <i1 true, i1 false, i1 false, i1 false>, %[[LOAD_FF_FIRST_LANE]] ] | ||
| ; CHECK-NEXT: [[TMP5:%.*]] = insertvalue { <4 x i32>, <4 x i1> } poison, <4 x i32> [[TMP3]], 0 | ||
| ; CHECK-NEXT: [[RES_FIRST_LANE_ONLY:%.*]] = insertvalue { <4 x i32>, <4 x i1> } [[TMP5]], <4 x i1> [[TMP4]], 1 | ||
| ; CHECK-NEXT: ret { <4 x i32>, <4 x i1> } [[RES_FIRST_LANE_ONLY]] | ||
| ; | ||
| %res = call { <4 x i32>, <4 x i1> } @llvm.masked.load.ff(ptr align 16 %p, <4 x i1> %mask) | ||
| ret { <4 x i32>, <4 x i1> } %res | ||
| } | ||
|
|
||
| ;; We can 'scalarize' first faulting loads for scalable vectors, since we only | ||
| ;; need to insert a single element into the start of a poison splat vector. | ||
| define { <vscale x 4 x i32>, <vscale x 4 x i1> } @load_ff_nxv4i32(ptr %p, <vscale x 4 x i1> %mask) { | ||
| ; CHECK-LABEL: define { <vscale x 4 x i32>, <vscale x 4 x i1> } @load_ff_nxv4i32( | ||
| ; CHECK-SAME: ptr [[P:%.*]], <vscale x 4 x i1> [[MASK:%.*]]) { | ||
| ; CHECK-NEXT: [[FIRST_ACTIVE:%.*]] = extractelement <vscale x 4 x i1> [[MASK]], i64 0 | ||
| ; CHECK-NEXT: br i1 [[FIRST_ACTIVE]], label %[[LOAD_FF_FIRST_LANE:.*]], label %[[LOAD_FF_RESULT:.*]] | ||
| ; CHECK: [[LOAD_FF_FIRST_LANE]]: | ||
| ; CHECK-NEXT: [[TMP1:%.*]] = load i32, ptr [[P]], align 16 | ||
| ; CHECK-NEXT: [[TMP2:%.*]] = insertelement <vscale x 4 x i32> poison, i32 [[TMP1]], i64 0 | ||
| ; CHECK-NEXT: br label %[[LOAD_FF_RESULT]] | ||
| ; CHECK: [[LOAD_FF_RESULT]]: | ||
| ; CHECK-NEXT: [[TMP3:%.*]] = phi <vscale x 4 x i32> [ poison, [[TMP0:%.*]] ], [ [[TMP2]], %[[LOAD_FF_FIRST_LANE]] ] | ||
| ; CHECK-NEXT: [[TMP4:%.*]] = phi <vscale x 4 x i1> [ zeroinitializer, [[TMP0]] ], [ insertelement (<vscale x 4 x i1> zeroinitializer, i1 true, i64 0), %[[LOAD_FF_FIRST_LANE]] ] | ||
| ; CHECK-NEXT: [[TMP5:%.*]] = insertvalue { <vscale x 4 x i32>, <vscale x 4 x i1> } poison, <vscale x 4 x i32> [[TMP3]], 0 | ||
| ; CHECK-NEXT: [[RES:%.*]] = insertvalue { <vscale x 4 x i32>, <vscale x 4 x i1> } [[TMP5]], <vscale x 4 x i1> [[TMP4]], 1 | ||
| ; CHECK-NEXT: ret { <vscale x 4 x i32>, <vscale x 4 x i1> } [[RES]] | ||
| ; | ||
| %res = call { <vscale x 4 x i32>, <vscale x 4 x i1> } @llvm.masked.load.ff(ptr align 16 %p, <vscale x 4 x i1> %mask) | ||
| ret { <vscale x 4 x i32>, <vscale x 4 x i1> } %res | ||
| } | ||
|
|
||
| define { <2 x double>, <2 x i1> } @load_ff_v2f64_all_true(ptr %p) { | ||
| ; CHECK-LABEL: define { <2 x double>, <2 x i1> } @load_ff_v2f64_all_true( | ||
| ; CHECK-SAME: ptr [[P:%.*]]) { | ||
| ; CHECK-NEXT: br i1 true, label %[[LOAD_FF_FIRST_LANE:.*]], label %[[LOAD_FF_RESULT:.*]] | ||
| ; CHECK: [[LOAD_FF_FIRST_LANE]]: | ||
| ; CHECK-NEXT: [[TMP1:%.*]] = load double, ptr [[P]], align 16 | ||
| ; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[TMP1]], i64 0 | ||
| ; CHECK-NEXT: br label %[[LOAD_FF_RESULT]] | ||
| ; CHECK: [[LOAD_FF_RESULT]]: | ||
| ; CHECK-NEXT: [[TMP3:%.*]] = phi <2 x double> [ poison, [[TMP0:%.*]] ], [ [[TMP2]], %[[LOAD_FF_FIRST_LANE]] ] | ||
| ; CHECK-NEXT: [[TMP4:%.*]] = phi <2 x i1> [ zeroinitializer, [[TMP0]] ], [ <i1 true, i1 false>, %[[LOAD_FF_FIRST_LANE]] ] | ||
| ; CHECK-NEXT: [[TMP5:%.*]] = insertvalue { <2 x double>, <2 x i1> } poison, <2 x double> [[TMP3]], 0 | ||
| ; CHECK-NEXT: [[RES_FIRST_LANE_ONLY:%.*]] = insertvalue { <2 x double>, <2 x i1> } [[TMP5]], <2 x i1> [[TMP4]], 1 | ||
| ; CHECK-NEXT: ret { <2 x double>, <2 x i1> } [[RES_FIRST_LANE_ONLY]] | ||
| ; | ||
| %res = call { <2 x double>, <2 x i1> } @llvm.masked.load.ff(ptr align 16 %p, <2 x i1> <i1 true, i1 true>) | ||
| ret { <2 x double>, <2 x i1> } %res | ||
| } | ||
|
|
||
| define { <16 x i16>, <16 x i1> } @load_ff_v16i16_all_false(ptr %p) { | ||
| ; CHECK-LABEL: define { <16 x i16>, <16 x i1> } @load_ff_v16i16_all_false( | ||
| ; CHECK-SAME: ptr [[P:%.*]]) { | ||
| ; CHECK-NEXT: br i1 false, label %[[LOAD_FF_FIRST_LANE:.*]], label %[[LOAD_FF_RESULT:.*]] | ||
| ; CHECK: [[LOAD_FF_FIRST_LANE]]: | ||
| ; CHECK-NEXT: [[TMP1:%.*]] = load i16, ptr [[P]], align 32 | ||
| ; CHECK-NEXT: [[TMP2:%.*]] = insertelement <16 x i16> poison, i16 [[TMP1]], i64 0 | ||
| ; CHECK-NEXT: br label %[[LOAD_FF_RESULT]] | ||
| ; CHECK: [[LOAD_FF_RESULT]]: | ||
| ; CHECK-NEXT: [[TMP3:%.*]] = phi <16 x i16> [ poison, [[TMP0:%.*]] ], [ [[TMP2]], %[[LOAD_FF_FIRST_LANE]] ] | ||
| ; CHECK-NEXT: [[TMP4:%.*]] = phi <16 x i1> [ zeroinitializer, [[TMP0]] ], [ <i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>, %[[LOAD_FF_FIRST_LANE]] ] | ||
| ; CHECK-NEXT: [[TMP5:%.*]] = insertvalue { <16 x i16>, <16 x i1> } poison, <16 x i16> [[TMP3]], 0 | ||
| ; CHECK-NEXT: [[RES_FIRST_LANE_ONLY:%.*]] = insertvalue { <16 x i16>, <16 x i1> } [[TMP5]], <16 x i1> [[TMP4]], 1 | ||
| ; CHECK-NEXT: ret { <16 x i16>, <16 x i1> } [[RES_FIRST_LANE_ONLY]] | ||
| ; | ||
| %res = call { <16 x i16>, <16 x i1> } @llvm.masked.load.ff(ptr align 32 %p, <16 x i1> zeroinitializer) | ||
| ret { <16 x i16>, <16 x i1> } %res | ||
| } |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that VP intrinsics are usually defined as a "layer above" masked intrinsics, should we instead modify the documentation llvm.vp.load.ff to mention it's similar to llvm.masked.load.ff instead?