-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support opaque pointers #177
Comments
I did some more research into opaque pointers recently. Here are some highlights:
|
Continuing from #177 (comment) ...
|
Given the complexity, is there any way to test the opaque pointer support? |
I'm not quite sure I understand your question, but we can test instructions that use opaque pointers just like any other instruction. We can test the parsing aspects of opaque pointers in |
I just meant that it sounds like you are changing a fair bit of code for this, so I was prompting you to think about how to test all that new code. I think you're right, the most important part is testing the semantics, and we need some test cases in |
The best thing to do would be to figure out all of the instructions that intersect with opaque pointers in some way, such as |
Another challenge that I've encountered while trying to make a simple test case involving opaque pointers work: how do we handle mixing opaque pointers with non-opaque pointers? At the LLVM level, this is forbidden: a bitcode file can either have solely opaque pointers or solely non-opaque pointers, but it is an error to combine the two. It's not so straightforward to enforce this in
There are sometimes situations where define void @f() { ... }
define void @g() {
%1 = alloca ptr, align 8
store ptr @f, ptr %1, align 8
...
} The type of After discussing this with others, the approach that I am going to use to work around this issue is to define a coarser notion of type equality. This will behave like the
There are also some trickier cases to consider, such as this code, which uses |
TODO RGS: Add test case This is necessary in order to `load` from an opaque pointer. See #177.
TODO RGS: Test cases This is necessary in order to use `getelementptr` on an opaque pointer. See #177.
In the same vein as GaloisInc/llvm-pretty#102 (comment), I did a quick audit of the code in
|
One surprising thing that I recently discovered about mixing opaque and non-opaque pointers is that LLVM will often optimize opaque-pointer bitcode in a way that would make it ill-typed in a non-opaque-pointer setting. Here is an example of this in action: #include <stdint.h>
#include <stdio.h>
void __attribute__((noinline)) f(uint8_t x[2][2]) {
x[1][0] = 42;
}
int main(void) {
uint8_t arr[2][2] = {{0, 0}, {0, 0}};
f(arr);
printf("%d\n", arr[1][0]);
return 0;
} If you compile this with @.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1
; Function Attrs: argmemonly mustprogress nofree noinline norecurse nosync nounwind willreturn writeonly uwtable
define dso_local void @f(ptr nocapture noundef writeonly %0) local_unnamed_addr #0 {
%2 = getelementptr inbounds [2 x i8], ptr %0, i64 1
store i8 42, ptr %2, align 1, !tbaa !5
ret void
}
; Function Attrs: nofree nounwind uwtable
define dso_local i32 @main() local_unnamed_addr #1 {
%1 = alloca [2 x [2 x i8]], align 4
call void @llvm.lifetime.start.p0(i64 4, ptr nonnull %1) #4
store i32 0, ptr %1, align 4
call void @f(ptr noundef nonnull %1)
%2 = getelementptr inbounds [2 x [2 x i8]], ptr %1, i64 0, i64 1
%3 = load i8, ptr %2, align 2, !tbaa !5
%4 = zext i8 %3 to i32
%5 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull @.str, i32 noundef %4)
call void @llvm.lifetime.end.p0(i64 4, ptr nonnull %1) #4
ret i32 0
} This bitcode is fine on its own, but it interacts strangely with how %1 = alloca [2 x [2 x i8]], align 4
...
store i32 0, ptr %1, align 4 The call to
That is, the second argument's pointee type had to match the type of the first argument. Now let's look at the preceding call to While prototyping a fix for #177, I kept these old typing rules in place, thinking that opaque and non-opaque pointers could fully coexist without issues. But this is not quite true, and the example above proves it. Let's walk through how this example would be typechecked under the previous typing rules:
Why didn't this issue arise before opaque pointers were a thing? To see why, compare the bitcode when the example above is compiled with @.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1
; Function Attrs: mustprogress nofree noinline norecurse nosync nounwind uwtable willreturn writeonly
define dso_local void @f([2 x i8]* nocapture noundef writeonly %0) local_unnamed_addr #0 {
%2 = getelementptr inbounds [2 x i8], [2 x i8]* %0, i64 1, i64 0
store i8 42, i8* %2, align 1, !tbaa !3
ret void
}
; Function Attrs: nofree nounwind uwtable
define dso_local i32 @main() local_unnamed_addr #1 {
%1 = alloca i32, align 4
%2 = bitcast i32* %1 to [2 x [2 x i8]]*
%3 = bitcast i32* %1 to i8*
call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %3) #4
store i32 0, i32* %1, align 4
%4 = bitcast i32* %1 to [2 x i8]*
call void @f([2 x i8]* noundef nonnull %4)
%5 = getelementptr inbounds [2 x [2 x i8]], [2 x [2 x i8]]* %2, i64 0, i64 1, i64 0
%6 = load i8, i8* %5, align 2, !tbaa !3
%7 = zext i8 %6 to i32
%8 = call i32 (i8*, ...) @printf(i8* noundef nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), i32 noundef %7)
call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %3) #4
ret i32 0
} In particular, the first lines of %1 = alloca i32, align 4
%2 = bitcast i32* %1 to [2 x [2 x i8]]*
%3 = bitcast i32* %1 to i8*
...
store i32 0, i32* %1, align 4 This time, LLVM is very careful to have The moral of the story: pointee types are a convenient fiction, and with opaque pointers, LLVM is free to optimize code in a way that doesn't make sense in a non-opaque-pointer setting. What should we do about this? As noted above, we don't want to outright drop support for non-opaque pointers for various reasons. Yet this issue exposes a rather fundamental tension that arises when mixing opaque and non-opaque-pointers. My initial reaction is that the most straightforward solution would be to relax some of the non-opaque-pointer checks that |
This bumps the `llvm-pretty` submodule to bring in the changes to the `Load` data constructor from GaloisInc/llvm-pretty#110 and adapts the code in `llvm-pretty-bc-parser` accordingly. This is necessary in order to `load` from an opaque pointer. See #177. A test case will be added in a subsequent commit.
This bumps the `llvm-pretty` submodule to bring in the changes to the `GEP`/`ConstGEP` data constructors from GaloisInc/llvm-pretty#110 and adapts the code in `llvm-pretty-bc-parser` accordingly. Because `ConstGEP` now stores the basis type for calculations explicitly, I needed to fix #218 in order to ensure that the basis type is always parsed properly. In the process of fixing this issue, I refactored the `parseCeGep` to make the code clearer and more closely mirror the structure of LLVM's own bitcode parser. This is necessary in order to use `getelementptr` on an opaque pointer. See #177. A test case will be added in a subsequent commit.
This adds the bare minimum needed to parse `PtrOpaque` (opaque pointer) types. See #177. Other instructions will need to be tweaked in order to account for the possibility of opaque pointer arguments, but this will happen in subsequent commits.
As explained in the new `Note [Pointers and pointee types]`, we cannot inspect `PtrTo` pointee types if we simultaneously support opaque pointers. The `ptrTo` and `baseType` functions fundamentally rely on this, and as such, they have been removed. They are ultimately used in service of implementing assertions, so removing them is fairly straightforward. See #177. The `elimPtrTo` and `elimPtrTo_` functions also inspect pointee types, but they are required to support old versions of LLVM that do not store the necessary type information in the instructions that need them. In subsequent commits, I will ensure that all uses of `elimPtrTo`/`elimPtrTo_` are appropriately guarded such that they will not be used on modern versions of LLVM bitcode.
See `Note [Pointers and pointee types]` for the rationale. See also #177.
See `Note [Pointers and pointee types]` for the rationale. See also #177.
See `Note [Pointers and pointee types]` for the rationale. See also #177.
Recent versions of the `FUNC_CODE_INST_ATOMICRMW` instruction code directly store the type corresponding to the pointer argument, which avoids the need to pattern-match on the pointer type. This is required to support opaque pointers. See #177. Older versions of the instruction (`FUNC_CODE_INST_ATOMICRMW_OLD`) do not store this type directly, so there were have no choice but to inspect the pointee type using `elimPtrTo`.
Because `llvm-pretty` permits opaque and non-opaque pointers to coexist, it is possible for the first argument of `cmpxchg` to be an opaque pointer and the second argument to be a non-opaque pointer (or vice versa). We don't want to reject such scenarios, so we compare the types of the argument using `eqTypeModuloOpaquePtrs`, a special form of type equality that treats opaque and non-opaque pointers as being the same. See #177. This requires bumping the `llvm-pretty` submodule to bring in the corresponding changes from GaloisInc/llvm-pretty#110.
…ointers This bumps the `llvm-pretty` submodule to bring in the `fixupOpaquePtrs` function from GaloisInc/llvm-pretty#110 and use it in the `disasm-test` test suite. This is needed because we must give pretty-printed `llvm-pretty` ASTs to `llvm-as`, which strictly forbids mixing opaque and non-opaque pointers. See #177.
Recent versions of the `FUNC_CODE_INST_ATOMICRMW` instruction code directly store the type corresponding to the pointer argument, which avoids the need to pattern-match on the pointer type. This is required to support opaque pointers. See #177. Older versions of the instruction (`FUNC_CODE_INST_ATOMICRMW_OLD`) do not store this type directly, so there were have no choice but to inspect the pointee type using `elimPtrTo`.
Because `llvm-pretty` permits opaque and non-opaque pointers to coexist, it is possible for the first argument of `cmpxchg` to be an opaque pointer and the second argument to be a non-opaque pointer (or vice versa). We don't want to reject such scenarios, so we compare the types of the argument using `eqTypeModuloOpaquePtrs`, a special form of type equality that treats opaque and non-opaque pointers as being the same. See #177. This requires bumping the `llvm-pretty` submodule to bring in the corresponding changes from GaloisInc/llvm-pretty#110.
…ointers This bumps the `llvm-pretty` submodule to bring in the `fixupOpaquePtrs` function from GaloisInc/llvm-pretty#110 and use it in the `disasm-test` test suite. This is needed because we must give pretty-printed `llvm-pretty` ASTs to `llvm-as`, which strictly forbids mixing opaque and non-opaque pointers. See #177.
As explained in the new `Note [Pointers and pointee types]`, we cannot inspect `PtrTo` pointee types if we simultaneously support opaque pointers. The `ptrTo` and `baseType` functions fundamentally rely on this, and as such, they have been removed. They are ultimately used in service of implementing assertions, so removing them is fairly straightforward. See #177. The `elimPtrTo` and `elimPtrTo_` functions also inspect pointee types, but they are required to support old versions of LLVM that do not store the necessary type information in the instructions that need them. In subsequent commits, I will ensure that all uses of `elimPtrTo`/`elimPtrTo_` are appropriately guarded such that they will not be used on modern versions of LLVM bitcode.
See `Note [Pointers and pointee types]` for the rationale. See also #177.
Recent versions of the `FUNC_CODE_INST_ATOMICRMW` instruction code directly store the type corresponding to the pointer argument, which avoids the need to pattern-match on the pointer type. This is required to support opaque pointers. See #177. Older versions of the instruction (`FUNC_CODE_INST_ATOMICRMW_OLD`) do not store this type directly, so there were have no choice but to inspect the pointee type using `elimPtrTo`.
Because `llvm-pretty` permits opaque and non-opaque pointers to coexist, it is possible for the first argument of `cmpxchg` to be an opaque pointer and the second argument to be a non-opaque pointer (or vice versa). We don't want to reject such scenarios, so we compare the types of the argument using `eqTypeModuloOpaquePtrs`, a special form of type equality that treats opaque and non-opaque pointers as being the same. See #177. This requires bumping the `llvm-pretty` submodule to bring in the corresponding changes from GaloisInc/llvm-pretty#110.
…ointers This bumps the `llvm-pretty` submodule to bring in the `fixupOpaquePtrs` function from GaloisInc/llvm-pretty#110 and use it in the `disasm-test` test suite. This is needed because we must give pretty-printed `llvm-pretty` ASTs to `llvm-as`, which strictly forbids mixing opaque and non-opaque pointers. See #177.
This bumps the `llvm-pretty` submodule to bring in the changes to the `Load` data constructor from GaloisInc/llvm-pretty#110 and adapts the code in `llvm-pretty-bc-parser` accordingly. This is necessary in order to `load` from an opaque pointer. See #177. A test case will be added in a subsequent commit.
This bumps the `llvm-pretty` submodule to bring in the changes to the `GEP`/`ConstGEP` data constructors from GaloisInc/llvm-pretty#110 and adapts the code in `llvm-pretty-bc-parser` accordingly. Because `ConstGEP` now stores the basis type for calculations explicitly, I needed to fix #218 in order to ensure that the basis type is always parsed properly. In the process of fixing this issue, I refactored the `parseCeGep` to make the code clearer and more closely mirror the structure of LLVM's own bitcode parser. This is necessary in order to use `getelementptr` on an opaque pointer. See #177. A test case will be added in a subsequent commit.
This adds the bare minimum needed to parse `PtrOpaque` (opaque pointer) types. See #177. Other instructions will need to be tweaked in order to account for the possibility of opaque pointer arguments, but this will happen in subsequent commits.
As explained in the new `Note [Pointers and pointee types]`, we cannot inspect `PtrTo` pointee types if we simultaneously support opaque pointers. The `ptrTo` and `baseType` functions fundamentally rely on this, and as such, they have been removed. They are ultimately used in service of implementing assertions, so removing them is fairly straightforward. See #177. The `elimPtrTo` and `elimPtrTo_` functions also inspect pointee types, but they are required to support old versions of LLVM that do not store the necessary type information in the instructions that need them. In subsequent commits, I will ensure that all uses of `elimPtrTo`/`elimPtrTo_` are appropriately guarded such that they will not be used on modern versions of LLVM bitcode.
See `Note [Pointers and pointee types]` for the rationale. See also #177.
Recent versions of the `FUNC_CODE_INST_ATOMICRMW` instruction code directly store the type corresponding to the pointer argument, which avoids the need to pattern-match on the pointer type. This is required to support opaque pointers. See #177. Older versions of the instruction (`FUNC_CODE_INST_ATOMICRMW_OLD`) do not store this type directly, so there were have no choice but to inspect the pointee type using `elimPtrTo`.
Because `llvm-pretty` permits opaque and non-opaque pointers to coexist, it is possible for the first argument of `cmpxchg` to be an opaque pointer and the second argument to be a non-opaque pointer (or vice versa). We don't want to reject such scenarios, so we compare the types of the argument using `eqTypeModuloOpaquePtrs`, a special form of type equality that treats opaque and non-opaque pointers as being the same. See #177. This requires bumping the `llvm-pretty` submodule to bring in the corresponding changes from GaloisInc/llvm-pretty#110.
…ointers This bumps the `llvm-pretty` submodule to bring in the `fixupOpaquePtrs` function from GaloisInc/llvm-pretty#110 and use it in the `disasm-test` test suite. This is needed because we must give pretty-printed `llvm-pretty` ASTs to `llvm-as`, which strictly forbids mixing opaque and non-opaque pointers. See #177.
This bumps the `llvm-pretty` submodule to bring in the changes to the `Load` data constructor from GaloisInc/llvm-pretty#110 and adapts the code in `llvm-pretty-bc-parser` accordingly. This is necessary in order to `load` from an opaque pointer. See #177. A test case will be added in a subsequent commit.
This bumps the `llvm-pretty` submodule to bring in the changes to the `GEP`/`ConstGEP` data constructors from GaloisInc/llvm-pretty#110 and adapts the code in `llvm-pretty-bc-parser` accordingly. Because `ConstGEP` now stores the basis type for calculations explicitly, I needed to fix #218 in order to ensure that the basis type is always parsed properly. In the process of fixing this issue, I refactored the `parseCeGep` to make the code clearer and more closely mirror the structure of LLVM's own bitcode parser. This is necessary in order to use `getelementptr` on an opaque pointer. See #177. A test case will be added in a subsequent commit.
This adds the bare minimum needed to parse `PtrOpaque` (opaque pointer) types. See #177. Other instructions will need to be tweaked in order to account for the possibility of opaque pointer arguments, but this will happen in subsequent commits.
As explained in the new `Note [Pointers and pointee types]`, we cannot inspect `PtrTo` pointee types if we simultaneously support opaque pointers. The `ptrTo` and `baseType` functions fundamentally rely on this, and as such, they have been removed. They are ultimately used in service of implementing assertions, so removing them is fairly straightforward. See #177. The `elimPtrTo` and `elimPtrTo_` functions also inspect pointee types, but they are required to support old versions of LLVM that do not store the necessary type information in the instructions that need them. In subsequent commits, I will ensure that all uses of `elimPtrTo`/`elimPtrTo_` are appropriately guarded such that they will not be used on modern versions of LLVM bitcode.
See `Note [Pointers and pointee types]` for the rationale. See also #177.
Recent versions of the `FUNC_CODE_INST_ATOMICRMW` instruction code directly store the type corresponding to the pointer argument, which avoids the need to pattern-match on the pointer type. This is required to support opaque pointers. See #177. Older versions of the instruction (`FUNC_CODE_INST_ATOMICRMW_OLD`) do not store this type directly, so there were have no choice but to inspect the pointee type using `elimPtrTo`.
Because `llvm-pretty` permits opaque and non-opaque pointers to coexist, it is possible for the first argument of `cmpxchg` to be an opaque pointer and the second argument to be a non-opaque pointer (or vice versa). We don't want to reject such scenarios, so we compare the types of the argument using `eqTypeModuloOpaquePtrs`, a special form of type equality that treats opaque and non-opaque pointers as being the same. See #177. This requires bumping the `llvm-pretty` submodule to bring in the corresponding changes from GaloisInc/llvm-pretty#110.
…ointers This bumps the `llvm-pretty` submodule to bring in the `fixupOpaquePtrs` function from GaloisInc/llvm-pretty#110 and use it in the `disasm-test` test suite. This is needed because we must give pretty-printed `llvm-pretty` ASTs to `llvm-as`, which strictly forbids mixing opaque and non-opaque pointers. See #177.
LLVM is in the process of migrating all of its pointer types from
i8*
,i32*
, etc. to an opaqueptr
type, as described here. LLVM 13 takes an important step in that direction, as it is the first LLVM release to includeptr
in the LLVM AST. See llvm/llvm-project@2155dc5.Note, however, that ordinary
clang
invocations do not yet produce LLVM ASTs that useptr
yet. Becauseptr
is still an experimental feature, one must explicitly opt in to it. Here is one way to do this:llvm-pretty-bc-parser
is currently not equipped to handle this, unsurprisingly:While opaque pointers are still a work in progress, we may want to add basic support for parsing them in the meantime.
The text was updated successfully, but these errors were encountered: