Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate codegen to operate on orc::ThreadSafeModule #44440

Merged
merged 7 commits into from
Apr 1, 2022

Conversation

pchintalapudi
Copy link
Member

@pchintalapudi pchintalapudi commented Mar 4, 2022

orc::ThreadSafeModule and orc::ThreadSafeContext use reference-counted contexts to allow multiple contexts to be created and destroyed safely during codegen. Moving TSModule to be our API boundary gets us closer to parallelizing codegen without leaking contexts.

Depends on #43770 to remove the global debug information with associated global context #44454 to refactor a few globals.

@pchintalapudi pchintalapudi marked this pull request as draft March 4, 2022 02:45
@pchintalapudi pchintalapudi added the compiler:codegen Generation of LLVM IR and native code label Mar 4, 2022
@pchintalapudi
Copy link
Member Author

There's a few indentation TODOs left in this PR, which I've deliberately left out for now as they cause large whitespace diffs that confuse the diff viewer.

@pchintalapudi
Copy link
Member Author

Per an offline discussion with Valentin, we will change the interface of jl_create_native to accept a TSModule to codegen into rather than creating its own TSModule with a provided TSContext. This allows users of jl_create_native to specify their own data layout and targets for the destination module.

src/aotcompile.cpp Show resolved Hide resolved
@@ -132,16 +132,6 @@ GlobalValue* jl_get_llvm_function_impl(void *native_code, uint32_t idx)
return NULL;
}

extern "C" JL_DLLEXPORT
LLVMContext* jl_get_llvm_context_impl(void *native_code)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Victory

src/aotcompile.cpp Show resolved Hide resolved
src/aotcompile.cpp Outdated Show resolved Hide resolved
src/aotcompile.cpp Outdated Show resolved Hide resolved
@vtjnash vtjnash added the status:merge me PR is reviewed. Merge when all tests are passing label Mar 31, 2022
@giordano giordano merged commit 4422a1d into JuliaLang:master Apr 1, 2022
@giordano giordano removed the status:merge me PR is reviewed. Merge when all tests are passing label Apr 1, 2022
@aviatesk
Copy link
Sponsor Member

@gbaraldi bisected that this commit introduced the error reported at #45302.

@pchintalapudi
Copy link
Member Author

That's interesting, I wouldn't be surprised that there's a hidden bug given the size of the changeset but the changes to jl_merge_module aren't actually meaningful.

@pchintalapudi
Copy link
Member Author

Looks like at least the error in jl_merge_module during LoopVectorization tests is:

Attempting to replace ; Function Attrs: alwaysinline
declare linkonce_odr i32 @"julia_readraw_turbo!_14765u14814"(i32, i32) #4

with global function  ; Function Attrs: alwaysinline
define linkonce_odr i32 @"julia_readraw_turbo!_14765u14814"(i32 %0, i32 %1, i32 %2) #0 {
top:
  %res = call i32 @llvm.x86.bmi.bzhi.32(i32 %0, i32 %1)
  ret i32 %res
}

@pchintalapudi
Copy link
Member Author

And another one:

Attempting to replace ; Function Attrs: alwaysinline
declare linkonce_odr i32 @"julia_rshift_i_avx!_47472u47499"(i32, i32) #3

with global function  ; Function Attrs: alwaysinline
define linkonce_odr i32 @"julia_rshift_i_avx!_47472u47499"(i32 %0, i32 %1, i32 %2) #0 {
top:
  %res = call i32 @llvm.x86.bmi.bzhi.32(i32 %0, i32 %1)
  ret i32 %res
}

Looks like there's a common x86.bmi.bzhi intrinsic that's causing some issues between function definition/declaration

@pchintalapudi
Copy link
Member Author

Reduced to

using LoopVectorization
img1 = Matrix{UInt8}(undef, 10, 10)
raw = rand(UInt8, (30 * 10) ÷ 4)
function readraw_turbo!(img, raw)
         npack = length(raw) ÷ 3
         @turbo for i = 0:npack-1
           img[1+4i] = raw[2+3i] << 4
           img[2+4i] = raw[1+3i]
           img[3+4i] = raw[2+3i]
           img[4+4i] = raw[3+3i]
         end
         img
       end
readraw_turbo!(img1,raw)

@pchintalapudi
Copy link
Member Author

And again to

using LoopVectorization;img1 = Matrix{UInt8}(undef, 10, 10);raw = rand(UInt8, (30 * 10) ÷ 4);function readraw_turbo!(img, raw)
         npack = length(raw) ÷ 3
         @turbo for i = 0:npack-1
           img[1+4i] = raw[2+3i] << 4
         end
         img
       end;readraw_turbo!(img1,raw)

@pchintalapudi
Copy link
Member Author

I am now slightly suspicious of this line: https://github.com/JuliaSIMD/VectorizationBase.jl/blob/9ae47c5c12c7bec9b57c132c24f67c4f14090c9e/src/llvm_intrin/masks.jl#L336

llvmcall_expr(decl, instrs, T, :(Tuple{$T,$T}), typ, [typ, typ, typ], [:a, :b])

I'm not sure, but I'm guessing that the list should only have 2 typ, not 3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants