Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transpiler having troubles with broadcast #41

Closed
ChrisRackauckas opened this issue Aug 8, 2017 · 4 comments
Closed

Transpiler having troubles with broadcast #41

ChrisRackauckas opened this issue Aug 8, 2017 · 4 comments

Comments

@ChrisRackauckas
Copy link
Member

ChrisRackauckas commented Aug 8, 2017

uprev = GPUArray(rand(Float32, 32, 32))
k1 = GPUArray(rand(Float32, 32, 32))
k2 = GPUArray(rand(Float32, 32, 32))
k3 = GPUArray(rand(Float32, 32, 32))
k4 = GPUArray(rand(Float32, 32, 32))
dt = 1.2f0
b1 = 1.3f0
b2 = 1.4f0
b3 = 1.5f0
b4 = 1.6f0
utilde = similar(uprev)
@. utilde = uprev + dt*(b1*k1 + b2*k2 + b3*k3 + b4*k4)

OpenCL backend

@ChrisRackauckas
Copy link
Member Author

ChrisRackauckas commented Aug 8, 2017

Here's another expression where it has trouble:

@. u = uprev + dt*duprev + dt^2*(1//2*ku)

where dt is a scalar and the rest are matching GPUArrays.

@SimonDanisch
Copy link
Member

The first one is pretty weird...
Seems to all work fine, but the return type of the broadcast lambda doesn't seem to propagate.

using Transpiler, Sugar, GPUArrays
lambda = getfield(Main, Symbol("##7#8")) # taken from backtrace
m = Transpiler.CLMethod((GPUArrays.broadcast_kernel!, Tuple{
    Float32, lambda,
    Transpiler.CLIntrinsics.CLArray{Float32,2}, Tuple{UInt32,UInt32}, UInt32,
    Tuple{GPUArrays.BroadcastDescriptorN{Array,2},GPUArrays.BroadcastDescriptorN{Any,0},
    GPUArrays.BroadcastDescriptorN{Any,0},GPUArrays.BroadcastDescriptorN{Array,2},
    GPUArrays.BroadcastDescriptorN{Any,0},GPUArrays.BroadcastDescriptorN{Array,2},
    GPUArrays.BroadcastDescriptorN{Any,0},GPUArrays.BroadcastDescriptorN{Array,2},
    GPUArrays.BroadcastDescriptorN{Any,0},GPUArrays.BroadcastDescriptorN{Array,2}},
    Transpiler.CLIntrinsics.CLArray{Float32,2},
    Float32,
    Float32,
    Transpiler.CLIntrinsics.CLArray{Float32,2},
    Float32,
    Transpiler.CLIntrinsics.CLArray{Float32,2},
    Float32, Transpiler.CLIntrinsics.CLArray{Float32,2},
    Float32, Transpiler.CLIntrinsics.CLArray{Float32,2}
}))

ci = code_typed(m.signature..., optimize=false)[1][1]
println(ci.code)

Variables:
  #self#::GPUArrays.#broadcast_kernel!
  state::Float32
  func::##7#8
  B::Transpiler.CLIntrinsics.CLArray{Float32,2}
  shape::Tuple{UInt32,UInt32}
  len::UInt32
  descriptor::Tuple{GPUArrays.BroadcastDescriptorN{Array,2},GPUArrays.BroadcastDescriptorN{Any,0},GPUArrays.BroadcastDescriptorN{Any,0},GPUArrays.BroadcastDescriptorN{Array,2},GPUArrays.BroadcastDescriptorN{Any,0},GPUArrays.BroadcastDescriptorN{Array,2},GPUArrays.BroadcastDescriptorN{Any,0},GPUArrays.BroadcastDescriptorN{Array,2},GPUArrays.BroadcastDescriptorN{Any,0},GPUArrays.BroadcastDescriptorN{Array,2}}
  A_1::Transpiler.CLIntrinsics.CLArray{Float32,2}
  A_2::Float32
  A_3::Float32
  A_4::Transpiler.CLIntrinsics.CLArray{Float32,2}
  A_5::Float32
  A_6::Transpiler.CLIntrinsics.CLArray{Float32,2}
  A_7::Float32
  A_8::Transpiler.CLIntrinsics.CLArray{Float32,2}
  A_9::Float32
  A_10::Transpiler.CLIntrinsics.CLArray{Float32,2}
  ilin::UInt32

Body:
  begin 
      SSAValue(2) = $(Expr(:invoke, MethodInstance for ret(::Type{UInt32}), :(Transpiler.CLIntrinsics.ret), :(Transpiler.CLIntrinsics.Cuint)))

      SSAValue(1) = (Base.checked_trunc_uint)(UInt32, 1)::UInt32
      ilin::UInt32 = (Base.add_int)(SSAValue(2), SSAValue(1))::UInt32 # line 149:
      unless (Base.ule_int)(ilin::UInt32, len::UInt32)::Bool goto 15 # line 150:

      SSAValue(0) = $(Expr(:invoke, MethodInstance for apply_broadcast(::UInt32, ::Float32, ::##7#8, ::Tuple{UInt32,UInt32}, ::UInt32, ::Tuple{GPUArrays.BroadcastDescriptorN{Array,2},GPUArrays.BroadcastDescriptorN{Any,0},GPUArrays.BroadcastDescriptorN{Any,0},GPUArrays.BroadcastDescriptorN{Array,2},GPUArrays.BroadcastDescriptorN{Any,0},GPUArrays.BroadcastDescriptorN{Array,2},GPUArrays.BroadcastDescriptorN{Any,0},GPUArrays.BroadcastDescriptorN{Array,2},GPUArrays.BroadcastDescriptorN{Any,0},GPUArrays.BroadcastDescriptorN{Array,2}}, ::Transpiler.CLIntrinsics.CLArray{Float32,2}, ::Float32, ::Float32, ::Transpiler.CLIntrinsics.CLArray{Float32,2}, ::Float32, ::Transpiler.CLIntrinsics.CLArray{Float32,2}, ::Float32, ::Transpiler.CLIntrinsics.CLArray{Float32,2}, ::Float32, ::Transpiler.CLIntrinsics.CLArray{Float32,2}), :(GPUArrays.apply_broadcast), :(ilin), :(state), :(func), :(shape), :(len), :(descriptor), :(A_1), :(A_2), :(A_3), :(A_4), :(A_5), :(A_6), :(A_7), :(A_8), :(A_9), :(A_10)))

      (GPUArrays.setindex!)(B::Transpiler.CLIntrinsics.CLArray{Float32,2}, SSAValue(0), ilin::UInt32)::ANY
      return
  end::Void

You can see that everything infers nicely, but:

call = ci.code[8].args[2]
m2 = Transpiler.CLMethod((GPUArrays.apply_broadcast,
    Tuple{map(x-> Sugar.expr_type(m, x), call.args[2:end])...}
))
Base.Core.Inference.return_type(m2.signature...) == Float32
# BUT:
Sugar.expr_type(m, SSAValue(0)) == Any # The variable that gets the call assigned

No idea why SSAValue(0) is inferred as Any.
Need to figure out if that's a problem with GPUArrays, or maybe even a corner case in type inference.

The second one is a classic break down of Transpiler.jl and can be worked around with:

u .= uprev .+ dt .* duprev .+ Float32(dt ^ 2) .* (Float32(Cint(1)//Cint(2)) .* ku)

I will see if I can fix that in Transpiler in the next days!
Btw, you should do these promotions/conversions before the broadcast call anyways, so that they don't happen in the hot loop.

@SimonDanisch
Copy link
Member

Okay, 2 should be fixed, while I'm pretty sure that 1 is JuliaLang/julia#22255

@maleadt
Copy link
Member

maleadt commented Jan 28, 2020

Transpiler is not maintained anymore.

@maleadt maleadt closed this as completed Jan 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants