-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot bit shift vec of bool #53
Comments
Booleans do not support bitshift operations. When you call You can convert the vector yourself, choosing the target integer size: f8(x::Vec{4, Bool}) = convert(Vec{4, Int8}, x) << Vec((0x01,0x02,0x03,0x04))
f16(x::Vec{4, Bool}) = convert(Vec{4, Int16}, x) << Vec((0x01,0x02,0x03,0x04)) The first (using define { <4 x i8> } @julia_f8_16841({ <4 x i8> } addrspace(11)* nocapture nonnull readonly dereferenceable(4)) {
top:
%1 = getelementptr inbounds { <4 x i8> }, { <4 x i8> } addrspace(11)* %0, i64 0, i32 0
%2 = load <4 x i8>, <4 x i8> addrspace(11)* %1, align 1
%3 = and <4 x i8> %2, <i8 1, i8 1, i8 1, i8 1>
%tmp.i = shl <4 x i8> %3, <i8 1, i8 2, i8 3, i8 4>
%.fca.0.insert = insertvalue { <4 x i8> } undef, <4 x i8> %tmp.i, 0
ret { <4 x i8> } %.fca.0.insert
} The second (using define { <4 x i16> } @julia_f16_16842({ <4 x i8> } addrspace(11)* nocapture nonnull readonly dereferenceable(4)) {
%1 = getelementptr inbounds { <4 x i8> }, { <4 x i8> } addrspace(11)* %0, i64 0, i32 0, i64 0
%2 = load i8, i8 addrspace(11)* %1, align 1
%3 = zext i8 %2 to i16
%4 = getelementptr inbounds { <4 x i8> }, { <4 x i8> } addrspace(11)* %0, i64 0, i32 0, i64 1
%5 = load i8, i8 addrspace(11)* %4, align 1
%6 = zext i8 %5 to i16
%7 = getelementptr inbounds { <4 x i8> }, { <4 x i8> } addrspace(11)* %0, i64 0, i32 0, i64 2
%8 = load i8, i8 addrspace(11)* %7, align 1
%9 = zext i8 %8 to i16
%10 = getelementptr inbounds { <4 x i8> }, { <4 x i8> } addrspace(11)* %0, i64 0, i32 0, i64 3
%11 = load i8, i8 addrspace(11)* %10, align 1
%12 = zext i8 %11 to i16
%13 = insertelement <4 x i16> undef, i16 %3, i32 0
%14 = insertelement <4 x i16> %13, i16 %6, i32 1
%15 = insertelement <4 x i16> %14, i16 %9, i32 2
%16 = insertelement <4 x i16> %15, i16 %12, i32 3
%tmp.i = shl <4 x i16> %16, <i16 1, i16 2, i16 3, i16 4>
%.fca.0.insert = insertvalue { <4 x i16> } undef, <4 x i16> %tmp.i, 0
ret { <4 x i16> } %.fca.0.insert
} One could argue that SIMD should auto-convert |
I understand why explicit is better than implicit here. Fortunately I am using UInt8 and in my updated example the codegen looks very good. Thank you for the explanation. |
You should be able to shave off the |
Yeah, I tired the julia> code_llvm(f8, Tuple{Vec{4, Bool}}; debuginfo=:none)
define { <4 x i8> } @julia_f8_17139({ <4 x i8> } addrspace(11)* nocapture nonnull readonly dereferenceable(4)) {
top:
%1 = getelementptr inbounds { <4 x i8> }, { <4 x i8> } addrspace(11)* %0, i64 0, i32 0
%2 = load <4 x i8>, <4 x i8> addrspace(11)* %1, align 4
%3 = shl <4 x i8> %2, <i8 1, i8 2, i8 3, i8 4>
%.fca.0.insert = insertvalue { <4 x i8> } undef, <4 x i8> %3, 0
ret { <4 x i8> } %.fca.0.insert
}
julia> code_llvm(f16, Tuple{Vec{4, Bool}}; debuginfo=:none)
define { <4 x i16> } @julia_f16_17156({ <4 x i8> } addrspace(11)* nocapture nonnull readonly dereferenceable(4)) {
top:
%1 = getelementptr inbounds { <4 x i8> }, { <4 x i8> } addrspace(11)* %0, i64 0, i32 0
%2 = load <4 x i8>, <4 x i8> addrspace(11)* %1, align 4
%3 = sext <4 x i8> %2 to <4 x i16>
%4 = shl <4 x i16> %3, <i16 1, i16 2, i16 3, i16 4>
%.fca.0.insert = insertvalue { <4 x i16> } undef, <4 x i16> %4, 0
ret { <4 x i16> } %.fca.0.insert
} |
Julia scalar:
SIMD.jl:
I have tried this with all Int types and it does not seem to work. It does however work when
convert
ing, but this generates less than ideal code.I am just starting to look at the code here and I do not know if this is related:
SIMD.jl/src/SIMD.jl
Line 227 in 1245797
It seems that the shift operator calls this
llvmconst
where it is initialized asi1
rather thani8
where julia usesi8
forBool
.SIMD.jl/src/SIMD.jl
Lines 266 to 270 in 1245797
The text was updated successfully, but these errors were encountered: