Refactor package into one part dealing with LLVM and one part that builds a Vec on top of that #63

KristofferC · 2020-02-12T12:58:09Z

This PR pretty much rewrites the package from scratch (with the exception of the indexing implemented by @tkf) while keeping the API intact. The reason for this is that I felt that the code could gain a lot of clarity by clearly separating the parts that deal with LLVM/llvmcall and then build a SIMD Vec on top of that. The number of lines of code has also been reduced from ~1600 to ~1000 giving some support to this claim.

The code is structured as follows:

LLVM_intrinsics.jl is pretty much a direct mapping of Julia Vectors (NTuple{N, VecElement{T}}) to the operators and intrinsics defined in https://llvm.org/docs/LangRef.html. It contains almost no higher level logic.
simdvec.jl contains the Vec (wrapping the tuple of VecElements) with definitions defined on it that maps to the intrinsics defined in LLVM.jl. In some cases, this is pretty automatic but in some cases requires some logic (like in the bitshifts partly to avoid undefined behavior or in the different conversions).
arrayops.jl is the stuff that deals with Julia Array like vload, vstore, vgather.

Things that have gotten added to the API:

The count_ones, count_zeros, leading_ones, leading_zeros, trailing_ones, trailing_zeros family of functions.
Type conversions and different types of reinterprets from scalar to vectors and back and between vectors of different size:

julia> v = Vec((Int32(2), Int32(4)))
<2 x Int32>[2, 4]

julia> reinterpret(Int64, v)
17179869186

julia> reinterpret(Vec{4, Int16}, v)
<4 x Int16>[2, 0, 4, 0]

julia> reinterpret(Vec{2, Int32}, 4)
<2 x Int32>[4, 0]

julia> convert(Vec{2, Float32}, v)
<2 x Float32>[2.0, 4.0]

Uses the LLVM vector reduction intrinsics (https://llvm.org/docs/LangRef.html#experimental-vector-reduction-intrinsics) instead of a hand-rolled reducer. This requires LLVM 8 and thus Julia 1.4. It should be noted that these are marked as experimental.

Things that has been removed from the API:

Removed the Val arguments from many functions (setindex, >> etc). Julia's constant propagation + LLVM's optimization is enough for these not to be needed. Things are specialized on the constant just as well as if using Val.
Removed the Val{} arguments and just use Val() consistently everywhere.
Removed exp10. This used to just call 10^v but the reason you would use exp10 is that there is a more efficient implementation for it than the naive one. I feel that providing exp10 gives a false impression that it provides a benefit over the naive version.
Removed all on Vec of Int. There is no such correspondence to Julia numbers (all should operator on Bools).

For the future, we should also think a bit how we could allow one to hook into the fast math flags defined in https://llvm.org/docs/LangRef.html#fast-math-flags. I guess we could try hook into the functionality provided by @fastmath.

I think a weak spot right now is all the different indexing. The combination of being able to use Vec{N, T}, VecRange{N} as the first argument, as well as the combination of alignments and non-temporal settings create a huge number of method combinations. In SIMD.jl many hundreds of lines are just defining similar methods with different orders of arguments and default values. Somehow the abstraction doesn't feel right here. I would want to at least make VecRange{N}(i) the default way to index because it feels unnecessary to have to pass the T in Vec{N,T} when it doesn't add any information.

Also, the Aligned flag sets the alignment to N * sizeof(T). Can that really be right?

Tagging some people that might be interested / can review. @tkf, @eschnett , @nlw0, @vchuravy, @chethega, @chriselrod

Fixes #65
Fixes #54
Fixes #51
Fixes #20

codecov-io · 2020-02-12T13:32:39Z

Codecov Report

Merging #63 into master will increase coverage by 2.58%.
The diff coverage is 88.58%.

@@            Coverage Diff             @@
##           master      #63      +/-   ##
==========================================
+ Coverage   86.02%   88.61%   +2.58%     
==========================================
  Files           1        4       +3     
  Lines         866      404     -462     
==========================================
- Hits          745      358     -387     
+ Misses        121       46      -75

Impacted Files	Coverage Δ
src/SIMD.jl	`100% <ø> (+13.97%)`	⬆️
src/simdvec.jl	`77.01% <77.01%> (ø)`
src/arrayops.jl	`95.91% <95.91%> (ø)`
src/LLVM_intrinsics.jl	`96.52% <96.52%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a9fc0a...e8f5815. Read the comment docs.

vchuravy

This is fantastic. I started a similar refactor twice and those stalled out on grounds of lacking time!

Made some comments inline, the pattern @eval @generated can just be replaced with @eval and the spliced in piece being hoisted.

src/LLVM_intrinsics.jl

src/arrayops.jl

src/LLVM_intrinsics.jl

eschnett · 2020-02-12T15:11:47Z

Things that has been removed from the API:

Removed the Val arguments from many functions (setindex, >> etc). Julia's constant propagation + LLVM's optimization is enough for these not to be needed. Things are specialized on the constant just as well as if using Val.

I'm unsure about this, but I feel the burden would be on me to provide the test cases where Val provides a benefit.

Removed the Val{} arguments and just use Val() consistently everywhere.

The Val{} were a design error and should now indeed be removed. I kept them for compatibility.

Removed exp10. This used to just call 10^v but the reason you would use exp10 is that there is a more efficient implementation for it than the naive one. I feel that providing exp10 gives a false impression that it provides a benefit over the naive version.

exp10 should be implemented in a way similar to exp2 that is much more efficient than 10^. There is a large performance difference. A simple implementation would be something like exp10(x) = exp2(log2(10) * x). exp2 is much faster than pow in libc.

eschnett · 2020-02-12T15:13:36Z

Does this mean that Julia 1.4 will be required for these changes? I know many people who still use Julia 1.2, and in particular if there is also an API change, being backward compatible across multiple Julia versions would be a benefit.

eschnett · 2020-02-12T15:14:09Z

I forgot the most important comment: Thanks for the work! I agree that this is the right direction.

README.md

KristofferC · 2020-02-12T15:25:52Z

Does this mean that Julia 1.4 will be required for these changes? I know many people who still use Julia 1.2, and in particular if there is also an API change, being backward compatible across multiple Julia versions would be a benefit.

The only thing that requires 1.4 is the new horizontal reduction intrinsics I think. So if we conditionally use the old hand-rolled one, it should be possible to support pre 1.4.

On the other hand, it isn't like the package was getting too much development (last functional change ~ a year ago) so those who use it in older Julia versions are probably just happy to keep using it as is and not to keen in upgrading packages at all.

chriselrod · 2020-02-12T15:47:00Z

Regarding indexing API, what about vload/vstore methods that accept AbstractArray{T,N}s and N-length tuples of indices, using promotion to determine which method should be used?
If all indices are integers, it may be a scalar load.
If the AbstractArray is contiguous and column major, a VecRange as the first index, while all following indices are integers, would result in a llvm load/store.
The presence of VecRanges in any other position within the index-tuple, or the presence of Vec{<:Any,<:Integer} cause use of gather/scatter.

That's the API I settled on for LoopVectorization's code generation, since it means I can generate the same code largely independent of broader context (when it comes to loads/stores), and have multiple dispatch make it do the right thing.

Perhaps that is less desirable for code people are actively writing, because they will be more intentional about what they're doing; they're likely to know whether an index will be a scalar, VecRange, or a Vec.
But it may still help write more generic code in the future, where iterators may generate different sorts of indexes based on input arguments.

eschnett · 2020-02-12T16:49:42Z

@KristofferC In principle yes, but given that Julia 1.4 isn't release yet, that would be a bit harsh. It would be nice (if it isn't too much effort) to give people a bit of time to upgrade (in the sense that Julia and SIMD can be upgraded separately).

eschnett · 2020-02-12T16:51:48Z

@chriselrod I modelled the vload / vstore functions after OpenCL C. Since Julia has a much better type system than C, maybe there is room for improvement. Your suggestion sounds interesting.

KristofferC · 2020-02-13T10:46:53Z

I need to fix the "codegen" for e.g. Bool + Bool because I haven't special cased that yet to do the trunc + zext dance yet. It is a bit unclear for me what the semantics for Bool should be. In Julia, Bools immediately promote when used in arithmetic:

julia> true - true
0

and in some cases methods that are defined on Int are not defined on Bool:

julia> count_zeros(true)
ERROR: MethodError: no method matching count_ones(::Bool)

Should the just be considered an i1 and then have all normal Int functions defined on them? Codegen for Bool's seems to have some issues though (JuliaLang/julia#21712).
Or should we try to match Julia as much as possible. Any implicit promotion in SIMD.jl feels pretty pointless, however.

eschnett · 2020-02-13T14:29:29Z

In SIMD code, changing the vector width is often expensive. Some systems have therefore different bool sizes, such as Bool32 and Bool64. They might internally represented very different from regular bools, i.e. they might even live in floating-point registers. I think SSE2 uses floating-point values with all-zero-bits and all-one-bits, PPC uses floating-point values that compare >=0, etc. Of course, more modern architectures (AVX512, CUDA) have special mask registers.

I don't know whether or how we can make LLVM generate efficient code here. I think the best would be to represent booleans as i1, and hope that LLVM knows about the efficient low-level SIMD representations.

I recommend against automatic conversion to Int, as Int would often have the wrong bit-width and thus lead to inefficient code. There is basically no automatic type conversion in SIMD.jl. I would also keep a distinction between Bool and Int, so even if the internal representation of i1 and i8 sounds similar, it doesn't have to be the same at a higher level. I don't see a point in defining -(::Bool, ::Bool) without automatic conversion and would omit it, similar for count_zeros etc.

We could define a type Int1 (and I know cases where a vectorized Int4 would be most convenient to have), but that's a different bag of chips.

KristofferC · 2020-02-13T15:06:12Z

I recommend against automatic conversion to Int, as Int would often have the wrong bit-width and thus lead to inefficient code. There is basically no automatic type conversion in SIMD.jl. I would also keep a distinction between Bool and Int, so even if the internal representation of i1 and i8 sounds similar, it doesn't have to be the same at a higher level. I don't see a point in defining -(::Bool, ::Bool) without automatic conversion and would omit it, similar for count_zeros etc.

Yeah, agreed.

FWIW, I ran the tests on https://github.com/KristofferC/Tensors.jl (which uses SIMD.jl) and they passed. If anyone else is actually using SIMD.jl in their code, would be nice if you could run with this PR to see that it is non-breaking.

exception of some of the indexing implemented by tkf) while keeping the API intact. The reason for this is that I felt that the code could gain a lot of clarity by clearly separating the parts that deals with LLVM/`llvmcall` and then build a `Vec` on top of that. The number of lines of code has also been reduced from ~1600 to 1000. The code is structured as follows: - `LLVM_Intrinsics.jl` is pretty much a direct mapping of Julia Vectors (`NTuple{N, VecElement{T}}`) to the operators and intrinsics defined in https://llvm.org/docs/LangRef.html. It contains almost no higher level logic. - `simdvec.jl` contains the `Vec` (wrapping the tuple of `VecElement`s) with definitions defined on it that maps to the intrinsics defined in `LLVM.jl`. In some cases this is pretty automatic but in some cases requires some logic (like in the bitshifts partly to avoid undefined behavior or in the different conversions). - `arrayops.jl` is the stuff that deals with Julia `Array` like `vload`, `vstore`, `vgather`. Things that have gotten added to the API: - The `count_ones, count_zeros, leading_ones, leading_zeros, trailing_ones, trailing_zeros` family of functions. - Type conversions and different types of reinterprets from scalar to vectors and back and between vectors of different size: ```jl julia> v = Vec((Int32(2), Int32(4))) <2 x Int32>[2, 4] julia> reinterpret(Int64, v) 17179869186 julia> reinterpret(Vec{4, Int16}, v) <4 x Int16>[2, 0, 4, 0] julia> reinterpret(Vec{2, Int32}, 4) <2 x Int32>[4, 0] julia> convert(Vec{2, Float32}, v) <2 x Float32>[2.0, 4.0] ``` - Uses the LLVM vector reduction intrinsics (https://llvm.org/docs/LangRef.html#experimental-vector-reduction-intrinsics) instead of a hand rolled reducer. Things that has been removed from the API: - Removed the `Val` arguments from many functions (`setindex`, `>>` etc). Julia's constant propagation + LLVM's optimization are enough for these not to be needed. Things are specialized on the constant just as well as if using `Val`. - Removed the `Val{}` arguments and just use `Val()` consistently everywhere. - Removed `exp10`. This used to just call `10^v` but the reason you would use `exp10` is that there is a more efficient implementation for it than the naive one. I feel that providing `exp10` gives the false impression that it provides a benefit over the naive version Co-Authored-By: Valentin Churavy <[email protected]>

fixup: fix supported element types

KristofferC · 2020-02-22T21:08:04Z

From my point of view, it is almost ready to be merged. The last feature I added runs into a bug on x86 (https://ci.appveyor.com/project/eschnett/simd-jl/builds/30989111/job/p0g8uhmkamy7ut8n#L56) which is similar to JuliaLang/julia#29447. I think Julia needs to ship compiler-rt for it to work. I will just disable that feature on X86 (overflow intrinsics)

This PR requires Julia 1.4 (not 1.5) but the overflow intrinsics are only available on 1.5.

Regarding wanting to support previous Julia versions, one often splits out a release-2.0 branch from the commit before the Julia version requirement got updated and if bugfixes are needed for the previous Julia version those are made against that branch. Releases for 2.x can then be made from that branch at will.

the error we get is "LLVM ERROR: Symbols not found: { __mulodi4 }" which seems like it would require compiler-rt support"

KristofferC · 2020-02-22T21:42:10Z

But before we tag, I want to go through the existing issues and see which ones are no longer relevant or see which ones can be easily fixed.

KristofferC · 2020-02-23T16:46:17Z

From my point of view, this could be merged whenever. I still have a few tests to add (like making sure there are no spurious bounds checks inside @inbounds�) but the PR is getting pretty big as it is.

vchuravy

Just read through it again. LGTM

chethega

Super cool!

chethega · 2020-02-23T22:07:56Z

src/LLVM_intrinsics.jl

+
+###########
+# Bitcast #
+###########


We also need a trunc-and-bitcast intruction to represent vpmovmskb.

That would take <n x i8>, truncate it to <n x i1>, zero-pad it, and cast it to e.g. i32 (or avx2). This can afaik not be reproduced with the other intrinsics here.

Likewise, we probably want an operation that takes <n x i8>, truncates to <n x i1> and sexts to <n x i8>; possibly extending even more. This is in order to reduce mismatch between julia/llvm and x86 semantics (x86 tends to set all the bits, i.e. sext; julia has no representation of <n x i1>, and julia's Bool uses zext -style <n x i8>).

chethega · 2020-02-23T22:13:30Z

src/simdvec.jl

+# See https://github.com/JuliaLang/julia/blob/7426625b5c07b0d93110293246089a259a0a677d/src/intrinsics.cpp#L1179-L1196
+# Shifting with a value larger than the number of bits in the type is undefined behavior
+# so set to zero in those cases.
+@inline function shl_int(x::Vec{N, T1}, y::Vec{N, T2}) where {N, T1<:IntegerTypes, T2<:IntegerTypes}


I think it would be much better to use llvm-semantics and document that they differ from julia semantics. People who use SIMD.jl care about speed, and can deal with the fact that shifts by more that width bits behave weirdly.

Not "weirdly", it gives you a poison value which gives you undefined behavior in many cases. I don't think we should expose undefined behavior so easily from the Vec type. You can always just call Intrinsics.shl?

We could also do the bitshifts modulo the size of the integer (which would match with e.g. rust). That would be cheaper since you can just and it.

Yes, this is a good idea. OpenCL does the same.

What do we do about negative bitshifts?

Julia distinguishes between signed and unsigned bitshifts. For signed ones, it handles negative values as positive shifts in the opposite direction.

If we use the same function/operator name, we should use the same semantics. For full performance, people have to specify unsigned shift counts. If we think this is confusing, then we can reject signed shift counts.

If we use the same function/operator name, we should use the same semantics.

But in #63 (comment) we basically said that we would not keep the same semantics? In Julia, a shift larger than the bitsize does not shift with wrapped around shift but sets the value to zero which to me is a very different semantics.

Right. I thought Julia was undefined here, and we'd only be tightening things. But that's LLVM's semantics, not Julia's.

Should we keep Julia's semantics, and add new functions (shl / shr) that re-interpret as unsigned and wrap around?

In many cases, the shift count will be known at compile time, and in this case, things will be efficient anyway.

So, I did some more reading on https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=AVX,AVX2&cats=Shift

For hare-brained reason _mm256_slli_epi64 et al have julia semantics, while non-vectorized shifts ignore the upper bits. LLVM often fails to properly optimize code that enforces julia/AVX semantics; but if we emit x86_64 semantics, we have a mismatch with the AVX instructions we actually want. ARM does something else, because why not.

I see your point that the poison-value variant is slightly too poisonous for the default. So I guess we should document that poisonous shl is preferable to << if known safe, and consider it our offering to ill-thought out llvm semantics.

I hereby volunteer you to write the doc string for these functions.

KristofferC · 2020-03-04T16:15:35Z

I added @fastmath support in KristofferC#1.

julia> f(a,b,c,d) = @fastmath a * b + c - d;

julia> v = Vec(1.0, 2.0, 3.0, 4.0);

julia> f(v, 1.0, v, 2.0)
<4 x Float64>[0.0, 2.0, 4.0, 6.0]

julia> @code_llvm debuginfo=:none f(v, 1.0, v, 2.0)
define void @julia_f_17933([1 x <4 x double>]* noalias nocapture sret, [1 x <4 x double>] 
...
  %5 = insertelement <4 x double> undef, double %2, i32 0
  %res.i = shufflevector <4 x double> %5, <4 x double> undef, <4 x i32> zeroinitializer
  %6 = getelementptr inbounds [1 x <4 x double>], [1 x <4 x double>] addrspace(11)* %1, i64 0, i64 0
  %7 = load <4 x double>, <4 x double> addrspace(11)* %6, align 16
  %8 = fmul fast <4 x double> %7, %res.i
  %9 = getelementptr inbounds [1 x <4 x double>], [1 x <4 x double>] addrspace(11)* %3, i64 0, i64 0
  %10 = load <4 x double>, <4 x double> addrspace(11)* %9, align 16
  %11 = insertelement <4 x double> undef, double %4, i32 0
  %12 = fsub fast <4 x double> <double -0.000000e+00, double undef, double undef, double undef>, %11
  %res.i1.neg = shufflevector <4 x double> %12, <4 x double> undef, <4 x i32> zeroinitializer
  %13 = fadd fast <4 x double> %8, %res.i1.neg
  %14 = fadd fast <4 x double> %13, %10
  %15 = getelementptr inbounds [1 x <4 x double>], [1 x <4 x double>]* %0, i64 0, i64 0
  store <4 x double> %14, <4 x double>* %15, align 32
  ret void
}

julia> @code_native debuginfo=:none f(v, 1.0, v, 2.0)
        .section        __TEXT,__text,regular,pure_instructions
        movq    %rdi, %rax
        vbroadcastsd    %xmm0, %ymm0
        vbroadcastsd    %xmm1, %ymm1
        vfmsub231pd     (%rsi), %ymm0, %ymm1 ## ymm1 = (ymm0 * mem) - ymm1
        vaddpd  (%rdx), %ymm1, %ymm0
        vmovapd %ymm0, (%rdi)
        vzeroupper
        retq
        nop

julia> g(a,b,c,d) =  a * b + c - d; # no @fastmath

julia> @code_native debuginfo=:none g(v, 1.0, v, 2.0)
        .section        __TEXT,__text,regular,pure_instructions
        movq    %rdi, %rax
        vbroadcastsd    %xmm0, %ymm0
        vmulpd  (%rsi), %ymm0, %ymm0
        vaddpd  (%rdx), %ymm0, %ymm0
        vbroadcastsd    %xmm1, %ymm1
        vsubpd  %ymm1, %ymm0, %ymm0
        vmovapd %ymm0, (%rdi)
        vzeroupper
        retq
        nopw    %cs:(%rax,%rax)
        nopl    (%rax,%rax)

KristofferC · 2020-03-04T16:40:51Z

Added the fastmath commit to this PR.

tkf · 2020-03-20T21:55:16Z

Also, the Aligned flag sets the alignment to N * sizeof(T). Can that really be right?

I just noticed that Threads.Atomic uses Base.gc_alignment. For example: https://github.com/JuliaLang/julia/blob/644753491de0d5d933c96e6f1f5876b3c7603cf4/base/atomics.jl#L347-L352

It calls jl_gc_alignment C API: https://github.com/JuliaLang/julia/blob/644753491de0d5d933c96e6f1f5876b3c7603cf4/src/julia_internal.h#L216-L236

Does it make sense to use Base.gc_alignment(T) when the alignment flag is false?

When it's true, it's basically SIMD.jl who defines what it means to be "aligned" so I suppose it's OK? (Or was the question asking if it is "right" in terms of the performance?)

Alternatively, if SIMD.Intrinsics is meant to be a direct API to LLVM IR, maybe it can use Val(integer)? The default argument can still use Base.gc_alignment(T).

src/arrayops.jl

KristofferC · 2020-03-23T09:37:39Z

Added some docs for the @fastmath integration.

KristofferC · 2020-03-28T10:56:17Z

1.4 is now released. Any thoughts on how to progress here @eschnett

eschnett · 2020-03-29T15:28:37Z

@KristofferC In general – once your pull request is ready, it should be applied. Are you referring to a particular question or suggestion (shifts, alignment, ...)?

KristofferC force-pushed the kc/rewrite branch from c7de73a to de0b04e Compare February 12, 2020 13:01

KristofferC changed the title ~~Refactor package into a one part dealing with LLVM and one part that builds on top of that~~ Refactor package into a one part dealing with LLVM and one part that builds a Vec on top of that Feb 12, 2020

KristofferC force-pushed the kc/rewrite branch 2 times, most recently from 6c4876a to e9bc504 Compare February 12, 2020 13:18

KristofferC changed the title ~~Refactor package into a one part dealing with LLVM and one part that builds a Vec on top of that~~ Refactor package into one part dealing with LLVM and one part that builds a Vec on top of that Feb 12, 2020

vchuravy reviewed Feb 12, 2020

View reviewed changes

src/LLVM_intrinsics.jl Show resolved Hide resolved

src/LLVM_intrinsics.jl Show resolved Hide resolved

src/arrayops.jl Show resolved Hide resolved

src/LLVM_intrinsics.jl Outdated Show resolved Hide resolved

eschnett reviewed Feb 12, 2020

View reviewed changes

README.md Show resolved Hide resolved

This was referenced Feb 13, 2020

Erro when trying to Revise a particular file timholy/Revise.jl#422

Closed

Possible to get better error message on syntax errors? timholy/Revise.jl#421

Closed

KristofferC force-pushed the kc/rewrite branch from badbe6e to f6ce1a6 Compare February 13, 2020 14:50

KristofferC force-pushed the kc/rewrite branch from 8041e0a to 291459d Compare February 13, 2020 15:36

perrutquist mentioned this pull request Feb 17, 2020

Extend Base.setindex #64

Closed

KristofferC and others added 2 commits February 21, 2020 19:11

add a warning and two explicit inlines

68c0b2d

KristofferC force-pushed the kc/rewrite branch from 291459d to 68c0b2d Compare February 21, 2020 18:11

add functions for doing saturated adds and subs

e3321e5

KristofferC force-pushed the kc/rewrite branch from 154f08c to e3321e5 Compare February 21, 2020 19:07

KristofferC added 2 commits February 22, 2020 21:33

fix supported element types

bf71b69

fixup: fix supported element types

improve typeinfo propagation

f237a54

throw when trying to call mul with overflow on Int64 on i686 because

5fb86c2

the error we get is "LLVM ERROR: Symbols not found: { __mulodi4 }" which seems like it would require compiler-rt support"

aminya mentioned this pull request Feb 22, 2020

Separating/Moving the vectorized functions #59

Closed

add conversion from Bool

67372bf

KristofferC mentioned this pull request Feb 23, 2020

Cannot bit shift vec of bool #53

Closed

fix some required uses of propagate_inbounds (JuliaLang/julia#30411)

5de3b15

KristofferC mentioned this pull request Feb 23, 2020

Bounds check in vstore #51

Closed

add a note that the readme example is not meant to beat scalar version

8204863

vchuravy approved these changes Feb 23, 2020

View reviewed changes

chethega approved these changes Feb 23, 2020

View reviewed changes

add fast math options to intrinsics and hook into fastmath macro (#1)

bdfd585

KristofferC force-pushed the kc/rewrite branch from 676dfe4 to 5ee8ad6 Compare March 5, 2020 10:31

add an extra fastmath test

05949bc

KristofferC force-pushed the kc/rewrite branch from 5ee8ad6 to 05949bc Compare March 5, 2020 10:50

tkf reviewed Mar 21, 2020

View reviewed changes

src/arrayops.jl Outdated Show resolved Hide resolved

KristofferC closed this Mar 22, 2020

KristofferC reopened this Mar 22, 2020

Kristoffer Carlsson and others added 2 commits March 22, 2020 21:31

fix some boundschecks

37b2340

add docs for fastmath

e8f5815

this release should be non-breaking

90d54fd

KristofferC merged commit 8eb7d50 into eschnett:master Mar 31, 2020

KristofferC mentioned this pull request Jul 13, 2020

New release? #67

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor package into one part dealing with LLVM and one part that builds a Vec on top of that #63

Refactor package into one part dealing with LLVM and one part that builds a Vec on top of that #63

KristofferC commented Feb 12, 2020 •

edited

Loading

codecov-io commented Feb 12, 2020 •

edited

Loading

vchuravy left a comment

eschnett commented Feb 12, 2020

eschnett commented Feb 12, 2020

eschnett commented Feb 12, 2020

KristofferC commented Feb 12, 2020

chriselrod commented Feb 12, 2020

eschnett commented Feb 12, 2020

eschnett commented Feb 12, 2020

KristofferC commented Feb 13, 2020 •

edited

Loading

eschnett commented Feb 13, 2020

KristofferC commented Feb 13, 2020

KristofferC commented Feb 22, 2020

KristofferC commented Feb 22, 2020

KristofferC commented Feb 23, 2020

vchuravy left a comment

chethega left a comment

chethega Feb 23, 2020

chethega Feb 23, 2020

KristofferC Feb 24, 2020

KristofferC Feb 27, 2020

eschnett Feb 28, 2020

KristofferC Feb 28, 2020

eschnett Feb 28, 2020

KristofferC Feb 28, 2020 •

edited

Loading

eschnett Feb 28, 2020

chethega Feb 29, 2020

eschnett Feb 29, 2020

KristofferC commented Mar 4, 2020

KristofferC commented Mar 4, 2020

tkf commented Mar 20, 2020 •

edited

Loading

KristofferC commented Mar 23, 2020

KristofferC commented Mar 28, 2020

eschnett commented Mar 29, 2020

Refactor package into one part dealing with LLVM and one part that builds a Vec on top of that #63

Refactor package into one part dealing with LLVM and one part that builds a Vec on top of that #63

Conversation

KristofferC commented Feb 12, 2020 • edited Loading

codecov-io commented Feb 12, 2020 • edited Loading

Codecov Report

vchuravy left a comment

Choose a reason for hiding this comment

eschnett commented Feb 12, 2020

eschnett commented Feb 12, 2020

eschnett commented Feb 12, 2020

KristofferC commented Feb 12, 2020

chriselrod commented Feb 12, 2020

eschnett commented Feb 12, 2020

eschnett commented Feb 12, 2020

KristofferC commented Feb 13, 2020 • edited Loading

eschnett commented Feb 13, 2020

KristofferC commented Feb 13, 2020

KristofferC commented Feb 22, 2020

KristofferC commented Feb 22, 2020

KristofferC commented Feb 23, 2020

vchuravy left a comment

Choose a reason for hiding this comment

chethega left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KristofferC Feb 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KristofferC commented Mar 4, 2020

KristofferC commented Mar 4, 2020

tkf commented Mar 20, 2020 • edited Loading

KristofferC commented Mar 23, 2020

KristofferC commented Mar 28, 2020

eschnett commented Mar 29, 2020

KristofferC commented Feb 12, 2020 •

edited

Loading

codecov-io commented Feb 12, 2020 •

edited

Loading

KristofferC commented Feb 13, 2020 •

edited

Loading

KristofferC Feb 28, 2020 •

edited

Loading

tkf commented Mar 20, 2020 •

edited

Loading