change type parameter range for AbstractStridedPointer #49

GiggleLiu · 2021-02-24T06:29:10Z

This is for supporting different number types.

chriselrod · 2021-02-25T15:02:12Z

There is an ambiguity error and an error in the memory tests. The memory test error should be fixed on master.

Mind merging in master and trying to fix the ambiguity?

 vstore!(ptr::Union{Ptr, VectorizationBase.AbstractStridedPointer}, v::Number) in VectorizationBase at /home/runner/work/VectorizationBase.jl/VectorizationBase.jl/src/llvm_intrin/memory_addr.jl:850
vstore!(ptr::VectorizationBase.AbstractStridedPointer{T, N, C, B, R, X, O} where {N, C, B, R, X<:Tuple{Vararg{Any, N}}, O<:Tuple{Vararg{Any, N}}}, v::T) where T in VectorizationBase at /home/runner/work/VectorizationBase.jl/VectorizationBase.jl/src/strided_pointers/stridedpointers.jl:121

Previously, the second method was always more specific, as all NativeTypes <: Number, and T <: NativeTypes` implicitly in the second definition.

So, perhaps try changing the first definition to just vstore!(ptr::Union{Ptr, VectorizationBase.AbstractStridedPointer}, v)?

GiggleLiu · 2021-02-25T17:02:05Z

Thanks for noticing, I just fixed it.

"Less Interface"ism

I notice that there are 57 methods called vstore!. Since I am a "less interface"ism, I have to preach the benefit of keeping less interface

less ambiguity errors,
means higher fan in, higher test coverage and easier to maintain (https://www.toolbox.com/tech/enterprise-software/blogs/design-principles-fan-in-vs-fan-out-050407/),
easier for people to extend,

e.g.

when I can currify a function

f(b) = x -> f(x, b)

I choose not to.

when I can specify a default parameters like this

f(x, b) = ...
f(x) = f(x, 1.0)

I choose not to.

"Less interface" rule does not apply when the interface decrease the user input dedundancy.

e.g.
when

f(x) = f(x, x^2)

is good. because people has less probability to make an error from repeated input.

Just sharing my aesthetics of coding, by no means to change the current design since they are already thouroughly tested.

codecov-io · 2021-02-25T17:05:30Z

Codecov Report

Merging #49 (2800fa0) into master (c79e5d4) will decrease coverage by 1.45%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master      #49      +/-   ##
==========================================
- Coverage   60.88%   59.43%   -1.46%     
==========================================
  Files          31       31              
  Lines        3981     3976       -5     
==========================================
- Hits         2424     2363      -61     
- Misses       1557     1613      +56

Impacted Files	Coverage Δ
src/strided_pointers/stridedpointers.jl	`26.92% <0.00%> (ø)`
src/llvm_intrin/verf.jl	`60.00% <0.00%> (-13.08%)`	⬇️
src/promotion.jl	`23.75% <0.00%> (-12.50%)`	⬇️
src/llvm_intrin/masks.jl	`58.36% <0.00%> (-6.83%)`	⬇️
src/llvm_intrin/intrin_funcs.jl	`54.63% <0.00%> (-6.02%)`	⬇️
src/llvm_intrin/conversion.jl	`63.20% <0.00%> (-4.00%)`	⬇️
src/llvm_intrin/nonbroadcastingops.jl	`97.46% <0.00%> (-1.27%)`	⬇️
src/VectorizationBase.jl	`67.83% <0.00%> (-0.70%)`	⬇️
src/early_definitions.jl	`63.15% <0.00%> (+0.65%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c79e5d4...2800fa0. Read the comment docs.

chriselrod · 2021-02-25T18:01:06Z

The interface is much smaller than the number of methods.
Many of the methods come from the equivalent of:

f(x::Float64, b) = ...
f(x::Int, b) = ...
f(x::String, b) = ...
f(x::Tuple, b) = ...

That is, many of the methods do not represent a different API.
They're specializations for different input types, to make different inputs act the same way.

There is also a processing pipeline from surface API before final code generation.
You should never specify the ::StaticInt{RS} parameter yourself, for example.
Every vstore! method where you see ::StaticInt{RS} as one of the arguments is an internal method that you shouldn't be manually calling.

So a change I probably should make is renaming all those vstore! and vloads (probably to _vstore! and _vload) to make that obvious.

For vload, the API is simply:

vload(p::AbstractStridedPointer, i::Tuple)
vload(p::AbstractStridedPointer, i::Tuple, m::Mask)
vload(p::AbstractStridedPointer, i::Unroll)
vload(p::AbstractStridedPointer, i::Unroll, m::Mask)

p is the stridedpointer you want to load from, i is the index, and m is an optional mask to avoid loading where the mask is off.
There are also vloada equivalents, that assume the memory is aligned, but you should use vload in general. So there are 4, or maybe 8 variants if you want to count vloada. Yet we have 41 vload methods:

julia> vload
vload (generic function with 41 methods)

Example usage, and what all that internal processing is for:

julia> colors = [(R = rand(), G = rand(), B = rand()) for i ∈ 1:100];

julia> colormat = reinterpret(reshape, Float64, colors)
3×100 reinterpret(reshape, Float64, ::Vector{NamedTuple{(:R, :G, :B), Tuple{Float64, Float64, Float64}}}) with eltype Float64:
 0.101462  0.566472  0.395725   0.156595  0.247472  0.58609   0.451675  0.329288  0.628678  0.0161756    0.457912  0.929442  0.209048  0.225449  0.570513  …  0.373047   0.0599544  0.487379  0.48549   0.219519    0.758767  0.349387  0.575183  0.0668234  0.204329  0.371181   0.197348  0.652758  0.630789
 0.919857  0.852129  0.0622284  0.479014  0.596842  0.600178  0.188863  0.411865  0.909364  0.000210617  0.254854  0.469283  0.552034  0.202351  0.948111     0.537816   0.65176    0.619699  0.520092  0.515346    0.791153  0.326027  0.104903  0.888952   0.459569  0.0474623  0.161321  0.59407   0.0856169
 0.980903  0.841869  0.525539   0.933114  0.766957  0.116385  0.167431  0.964176  0.636977  0.491324     0.584939  0.135019  0.58517   0.467258  0.654256     0.0479113  0.162333   0.363111  0.162873  0.00662285  0.204877  0.789972  0.536066  0.0866196  0.753488  0.279345   0.556696  0.625233  0.364667

julia> colors[1]
(R = 0.10146162439576822, G = 0.9198573051020194, B = 0.980902779143441)

julia> vload(stridedpointer(colormat), (2,3)) # load element [2,3]
0.0622284186540214

julia> vload(stridedpointer(colormat), (2,MM{8}(3))) # load elements colormat[2,3:10]
Vec{8, Float64}<0.0622284186540214, 0.47901368495584085, 0.596842316190509, 0.6001775663920466, 0.18886316876982878, 0.41186519614364814, 0.9093641942999946, 0.00021061724559756634>

julia> colormat[2,3:10]'
1×8 adjoint(::Vector{Float64}) with eltype Float64:
 0.0622284  0.479014  0.596842  0.600178  0.188863  0.411865  0.909364  0.000210617

julia> vload(stridedpointer(colormat), (MM{4}(1),3), mask(Val(4), 3)) # load elements colormat[1:3,3]
Vec{4, Float64}<0.39572490568820506, 0.0622284186540214, 0.5255392424137211, 0.0>

julia> colormat[1:3,3]'
1×3 adjoint(::Vector{Float64}) with eltype Float64:
 0.395725  0.0622284  0.525539

julia> vload(stridedpointer(colormat), (MM{4}(1),MM{4}(3)), mask(Val(4), 3)) # load elements colormat[(1,3), (2,4), (3,5)]
Vec{4, Float64}<0.39572490568820506, 0.47901368495584085, 0.7669572786918826, 0.0>

julia> getindex.(Ref(colormat), 1:3, 3:5)'
1×3 adjoint(::Vector{Float64}) with eltype Float64:
 0.395725  0.479014  0.766957

MMs represent a vector of indices. Passing them as arguments basically means to take a slice. Adding a mask lets you avoid reading out of bounds.

The Unroll indices are for if we want to load multiple vectors at one time. The API is really awkward (I always have to check ?Unroll myself) and I'm open to suggestions for improvement. Example:

julia> vload(stridedpointer(colormat), Unroll{1,1,3,2,8,zero(UInt),1}((1,1))) #axis 1 unrolled w/ step of 1, unrolled 3x, axis 2 vectorized, loading 8 elements,mask 0 of the loads, loaded vector elements are 1 apart
3 x Vec{8, Float64}
Vec{8, Float64}<0.10146162439576822, 0.5664720363768123, 0.39572490568820506, 0.1565950460890433, 0.24747228039940494, 0.586089973376654, 0.4516746290804887, 0.3292882151577483>
Vec{8, Float64}<0.9198573051020194, 0.852129002305116, 0.0622284186540214, 0.47901368495584085, 0.596842316190509, 0.6001775663920466, 0.18886316876982878, 0.41186519614364814>
Vec{8, Float64}<0.980902779143441, 0.8418685033011257, 0.5255392424137211, 0.9331143002012057, 0.7669572786918826, 0.11638477303011419, 0.16743089573675807, 0.964175916870037>

Why do this instead of just using the first API 3 times?

julia> vload(stridedpointer(colormat), (1,MM{8}(1))) # load [1,1:8]
Vec{8, Float64}<0.10146162439576822, 0.5664720363768123, 0.39572490568820506, 0.1565950460890433, 0.24747228039940494, 0.586089973376654, 0.4516746290804887, 0.3292882151577483>

julia> vload(stridedpointer(colormat), (2,MM{8}(1))) # load [2,1:8]
Vec{8, Float64}<0.9198573051020194, 0.852129002305116, 0.0622284186540214, 0.47901368495584085, 0.596842316190509, 0.6001775663920466, 0.18886316876982878, 0.41186519614364814>

julia> vload(stridedpointer(colormat), (3,MM{8}(1))) # load [3,1:8]
Vec{8, Float64}<0.980902779143441, 0.8418685033011257, 0.5255392424137211, 0.9331143002012057, 0.7669572786918826, 0.11638477303011419, 0.16743089573675807, 0.964175916870037>

Because the Unroll is sometimes much faster:

julia> @benchmark (vload(stridedpointer($colormat), (1,MM{8}(1))), vload(stridedpointer($colormat), (2,MM{8}(1))),  vload(stridedpointer($colormat), (3,MM{8}(1))))
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     5.940 ns (0.00% GC)
  median time:      6.040 ns (0.00% GC)
  mean time:        6.076 ns (0.00% GC)
  maximum time:     14.404 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark vload(stridedpointer($colormat), Unroll{1,1,3,2,8,zero(UInt),1}((1,1)))
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     3.080 ns (0.00% GC)
  median time:      3.097 ns (0.00% GC)
  mean time:        3.108 ns (0.00% GC)
  maximum time:     8.722 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

It's often possible to optimize groups of loads and stores that are done together, and that is what the Unroll type enables.
Note that the master branch of LoopVectorization doesn't use the Unroll type yet, but the vecunroll branch does, and I will merge it with master in the next few weeks or perhaps days.
While Unroll looks awkward for us, it's straightforward for LoopVectorization to plug in the correct parameters.

The case for vstore! is similar, but it also accepts a function as a first argument to tell it to possibly use that function to "reduce" a vector into a scalar. This is again something that is possible to optimize when done in Unrolled batches.

I'm definitely open to suggestions for cleaning things up, and do thing the vload/vstore! and also the type promotion code could use a lot of cleaning up. At the very least, renaming all the internal vload/vstore!s is an easy thing to do.
The code could almost certainly be orthoganalized better.

But the actual API of these functions is pretty straightforward, with most of the methods coming from determining the behavior from the types.

I should probably document this somewhere more clearly!
But I hope the explanation helps.

GiggleLiu · 2021-02-25T18:25:50Z

Thanks for your very detailed explaination. The reason why I am confused because the list is very long when I type ?vload and ?vstore! in an REPL. Now I am completely convinced by you.

Writing many docstring in a too early stage is generally not good for future change of design, but a minimum documentation of most important functions would help people to understand the design. Let me see if I can help adding some documentations.

Cheers.

change type parameter range for AbstractStridedPointer

67bf8d1

GiggleLiu added 2 commits February 25, 2021 11:21

Merge branch 'master' into strided-pointer

0f30899

fix ambiguity error

2800fa0

chriselrod merged commit 088edc8 into JuliaSIMD:master Feb 25, 2021

chriselrod added a commit that referenced this pull request Mar 3, 2021

Backport #49

52df28e

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change type parameter range for AbstractStridedPointer #49

change type parameter range for AbstractStridedPointer #49

GiggleLiu commented Feb 24, 2021

chriselrod commented Feb 25, 2021

GiggleLiu commented Feb 25, 2021

codecov-io commented Feb 25, 2021

chriselrod commented Feb 25, 2021 •

edited

Loading

GiggleLiu commented Feb 25, 2021

change type parameter range for AbstractStridedPointer #49

change type parameter range for AbstractStridedPointer #49

Conversation

GiggleLiu commented Feb 24, 2021

chriselrod commented Feb 25, 2021

GiggleLiu commented Feb 25, 2021

"Less Interface"ism

codecov-io commented Feb 25, 2021

Codecov Report

chriselrod commented Feb 25, 2021 • edited Loading

GiggleLiu commented Feb 25, 2021

chriselrod commented Feb 25, 2021 •

edited

Loading