Allow batched_mul to work through PermutedDimsArray, II #191

mcabbott · 2020-04-03T20:59:24Z

This is an alternative to #187.

It similarly allows batched_mul to work on many PermutedDimsArrays, but does this just by calling strides(A) and branching. It also extends batched_mul! to take α, β scales like mul!.

It adds two functions storage_type and is_strided, both of which recursively unwrap things. This avoids trying to dispatch on BatchedAdjoint{PermutedDimsArray{...,CuArray}}... instead it can separately check the underlying storage, and whether it should be safe to call strides(A).

It also improves batched_gemm! to multi-thread the batch index (using JuliaLang/julia#36360 to save & restore the number of BLAS threads), and to allow size(A,3)==1 (batch only B,C).

mcabbott · 2020-10-24T12:08:07Z

Bump. Who has merge permissions here? @CarloLucibello, @DhairyaLGandhi?

copied from JuliaGPU/CuArrays.jl#664, needs FluxML/NNlib.jl#191

CarloLucibello · 2020-11-11T07:06:15Z

ops, didn't see this, thanks

mcabbott · 2020-11-11T08:55:18Z

Thanks!

copied from JuliaGPU/CuArrays.jl#664, needs FluxML/NNlib.jl#191

This was referenced Apr 3, 2020

Improvements to batched_mul, including PermutedDimsArray #187

Closed

Make gemm_strided_batched! work with PermutedDimsArrays JuliaGPU/CuArrays.jl#664

Closed

mcabbott pushed a commit to mcabbott/CuArrays.jl that referenced this pull request Apr 3, 2020

use _batched_gemm and storage_type from FluxML/NNlib.jl#191

0c6f5c6

mcabbott marked this pull request as ready for review July 2, 2020 17:18

mcabbott mentioned this pull request Sep 20, 2020

Support matrix multiplication (Continue #93) JuliaArrays/OffsetArrays.jl#146

Open

2 tasks

mcabbott mentioned this pull request Oct 23, 2020

scalar getindex when shared index changes order mcabbott/TensorCast.jl#28

Closed

Michael Abbott and others added 12 commits October 24, 2020 13:01

more unwrapping, plus strides & pointers

5c7e228

alternative batched_gemm! setup, using strides & storage_type

44faf05

fix strides in batched_gemm

915660c

fixup

f60dc8f

Array -> DenseArray

8775abc

allow size(A,3)==1 with size(B,3)==size(C,3)

db83d76

tests for permutations, 5-arg mul, trivial batches

b264c97

fixes & tweaks

71fc51e

better promotion of alpha, beta

ffedecc

make copies to avoid generic

ab998e5

doc typos

d8c1761

multi-thread loop + single-thread BLAS

18deacd

mcabbott force-pushed the fix2 branch from 39bc6bd to 18deacd Compare October 24, 2020 11:05

Michael Abbott added 3 commits October 24, 2020 13:18

simplify thanks to Compat.jl, and julia 1.3+

b51cc49

type parameter tweaks

c8a1fee

use adjoint not transpose, + doc tweaks

676d166

mcabbott pushed a commit to mcabbott/CUDA.jl that referenced this pull request Oct 24, 2020

allow PermutedDimsArray in gemm_strided_batched

3eea8a6

copied from JuliaGPU/CuArrays.jl#664, needs FluxML/NNlib.jl#191

CarloLucibello merged commit 9780c29 into FluxML:master Nov 11, 2020

mcabbott deleted the fix2 branch November 11, 2020 08:54

mcabbott mentioned this pull request Nov 11, 2020

Allow PermutedDimsArray in gemm_strided_batched JuliaGPU/CUDA.jl#539

Merged

mcabbott pushed a commit to mcabbott/CUDA.jl that referenced this pull request Nov 12, 2020

allow PermutedDimsArray in gemm_strided_batched

03744a6

copied from JuliaGPU/CuArrays.jl#664, needs FluxML/NNlib.jl#191

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow batched_mul to work through PermutedDimsArray, II #191

Allow batched_mul to work through PermutedDimsArray, II #191

mcabbott commented Apr 3, 2020 •

edited

Loading

mcabbott commented Oct 24, 2020

CarloLucibello commented Nov 11, 2020

mcabbott commented Nov 11, 2020

Allow batched_mul to work through PermutedDimsArray, II #191

Allow batched_mul to work through PermutedDimsArray, II #191

Conversation

mcabbott commented Apr 3, 2020 • edited Loading

mcabbott commented Oct 24, 2020

CarloLucibello commented Nov 11, 2020

mcabbott commented Nov 11, 2020

mcabbott commented Apr 3, 2020 •

edited

Loading