Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow batched_mul to work through PermutedDimsArray, II #191

Merged
merged 15 commits into from
Nov 11, 2020

Conversation

mcabbott
Copy link
Member

@mcabbott mcabbott commented Apr 3, 2020

This is an alternative to #187.

It similarly allows batched_mul to work on many PermutedDimsArrays, but does this just by calling strides(A) and branching. It also extends batched_mul! to take α, β scales like mul!.

It adds two functions storage_type and is_strided, both of which recursively unwrap things. This avoids trying to dispatch on BatchedAdjoint{PermutedDimsArray{...,CuArray}}... instead it can separately check the underlying storage, and whether it should be safe to call strides(A).

It also improves batched_gemm! to multi-thread the batch index (using JuliaLang/julia#36360 to save & restore the number of BLAS threads), and to allow size(A,3)==1 (batch only B,C).

@mcabbott
Copy link
Member Author

Bump. Who has merge permissions here? @CarloLucibello, @DhairyaLGandhi?

mcabbott pushed a commit to mcabbott/CUDA.jl that referenced this pull request Oct 24, 2020
@CarloLucibello
Copy link
Member

ops, didn't see this, thanks

@CarloLucibello CarloLucibello merged commit 9780c29 into FluxML:master Nov 11, 2020
@mcabbott mcabbott deleted the fix2 branch November 11, 2020 08:54
@mcabbott
Copy link
Member Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants