-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance degradation when reducing SparseMatrixCSC #27836
Comments
Wow that is a lot of allocation for a simple operation. Is there an easy way to create the input sparse matrix? Probably you want to work with the transpose - but that is a different matter. |
I made this matrix from an HDF5 file downloaded from https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.3.0/1M_neurons (the "aggr - Gene / cell matrix HDF5 (filtered)" file, 3.93 GB). The following script can load the file into a
It requires ~30 GiB of RAM to load but it may be reproducible with smaller matrices. |
From the following profiling, the problem is around these lines: https://github.com/JuliaLang/julia/blob/v0.7.0-beta/stdlib/SparseArrays/src/sparsematrix.jl#L1689-L1692.
|
I didn't know but the reduction operation now returns a sparse matrix for a given sparse matrix, which causes the performance degradation in the tight loop of reduction. This change may be reasonable but the performance degradation is not acceptable to me.
|
I agree the performance is not acceptable. |
This is an unintended consequence of #23227. |
Thank you, @andreasnoack! Let me know if I need to benchmark your fix. |
It appears that even in the "before" version, things are too slow - based on the amount of allocation. This operation should literally be possible with a single scan of the data structure. |
Ideally add the benchmark to BaseBenchmarks, so it's tracked. |
* Adjust initialization in maximum and minimum Fixes #27836 * Reduce to first element in index range for OffsetArrays * Adjust mapreduce to use first element in offset range as well. Adjust tests * Avoid defining global Areduc variable in reducedim tests * Add comment in mapslices to explain the wrapping of scalars
* Adjust initialization in maximum and minimum Fixes #27836 * Reduce to first element in index range for OffsetArrays * Adjust mapreduce to use first element in offset range as well. Adjust tests * Avoid defining global Areduc variable in reducedim tests * Add comment in mapslices to explain the wrapping of scalars * Change reduced_index for Slices and add test for previously failing maximum of OffsetArray example * Add checks that reductions along dimensions are inferred
I've found an extreme performance degradation (~30x slower) on Julia 0.7.0-beta vs. on Julia 0.6.2 when reducing a large
SparseMatrixCSC
matrix usingmaximum
. I couldn't find differences in the data, so I think this is a problem inSparseArrays
.Julia 0.6.2
Julia 0.7.0-beta:
The text was updated successfully, but these errors were encountered: