-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
faster sprand, new randsubseq (fixes #6714 and #6708) #6726
Conversation
Okay, should be fixed now:
|
cc: @alanedelman, our resident random-matrix expert |
I have always been confused by MATLAB's I think it is a big improvement over MATLAB's implementation that you, in addition to have the right expected number of non-zeros, are explicit about how the random sparsity structure is chosen. As I understand your explanation, your new A minor comment is that I think the label |
@andreasnoackjensen, that's right. I will update the docs. |
LGTM |
Hmm, Travis error seems unrelated:
Another one of the usual random Travis failures. |
Seems like a very positive change. It would be good to have @alanedelman's input on the new behavior. Is this behavior something that's useful? Are random matrices generated like this handy in practice? Do they have well understood statistical structure? Is there some other kind of random sparse matrix that's particularly useful? Should this be in base at all? |
There seem to be a large number of papers on sparse random matrices (google "random sparse matrix"). Many (though not all) seem to use a distribution similar to my Aside from models of physical systems, random networks etc., and compressive-sensing applications, random sparse matrices are also pretty useful for tests and benchmarks of sparse-matrix algorithms (for which application the precise distribution probably doesn't matter much, but it is useful to have a well-defined distribution in any case). I'll try to bug Alan about it tomorrow if I run into him. |
Seems to compare well to Matlab in terms of performance:
vs.
on the same machine with a single thread. |
The random matrix literature on sparse matrices may not yet be ready to be relevant One could ask for random matrices given a specific sparsity pattern, or an approximate nnz number. |
Any comments on whether this particular sparse random matrix generation is a decent default choice? |
I just spoke to Alan. Matlab's sprand has three variants: |
If we are going to have an |
Ok by me. |
Go for it.
|
This patch implements a much faster (at least 5× on my laptop, but on an older machine I tried it was only 2×)
sprand
(fixing #6708), by filling in the matrix one (nonempty) column at a time. It also changes the meaning of the specifieddensity
to something that is (I think) easier to explain: the probability of any element of the matrix being nonzero is (independently) given bydensity
.A subroutine used by both this
sprand
and the previoussprand
is a function to compute a random subsequence of a givenAbstractArray
(e.g.1:n
). Since this is generally useful functionality, it seemed sensible to export it as functionsrandsubseq(A, p)
andrandsubseq!(S, A, p)
. What these functions do is to create a random (ordered) subsequence ofA
by selecting each element ofA
with independent probabilityp
. (This is different from the problem of finding a random subset of a fixed size.)randsubseq
is implemented using a new algorithm different from the previous one (which had the same mean (modulo the bug in #6714) but a more complicated, and I think less useful, distribution). The new algorithm is somewhat faster, requires no temporary arrays, has Θ(p * length(A)
) complexity, and avoids the statistical bug of #6714; it is inspired by an algorithm by Vitter for a related problem.(As discussed in #6714, it might be nice to also have a
randsubseq(S, A)
that finds a random subsequence of fixed length== length(S)
. The algorithm inStatsBase
for this problem is currently quite suboptimal according to my benchmarks as well as theoretically. However, that may go intoStatsBase
rather thanBase
and in any case should be a separate PR.)cc: @timholy, @StefanKarpinski, @lindahua