-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement normalize and normalize! #13681
Conversation
Maybe it should also return the norm. E.g., when using a |
Considering that writing this function yourself takes all of one line, I would think that any specialized usages that require e.g. both the norm and the vector can just write their own implementation. |
`normalize` | ||
|
||
""" | ||
function normalize!(v::AbstractVector, p::Real=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just a one-liner normalize!(v::AbstractVector, p::Real=2) = scale!(v, inv(norm(v, p))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately the one-liner (and the equivalent two line version here) is vulnerable to overflow, since inv(1e-310) == Inf. In-place division is safe though, so I've used that in the next iteration of this PR.
We could define QR on an |
Good idea! I think we should consider that. |
""" | ||
function normalize!(v::AbstractVector, p::Real=2) | ||
nrm = norm(v, p) | ||
scale!(v, inv(nrm)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try x = [1e-310, 1e-310]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to... :/
Would it make sense to call normalize(v,p=2) = v/norm(v,p)
normalize!(v,p=2) = scale!(v,inv(norm(v,p))) |
IMHO, the fact that you need to discuss the best implementation shows that these are trickier that it seems to write correctly, and deserve being in Base. |
@nalimilan, finding the fastest implementation of anything is tricky — computers are complicated beasts. All of these algorithms are O(n), and we are just fiddling with small constant factors. I don't think that is a great argument for putting a one-liner into base, because it would apply to almost anything. The argument, as I understand it, is that people need this function so often that saving the few extra characters is worth it. Especially since (Note that neither Matlab nor NumPy seem to bother to include this function, however.) |
@Jutho no, I tried that the first time. scale!(v, inv(nrm)) is prone to overflow. For nrm less than machine epsilon, inv(nrm) is Inf. Furthermore it's only BLAS1, there's effectively no speedup calling out to BLAS. |
Could scaling with Even though BLAS doesn't matter the fact that you avoid the division speeds things up by ~ 60% |
@jiahao Really? Not that I would trust normalising a vector with such a small norm anyway, but surely the following (overly) simple test succeeds (though I trust your tests are saying otherwise): a=randn(100);
b=a*1e-200;
scale!(b,inv(norm(b)));
scale!(a,inv(norm(a)));
isapprox(a,b) # -> true |
@nalimilan It's also moderately annoying to write the library function because it has to work for the general case. Most people are probably not going to normalize a tiny vector with norm less than epsilon, but as the library writer I have to worry about the edge cases. If necessary, I will have to sacrifice speed for correctness and inevitably some users will whine about the library code being slow. |
@Jutho see the second test case I put in. @KristofferC it makes no sense to do run time algorithm selection on the value. As I said, BLAS1 is just the naive for loop underneath; you are not gaining much calling out to BLAS and furthermore have the pay the overhead of an external library call. |
Also, the naive for loop with division is a little more accurate in my testing because of fiddly details about roundoff. The first test block I put in fails using inv(norm) because the first compile is nextfloat(0.6), not 0.6. |
@jihao, Hmm, I clearly acknowledged that BLAS didn't matter but avoiding the division did:
Anyway, adding |
I also agree that BLAS will not give any significant speedup; my main concern was minimal code. The second test (the overflow) runs fine with |
@KristofferC sorry, I missed that in the first reading. If you have benchmarked both the /nrm and *inv(nrm) versions and the latter produces a loop that is 60% faster, then it might be worth paying the penalty for a branch |
These are my benchmarks comparing the two versions and # /nrm
0.513842 seconds (4 allocations: 160 bytes)
# /nrm and @inbounds --> SIMD
0.308682 seconds (4 allocations: 160 bytes)
# *inv(nrm) + scale
0.199295 seconds (4 allocations: 160 bytes) Edit: I removed the one with |
I think that the usual approach in BLAS/LAPACK is to scale the vector with some power of two if the norm is very small and then normalize by scaling with the inverse of the scaled norm. In that way, you can save a lot of divisions when the norm is really small. Some of the BLAS functions are threaded so that might give a speedup (until we have threading). |
Thanks @KristofferC, will investigate the overflow case, especially given that @Jutho cannot reproduce my second test case which I observed to overflow. |
@jiahao The values have to be smaller than
|
@KristofferC There are some "thread" related statements in https://github.com/xianyi/OpenBLAS/blob/develop/interface/scal.c so I think it is threaded. |
I can confirm a factor 2 or 3 difference between function normalize!(v)
inrm=inv(norm(v));
@simd for i in eachindex(v)
@inbounds v[i]*=inrm
end
end for a vector of size |
@andreasnoack yep, I was wrong: julia> b = copy(a); @time scale!(b, 2.0);
0.085937 seconds (4 allocations: 160 bytes)
julia> blas_set_num_threads(1)
julia> b = copy(a); @time scale!(b, 2.0);
0.128238 seconds (4 allocations: 160 bytes) Using a julia loop with |
@Jutho make sure you don't run the benchmark on already normalized arrays. OpenBLAS has: if (alpha == ONE) return; |
You're correct; I should take more time before posting something. My apologies. |
@andreasnoack @Jutho Ah I see, looks like I changed the test from 1e-310 to 1e-300 last night in updating? Will fix. @KristofferC I was able to reproduce the speedup using |
Maybe use |
Sure. It looks like |
For me the code got vectorized even without the macro. It doesn't for you? Did u check with |
No I got a significant time difference in the |
Simple helper functions for normalizing vectors Closes #12047
Compute polar decomposition of vector `qr[!]` is equivalent to `v->(normalize[!](v), norm(v))` but is convenient for reusing the norm calculation, e.g. for Krylov subspace algorithms. Also refactors the in place normalization code to a separate routine, `__normalize!()`, so that it can be reused by `qr!`
Implement normalize and normalize!
Simple helper functions for normalizing vectors
Closes JuliaLang/LinearAlgebra.jl#226