Implement normalize and normalize! #13681

jiahao · 2015-10-20T02:49:01Z

Simple helper functions for normalizing vectors

andreasnoack · 2015-10-20T02:53:14Z

Maybe it should also return the norm. E.g., when using a normalize function in a Lanczos algorithm you want both the norm and the unit vector, but you don't want to compute the norm twice.

stevengj · 2015-10-20T03:19:34Z

Considering that writing this function yourself takes all of one line, I would think that any specialized usages that require e.g. both the norm and the vector can just write their own implementation.

stevengj · 2015-10-20T03:20:33Z

base/linalg/generic.jl

+`normalize`
+
+"""
+function normalize!(v::AbstractVector, p::Real=2)


Why not just a one-liner normalize!(v::AbstractVector, p::Real=2) = scale!(v, inv(norm(v, p))?

Unfortunately the one-liner (and the equivalent two line version here) is vulnerable to overflow, since inv(1e-310) == Inf. In-place division is safe though, so I've used that in the next iteration of this PR.

jiahao · 2015-10-20T03:24:58Z

Maybe it should also return the norm. E.g., when using a normalize function in a Lanczos algorithm you want both the norm and the unit vector, but you don't want to compute the norm twice.

We could define QR on an AbstractVector to do exactly this.

andreasnoack · 2015-10-20T03:29:55Z

We could define QR on an AbstractVector to do exactly this.

Good idea! I think we should consider that.

andreasnoack · 2015-10-20T03:30:55Z

base/linalg/generic.jl

+"""
+function normalize!(v::AbstractVector, p::Real=2)
+    nrm = norm(v, p)
+    scale!(v, inv(nrm))


Try x = [1e-310, 1e-310].

I don't want to... :/

Jutho · 2015-10-20T06:15:07Z

Would it make sense to call scale! instead of writing the for loop for the in-place normalization. Then it can dispatch to BLAS for BLAS vectors. I'd say these are really one-liners.

normalize(v,p=2) = v/norm(v,p)
normalize!(v,p=2) = scale!(v,inv(norm(v,p)))

nalimilan · 2015-10-20T08:23:10Z

IMHO, the fact that you need to discuss the best implementation shows that these are trickier that it seems to write correctly, and deserve being in Base.

stevengj · 2015-10-20T12:24:05Z

@nalimilan, finding the fastest implementation of anything is tricky — computers are complicated beasts. All of these algorithms are O(n), and we are just fiddling with small constant factors. I don't think that is a great argument for putting a one-liner into base, because it would apply to almost anything.

The argument, as I understand it, is that people need this function so often that saving the few extra characters is worth it. Especially since v appears twice in the one-liner, having a function saves you from having to declare a temporary variable in the case where v is computed on the fly.

(Note that neither Matlab nor NumPy seem to bother to include this function, however.)

jiahao · 2015-10-20T12:47:08Z

@Jutho no, I tried that the first time. scale!(v, inv(nrm)) is prone to overflow. For nrm less than machine epsilon, inv(nrm) is Inf. Furthermore it's only BLAS1, there's effectively no speedup calling out to BLAS.

KristofferC · 2015-10-20T12:52:04Z

Could scaling with inv(norm(v,p) be done only in those cases the norm is not too small (for some definition of "too small").

Even though BLAS doesn't matter the fact that you avoid the division speeds things up by ~ 60%

Jutho · 2015-10-20T12:53:28Z

@jiahao Really? Not that I would trust normalising a vector with such a small norm anyway, but surely the following (overly) simple test succeeds (though I trust your tests are saying otherwise):

a=randn(100);
b=a*1e-200;
scale!(b,inv(norm(b)));
scale!(a,inv(norm(a)));
isapprox(a,b) # -> true

jiahao · 2015-10-20T12:55:01Z

@nalimilan It's also moderately annoying to write the library function because it has to work for the general case. Most people are probably not going to normalize a tiny vector with norm less than epsilon, but as the library writer I have to worry about the edge cases. If necessary, I will have to sacrifice speed for correctness and inevitably some users will whine about the library code being slow.

jiahao · 2015-10-20T12:59:08Z

@Jutho see the second test case I put in.

@KristofferC it makes no sense to do run time algorithm selection on the value. As I said, BLAS1 is just the naive for loop underneath; you are not gaining much calling out to BLAS and furthermore have the pay the overhead of an external library call.

jiahao · 2015-10-20T13:03:42Z

Also, the naive for loop with division is a little more accurate in my testing because of fiddly details about roundoff. The first test block I put in fails using inv(norm) because the first compile is nextfloat(0.6), not 0.6.

KristofferC · 2015-10-20T13:09:32Z

@jihao, Hmm, I clearly acknowledged that BLAS didn't matter but avoiding the division did:

Even though BLAS doesn't matter the fact that you avoid the division speeds things up by ~ 60%

Anyway, adding @inbounds makes LLVM vectorize the code to use SIMD instructions which gives quite significant performance increase (~60%). This also makes the division and multiplication method close to each other in performance.

Jutho · 2015-10-20T13:14:38Z

I also agree that BLAS will not give any significant speedup; my main concern was minimal code.

The second test (the overflow) runs fine with normalize!(v)=scale!(v,inv(norm(v))) on my machine, but you are correct about the round off error in the value 0.6 the first test. Then again, it's not like Julia Base is free of round off error cause otherwise it should also not use BLAS matrix multiplication. Anyway, I don't have strong feelings about this, just wanted to contribute to keeping the code simple.

jiahao · 2015-10-20T13:23:49Z

@KristofferC sorry, I missed that in the first reading. If you have benchmarked both the /nrm and *inv(nrm) versions and the latter produces a loop that is 60% faster, then it might be worth paying the penalty for a branch

KristofferC · 2015-10-20T13:26:17Z

These are my benchmarks comparing the two versions and @inbounds or not.

# /nrm
  0.513842 seconds (4 allocations: 160 bytes)

# /nrm and @inbounds --> SIMD
  0.308682 seconds (4 allocations: 160 bytes)

# *inv(nrm) + scale
  0.199295 seconds (4 allocations: 160 bytes)

Edit: I removed the one with @inbounds and *inv(nrm) because I should have just called scale! like Jutho said, and I didn't before (I used a Julia loop).

andreasnoack · 2015-10-20T13:27:16Z

I think that the usual approach in BLAS/LAPACK is to scale the vector with some power of two if the norm is very small and then normalize by scaling with the inverse of the scaled norm. In that way, you can save a lot of divisions when the norm is really small.

Some of the BLAS functions are threaded so that might give a speedup (until we have threading).

jiahao · 2015-10-20T13:29:29Z

Thanks @KristofferC, will investigate the overflow case, especially given that @Jutho cannot reproduce my second test case which I observed to overflow.

andreasnoack · 2015-10-20T13:32:11Z

@jiahao The values have to be smaller than

julia> inv(prevfloat(Inf))
5.562684646268003e-309

andreasnoack · 2015-10-20T13:42:49Z

@KristofferC There are some "thread" related statements in https://github.com/xianyi/OpenBLAS/blob/develop/interface/scal.c so I think it is threaded.

Jutho · 2015-10-20T13:50:02Z

I can confirm a factor 2 or 3 difference between scale!(v,inv(norm(v))) and

function normalize!(v)
  inrm=inv(norm(v));
  @simd for i in eachindex(v)
    @inbounds v[i]*=inrm
  end
end

for a vector of size 1048576 * 10 (in favor of the first version).

KristofferC · 2015-10-20T13:51:36Z

@andreasnoack yep, I was wrong:

julia> b = copy(a); @time scale!(b, 2.0);
  0.085937 seconds (4 allocations: 160 bytes)

julia> blas_set_num_threads(1)

julia> b = copy(a); @time scale!(b, 2.0);
  0.128238 seconds (4 allocations: 160 bytes)

Using a julia loop with @inbounds has the same performance as single threaded BLAS,

KristofferC · 2015-10-20T14:21:02Z

@Jutho make sure you don't run the benchmark on already normalized arrays. OpenBLAS has:

if (alpha == ONE) return;

Jutho · 2015-10-20T14:22:54Z

You're correct; I should take more time before posting something. My apologies.

jiahao · 2015-10-20T14:49:18Z

@andreasnoack @Jutho Ah I see, looks like I changed the test from 1e-310 to 1e-300 last night in updating? Will fix.

@KristofferC I was able to reproduce the speedup using *inv(nrm), @inbounds and @simd. I will make this the default loop, falling back to division when nrm <= inv(prevfloat(typemax(float(nrm)))). (It looks like @simd doesn't do anything for divisions, but the corner case isn't worth optimizing anyway.)

KristofferC · 2015-10-20T14:55:38Z

Maybe use scale! instead of the explicit for loop to get the BLAS boost for pertinent types? The fallback will be a generic scale which already does the inbounds macro.

jiahao · 2015-10-20T15:14:57Z

Sure. It looks like generic_scale! does not apply @simd though, so I'll add that in so that types that are simple immutable wrappers around machine floats will benefit.

KristofferC · 2015-10-20T15:17:00Z

For me the code got vectorized even without the macro. It doesn't for you? Did u check with @code_llvm? I think that is why it seems like the macro doesn't do anything.

jiahao · 2015-10-20T15:20:23Z

No I got a significant time difference in the *inv(nrm) case even though vectorized instructions are emitted with and without @simd. I don't think simply checking the code for emitted vectorized intrinsics is sufficient.

Ref: #13686 (comment) #13681 (comment)

Simple helper functions for normalizing vectors Closes #12047

Ref: #13686 (comment) #13681 (comment)

Compute polar decomposition of vector `qr[!]` is equivalent to `v->(normalize[!](v), norm(v))` but is convenient for reusing the norm calculation, e.g. for Krylov subspace algorithms. Also refactors the in place normalization code to a separate routine, `__normalize!()`, so that it can be reused by `qr!`

Implement normalize and normalize!

Ref: JuliaLang#13686 (comment) JuliaLang#13681 (comment)

…+8353)

stevengj reviewed Oct 20, 2015
View reviewed changes

andreasnoack reviewed Oct 20, 2015
View reviewed changes

jiahao force-pushed the cjh/normalize branch from 523dcc1 to 9a5bc2e Compare October 20, 2015 04:22

jiahao added the linear algebra Linear algebra label Oct 20, 2015

jiahao mentioned this pull request Oct 20, 2015

SIMD performance regression tests #13686

Closed

jiahao force-pushed the cjh/normalize branch from 9a5bc2e to e8b2840 Compare October 20, 2015 15:36

jiahao added a commit that referenced this pull request Oct 20, 2015

Add simd annotation to generic_scale!

c4c6bb6

Ref: #13686 (comment) #13681 (comment)

jiahao force-pushed the cjh/normalize branch from e8b2840 to d056fa8 Compare October 20, 2015 18:21

jiahao added 2 commits October 21, 2015 17:36

Implement normalize and normalize!

e08d979

Simple helper functions for normalizing vectors Closes #12047

Add simd annotation to generic_scale!

db90ac0

Ref: #13686 (comment) #13681 (comment)

jiahao force-pushed the cjh/normalize branch from f9a2ceb to 9c02781 Compare October 21, 2015 22:05

jiahao added 2 commits October 21, 2015 22:22

Update NEWS

5df3ebd

jiahao force-pushed the cjh/normalize branch from 91bcccd to 5df3ebd Compare October 22, 2015 02:40

jiahao added a commit that referenced this pull request Oct 23, 2015

Merge pull request #13681 from JuliaLang/cjh/normalize

e0a8985

Implement normalize and normalize!

jiahao merged commit e0a8985 into master Oct 23, 2015

jiahao deleted the cjh/normalize branch October 23, 2015 13:36

bjarthur pushed a commit to bjarthur/julia that referenced this pull request Oct 27, 2015

Add simd annotation to generic_scale!

261460b

Ref: JuliaLang#13686 (comment) JuliaLang#13681 (comment)

stevengj added a commit to stevengj/PETSc.jl that referenced this pull request Nov 17, 2015

normalize! was only defined in JuliaLang/julia#13681 (Julia 0.5.0-dev…

da2a5e5

…+8353)

stevengj added a commit to stevengj/PETSc.jl that referenced this pull request Nov 17, 2015

normalize! was only defined in JuliaLang/julia#13681 (Julia 0.5.0-dev…

159baea

…+8353)

stevengj added a commit to stevengj/PETSc.jl that referenced this pull request Nov 17, 2015

normalize! was only defined in JuliaLang/julia#13681 (Julia 0.5.0-dev…

418cce8

…+8353)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement normalize and normalize! #13681

Implement normalize and normalize! #13681

jiahao commented Oct 20, 2015

andreasnoack commented Oct 20, 2015

stevengj commented Oct 20, 2015

stevengj Oct 20, 2015

jiahao Oct 20, 2015

jiahao commented Oct 20, 2015

andreasnoack commented Oct 20, 2015

andreasnoack Oct 20, 2015

jiahao Oct 20, 2015

Jutho commented Oct 20, 2015

nalimilan commented Oct 20, 2015

stevengj commented Oct 20, 2015

jiahao commented Oct 20, 2015

KristofferC commented Oct 20, 2015

Jutho commented Oct 20, 2015

jiahao commented Oct 20, 2015

jiahao commented Oct 20, 2015

jiahao commented Oct 20, 2015

KristofferC commented Oct 20, 2015

Jutho commented Oct 20, 2015

jiahao commented Oct 20, 2015

KristofferC commented Oct 20, 2015

andreasnoack commented Oct 20, 2015

jiahao commented Oct 20, 2015

andreasnoack commented Oct 20, 2015

andreasnoack commented Oct 20, 2015

Jutho commented Oct 20, 2015

KristofferC commented Oct 20, 2015

KristofferC commented Oct 20, 2015

Jutho commented Oct 20, 2015

jiahao commented Oct 20, 2015

KristofferC commented Oct 20, 2015

jiahao commented Oct 20, 2015

KristofferC commented Oct 20, 2015

jiahao commented Oct 20, 2015

Implement normalize and normalize! #13681

Implement normalize and normalize! #13681

Conversation

jiahao commented Oct 20, 2015

andreasnoack commented Oct 20, 2015

stevengj commented Oct 20, 2015

stevengj Oct 20, 2015

Choose a reason for hiding this comment

jiahao Oct 20, 2015

Choose a reason for hiding this comment

jiahao commented Oct 20, 2015

andreasnoack commented Oct 20, 2015

andreasnoack Oct 20, 2015

Choose a reason for hiding this comment

jiahao Oct 20, 2015

Choose a reason for hiding this comment

Jutho commented Oct 20, 2015

nalimilan commented Oct 20, 2015

stevengj commented Oct 20, 2015

jiahao commented Oct 20, 2015

KristofferC commented Oct 20, 2015

Jutho commented Oct 20, 2015

jiahao commented Oct 20, 2015

jiahao commented Oct 20, 2015

jiahao commented Oct 20, 2015

KristofferC commented Oct 20, 2015

Jutho commented Oct 20, 2015

jiahao commented Oct 20, 2015

KristofferC commented Oct 20, 2015

andreasnoack commented Oct 20, 2015

jiahao commented Oct 20, 2015

andreasnoack commented Oct 20, 2015

andreasnoack commented Oct 20, 2015

Jutho commented Oct 20, 2015

KristofferC commented Oct 20, 2015

KristofferC commented Oct 20, 2015

Jutho commented Oct 20, 2015

jiahao commented Oct 20, 2015

KristofferC commented Oct 20, 2015

jiahao commented Oct 20, 2015

KristofferC commented Oct 20, 2015

jiahao commented Oct 20, 2015