Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Improve primes performance. #11927

Closed
wants to merge 1 commit into from
Closed

WIP: Improve primes performance. #11927

wants to merge 1 commit into from

Conversation

Ismael-VC
Copy link
Contributor

There was a lot of interest to improve the current Base.primes function in #11594, but no PRs, I hope this gets the ball rolling even more.

References:

More complete test using JuliaBox and Julia version 0.4.0-dev+5491:

@Ismael-VC Ismael-VC changed the title Add sieve of atkin function. WIP: Add sieve of Atkin function. Jun 29, 2015
@Ismael-VC
Copy link
Contributor Author

I didn't intend to replace Base.primes in this PR because I don't understand how it works and the usefulness of Base.primesmask, which my implementation doesn't use.

Also Base.primes has this method:

julia> methods(primes)
# 1 method for generic function "primes":
primes(n::Union{AbstractArray{Bool,1},Integer}) at primes.jl:33

And I don't know how to support n::Union{AbstractArray{Bool,1},Integer} for atkin.

@StefanKarpinski
Copy link
Member

The implementation of primes already is an implementation of this sieve. Also, the interface shouldn't really care about what algorithm is used – it should just be a fast way to get a bunch of primes.

@pabloferz
Copy link
Contributor

I tested you version and I see no performance difference between primes from Base and your implementation, actually your version is more or less the same as the function primes from base even when they might look different.

I was reading that the algorithm from https://en.wikipedia.org/wiki/Sieve_of_Atkin#Pseudocode , should perform better that the one in base or yours for that matter. Actually, from the discussion in http://stackoverflow.com/questions/19388106/the-sieve-of-atkin/20572948#20572948 it seems that the following algorithm should be better if you want to use a Sieve of Atkin.

// arbitrary search limit
limit ← 1000000000

// the set of base wheel primes
r ∈ {23,29,31,37,41,43,47,53, 59,61,67,71,73,79,83, //positions + 19
     89,97,101,103,107,109,113,121,127, 131,137,139,143,149,151,157,163,
     167,169,173,179,181,187,191,193, 197,199,209,211,221,223,227,229}

// an array of length 11 times 13 times 17 times 19 = 46189 wheels initialized
// so that it doesn't contain multiples of the large wheel primes
for n where n ← 210 × w + x where w ∈ {0,...46189}, x in r: // already
    if (n mod cp) not equal to 0 where cp ∈ {11,13,17,19}: // no 2,3,5,7
        mstr(n) ← true else mstr(n) ← false                // factors

// Initialize the sieve as an array of the smaller wheels with
// enough wheels to include the representation for limit
for n where n ← 210 × w + x, w ∈ {0,...(limit - 19) ÷ 210}, x in r:
    sieve(n) ← mstr(n mod (210 × 46189))    // init pre-culled primes.

// Eliminate composites by sieving, only for those occurrences on the
// wheel using wheel factorization version of the Sieve of Eratosthenes
for n² ≤ limit when n ← 210 × k + x where k ∈ {0..}, x in r
    if sieve(n):
        // n is prime, cull its multiples
        s ← n² - n × (x - 23)     // zero'th modulo cull start position
        while c0 ≤ limit when c0 ← s + n × m where m in r:
            c0d ← (c0 - 23) ÷ 210, cm ← (c0 - 23) mod 210 + 23 //means cm in r
            while c ≤ limit for c ← 210 × (c0d + n × j) + cm
                    where j ∈ {0,...}:
                sieve(c) ← false       // cull composites of this prime

output 2, 3, 5, 7, 11, 13, 17, 19,
for n ≤ limit when n ← 210 × k + x where k ∈ {0..}, x in r:
    if sieve(n): output n

It also seems from the last link that an optimized Sieve of Eratosthenes whit a higher degree of wheel factorization might perform even better.

@pabloferz
Copy link
Contributor

Although, the asymptotic behaviour of your algorithm or the one in base should be better as indicated in Atkin's paper (http://www.ams.org/mcom/2004-73-246/S0025-5718-03-01501-1/S0025-5718-03-01501-1.pdf)

Here http://stackoverflow.com/questions/19388106/the-sieve-of-atkin/20572948#20572948 it is claimed that this Sieve of Atkin is probably better for very high values of the number of primes.

@Ismael-VC
Copy link
Contributor Author

I don't know why I'm getting a BoundsError non deterministically.

This CI test (and others) failed:

ERROR: LoadError: LoadError: test error in expression: Base.atkin(10000) == Base.primes(10000) == [
2,3,5,7,11,13,17,19,23, ...... ,9901,9907,9923,9929,9931,9941,9949,9967,9973
]
BoundsError: attempt to access 10000-element BitArray{1}:
 false
 false
 false
 false
  true
 false
  true
 false
 false
 false
     ⋮
 false
 false
 false
 false
 false
 false
 false
 false
 false
  at index [10003]

But this one passed.

At first I thought it was because of the use of @inbounds, so I commented it, but that doesn't seems to be the problem, I also tested before as shown in the reference links and never got this issue.

@pabloferz
Copy link
Contributor

You are testing if n <= n, which is always true, instead of n <= limit

@Ismael-VC Ismael-VC changed the title WIP: Add sieve of Atkin function. WIP: Improve primes performance. Jun 30, 2015
primes = Int[]

elseif limit == 2
primes = [2]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limit < 0 && error("limit can't be negative (found $(limit))")
limit < 2 && return primes = Int[]
limit == 2 && return primes = [2]

@Ismael-VC
Copy link
Contributor Author

@StefanKarpinski thanks got it!

@pabloferz thanks! I've updated the PR.

How did you tested?, atkin has been faster everywhere I've tested so far. I've added a temporary test to see if it's indeed faster than primes and so worth profiling, etc. Is this not the way to test this? Should I add something else?

I could also try to implement the wheel and fragmented versions and also a sieve of Eratosthenes and we could pick from both: primes(limit, Eratosthenes::Algorithm) and primes(limit, Atkin::Algotirhtm) or something like that, would that be ok?

I'm not much of a math guy, do you know what does mstr(n) means?

@Ismael-VC
Copy link
Contributor Author

@StefanKarpinski how does BitVector works? Why are they equal to falses of the same size only bellow n ≤ 65 but not above that?

julia> for n in 1:100
           @show n
           @show BitVector(n) == falses(n)
           println()
       end
...
n = 64
BitVector(n) == falses(n) = true

n = 65
BitVector(n) == falses(n) = false
...

@quinnj
Copy link
Member

quinnj commented Jun 30, 2015

Using the BitVector(n) constructor is the same as using the raw Array(T, n) constructors, which grab chunks of memory directly. falses is grabbing a chunk of memory, but zeroing it out. If you need to ensure the memory is initialized to falses or trues, use those constructors (falses and trues). See #9147

@ScottPJones
Copy link
Contributor

Does julia have any "compressed" bit vector format? (Ie. that can handle sparse arrays of bits efficiently?)

@pabloferz
Copy link
Contributor

@Ismael-VC I checked again and can confirm the performance improvement. However, you should really improve the primesmask function instead of adding a new one which does the same thing (@StefanKarpinski already mentioned this). If you look at your implementation and the one in primesmask you should notice that all the work done is almost the same.

I found that the performance hit in the current implementation comes from this lines

i, j, k = 4xx+yy, 3xx+yy, 3xx-yy
i <= n && (s[i] $= (i%12==1)|(i%12==5))
j <= n && (s[j] $= (j%12==7))
1 <= k <= n && (s[k] $= (x>y)&(k%12==11))

By changing this to something similar to what you have, let say

(k = 4xx+yy)  n && (k % 12 == 1 || k % 12 == 5) && (s[k] = !s[k])
(k -=  xx  )  n && (k % 12 == 7               ) && (s[k] = !s[k])
(k -= 2yy); 1  k  n && x > y && (k % 12 == 11) && (s[k] = !s[k])

you get the same performance as in your atkin implementation.

k = 3x² - y²
if x > y && k ≤ n && k % 12 == 11
@inbounds s[k] = !s[k]
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the same variable instead of the three i, j and k. Also, you can move the @inbounds to the beginning of the loop to have

@inbounds for x = 1:r; x² = x*x
    # Rest of the code
end

instead putting one every time you set an index

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it makes no difference in the expanded expression and I have seen both styles being used, that's why at the start I used:

function foo()
    @inbounds begin
        # ...
    end
end

But something felt wrong about it. I think it could hide bugs.

I dream every night that I have the official™ julian style guide in my hands ...but then I wake up! 😟

@IainNZ IainNZ added the performance Must go faster label Jun 30, 2015
@Ismael-VC
Copy link
Contributor Author

@quinnj thanks! I didn't know about that.

@pabloferz yes I know, but I wanted to make sure that it was indeed better even if it was for a bit. I wil try to implement the wheel next, but what does mstr means in mstr(n) ← true else mstr(n) ← false?

@all I know you guys love the short && and || syntax instead of using if ... elseif ... else ... end, but I would like everyone to think on readability (and people who are not expert programmers).

Now that the documentation is being redesigned, is there a way we could produce a low level (just for devs, for both exported and non exported objects) API documentation?, or at least comment with a mnemonic name all the short variables, ie. s # sieve and so on?

for x = 1:r; x² = x*x
    for y = 1:r; y² = y*y
        i = 4+if i  n && (i % 12 == 1 || i % 12 == 5)
            sieve[i] = !sieve[i]
        end
        j = 3+if j  n && j % 12 == 7
            sieve[j] = !sieve[j]
        end
        k = 3-if x > y && k  n && k % 12 == 11
            sieve[k] = !sieve[k]
        end
    end
end

Is sieve really too much for a variable name? I've found lots of one to tree character variables names all over Julia, C and Scheme code. It doesn't help that the algorithm is concise, at least for me.

IMHO the last snippet is better than:

for x = 1:r
    xx = x*x
    for y = 1:r
        yy = y*y
        i, j, k = 4xx+yy, 3xx+yy, 3xx-yy
        i <= n && (s[i] $= (i%12==1)|(i%12==5))
        j <= n && (s[j] $= (j%12==7))
        1 <= k <= n && (s[k] $= (x>y)&(k%12==11))
    end
end

What does r stands for?

And certainly better than:

(k = 4xx+yy)  n && (k % 12 == 1 || k % 12 == 5) && (s[k] = !s[k])
(k -=  xx  )  n && (k % 12 == 7               ) && (s[k] = !s[k])
(k -= 2yy); 1  k  n && x > y && (k % 12 == 11) && (s[k] = !s[k])

Even if it weren't faster!

Please consider this 🙏, thanks!

@ScottPJones
Copy link
Contributor

I'm not sure why there are so many 1 character names, I find them very difficult to deal with, both visually and trying to find things with the editor. Except for i,j,k in for loops, I believe in having a minimum of 3 characters, and hopefully not a substring of something else in the function.

@ScottPJones
Copy link
Contributor

I'm not sure I like though the , because to me, that looks like you are doing the squaring there,
but really is a variable, set up above. For a non-mathematician, I'd have said use something like x_square.

@StefanKarpinski
Copy link
Member

It's hard to meaningfully name numbers.

@ScottPJones
Copy link
Contributor

Yes, that's true, but I still think avoiding visual confusion can be useful.

@Ismael-VC
Copy link
Contributor Author

@ScottPJones I'm trying to test this in an objective way; by showing the code to my classmates (Java is all they have seen mostly), and also to the students at other careers in order to have their non technical opinions.

We all agree that since it's year 2015, it's ok to use unicode if we can, I would understand if the x² = x*x and y² = y*y where many lines above and out of context, and also every julian should eventually know is a variable not the expression y^2.

Isn't this why we have unicode for identifiers? IMHO 4xx+yy looks so 70s, while 4x² + y² looks just right. Isn't this something we've wanted even before Lisp and Fortran?

I want Julia to go forward into the future in every aspect not backwards! (that's my greed) The old languages should be the ones trying to catch up (I don't care about them) and new ones too ie, Swift.

I think that the prevalence of short identifiers is in part because of this, some like: 4xx+yy and some others like: 4*x_square + y_square ( is just a x\^2<TAB> away). I believe virtue is in the middle so: 4x² + y². And this is just to avoid computing it twice, else I would've used 4x^2 + y^2.

You seem to think a lot about the dogmas, status quo and paradigms of the current and past generations of programmers, why not think in the future ones also?

There is no spoon! ✨

@StefanKarpinski
Copy link
Member

For what it's worth, I'm fine with the thing – this kind of naming is used fairly extensively elsewhere.

@StefanKarpinski
Copy link
Member

In particular, you can infer from just the local code that it is a variable name, not an operation, since it gets assigned to (although technically, it could be a method definition, but then using it without calling it would be pretty weird).

@ScottPJones
Copy link
Contributor

@Ismael-VC I'm not against the use of Unicode in julia, it was simply that to me, looked like
julia had defined ² as being a square operator. That's all. I'm actually surprised that it hadn't been
defined as a square operator. I noticed that the square root symbol is also allowed as an initial in a variable name, so you can do ✓5=2 and drive people crazy later in the program.

I do like to think about future paradigms, that's why I'm so into julia now. I think it really has quite a lot going for it.

About inferring from the local code, yes, but in practice, that all depends on how many pages of local code you have (and the short names make it hard to find all the places a particular variable is used)
What happens when a few pages after the initial assignment, somebody updates x, not realizing that they also need to update ?

@Ismael-VC
Copy link
Contributor Author

@ScottPJones, I used it because that lives within a well delimited scope and one that certainly fits in less than one page, I wouldn't have used it otherwise. I hate Java, VB.NET, etc styles in which you declare a variable before use at the beginning of the file without regard to context (and it's even considered good practice! 😟 ).

Perhaps in this cases a comment would be nice?

# x² is a variable NOT an expr.

@Ismael-VC
Copy link
Contributor Author

I like the spirit of the Quorum Language, The world's first evidence-oriented programming language:

Bringing randomized controlled trials to human factors decisions in programming language design. You know ... science.

@ScottPJones
Copy link
Contributor

Yes, in that example, it's small enough, I'm not sure we'd need to have comments all over if this is going to be standard julia practice, but I do think that square and square root not being operators, but rather parts of variable names, should at the very least be well documented, especially in the "Noteworthy differences" sections. Once you learn that they are part of identifiers and not operators, you can survive... (I still wonder why somebody didn't make them operators, as that seems to be more the usually julian practice)
Also, @Ismael-VC, that's very nasty to drop a new interesting language on me 😀, how will I ever get my real work done???

@pabloferz
Copy link
Contributor

@Ismael-VC I believe that the mstr(n) in the pseudo code is an array of length 46189. It is not really clear.

@ScottPJones
Copy link
Contributor

I just looked the square and square root symbols up, and I think it may be a mistake that they are allowed to be used for general variable names.
They are marked as "Mathematical Operator" in the Unicode standard.
It seems strange that an "Operator" can be part of a name.

@jiahao
Copy link
Member

jiahao commented Jul 1, 2015

@ScottPJones is not allowed in an identifier name.

julia> 4
2.0

julia> p = 3
ERROR: syntax: extra token "" after end of expression

Names like and the like are incredibly useful for caching powers of x. A classic example is in computing the Lennard-Jones 12-6 potential energy in computational chemistry and physics: to compute A/x^12 - B/x^6, one can write:

function lj(x, A, B)
    x⁻² = x^-2
    x⁻⁶ = (x⁻²)^3
    x⁻¹² = (x⁻⁶)^2
    A*x⁻¹² - B*x⁻⁶
end

to ensure that intermediate products are reused optimally in time-critical code. Computations like these are often very difficult for compilers to optimize to the level that humans can.

@jiahao
Copy link
Member

jiahao commented Jul 1, 2015

Trying to use at the start of a variable name doesn't throw an error, but instead leads to a new method definition for sqrt:

julia> a = 4
sqrt (generic function with 13 methods)

@Ismael-VC
Copy link
Contributor Author

@jiahao that's a good one, nice!

sieve[n] = !sieve[n]
end
n = 3x² - y²
if x > y && n ≤ limit && n % 12 == 11
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test if this line gives a BoundsError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pabloferz the test passed as is, only build 1.0.6386 failed but it's not related I think ...putting the @inbounds back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I wasn't really paying attention to the x > y condition in there (which guaranties n is not negative). So the index will be always in bounds given the conditionals. In the case of the previous implementation is was really necessary.

@ScottPJones
Copy link
Contributor

@jiahao I got confused because both √x and are both accepted, in quite different (and inconsistent) ways.
I really don't think that should be allowed to be an identifier, instead, should be equivalent to square(x), which I think would make more sense and looks more mathematical and julian than x^2.

@ScottPJones
Copy link
Contributor

@jiahao The names used have nothing at all to do with whether you cache something or not.
People have been caching values in other languages like Fortran forever.
The problem with those names that look like calculations, is the likelihood that somebody might update x, but forget to update , leading to incredibly hard to find bugs.
It is a software engineering issue, not to allow such confusable syntax.

@pabloferz
Copy link
Contributor

LGTM now!

One of the AppVeyor tests timed out. But I think this is OK otherwise.

throw(ArgumentError("requested number of primes must be ≤ $(typemax(Int)), got $limit"))

"Returns a collection of the prime numbers ≤ `limit`."
primes(limit::Union{Integer,AbstractVector{Bool}}) = find(primesmask(limit))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the function accepts either an Integer (a limit) or an array of Bools to mask so I think maybe n is better than limit here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pabloferz I we're going to export primesmask I think we wouldn't need the method:

  • primes(limit::Union{Integer,AbstractVector{Bool}})

Since there is already: primesmask(sieve::AbstractVector{Bool}). It could end up like this:

  • primes(limit::Integer)
  • primesmask(sieve::AbstractVector{Bool})

Which makes the interface better IMHO and easier to document. So that the people that find primesmask(sieve::AbstractVector{Bool}) useful, can explicitly use it instead. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also imply removing:

  • primesmask(limit::Int) = primesmask(falses(limit))

And:

primesmask(limit::Integer) = limit <= typemax(Int) ? primesmask(Int(limit)) :
    throw(ArgumentError("requested number of primes must be ≤ $(typemax(Int)), got $limit"))

In order to move that logic into primes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with those changes. But I guess we need someone else to agree (@StefanKarpinski for example).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this:

"Returns a collection of the prime numbers ≤ `limit`."
primes(limit::Integer) = limit <= typemax(Int) ? find(primesmask(falses(limit))) :
    throw(ArgumentError("requested number of primes must be ≤ $(typemax(Int)), got $limit"))

@Ismael-VC
Copy link
Contributor Author

@pabloferz If you or anyone else have some time to look at my attempt to understand the pseudocode you posted:

And give me a hint I would be very grateful! Obviously I'm lost.

If I run it, it gets stuck at the fourth phase. 😟

@pabloferz
Copy link
Contributor

@Ismael-VC I guess can check http://stackoverflow.com/questions/19388106/the-sieve-of-atkin/20572948#20572948 to get more insight on the algorithm. But in the long run I guess it is better to implement an optimized Sieve of Eratosthenes instead (I'm trying to do that).

@Ismael-VC
Copy link
Contributor Author

@pabloferz the pseudocode you posted is for an Eratosthenes sieve not an Atkin one if I understand correctly:

// Eliminate composites by sieving, only for those occurrences on the
// wheel using wheel factorization version of the Sieve of Eratosthenes

I also understand that Atkin's sieve is faster for larger primes Int64 specially, If we en up doing a segmented sieve, wouldn't it make sense to use an Atkin one for the upper bounds and two Eratosthenes ones for the medium and lower bounds (we already cache primes up to 2^16 limit): http://git.io/vqebv

@Ismael-VC
Copy link
Contributor Author

Is primesieve BSD 2-Clause License not compatible with MIT License? I'm reading about it but I'm not sure if I understand quite right:

These links may serve as a better reference:

@Ismael-VC
Copy link
Contributor Author

I wish GitHub could let me edit several files from the online interface and commit them all at once instead of forcing me to commit one by one. Is there a way to cancel unwanted CI tests from my side? 😟

@yuyichao
Copy link
Contributor

yuyichao commented Jul 2, 2015

Is there a way to cancel unwanted CI tests from my side?

This is the way to do it.

I cancelled one of them and apparently someone else did the other three at the same time.

@Ismael-VC
Copy link
Contributor Author

@yuyichao thanks!

@Ismael-VC
Copy link
Contributor Author

This LGTM, AppVeyor build seems to have timed out.

@ironman353
Copy link

@Ismael-VC, can you check my code for prime generation in the thread #11594? It is very fast. Here are few results:
Pi(10^7) = 664579 in 0.108 seconds
Pi(10^8) = 5761455 in 0.329 seconds
Pi(10^9) = 50847534 in 2.726 seconds
Pi(10^10) = 455052511 in 29.697 seconds
Pi(10^11) = 4118054813 in 387.659 seconds
Pi(10^12) = 37607912018 in 5783.603 seconds

@Ismael-VC
Copy link
Contributor Author

@ironman353 yes I will! Is this the one you ported form Java that you where mentioning? I'll test each and every implementation in #11594 and try to implement something that reflects the advantages of all this codes in hope it's faster than primesieve.

@aiorla @danaj @VicDrastik @pabloferz @ironman353

By the way which is a good point of comparison? I'm using primesieve (implemented in C++) currently to compare them. It seems to be the most recently updated, and also the one that has better documentation. It also claims to be faster than primegen and yafu.

@danaj
Copy link

danaj commented Jul 5, 2015

@Ismael-VC I'd be quite surprised if you passed primesieve. If you're 1/2 to 1/4 its speed you're doing very well. This is generating and counting primes -- of course there are much faster ways to just count. For sieving, it's the fastest out there (yafu beats it by a few percent in some ranges).

You'll want a segmented sieve for controlling memory and for performance if you want to compare times for large sizes. For reference, primesieve running single threaded:

0.12s 10^9
1.66s 10^10
23.5s 10^11
4m 56s 10^12
58m 18s 10^13

These use very little memory. This is in C rather than Julia, but that gives you some idea of the times. yafu is relatively similar in time. Perl/ntheory is about 1/2 the speed. primegen (Bernstein's segmented Sieve of Atkin) is about 1.5x the speed up to 10^11, then gets relatively slower (Perl/ntheory passes it at 10^12 and is almost 2x faster at 10^13).

@ironman353
Copy link

@Ismael-VC, yes I converted my code in JAVA to Julia. It is in that thread.

@aiorla
Copy link
Contributor

aiorla commented Jul 6, 2015

Wow, I didn't remember the primes issue! @ironman353 results were much better than master, I think they are worth a try. (although it's Eratosthenes and this PR seems to be more Atkin "oriented")

@pabloferz
Copy link
Contributor

Now that #12025 is ready, I believe we can close this in favour of that one.

@IainNZ
Copy link
Member

IainNZ commented Aug 10, 2015

Closing in favor of #12025

@IainNZ IainNZ closed this Aug 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.