So after going through the book I needed something to practice on, not wanting to throw myself into an active project I decided to head over to Project Euler. A great resource if you want to practice using a language without resorting to made up scenarios or over elaborate "Hello World" applications. There are a number of problems (399 at the time of writing) which are mathematical/programming based in nature. If you're not too good at maths or it's been a while since you were in full time education then you may need to Google a bit to get an idea of the problem but they're not beyond the average person.
The Problem
The one I want to look at here specifically is problem 7 which (at the time of writing) is a problem to find the 10,001st prime number. There are two obvious ways to solve this, the first being a prime number generator. This is a function which keeps on incrementing a count, checks to see if the next number is prime and if it is the number is returned from the function. The second method is to use a sieve, this takes a sequence of numbers and "removes" all of the ones which are not prime numbers, it does this by:
1. Create a sequence of numbers, marking each as a "prime candidate"
2. Starting from 2, check to see if the entry is marked as a prime candidate
a. If the entry is a prime candidate then mark each multiple in the sequence as non-prime
3. Return the collection of prime of numbers still marked as prime candidates as the final set of prime numbers
This is an over-simplified explanation of the process, for a more detailed explanation see the Wikipedia entry on the Sieve of Eratosthenes.
The issue with creating a sieve is that you must specify an upper limit, that is to say "find all prime numbers less than N". The problem however poses the question as "find the first N prime numbers". This means that in order to use a sieve we have to know roughly where the Nth prime will be, or use a high enough limit to generate enough primes to get N. Because of this it would seem that using a generator would be a better option. One thing to keep in mind is that, although there is no time limit to the problems, generally speaking if the solution takes longer than a minute then it may not be the best solution.
The Solution
Before I start
One thing I should probably mention here is that I'm only checking to see if a number is divisible by prime numbers instead of checking for all possible divisors. The theory here is Prime Factorization. So a prime number is a number which is only divisible by 1 and itself, other positive integers (other than 1 which is special) are composite numbers, meaning that they are divisible by integers other than 1 and themselves. Consider the number 20, this can be decomposed as follows:
20 / 2 = 10
10 / 2 = 5
5 / 5 = 1
So the number 20 can be represented as 2 x 2 x 5 (or 2^2 x 5).
Because numbers which are not primes must be evenly divisible by at least one prime number less than itself then this reduces the number of numbers to check for to see if a number is prime.
Solution 1
Because I wanted to play with Go I took the opportunity to create a prime number generator using goroutines (think light-weight threads). This was done by creating a number "emitter" at one end and a "listener" at another end, when a number leaves the emitter it passes through a series of checkers, these check to see if the number is evenly divisible by a given prime number. So the first checker will check to see if the number is evenly divisible by 2, if not then it is passed to the next checker which checks to see if it is evenly divisible by 3. The "listener" watches the output from the last checker, if a number appears here then it is not divisible by any of the currently known primes and so must be a prime number itself. In this event, the listener creates a new checker for the new prime number which listens to the output from the last checker, the listener then listens for output from this new checker, it also puts the new prime number into a channel and waits for it to be read by the client.
Emitter > Check For 2 > Check For 3 > Check For 5 > Listener
In implementation I thought this to be a rather elegant solution, however for production use this turns out to be quite a time consuming exercise. It allowed me however to play with channels and goroutines and so was incredibly useful. Next up I decided to implement a similar generator again, but without goroutines and channels.
Solution 2
This solution is more simple as it simply builds up a list of prime numbers and runs in a single goroutine (the main one). The theory is almost identical though; keep on checking numbers to see if they are evenly divisible by the current list of known prime numbers, if not then it is a prime number and add it to the list. The implementation this time takes a number which is the Nth prime to find, this means that the function will keep on running until it has found that many prime numbers and will return the last one found.
This was straight forward to implement and ran so much faster than the first solution (something along the lines of 38 seconds for solution 1 to 6 seconds for solution 2). But I still felt that it was too slow.
Solution 3 - The Prime Sieve
So I investigated the sieve solution. I gave a brief outline of what is required above but there are a couple of optimizations to be made. The first is that for a given sequence of numbers you only need to look for prime numbers up to the square root of the limit, numbers after this point are either multiples of the prime numbers discovered to that point, or are prime numbers themselves. Say we wanted to find all of the prime numbers up to 16 then we would limit ourselves to only looking up to 4 (the square root of 16):
1 | N/A |
2 | Prime Number |
3 | Prime Number |
4 | Divisible by 2 |
Search limit | |
5 | Prime Number |
6 | Divisible by 3 |
7 | Prime Number |
8 | Divisible by 2 |
9 | Divisible by 3 |
10 | Divisible by 2 |
11 | Prime Number |
12 | Divisible by 2 |
13 | Prime Number |
14 | Divisible by 2 |
The second optimization is fairly obvious from the table above, every even number with the exception of 2 is non-prime. This means that we can initially mark every even candidate as non-prime (except 2), then starting from 3 check every odd number (e.g. 3, 5, 7, 9...), this halves the number of candidates which need checking.
In order to get around the "known limit" problem I implemented the solution to sieve out all primes up to 1,000,000 (a suitably large number I thought). There are ways in which you can determine the upper limit, but the math is a bit beyond me right now, this is Prime Number Theorem, feel free to take a look. If this were implemented then we would be checking fewer potential candidates meaning less work again.
(Edit, thanks to compoasso from the comments I implemented a function to get the upper limit so that it is no longer checking from primes up to 1 million)
I didn't hold out much hope for this solution as I am sieving for primes in a large data set to find 1 specific number, so I figured I that I wasn't going to do much better than the ~6 seconds from solution 2. So the result surprised me a little. After putting together the solution and running it with the "time" command on my HP Mini 210 (Intel Atom N450, dual core 1.6Ghz) running Ubuntu Linux I got the correct result in 98 milliseconds!
So, what now?
Well, if you want to check out the full source files you can find them on GitHub along with the above snippets.
I absolutely love playing with Go, it is very much like creating C++ application but with the convenience of something like Python or Ruby. Project Euler is a great playground for experimenting with new languages as well, it also allows you to think about how to create a solution which is appropriate to the language (instead of making a generic solution work across all languages).
Now I just need to remember to keep on posting :-)
About the part "finding a high enough limit" you can use x / ln(x) as describle in wikipedia.
ReplyDeleteThis formula is true for x approaches infinity, here x is low so you can multiply the result by a small delta for security.
In your case : 10,001st prime number give -> 10^4/ln(10^4) = 1085. Here 10^4 is a very very low limit so you should multiply by a "big delta" for example 1.2 to have good limit.
For following power (5, 6, 7,...) delta can be a bit less (1.1) and give a good compromise between memory space and limit.
Thanks, I'll have a go at implementing that and see if I can shave a bit more time off :-)
DeleteI changed the code to get the limit using n * ln(n) * 1.2 which gave an limit of 110536 (not far off the answer). It also took the execution time down from 0.098s to 0.021s :)
DeleteThanks for pointing me in the right direction