Thursday, January 15, 2009

Erlang solution for Euler problem 10

In Euler problem 10 we are asked to compute the sum of all prime numbers below N, where N is 2 million.

I've implemented this in Erlang using four different approaches.

The first approach (see module euler10a below) visits every odd number between two and two million. It checks whether a number n is a prime by trying to divide it by every odd number between 2 and sqrt(n).

The second approach (see module euler10b below) uses an implementation which is remarkably elegant in Erlang and which is often incorrectly described as being an implementation of the sieve of Eratosthenes. The code for this approach is taken from an answer by Andreas Jacobsen to this stack overflow question. The excellent article "The Genuine Sieve of Eratosthenes" by Melissa O'Neill describes why this approach is actually not an implementation of the sieve of Eratosthenes, how that causes it to be extremely inefficient, and how it can be modified to be more efficient. I've included this implementation because of the sheer elegance of the code.

The third approach (see module 10c below) is a simple implementation of the sieve of Eratosthenes using the the Erlang array module.

The fourth approach (see module 10e below) is an optimized implementation of the sieve of Eratosthenes. The main difference with the third approach is that it only stores odd numbers in the array.

All of these approaches are single threaded. I would like to do some more research into distributed algorithms for finding prime numbers and implement a concurrent solution in Erlang.

The following shows the run-time of each of these algorithms (on an Apple MacBook Pro) in seconds as a function of the size of the problem N.



Source code for approach A:


%% Project Euler, problem 10
%%
%% Find the sum of all the primes below two million.
%%
%% Approach a:
%% - Try all odd numbers from below two million.
%% - Check each number n for being a prime, by trying to devide it by all odd number 3 ... sqrt(n).

-module(euler10a).
-author('Cayle Spandon').

-export([solve/0, solve/1]).

isPrime(N) ->
MaxTry = trunc(math:sqrt(N)),
isPrime(N, 3, MaxTry).

isPrime(N, Try, MaxTry) ->
if
Try > MaxTry ->
true;
true ->
if
N rem Try == 0 ->
false;
true ->
isPrime(N, Try+2, MaxTry)
end
end.

sumPrimes(Max) ->
sumPrimes(3, 2, Max).

sumPrimes(Current, Sum, Max) ->
if
Current >= Max ->
Sum;
true ->
case isPrime(Current) of
true -> NewSum = Sum + Current;
false -> NewSum = Sum
end,
sumPrimes(Current+2, NewSum, Max)
end.

solve() ->
solve(2000000).

solve(N) ->
sumPrimes(N).


Source code for approach B:


%% Project Euler, problem 10
%%
%% Find the sum of all the primes below two million.
%%
%% Approach b:
%% - Generate all primes below two million, using an extremely elegant but highly inefficient algorithm which is
%% similar but not quite identical to the sieve of Eratosthenes.
%% - Sum the generated primes.

-module(euler10b).
-author('Cayle Spandon').

-export([solve/0, solve/1, sieve/1]).

sieve([]) ->
[];

sieve([H|T]) ->
List = lists:filter(fun(N) -> N rem H /= 0 end, T),
[H|sieve(List)];

sieve(N) ->
sieve(lists:seq(2, N)).

sum(L) ->
lists:foldl(fun (N, Acc) -> N + Acc end, 0, L).

solve() ->
solve(2000000).

solve(N) ->
sum(sieve(N)).


Source code for approach C:


%% Project Euler, problem 10
%%
%% Find the sum of all the primes below two million.
%%
%% Approach c:
%% - Generate all primes below two million, using the sieve of Eratosthenes implemented using arrays.
%% - Sum the generated primes.

-module(euler10c).
-author('Cayle Spandon').

-export([solve/0, solve/1, sieve/1]).

sieve(N) ->
Array1 = array:new([{size, N+1}, {default, is_prime}]), % +1 because array is 0-based
Array2 = array:set(0, is_not_prime, Array1), % handle 0 as a special case
Array3 = array:set(1, is_not_prime, Array2), % handle 1 as a special case
MaxTry = trunc(math:sqrt(N)),
Try = 2, % start with 2 as the first candidate prime
sieve(Array3, MaxTry, Try).

sieve(Array, MaxTry, Try) ->
case Try > MaxTry of
true ->
Array;
false ->
case array:get(Try, Array) of
is_not_prime ->
sieve(Array, MaxTry, Try + 1);
is_prime ->
NewArray = remove_multiples_of_prime(Array, Try),
sieve(NewArray, MaxTry, Try + 1)
end
end.

remove_multiples_of_prime(Array, Prime) ->
Multiple = 2 * Prime,
remove_multiples_of_prime(Array, Prime, Multiple).

remove_multiples_of_prime(Array, Prime, Multiple) ->
case Multiple >= array:size(Array) of
true ->
Array;
false ->
NewArray = array:set(Multiple, is_not_prime, Array),
remove_multiples_of_prime(NewArray, Prime, Multiple + Prime)
end.

sum(Array) ->
array:foldl(
fun (Index, Value, Acc) ->
if
Value == is_prime ->
Acc + Index;
true ->
Acc
end
end,
0,
Array
).

solve() ->
solve(2000000).

solve(N) ->
sum(sieve(N)).


Source code for approach E:


%% Project Euler, problem 10
%%
%% Find the sum of all the primes below two million.
%%
%% Approach e:
%% - Generate all primes below two million, using the sieve of Eratosthenes implemented using arrays.
%% - Only store odd numbers in the sieve array.
%% - Sum the generated primes.

-module(euler10e).
-author('Cayle Spandon').

-export([solve/0, solve/1, sieve/1]).

init_sieve_array(N) ->
Size = (N + 1) div 2, % only store odd numbers (0->1, 1->3, 2->5, etc.)
Array1 = array:new([{size, Size}, {default, is_prime}]),
Array2 = array:set(0, is_not_prime, Array1), % handle 1 as a special case
Array2.

get_value_from_sieve_array(Nr, Array) ->
1 = Nr rem 2, % Nr must be odd
Index = Nr div 2,
array:get(Index, Array).

set_value_in_sieve_array(Nr, Value, Array) ->
1 = Nr rem 2, % Nr must be odd
Index = Nr div 2,
array:set(Index, Value, Array).

sieve(N) ->
Array = init_sieve_array(N),
MaxTry = trunc(math:sqrt(N)),
Try = 3, % start with 3 as the first real prime
sieve(N, Array, MaxTry, Try).

sieve(N, Array, MaxTry, Try) ->
case Try > MaxTry of
true ->
Array;
false ->
case get_value_from_sieve_array(Try, Array) of
is_not_prime ->
sieve(N, Array, MaxTry, Try + 2);
is_prime ->
NewArray = remove_multiples_of_prime(N, Array, Try),
sieve(N, NewArray, MaxTry, Try + 2)
end
end.

remove_multiples_of_prime(N, Array, Prime) ->
Multiple = 3 * Prime,
remove_multiples_of_prime(N, Array, Prime, Multiple).

remove_multiples_of_prime(N, Array, Prime, Multiple) ->
case Multiple > N of
true ->
Array;
false ->
NewArray = set_value_in_sieve_array(Multiple, is_not_prime, Array),
remove_multiples_of_prime(N, NewArray, Prime, Multiple + 2*Prime)
end.

sum(Array) ->
Sum = array:foldl(
fun (Index, Value, Acc) ->
if
Value == is_prime ->
Acc + (2 * Index) + 1;
true ->
Acc
end
end,
0,
Array
),
Sum + 2. % 2 is the only even number and is not in the array

solve() ->
solve(2000000).

solve(N) ->
sum(sieve(N)).

3 comments:

Klaus said...

Primes time Assembler Erlang(approach E)
2 Mill 10 ms approx. 1000 ms
10 Mill 230 ms 18000 ms

performs 78 times and 100 times faster(on slower machine), uses optimized sieve with bitfield-array:
C:\Asm>euler10

Project Euler - finde die Summe der Primzahlen bis: 2000000
125.004 Bytes Arrayspeicher

Suche nach Primzahlen von 1 bis 2.000.000 - bitte warten...

0 ms init Array
0 ms im Sieb
10 ms Primzahlen zaehlen
10 ms Gesamtzeit.

148.933 Primzahlen gefunden
Summe aller Primzahlen: 142913828922 Hex 00000021:4653F83A
----------------------------------
C:\Asm>euler10

Project Euler - finde die Summe der Primzahlen bis: 10000000
625.004 Bytes Arrayspeicher

Suche nach Primzahlen von 1 bis 10.000.000 - bitte warten...

0 ms init Array
210 ms im Sieb
20 ms Primzahlen zaehlen
230 ms Gesamtzeit.

664.579 Primzahlen gefunden
Summe aller Primzahlen: 3203324994356 Hex 000002E9:D50C6334

CayleSpandon said...

@Klaus: I am not surprised your implementation in (what appears to be) assembler code is 78 times faster. I'm just using the Euler problems to learn Erlang. It did not take long to discover that Erlang clearly isn't the best language for the kind of problems that the Euler project favors. Erlang seems to be targeted more at high reliability distributed systems with lots of processes and lots of messaging between those processes.

Klaus said...

@Cayle: I didn't want to pick on you. Perhaps I did not express clearly what I meant:

Does Erlang support bitfields?
My first implementation of the sieve in assembler needed 1.2 seconds to count the primes up to 10,000,000 and it used a byte-array.
Rather I wanted to tell you that using a bitfield (if Erlang supports) you could save memory AND time.