Improving Prime Number Sieves


With the elementary knowledge about distribution of prime numbers, we are finally able to characterize the time complexity of different prime number sieves precisely. This will help us reasonably understand and argue about our improvement of prime number sieve over the classical Eratosthenes sieve.

Linear Sieve

All of us might be very familiar with the sieve of Eratosthenes: to sieve the primes below , we keep a bitset and scan integers from to ; if the currently scanned has not been marked, then it's a prime, and then we will mark as composite; if it's a composite, then just skip it.

What's the time complexity of the sieve of Eratosethenes? Well, the process of marking composite based on each prime dominates the time complexity. For sieving primes below , it takes approximately to be done. And with analytic number theory, we are able to expand asymptotically:

Theorem (Asymptotic Expansion of Prime Harmonic Series)

Proof. Actually proving this theorem requires only and Shapiro's tauberian theorem.

First, by the Shapiro's tauberian theorem we know there's , and we can let so that there's , and thus , by some so that and since is finite for finite , and let .

Then, by applying the Abel's identity we have:

Where .

It's clear that:

Meantime, we have:

Therefore converges absolutely to some constant within and .

Finally, by letting , we have:

And we are done with the proof.

Corollary
The algorithm of Eratosthenes sieve runs in time complexity , plus a space complexity to store bitset of discovered primes.

Although compared to a common comparison-based sorting algorithm's time complexity, the might not seem a big deal, but given that for , it starts to take more than one times the scale to sieve the primes below .

In fact, we are able to do much better than that. A sieve method can be done in time complexity and space complexity , but remains to be just as simple as the sieve of Eratosthenes, once you have grasped its essence.

First, we are able to abstract the marking process into a graph, with vertices being integers, and if vertex marks , there's an edge between and . The graph is finite for each scale we are going to sieve. Here's the graph for :

There are many composites adjacent to more than one primes, so generally it's not a tree.

Contrarily, a map defined on integers from to can also be viewed as a simple digraph with edge from its input to its output. The good thing of a map is, every vertex has out degree being only .

Consider such a map: for each integer , we are able to uniquely factorize it into product of positive integers, let it be , with being the least prime factor, clearly both and are unique and the is a map. Let's draw the digraph of for , with each edge labeled with corresponding to current :

Clearly such a digraph is always a tree for each scale , which means there're edges there. Conversely, imagine if we are able to mark each integer from to guided by the edges, the process of marking will definitely be completed within steps. In fact, this guides us to the essence of the linear sieve (aka. Euler's sieve):

Example (Linear Sieve)
In the digraph of for scale , there's an edge to from some iff:

  1. and is prime.
  2. , for least prime factor of .

Proof. For the statement #1, by our definition, each prime can be written as , and if there's for some prime , then .

For the statement #2, record the edges in path from to as . Since is always the currently least prime factor, there has to be . In our case, it's and , so there's an edge from to only if .

Conversely, by mathematical induction on the length of path , starting with statement #1 as the case of , for the path of length with defined for , the path defined for is a valid path with , and there's a vertex for in the digraph, given that .

In the implementation, for each visited integer , we just need to record the prime that sieved by, since it has to be the least prime factor. And in the record storing the least prime, if there's record for , then it's not sieved by any integer below , which means is a prime. We also require an extra array to record to primes, occupying approximately space by prime number theorem. Here is a possible implementaion:

N = 1000000 # set it to your scale
dp = list([0]*(N+1))
primes = list()
for m in range(2, N+1):
	if dp[m] == 0:
		# m = m * 1
		dp[m] = m
		primes.append(m)
	for p in primes:
		n = m * p
		if n > N:
			# p' * m > p * n > N
			break
		dp[n] = p
		if p >= dp[m]:
			break

print(primes)

Integer factorization based on linear sieve

Since for integer , if it's not a prime, there must be a prime factor below , it suffices to factorize integers up to for a linear sieve run up to . Here is an integer factorization algorithm based on the linear sieve:

# WARN: the code must be following the linear sieve code above
import random
c = random.randint(2, N**2) # roll today's lucky number
# c = 123456 # or set to your own number
assert (1 <= c) and (c <= N**2)

n = c
factors = list()
if n > N:
	# we must try the divisibility.
	for p in primes:
		if n == 1:
			break
		if n % p == 0:
			k = 0
			while n % p == 0:
				n = n // p
				k += 1
			factors.append((p, k))
		if n <= N:
			# we can lookup table directly
			break

if n <= N:
	# lookup table directly now.
	while n != 1:
		p = dp[n]
		k = 0
		while n % p == 0:
			n = n // p
			k += 1
		factors.append((p, k))

if n != 1:
	# n is a prime by now
	factors.append((n, 1))

# note len(factors) == 1 for c = 1
print(f'{c} = ' + ' * '.join(f'({factor[0]} ** {factor[1]})' for factor in factors))

Undoubtedly, looking up the table generated by linear sieve is much more efficient than iterating through primes and testing divisibility, the complexity of looking up is related to the number of prime factors in and the maximum power of a prime in .

The latter one can be coarsely estimated to be . And for the former one, we have:

Lemma
There are approximately distinct prime factors of positive integers less than .

Proof. It suffices to consider the worst case: to maximize the number of distinct prime factor within , we multiply primes in ascending order, which is:

Take natural logarithmic to both sides and then we have

By prime number theorem, We know , therefore the .

Multiplying distinct primes within yields the maximum number approximating , and thus multiplying distinct primes within yields the maximum number approximating .

We can verify the conclusion by programming to calculate how much our estimation is deviated from the actual case:

# WARN: the code must be following the linear sieve code above
l = 10000 # number of sample points to test
assert l <= len(primes), "please sieve more primes"
prime_product = [0]*l
prime_product[0] = primes[0]
for i in range(1, l):
	prime_product[i] = prime_product[i-1] * primes[i]

import math
print([(math.log(n) / math.log(math.log(n))) / (i+1) for i, n in enumerate(prime_product)])

The ratio between the approximation and the actual value slowly approaches as grows larger and larger.

Please notice that having more prime power and having more distinct prime product are mutually exclusive. And again, we can coarsely interpolate them by geometric average, claiming the time complexity to be .

"Lord Du Sieve" with Precomputation

We have introduced the "Lord Du Sieve", which provides a fast computation within of certain number theoretic function. And undoubtedly, it's faster than using Linear sieve to calculate all partial sums up to , which will require time complexity .

However, if we take a closer look, we soon find it's based on two dynamic programming on function values of -th root, and computing values from to already takes time complexity , which can be done within using Linear sieve quickly.

In fact, we can even find a trade-off point within the portion from to , so that if we precompute the portion from to by Linear sieve, and the portion from to using Lord Du sieve, the algorithm can be done within time complexity .

By tweaking the upper bound of the time complexity analysis of the Lord Du sieve, for the portion from to , we get:

Lemma
Proof.
Theorem
The algorithm of Lord Du sieve with precomputation using linear sieve from to runs in time complexity , if the precomputation of the number theoretic function can be done within time complexity .
Proof. Noted that evaluates the time complexity of Lord Du sieve to compute values from to , which is in . The precomputation from to is done within if each function value can be precomputed within time complexity .

In order to guarantee the precomputation to be done within , one must try their best to avoid complete factorization of current , which might introduce a factor of .

There're many ways to guarantee the precomputation if we make good use of the linear sieve. For example, consider the multiplicative functions, when visiting and filtering , if we keep track an extra map (in an array), such that , if , and if , the value of can be precomputed by , which is done within .

March 24, 2024