Improving Prime Number Sieves
With the elementary knowledge about distribution of prime numbers, we are finally able to characterize the time complexity of different prime number sieves precisely. This will help us reasonably understand and argue about our improvement of prime number sieve over the classical Eratosthenes sieve.
Linear Sieve
All of us might be very familiar with
the sieve of Eratosthenes: to sieve
the primes below
What's the time complexity of the
sieve of Eratosethenes? Well, the
process of marking composite based
on each prime
Proof. Actually proving this
theorem requires only
First, by the Shapiro's tauberian
theorem we know there's
Then, by applying the Abel's identity we have:
Where
It's clear that:
Meantime, we have:
Therefore
Finally, by letting
And we are done with the proof.
Although compared to a common
comparison-based sorting algorithm's
In fact, we are able to do much
better than that. A sieve method
can be done in time complexity
First, we are able to abstract
the marking process into a graph,
with vertices being integers,
and if vertex
There are many composites adjacent to more than one primes, so generally it's not a tree.
Contrarily, a map defined on integers
from
Consider such a map: for each integer
Clearly such a digraph is always a
tree for each scale
and is prime. , for least prime factor of .
Proof. For the statement #1,
by our definition, each prime can
be written as
For the statement #2, record the edges
in path from
Conversely, by mathematical induction
on the length of path
In the implementation, for each
visited integer
N = 1000000 # set it to your scale
dp = list([0]*(N+1))
primes = list()
for m in range(2, N+1):
if dp[m] == 0:
# m = m * 1
dp[m] = m
primes.append(m)
for p in primes:
n = m * p
if n > N:
# p' * m > p * n > N
break
dp[n] = p
if p >= dp[m]:
break
print(primes)
Integer factorization based on linear sieve
Since for integer
# WARN: the code must be following the linear sieve code above
import random
c = random.randint(2, N**2) # roll today's lucky number
# c = 123456 # or set to your own number
assert (1 <= c) and (c <= N**2)
n = c
factors = list()
if n > N:
# we must try the divisibility.
for p in primes:
if n == 1:
break
if n % p == 0:
k = 0
while n % p == 0:
n = n // p
k += 1
factors.append((p, k))
if n <= N:
# we can lookup table directly
break
if n <= N:
# lookup table directly now.
while n != 1:
p = dp[n]
k = 0
while n % p == 0:
n = n // p
k += 1
factors.append((p, k))
if n != 1:
# n is a prime by now
factors.append((n, 1))
# note len(factors) == 1 for c = 1
print(f'{c} = ' + ' * '.join(f'({factor[0]} ** {factor[1]})' for factor in factors))
Undoubtedly, looking up the table
generated by linear sieve is much
more efficient than iterating through
primes and testing divisibility, the
complexity of looking up is related
to the number of prime factors in
The latter one can be coarsely
estimated to be
Proof. It suffices to consider
the worst case: to maximize the number
of distinct prime factor within
Take natural logarithmic to both sides and then we have
By prime number theorem, We know
Multiplying
We can verify the conclusion by programming to calculate how much our estimation is deviated from the actual case:
# WARN: the code must be following the linear sieve code above
l = 10000 # number of sample points to test
assert l <= len(primes), "please sieve more primes"
prime_product = [0]*l
prime_product[0] = primes[0]
for i in range(1, l):
prime_product[i] = prime_product[i-1] * primes[i]
import math
print([(math.log(n) / math.log(math.log(n))) / (i+1) for i, n in enumerate(prime_product)])
The ratio between the approximation
and the actual value slowly approaches
Please notice that having more prime
power and having more distinct prime
product are mutually exclusive. And
again, we can coarsely interpolate
them by geometric average, claiming
the time complexity to be
"Lord Du Sieve" with Precomputation
We have introduced the "Lord Du Sieve",
which provides a fast computation
within
However, if we take a closer look, we
soon find it's based on two dynamic
programming on function values of
In fact, we can even find a trade-off
point
By tweaking the upper bound of the
time complexity analysis of the
Lord Du sieve, for the portion from
In order to guarantee the
precomputation to be done within
There're many ways to guarantee
the