Sieve in C: Complete Solution & Deep Dive Guide

a close up of a computer screen with code on it

Mastering Prime Numbers: The Complete Guide to the Sieve of Eratosthenes in C

The Sieve of Eratosthenes is a highly efficient, ancient algorithm for finding all prime numbers up to a specified integer. This guide provides a deep dive into its implementation in C, covering the core logic, memory management, performance optimizations, and real-world applications for modern computing challenges.


The Quest for Peak Performance

Imagine you've just assembled a custom computer from a box of random parts acquired at a garage sale. The machine boots, the fans whir, and the LEDs glow. But how powerful is it really? To push your new creation to its limits and get a real measure of its processing power, you need a benchmark—a task that is computationally intensive yet elegant in its design.

You decide against generic benchmarking software. Instead, you turn to a classic, an algorithm devised over two thousand years ago by a Greek mathematician named Eratosthenes. The "Sieve of Eratosthenes" is not just a relic of history; it's a brilliant and surprisingly efficient method for finding prime numbers, and its implementation is a perfect stress test for a CPU's integer and memory operations.

This article will guide you through implementing this famous algorithm in C. We won't just write the code; we will understand its soul—why it works, how to optimize it, and where it fits in the landscape of modern programming. This is a core challenge from the exclusive kodikra.com C learning path, designed to sharpen your problem-solving and memory management skills.


What Exactly Is the Sieve of Eratosthenes?

The Sieve of Eratosthenes is an algorithm for finding all prime numbers up to a given limit, n. A prime number is a natural number greater than 1 that has no positive divisors other than 1 and itself. The algorithm works by iteratively marking as composite (i.e., not prime) the multiples of each prime, starting with the first prime number, 2.

Think of it like a process of elimination. You start with a list of all integers from 2 up to your limit. First, you take the smallest number, 2, which you know is prime. You then "sieve out" all of its multiples (4, 6, 8, ...) by marking them. You move to the next unmarked number, which is 3. You declare it prime and then sieve out all of its multiples (6, 9, 12, ...). You repeat this process, and the numbers that remain unmarked at the end are all the prime numbers within your limit.

This method is incredibly efficient for finding all primes in a range, as it avoids the costly division operations required by more naive methods like trial division. Its elegance lies in its simplicity and its reliance on basic arithmetic and memory manipulation, making it a cornerstone algorithm in both number theory and computer science education.

    ● Start (limit = n)
    │
    ▼
  ┌───────────────────────────┐
  │ Create boolean array      │
  │ `is_prime[0...n]`         │
  │ Initialize all to `true`  │
  └────────────┬──────────────┘
               │
               ▼
  ┌───────────────────────────┐
  │ Mark 0 and 1 as `false`   │
  └────────────┬──────────────┘
               │
               ▼
  ┌───────────────────────────┐
  │ Loop `p` from 2 to sqrt(n)│
  └────────────┬──────────────┘
               │
  ╭────────────▼────────────╮
  │ If `is_prime[p]` is true │
  ╰────────────┬────────────╯
               │
               ▼
  ┌───────────────────────────┐
  │ Loop `i` from `p*p` to `n`│
  │ in steps of `p`           │
  │   Mark `is_prime[i]`      │
  │   as `false`              │
  └────────────┬──────────────┘
               │
               ▼
  ┌───────────────────────────┐
  │ Collect all `i` where     │
  │ `is_prime[i]` is `true`   │
  └────────────┬──────────────┘
               │
               ▼
    ● End (Return primes)

Why Use the Sieve for Finding Primes?

When tasked with finding prime numbers, a programmer's first instinct might be to use "trial division." This method involves taking each number n and checking for divisibility by all integers from 2 up to the square root of n. While simple to implement for a single number, it becomes brutally inefficient when you need to find all primes up to a large limit.

The Sieve of Eratosthenes offers a fundamentally different and superior approach for this specific problem. Instead of checking each number individually, it eliminates entire classes of non-prime numbers in single sweeps. This shift from division-based checks to index-based marking is the source of its power.

Let's compare the two approaches to see why the Sieve is the preferred method for generating primes within a range.

Pros & Cons of the Sieve of Eratosthenes

Aspect Sieve of Eratosthenes Trial Division (for a range)
Time Complexity Excellent: O(n log log n). It's very close to linear time, making it extremely fast for finding all primes up to n. Poor: Roughly O(n * sqrt(n)). This becomes prohibitively slow as n increases.
Core Operation Memory access and boolean marking (addition/array indexing). These are very fast CPU operations. Modulo/Division. These are significantly slower CPU operations compared to addition.
Memory Usage High: Requires an array of size O(n), which can be a limitation for extremely large limits (e.g., billions). Very Low: Requires minimal memory, only storing the number being tested.
Problem Suitability Ideal for finding all primes in a given range [2, n]. Suitable for testing if a single, specific number is prime. Inefficient for ranges.
Implementation Slightly more complex due to array management and nested loops. Very simple and straightforward to implement.

The table makes it clear: if your goal is to generate a list of all primes up to a limit, the Sieve's superior time complexity far outweighs its memory cost for most practical applications. It's a classic trade-off of space for time, and in this case, the time savings are enormous.


How to Implement the Sieve of Eratosthenes in C

Now, let's translate the algorithm's logic into a robust C implementation. This process involves careful memory allocation, precise loop boundaries, and efficient data structures. The solution presented here is part of the kodikra.com C curriculum, focusing on clean, readable, and efficient code.

The Complete C Solution

Here is a well-structured C implementation that finds all primes up to a given limit and stores them in a pre-allocated array primes.

#include "sieve.h"
#include <stdbool.h>
#include <stdlib.h>
#include <string.h> // For memset

// An optimized implementation of the Sieve of Eratosthenes
uint32_t sieve(uint32_t limit, uint32_t *primes, size_t max_primes) {
    if (limit < 2) {
        return 0; // No primes less than 2
    }

    // We need a boolean array for numbers up to 'limit'.
    // `calloc` initializes memory to zero (false), which is perfect.
    // We will use 'false' to mean "is potentially prime" and 'true' for "is composite".
    bool *is_composite = calloc(limit + 1, sizeof(bool));
    if (!is_composite) {
        return 0; // Memory allocation failed
    }

    // Optimization: Start marking from the square of the prime.
    // Any smaller multiple would have been marked by a smaller prime.
    // For example, for p=5, 2*5 and 3*5 were already marked by 2 and 3.
    for (uint32_t p = 2; p * p <= limit; p++) {
        // If p has not been marked as composite, it is a prime.
        if (!is_composite[p]) {
            // Mark all multiples of p as composite, starting from p*p.
            for (uint32_t i = p * p; i <= limit; i += p) {
                is_composite[i] = true;
            }
        }
    }

    uint32_t prime_count = 0;
    // Collect all prime numbers (those not marked as composite).
    for (uint32_t p = 2; p <= limit; p++) {
        if (!is_composite[p]) {
            if (prime_count < max_primes) {
                primes[prime_count++] = p;
            } else {
                // Stop if the output array is full.
                break;
            }
        }
    }

    free(is_composite); // Don't forget to free the allocated memory!
    return prime_count;
}

Detailed Code Walkthrough

Let's dissect the code line by line to understand every decision made.

1. Header Includes and Function Signature

#include "sieve.h"
#include <stdbool.h>
#include <stdlib.h>

uint32_t sieve(uint32_t limit, uint32_t *primes, size_t max_primes) {
  • sieve.h: A custom header file likely containing the function prototype, a common practice in C projects.
  • <stdbool.h>: Provides the bool type and true/false macros for better readability than using integers.
  • <stdlib.h>: Essential for memory management functions like calloc and free.
  • The function sieve takes three arguments: the upper limit for the search, a pointer primes to an array where the results will be stored, and max_primes to prevent buffer overflows. It returns the count of primes found.

2. Edge Case Handling

    if (limit < 2) {
        return 0; // No primes less than 2
    }

This is a crucial sanity check. The definition of a prime number starts from 2. If the limit is less than 2, no primes can be found, so we immediately return 0.

3. Memory Allocation for the Sieve

    bool *is_composite = calloc(limit + 1, sizeof(bool));
    if (!is_composite) {
        return 0; // Memory allocation failed
    }
  • We allocate a dynamic array of booleans. The size is limit + 1 so that we can use the number itself as the index (e.g., to check the status of number 10, we access is_composite[10]).
  • calloc is used instead of malloc for a key reason: it initializes the allocated memory to zero. In C, a zero value for a boolean is interpreted as false. This means our entire array starts with every number being considered "potentially prime" (not composite) without needing an extra initialization loop.
  • The if (!is_composite) check is vital for robust code. If calloc fails to allocate the requested memory (e.g., for a very large limit), it returns NULL. We must handle this gracefully to prevent a crash.

4. The Sieving Process (Marking Multiples)

    for (uint32_t p = 2; p * p <= limit; p++) {
        if (!is_composite[p]) {
            for (uint32_t i = p * p; i <= limit; i += p) {
                is_composite[i] = true;
            }
        }
    }

This is the heart of the algorithm. It consists of two nested loops.

  • Outer Loop: for (uint32_t p = 2; p * p <= limit; p++)
    • It iterates from p = 2 upwards.
    • The condition p * p <= limit is a critical optimization. Any composite number n must have a prime factor less than or equal to sqrt(n). Therefore, we only need to check for primes and mark their multiples up to the square root of our limit.
  • The Prime Check: if (!is_composite[p])
    • Before starting to mark multiples, we check if the current number p has already been marked as composite. If it hasn't (its value is still false), it must be a prime number.
  • Inner Loop (Marking): for (uint32_t i = p * p; i <= limit; i += p)
    • This loop marks all multiples of our newly found prime p.
    • Key Optimization: We start marking from p * p, not 2 * p. Why? Consider p = 5. The multiples 2*5, 3*5, and 4*5 have already been marked by the smaller primes 2 and 3. The first multiple that hasn't been marked is 5*5. This simple change provides a noticeable performance boost.
    • i += p efficiently jumps to the next multiple.

Let's visualize the state of the is_composite array for limit = 15.

Initial state (all `false`):
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
 F  F  F  F  F  F  F  F  F  F  F   F   F   F   F   F

After p = 2 (marks multiples of 2 from 4):
[_, _, F, F, T, F, T, F, T, F, T,  F,  T,  F,  T,  F]
             ▲     ▲     ▲     ▲       ▲       ▲

After p = 3 (marks multiples of 3 from 9):
[_, _, F, F, T, F, T, F, T, T, T,  F,  T,  F,  T,  T]
                            ▲           ▲          ▲

Outer loop stops since next p=4 is composite. Next prime p=5, but 5*5=25 > 15.

5. Collecting the Results

    uint32_t prime_count = 0;
    for (uint32_t p = 2; p <= limit; p++) {
        if (!is_composite[p]) {
            if (prime_count < max_primes) {
                primes[prime_count++] = p;
            } else {
                break;
            }
        }
    }
  • After the sieving is complete, we iterate through our boolean array one last time.
  • Any index p where is_composite[p] is still false corresponds to a prime number.
  • We add these prime numbers to the output primes array, carefully checking against max_primes to avoid writing past the end of the buffer.
  • The prime_count is incremented for each prime found.

6. Cleanup and Return

    free(is_composite);
    return prime_count;
  • Crucially, we must call free(is_composite) to release the memory we allocated with calloc. Forgetting to do this results in a memory leak, a serious bug in C programming.
  • Finally, we return the total count of primes discovered.

Where Is the Sieve Algorithm Used Today?

While its origins are ancient, the Sieve of Eratosthenes is far from obsolete. Its principles and implementation appear in various modern contexts.

1. Cryptography: Many cryptographic systems, like RSA, rely on the difficulty of factoring large numbers that are the product of two large primes. Generating these candidate prime numbers often involves algorithms that are variations or extensions of the Sieve, used to quickly eliminate a vast number of composite numbers before applying more rigorous primality tests.

2. Competitive Programming: The Sieve is a standard tool in the arsenal of any competitive programmer. Problems related to number theory, divisors, or prime factorization frequently require pre-computing all primes up to a certain limit (e.g., 10^6 or 10^7), and the Sieve is the go-to algorithm for doing this quickly and efficiently.

3. Mathematical Research: In number theory research, sieving methods are fundamental. While more advanced sieves like the Sieve of Atkin or the large sieve are used for cutting-edge problems, they are all built upon the foundational concepts introduced by Eratosthenes.

4. Performance Benchmarking: As demonstrated in our opening story, the algorithm's heavy reliance on memory access and simple integer arithmetic makes it a great way to benchmark CPU cache performance and memory subsystem speed. Running a sieve over a large limit can reveal bottlenecks in how a processor handles tight loops and sequential memory access.

Here is a visual representation of the data flow during the prime collection phase.

    ┌─────────────────────────┐
    │ `is_composite` array    │
    │ (Sieving complete)      │
    └───────────┬─────────────┘
                │
                ▼
  ┌─────────────────────────────┐
  │ Loop `p` from 2 to `limit`  │
  └───────────┬─────────────────┘
              │
    ╭─────────▼─────────╮
    │ `!is_composite[p]`? │
    ╰─────────┬─────────╯
              │ Yes
              ▼
    ╭─────────▼─────────╮
    │ `count < max_primes`? │
    ╰─────────┬─────────╯
              │ Yes
              ▼
  ┌─────────────────────────────┐
  │ `primes[count++] = p`       │
  └───────────┬─────────────────┘
              │
              ▼
    ● Continue Loop

Frequently Asked Questions (FAQ)

What is the time complexity of the Sieve of Eratosthenes?
The time complexity is approximately O(n log log n). For each prime p, the inner loop runs n/p times. The sum over all primes up to n (which is a harmonic series of primes) converges to log log n. This is extremely efficient and very close to linear time, O(n).
Why do we only need to iterate the outer loop up to the square root of the limit?
This is a core optimization. Any composite number c can be factored into a * b. If both a and b were greater than sqrt(c), then their product a * b would be greater than c, which is a contradiction. Therefore, at least one prime factor of any composite number must be less than or equal to its square root. By the time our outer loop reaches sqrt(limit), we have already found the smallest prime factor for every composite number up to limit and marked them.
Can the Sieve of Eratosthenes handle very large numbers (e.g., up to 10^12)?
Not directly with this implementation. The primary limitation is memory. A sieve up to 10^12 would require a boolean array of that size, which translates to a terabyte of RAM—far beyond what is available on consumer hardware. For such large limits, programmers use more advanced techniques like a "segmented sieve," which sieves the range in smaller, manageable chunks or blocks, drastically reducing memory requirements.
Is the Sieve of Eratosthenes the fastest prime-finding algorithm?
For finding all primes up to a limit n, it is one of the fastest known practical algorithms. The Sieve of Atkin is theoretically faster, with a complexity of O(n / log log n), but it is significantly more complex to implement and its constant factors often make it slower in practice for typical limits (e.g., up to 10^8).
How does calloc provide an advantage over malloc here?
calloc(count, size) does two things: it allocates memory for count elements of size, and it initializes all bytes in that memory block to zero. malloc(count * size) only allocates the memory, leaving its contents indeterminate. Since a boolean false is represented by 0 in C, using calloc allows us to skip a manual initialization loop (e.g., a for loop or memset) to set the entire array to false, making the code cleaner and potentially faster.
What is the purpose of the max_primes parameter?
It's a safety feature to prevent a buffer overflow. The caller provides an array (primes) of a certain size (max_primes). Our function must respect this boundary. If we find more primes than the array can hold, we must stop adding them. This prevents writing to memory that doesn't belong to us, which is a common source of crashes and security vulnerabilities in C programs.

Conclusion: An Ancient Algorithm for Modern Challenges

The Sieve of Eratosthenes is a beautiful testament to algorithmic efficiency. Its core idea—eliminating composites by marking multiples of primes—is a powerful strategy that has stood the test of time. By implementing it in C, you not only solve a classic computational problem but also gain invaluable experience with fundamental programming concepts: dynamic memory management with calloc and free, array manipulation, and performance optimization through intelligent loop boundaries.

This challenge, drawn from the kodikra.com C learning roadmap, is more than just a coding exercise. It's an exploration of how a simple, elegant idea can outperform brute-force methods by orders of magnitude. Understanding this trade-off between algorithmic design, time complexity, and space complexity is what separates a novice programmer from an expert.

As you continue your journey, remember the lessons of the Sieve. Often, the most efficient solution isn't about raw processing power but about finding a smarter way to approach the problem. To continue honing these skills, explore the full collection of challenges and tutorials in our comprehensive C language section.

Disclaimer: All code examples are based on modern C standards (C11/C17) and assume a standard library implementation. The logic is timeless, but specific function availability and behavior may vary with older, non-compliant compilers.


Published by Kodikra — Your trusted C learning resource.