Sieve in Csharp: Complete Solution & Deep Dive Guide


Mastering Prime Numbers: A Deep Dive into the Sieve of Eratosthenes with C#

The Sieve of Eratosthenes is a highly efficient ancient algorithm for finding all prime numbers up to a specified integer. This C# guide demonstrates how to implement it by iteratively marking the multiples of each prime, starting with the first prime number, 2, leaving only the prime numbers unmarked.

Imagine you've just acquired a treasure trove of random computer parts from a garage sale. Your goal is to assemble custom-built machines and push them to their limits. To measure their performance, you need a reliable, CPU-intensive benchmark. You decide against canned solutions and opt to build your own, choosing a timeless and elegant algorithm: the Sieve of Eratosthenes. It's an ancient method, yet its computational demands make it a perfect stress test for modern hardware.

But you quickly hit a wall. A naive approach of checking every number for divisibility is painfully slow, especially as the target number grows into the millions. This is the exact problem the Sieve was designed to solve. This guide will walk you through not just the theory but the practical, high-performance implementation of this algorithm in modern C#, transforming a complex mathematical challenge into elegant and efficient code. You'll learn why it's superior to brute-force methods and how to optimize it for maximum speed.


What Is the Sieve of Eratosthenes?

The Sieve of Eratosthenes is an ancient algorithm for finding all prime numbers within a given range. Attributed to the Greek mathematician Eratosthenes of Cyrene, who lived in the 3rd century BC, its brilliance lies in its simplicity and efficiency. Instead of testing each number for primality individually (a process called trial division), the Sieve works by elimination.

The core idea is to create a list of all integers from 2 up to a specified limit. You then start with the first prime number, 2, and systematically "sieve out" or mark all of its multiples (4, 6, 8, and so on). You then move to the next unmarked number, which is 3, and repeat the process, marking all of its multiples (6, 9, 12, ...).

This continues with the next unmarked number (5), and so on. Any number that survives this process without being marked is a prime number. It's an algorithm of exclusion, not verification, which is what makes it so fast for finding primes within a contiguous block.

How the Algorithm Works: A Conceptual Flow

The logical progression of the Sieve is a beautiful example of computational thinking. It follows a clear, repeatable pattern that is perfect for implementation in code. Let's visualize the high-level steps involved in finding primes up to a limit, say 30.

    ● Start (Limit = 30)
    │
    ▼
  ┌─────────────────────────────────┐
  │ Create list: [2, 3, 4, ..., 30] │
  └───────────────┬─────────────────┘
                  │
    Let p = 2 (first prime)
                  │
                  ▼
  ┌─────────────────────────────────┐
  │ Mark multiples of p=2:          │
  │ [4, 6, 8, 10, 12, ..., 30]      │
  └───────────────┬─────────────────┘
                  │
    Find next unmarked number: p = 3
                  │
                  ▼
  ┌─────────────────────────────────┐
  │ Mark multiples of p=3:          │
  │ [6, 9, 12, 15, ..., 30]         │
  └───────────────┬─────────────────┘
                  │
    Find next unmarked number: p = 5
                  │
                  ▼
  ┌─────────────────────────────────┐
  │ Mark multiples of p=5:          │
  │ [10, 15, 20, 25, 30]            │
  └───────────────┬─────────────────┘
                  │
                  ▼
    ◆ p*p > limit? (5*5=25 < 30) -> No
                  │
    Find next unmarked number: p = 7
                  │
                  ▼
    ◆ p*p > limit? (7*7=49 > 30) -> Yes, Stop.
                  │
                  ▼
  ┌─────────────────────────────────┐
  │ Collect all remaining unmarked  │
  │ numbers: [2,3,5,7,11,13,17,...] │
  └───────────────┬─────────────────┘
                  │
                  ▼
    ● End

This flow demonstrates the process of elimination. We don't check if 29 is prime by dividing it by numbers. Instead, 29 is proven to be prime simply because it was never marked as a multiple of a smaller prime number.


Why Use the Sieve Over Simpler Methods?

For a beginner, the most intuitive way to check if a number n is prime is to try dividing it by every integer from 2 up to n-1. This is called trial division. While it works for small numbers, it becomes incredibly inefficient when you need to find all primes up to a large limit.

Imagine finding all primes up to 1,000,000. Using trial division, you would perform a massive number of division operations. The Sieve of Eratosthenes, by contrast, uses simple addition and marking, which are computationally much cheaper operations. This leads to a significant performance difference.

The time complexity of the Sieve of Eratosthenes is approximately O(n log log n), which is remarkably close to linear time. Trial division for all numbers up to n is significantly slower. Let's compare them directly.

Pros and Cons of the Sieve of Eratosthenes

Aspect Sieve of Eratosthenes Trial Division (Naive Method)
Performance Extremely fast for finding all primes in a range. Time complexity is near-linear: O(n log log n). Very slow for a range of numbers. Checking a single number k is O(sqrt(k)), so checking all numbers up to n is much worse.
Complexity Slightly more complex to implement correctly, requires careful state management (the list of marked numbers). Very simple to understand and implement for checking a single number.
Memory Usage Requires an array or list of size n+1 to store the state of each number. This can be a limitation for extremely large limits. Requires very little memory, as it only needs to check one number at a time.
Use Case Ideal for generating a list of all primes up to a specific limit (e.g., 10 million). Suitable for checking the primality of a few, sparse numbers, especially if they are small.

The trade-off is clear: the Sieve uses more memory to achieve its incredible speed. For most applications running on modern hardware, this is a worthwhile exchange. It's the standard algorithm used in competitive programming and many mathematical software packages for this very reason.


Where Is It Implemented in C#? A Detailed Code Walkthrough

Now, let's dissect a C# implementation from the exclusive kodikra.com learning path. This particular solution uses a Queue<int> to manage the candidate numbers, which is an interesting, though not the most common, approach. Understanding it provides valuable insights into data structures and algorithm design.

The Initial Solution Code


public static class Sieve
{
    public static int[] Primes(int limit)
    {
        if (limit < 0)
        {
            throw new ArgumentOutOfRangeException(nameof(limit), "Limit cannot be negative.");
        }

        if (limit < 2)
        {
            return Array.Empty<int>();
        }

        var primes = new List<int>();
        var candidates = new Queue<int>(Enumerable.Range(2, limit - 1));

        do
        {
            int p = candidates.Dequeue();
            primes.Add(p);

            int count = candidates.Count;
            for (int i = 0; i < count; i++)
            {
                int num = candidates.Dequeue();
                if (num % p != 0)
                {
                    candidates.Enqueue(num);
                }
            }
        } while (candidates.Count > 0);

        return primes.ToArray();
    }
}

Line-by-Line Code Explanation

Let's break down this code piece by piece to understand its logic, data structures, and flow.

1. Method Signature and Initial Checks


public static int[] Primes(int limit)
{
    if (limit < 0)
    {
        throw new ArgumentOutOfRangeException(nameof(limit), "Limit cannot be negative.");
    }

    if (limit < 2)
    {
        return Array.Empty<int>();
    }
  • public static int[] Primes(int limit): The method is public and static, meaning it can be called directly on the Sieve class without creating an instance. It accepts an integer limit and is expected to return an array of prime numbers.
  • ArgumentOutOfRangeException: This is robust error handling. Since prime numbers are positive, a negative limit is invalid input and should throw an exception.
  • if (limit < 2): The smallest prime number is 2. If the limit is less than 2, there are no primes to find. Returning an empty array with Array.Empty<int>() is the correct and most efficient way to handle this edge case in modern C#.

2. Data Structure Initialization


var primes = new List<int>();
var candidates = new Queue<int>(Enumerable.Range(2, limit - 1));
  • var primes = new List<int>(): A List<int> is created to store the prime numbers as they are found. A list is chosen because its size can grow dynamically.
  • var candidates = new Queue<int>(...): This is the core of this specific implementation. A Queue is a First-In, First-Out (FIFO) data structure.
  • Enumerable.Range(2, limit - 1): This LINQ method generates a sequence of integers starting from 2, with a total count of limit - 1 elements. For a limit of 10, this generates [2, 3, 4, 5, 6, 7, 8, 9, 10]. This sequence is used to initialize the queue.

3. The Main Processing Loop


do
{
    int p = candidates.Dequeue();
    primes.Add(p);

    int count = candidates.Count;
    for (int i = 0; i < count; i++)
    {
        int num = candidates.Dequeue();
        if (num % p != 0)
        {
            candidates.Enqueue(num);
        }
    }
} while (candidates.Count > 0);
  • do { ... } while (candidates.Count > 0): The loop continues as long as there are numbers left in the candidates queue.
  • int p = candidates.Dequeue(): In each iteration, the number at the front of the queue is removed. Because of how the algorithm works, this number is guaranteed to be the next prime. The first time this runs, p will be 2.
  • primes.Add(p): The discovered prime p is added to our results list.
  • int count = candidates.Count;: This is a crucial detail. The loop needs to iterate over all elements currently in the queue. If we used i < candidates.Count directly in the for loop condition, it would lead to an infinite loop or incorrect behavior because we are modifying the queue's size inside the loop.
  • for (int i = 0; i < count; i++): This loop processes every remaining candidate.
  • int num = candidates.Dequeue(): It takes the next candidate from the front of the queue.
  • if (num % p != 0): This is the "sieving" step. It checks if the candidate number is a multiple of the current prime p.
  • candidates.Enqueue(num): If the number is not a multiple of p, it survives this round and is added back to the end of the queue to be processed in the next iteration. If it is a multiple, it's simply discarded.

4. Returning the Result


return primes.ToArray();
  • The method signature requires an int[], so the List<int> is converted to an array before being returned.

While clever, this queue-based approach has a significant performance drawback. For each prime found, it iterates through and re-queues every single remaining candidate number. This involves a lot of object manipulation and is less efficient than the classic array-based approach.


When to Optimize: A High-Performance C# Implementation

The most common and performant way to implement the Sieve of Eratosthenes is by using a boolean array (bool[]). This approach leverages direct memory access via array indices, which is significantly faster than the enqueue/dequeue operations of a queue.

The idea is to create a boolean array, let's call it isPrime, of size limit + 1. The index of the array corresponds to a number, and the value at that index (true or false) indicates whether the number is considered prime. Initially, we assume all numbers are prime (set all values to true).

The Optimized Code with `bool[]`


public static class Sieve
{
    public static int[] Primes(int limit)
    {
        if (limit < 0)
        {
            throw new ArgumentOutOfRangeException(nameof(limit), "Limit cannot be negative.");
        }

        if (limit < 2)
        {
            return Array.Empty<int>();
        }

        // Step 1: Initialize the boolean array.
        // isPrime[i] = true means we currently think 'i' is prime.
        var isPrime = new bool[limit + 1];
        for (int i = 2; i <= limit; i++)
        {
            isPrime[i] = true;
        }

        // Step 2: The Sieve process.
        // We only need to check for factors up to sqrt(limit).
        for (int p = 2; p * p <= limit; p++)
        {
            // If isPrime[p] is true, then p is a prime.
            if (isPrime[p])
            {
                // Mark all multiples of p as not prime.
                // Start from p*p, as smaller multiples would have been marked already.
                for (int i = p * p; i <= limit; i += p)
                {
                    isPrime[i] = false;
                }
            }
        }

        // Step 3: Collect the results.
        var primes = new List<int>();
        for (int p = 2; p <= limit; p++)
        {
            if (isPrime[p])
            {
                primes.Add(p);
            }
        }

        return primes.ToArray();
    }
}

Why This Version Is Better

This implementation introduces several key optimizations that make it vastly superior for large limits.

1. Use of `bool[]` Array:

  • Direct Access: Accessing isPrime[i] is an O(1) operation. It's incredibly fast because it's a direct memory lookup.
  • Cache Locality: Arrays store data in a contiguous block of memory. When the CPU processes the array sequentially, it can pre-fetch nearby memory into its cache (L1/L2/L3), drastically speeding up subsequent accesses. The queue-based approach scatters objects in memory, leading to more cache misses.

2. Outer Loop Optimization:


for (int p = 2; p * p <= limit; p++)
  • The loop that finds the next prime to sieve with only needs to go up to the square root of the limit. Any composite number n must have at least one prime factor less than or equal to sqrt(n). If it didn't, all its factors would be greater than sqrt(n), and their product would be greater than n, which is a contradiction. This simple mathematical trick dramatically reduces the number of iterations.

3. Inner Loop Optimization:


for (int i = p * p; i <= limit; i += p)
  • When marking multiples of a prime p, we can start marking from p * p. Why? Because any smaller multiple, like 2 * p or 3 * p, would have already been marked by a smaller prime factor (2 or 3, in this case). For example, when p=5, we don't need to mark 10 (2*5) or 15 (3*5) because they were already marked when we processed p=2 and p=3. The first new multiple to mark is 5*5 = 25.

Visualizing the Optimized Flow

This diagram illustrates the logic of the `bool[]` implementation, which is far more direct and efficient.

    ● Start (limit)
    │
    ▼
  ┌───────────────────────────┐
  │ Create bool[] isPrime     │
  │ of size limit+1           │
  └────────────┬──────────────┘
               │
               ▼
  ┌───────────────────────────┐
  │ Initialize all to `true`  │
  │ (from index 2 to limit)   │
  └────────────┬──────────────┘
               │
               ▼
  Loop p from 2 to sqrt(limit)
               │
    ╭──────────▼──────────╮
    │ If isPrime[p] is true │
    ╰──────────┬──────────╯
               │
               ├─── Yes ───▶ Loop i from p*p to limit (step p)
               │           │    │
               │           │    ▼
               │           │  ┌──────────────────┐
               │           │  │ isPrime[i]=false │
               │           │  └──────────────────┘
               │           │
    No ────────╯           ▼
    (Continue p loop)      (Continue i loop)
               │
               ▼
  ┌───────────────────────────┐
  │ Loop through isPrime[]    │
  │ Collect indices where     │
  │ value is `true`           │
  └────────────┬──────────────┘
               │
               ▼
    ● End (Return primes)

Who Uses This Algorithm? Real-World Relevance

While it might seem like a purely academic exercise, the Sieve of Eratosthenes and the concepts behind prime number generation are fundamental in computer science and mathematics.

  • Cryptography: Modern encryption, such as the RSA algorithm, relies on the difficulty of factoring very large numbers that are the product of two large primes. While the Sieve itself isn't used to find these specific primes (other algorithms like Miller-Rabin are used), it's a foundational algorithm for understanding prime number properties.
  • Competitive Programming: In programming contests, problems often require pre-calculating all primes up to a certain limit (e.g., 10^7) to solve other problems efficiently. The Sieve is the go-to algorithm for this task due to its speed.
  • Mathematical Research: Number theory is an active area of research. Efficiently generating primes is a prerequisite for studying their distribution and properties, such as the Riemann hypothesis.
  • Benchmarking: As seen in our original story, prime generation is a classic way to benchmark CPU performance, as it is integer-heavy and computationally intensive.

Learning to implement it effectively, as shown in the optimized C# example, is a testament to your ability to handle classic algorithms and optimize code based on a deep understanding of data structures and computational complexity. For more challenges like this, you can explore our C# Learning Roadmap which is full of practical modules.


Frequently Asked Questions (FAQ)

What is the time complexity of the Sieve of Eratosthenes?
The time complexity is approximately O(n log log n), where n is the limit. This is derived from the sum of reciprocals of primes. For practical purposes, it is very close to O(n), making it extremely efficient for generating a list of primes.
Can the Sieve of Eratosthenes find primes up to a very large number?
The primary limitation is memory. The standard implementation requires an array of size n (the limit). If you want to find primes up to 1 billion, you would need a boolean array of 1 billion elements, which would consume about 1 GB of RAM. This is feasible on modern machines, but it becomes a problem for much larger limits.
Is the Sieve of Eratosthenes the fastest algorithm for finding prime numbers?
For finding all primes up to a moderately large limit (e.g., up to 10^8 or 10^9), it is one of the fastest and simplest to implement. More advanced algorithms like the Sieve of Atkin are theoretically faster (O(n / log log n)), but they are significantly more complex to implement correctly and often have higher constant factors, making the Sieve of Eratosthenes faster in practice for many common ranges.
Why does the outer loop in the optimized version only need to go up to the square root of the limit?
Any composite number c can be factored into a * b. If both a and b were greater than sqrt(c), then their product a * b would be greater than c, which is a contradiction. Therefore, at least one prime factor of any composite number must be less than or equal to its square root. This means that by the time we have sieved using all primes up to sqrt(limit), all composite numbers up to limit will have been marked.
How can I handle even larger limits that don't fit in memory?
For limits that are too large for a single array, you can use a technique called a "segmented sieve". The idea is to break the full range (e.g., 0 to 10^12) into smaller, manageable segments (e.g., of size 10^6). You first find all primes up to the square root of the total limit. Then, for each segment, you use those pre-calculated primes to sieve out the multiples within that smaller segment, drastically reducing memory requirements.
What's the difference between the `Queue` implementation and the `bool[]` array implementation?
The `Queue` implementation is a "destructive" filtering process where non-primes are discarded and survivors are re-queued. This involves a lot of object overhead. The `bool[]` implementation uses a "marking" process on a static data structure. It's much faster due to direct memory access (O(1) index lookup) and better CPU cache performance.
Is `Array.Empty<int>()` better than returning `new int[0]` in C#?
Yes. `Array.Empty<int>()` returns a cached, static instance of an empty array. This avoids allocating a new empty array object on the heap every time the method is called with a limit less than 2. It's a small but important micro-optimization that reduces memory allocation and garbage collection pressure.

Conclusion: From Ancient Theory to Modern Code

The Sieve of Eratosthenes is more than just a historical curiosity; it's a powerful demonstration of algorithmic efficiency. By shifting the problem from "how to prove a number is prime" to "how to eliminate all numbers that are not," it achieves remarkable performance. As we've seen, the choice of data structure—a simple bool[] array versus a Queue<int>—can have a profound impact on real-world speed, highlighting the importance of understanding the tools at your disposal in C#.

You've learned the theory, walked through two different C# implementations, and understood the critical optimizations that make the Sieve a go-to solution for generating primes. This knowledge is a valuable asset, applicable in performance-critical applications, technical interviews, and any scenario where computational efficiency matters.

Disclaimer: All code examples are based on modern C# (.NET 8+) and reflect current best practices as taught in the exclusive kodikra.com curriculum. The C# language and .NET framework are continuously evolving, so always refer to the latest official documentation for future versions.

Ready to continue your journey and master more algorithms and C# features? See the complete C# guide on our platform for more in-depth tutorials and challenges.


Published by Kodikra — Your trusted Csharp learning resource.