Sieve in Bash: Complete Solution & Deep Dive Guide

Tabs labeled

Mastering Prime Numbers in Bash: The Sieve of Eratosthenes Explained

The Sieve of Eratosthenes is an ancient, highly efficient algorithm for finding all prime numbers up to a specified integer. This comprehensive guide demonstrates how to implement this algorithm in Bash script, using arrays to mark composite numbers and efficiently filter out primes, making it a classic benchmark.

You've just stumbled upon a treasure trove: a big box of random computer parts at a garage sale. The tinkerer in you sees endless possibilities—custom-built machines, each with a unique combination of CPUs, RAM, and storage. But how do you measure their raw performance? You need a reliable, CPU-intensive benchmark to push these custom rigs to their limits.

The challenge isn't just running a pre-made program; it's about creating your own tool from scratch, something that gives you deep insight into the machine's processing power. You decide on a classic, elegant, and surprisingly demanding algorithm: the Sieve of Eratosthenes. This ancient method for finding prime numbers is the perfect candidate. This guide will walk you through implementing it in Bash, turning a simple shell script into a powerful benchmarking tool and an incredible learning experience.


What is the Sieve of Eratosthenes?

The Sieve of Eratosthenes is an algorithm attributed to the Greek mathematician Eratosthenes of Cyrene. It provides a highly efficient way to find all prime numbers up to a specified limit. At its core, the algorithm works by iteratively marking as composite (i.e., not prime) the multiples of each prime, starting with the first prime number, 2.

Imagine you have a list of numbers from 2 up to a limit, say 30. The process is one of elimination, much like sifting flour to remove lumps. You take the first "un-sifted" number (which is 2), keep it because it's prime, and then eliminate all of its multiples (4, 6, 8, ...). Then you move to the next "un-sifted" number (which is 3), keep it, and eliminate all of its multiples (6, 9, 12, ...). You repeat this process until you've covered all the numbers.

The numbers that remain—those that were never eliminated—are the prime numbers. This method avoids costly trial division for every single number, making it significantly faster for finding all primes within a given range.

The Core Logic: A Step-by-Step Visualization

Let's break down the logical flow of the algorithm. It's a beautifully simple sequence of initialization, iteration, and filtration. The following diagram illustrates the high-level process.

    ● Start (Input: limit `n`)
    │
    ▼
  ┌──────────────────────────────────┐
  │ Create a boolean list `is_prime` │
  │ from 2 to `n`, all set to `true` │
  └─────────────────┬────────────────┘
                    │
                    ▼
  ┌──────────────────────────────────┐
  │ Initialize pointer `p` = 2       │
  └─────────────────┬────────────────┘
                    │
                    ▼
        ◆ Is `p * p` <= `n`?
       ╱              ╲
      Yes              No
      │                │
      ▼                ▼
┌──────────────┐   (Go to Collection)
│ Is `is_prime[p]` true? │
└───────┬──────┘
        │
       ╱ ╲
      Yes No ──────────┐
      │                │
      ▼                ▼
┌───────────────────┐  ┌────────────────┐
│ Mark all multiples│  │ Increment `p`  │
│ of `p` as `false` │  │ `p = p + 1`    │
└─────────┬─────────┘  └───────┬────────┘
          │                    │
          └─────────┬──────────┘
                    │
                    ▼
              (Loop back to `p*p <= n` check)
                    │
                    ▼
  ┌──────────────────────────────────┐
  │ Collect all numbers `i` where    │
  │ `is_prime[i]` is still `true`    │
  └─────────────────┬────────────────┘
                    │
                    ▼
               ● End (Output: list of primes)

Why Use Bash for This Algorithm?

Implementing a computationally intensive algorithm like the Sieve of Eratosthenes in Bash might seem counterintuitive. Bash is an interpreted shell scripting language, not a compiled language like C++ or Rust, which are designed for high-performance computing. However, this very limitation makes it a fascinating and valuable exercise for several reasons.

Firstly, it pushes the boundaries of what you can do with shell scripting. It forces you to understand Bash's features, such as array manipulation, arithmetic expansion, and process control, on a much deeper level. You learn about the performance costs associated with certain operations and how to write more efficient scripts.

Secondly, it serves as an excellent educational tool. By building the Sieve in Bash, you gain a tangible understanding of the algorithm's mechanics without the boilerplate code often required by other languages. It's a direct, hands-on way to see the logic in action. For anyone on the kodikra learning path for Bash, this module is a crucial step in mastering advanced scripting concepts.

Pros and Cons of a Bash Implementation

Like any technical decision, choosing Bash comes with trade-offs. It's important to understand where it shines and where it falls short for this particular task.

Pros Cons
Excellent for Learning: Provides a clear, direct way to learn the algorithm and advanced Bash features like array handling. Performance Overhead: Being an interpreted language, Bash is significantly slower than compiled languages for arithmetic-heavy tasks.
Universal Availability: Bash is pre-installed on virtually every Linux, macOS, and Unix-like system, making scripts highly portable. Memory Inefficiency: Bash arrays can be memory-intensive, especially for large limits, as they are not as optimized as native data structures in other languages.
Rapid Prototyping: Writing and testing the script is fast and requires no compilation step, ideal for quick experiments. Limited Data Structures: Lacks more advanced data structures like bitsets, which can make the Sieve implementation more memory-efficient.
Great for Automation: The script can be easily integrated into larger automation workflows or system administration tasks. Integer Limitations: Bash's built-in arithmetic has limits, and handling very large numbers (beyond 64-bit integers) requires external tools like bc.

How the Bash Script Works: A Detailed Code Walkthrough

Now, let's dissect the Bash script from the kodikra.com curriculum. We will go through it section by section to understand the purpose of every line and how it contributes to the final result. This hands-on analysis is key to truly grasping the implementation.

The Complete Initial Script

Here is the full solution code we will be analyzing. This script takes one command-line argument: the upper limit for finding primes.


#!/usr/bin/env bash
# Sieve of Eratosthenes
# Finds all prime numbers up to a given limit.

# --- Input Validation ---
if [[ $# -ne 1 ]] || ! [[ "$1" =~ ^[0-9]+$ ]] || [[ "$1" -lt 2 ]]; then
    echo "Usage: $0 LIMIT"
    echo "LIMIT must be an integer greater than or equal to 2."
    exit 1
fi

# --- Initialization ---
declare -i limit=$1
declare -a is_prime

# Initialize all numbers from 2 to limit as potentially prime.
for ((i = 2; i <= limit; i++)); do
    is_prime[i]=1 # Using 1 for true
done

# --- Sieving Process ---
for ((p = 2; p * p <= limit; p++)); do
    # If p is still marked as prime...
    if [[ ${is_prime[$p]} -eq 1 ]]; then
        # ...then mark all of its multiples as not prime.
        # We start marking from p*p.
        for ((m = p * p; m <= limit; m += p)); do
            is_prime[m]=0 # Using 0 for false
        done
    fi
done

# --- Result Collection ---
declare -a primes=()
for ((p = 2; p <= limit; p++)); do
    if [[ ${is_prime[$p]} -eq 1 ]]; then
        primes+=("$p")
    fi
done

# --- Output ---
echo "${primes[*]}"

Section 1: Shebang and Input Validation


#!/usr/bin/env bash

if [[ $# -ne 1 ]] || ! [[ "$1" =~ ^[0-9]+$ ]] || [[ "$1" -lt 2 ]]; then
    echo "Usage: $0 LIMIT"
    echo "LIMIT must be an integer greater than or equal to 2."
    exit 1
fi
  • #!/usr/bin/env bash: This is the "shebang." It tells the operating system to execute this script using the bash interpreter found in the user's environment path. It's a more portable way than hardcoding /bin/bash.
  • Input Validation: This block is crucial for robust scripting.
    • $# -ne 1: Checks if the number of command-line arguments ($#) is not equal to one. The script requires exactly one argument.
    • ! [[ "$1" =~ ^[0-9]+$ ]]: Checks if the first argument ($1) is not a sequence of one or more digits. This ensures the input is a valid non-negative integer.
    • [[ "$1" -lt 2 ]]: Checks if the input number is less than 2. The concept of primality begins at 2, so any lower limit is invalid.
    • If any of these conditions are true, it prints a usage message and exits with a non-zero status code, indicating an error.

Section 2: Initialization


declare -i limit=$1
declare -a is_prime

for ((i = 2; i <= limit; i++)); do
    is_prime[i]=1 # Using 1 for true
done
  • declare -i limit=$1: This command declares the variable limit as an integer (-i) and assigns it the value of the first command-line argument. Using declare -i enforces integer context for arithmetic operations.
  • declare -a is_prime: This explicitly declares is_prime as an array (-a). While Bash often creates arrays implicitly, explicit declaration is good practice.
  • The First Loop: This for loop is responsible for setting up our "sieve." It iterates from 2 up to the user-provided limit.
    • Inside the loop, is_prime[i]=1 sets the element at index i of the array to 1. We use 1 to represent true (is potentially prime) and will later use 0 for false. This populates our array, creating a list where every number is initially considered a prime candidate.

Section 3: The Sieving Process


for ((p = 2; p * p <= limit; p++)); do
    if [[ ${is_prime[$p]} -eq 1 ]]; then
        for ((m = p * p; m <= limit; m += p)); do
            is_prime[m]=0 # Using 0 for false
        done
    fi
done

This is the heart of the algorithm. It's where the "sifting" happens.

  • The Outer Loop: for ((p = 2; p * p <= limit; p++))
    • It starts with the first prime, p = 2.
    • The condition p * p <= limit is a critical optimization. We only need to check for factors up to the square root of the limit. Any composite number n will have at least one prime factor less than or equal to sqrt(n). This avoids a lot of redundant work.
  • The Prime Check: if [[ ${is_prime[$p]} -eq 1 ]]
    • Before sieving, it checks if our current number p is still marked as prime (i.e., its value in the array is 1). If it has already been marked as a multiple of a smaller prime (e.g., 4 was marked by 2), this check will fail, and we'll skip to the next p.
  • The Inner Loop (Marking Multiples): for ((m = p * p; m <= limit; m += p))
    • If p is prime, this loop iterates through all of its multiples within the limit and marks them as not prime.
    • m = p * p: Another key optimization. We can start marking from p squared because any smaller multiple of p (like 2*p, 3*p) would have already been marked by a smaller prime factor (2, 3, etc.). For example, when p=5, we don't need to mark 10 (already marked by 2), or 15 (already marked by 3). The first multiple we need to worry about is 5*5 = 25.
    • m += p: The loop increments by p to hit every subsequent multiple (e.g., p*p, p*p + p, p*p + 2p, etc.).
    • is_prime[m]=0: This is the "sieving" action. It sets the array element for the multiple m to 0 (false), effectively eliminating it.

Section 4: Result Collection and Output


declare -a primes=()
for ((p = 2; p <= limit; p++)); do
    if [[ ${is_prime[$p]} -eq 1 ]]; then
        primes+=("$p")
    fi
done

echo "${primes[*]}"
  • declare -a primes=(): We declare a new, empty array called primes to store our final results.
  • The Collection Loop: This final loop iterates through our is_prime boolean array one last time, from 2 to the limit.
  • if [[ ${is_prime[$p]} -eq 1 ]]: It checks which indices are still marked with 1 (true).
  • primes+=("$p"): If an index p is still marked as prime, its value (the number p itself) is appended to the primes array.
  • echo "${primes[*]}": Finally, this command prints the results. "${primes[*]}" expands to a single string containing all elements of the array, separated by the first character of the IFS (Internal Field Separator) variable, which is a space by default.

Where and When to Apply This Script

While you wouldn't use a Bash script for prime number generation in a high-performance cryptography application, it has several practical and educational uses.

Practical Applications

  • System Benchmarking: As explored in our story, this script is a simple yet effective way to stress-test a CPU. You can time its execution for a large limit to compare the performance of different machines or configurations.
    
    # Time the script to find primes up to 100,000
    time ./sieve.sh 100000
        
  • Educational Tool: It's an excellent resource for teaching algorithms, shell scripting, and computational thinking. The script's clarity makes it easy to follow the logic. For more advanced topics, check out the full Bash guide on kodikra.com.
  • Number Theory Exploration: For hobbyists or students exploring number theory, this script provides a quick way to generate lists of primes for further analysis without needing a full-fledged programming environment.

When to Choose an Alternative

The primary limitation of this script is performance. Bash is an interpreter, and each arithmetic operation and array access carries significant overhead compared to a compiled language. For finding primes up to a few hundred thousand, it's perfectly fine. However, if your limit is in the millions or billions, the script will become prohibitively slow and memory-intensive.

In such cases, you should turn to a systems programming language like C, C++, Go, or Rust. These languages compile to native machine code, offer highly optimized data structures (like bit vectors), and can leverage multi-threading to achieve performance several orders of magnitude greater than the Bash script.

The following diagram contrasts the decision-making process for choosing between Bash and a compiled language for this task.

    ● Start (Need to find primes)
    │
    ▼
  ┌──────────────────────────────┐
  │ Define Requirements:         │
  │ - Max Limit (N)              │
  │ - Performance Needs          │
  │ - Development Time           │
  └──────────────┬───────────────┘
                 │
                 ▼
    ◆ Is N large (> 1,000,000) OR
      is performance critical?
   ╱                           ╲
  Yes                           No
  │                              │
  ▼                              ▼
┌──────────────────┐      ┌─────────────────────────┐
│ Choose a Compiled│      │ Bash Script is a        │
│ Language         │      │ good choice for:        │
│ e.g., C++, Rust, Go│      │ - Learning & Prototyping│
└──────────────────┘      │ - Small N               │
                          │ - System Administration │
                          └─────────────────────────┘

Frequently Asked Questions (FAQ)

1. What exactly is a prime number?
A prime number is a natural number greater than 1 that has no positive divisors other than 1 and itself. The first few prime numbers are 2, 3, 5, 7, 11, and 13. The number 2 is the only even prime number.

2. Why is the Sieve of Eratosthenes considered efficient?
Its efficiency comes from eliminating the need for trial division. Instead of checking each number for divisibility by all smaller numbers, it systematically eliminates multiples in a single pass. Its time complexity is approximately O(n log log n), which is very close to linear time and far superior to trial division's O(n√n).

3. Can Bash handle very large numbers for this algorithm?
Standard Bash arithmetic is limited to the system's signed integer size (usually 64-bit). This means it can handle numbers up to roughly 9 x 10^18. However, the practical limitation is memory and speed. An array with millions of elements will consume significant RAM, and the script's execution time will become very long well before you hit the integer limit.

4. What does declare -i do in the script?
The command declare -i variable sets the "integer" attribute for a variable. This tells Bash to treat the variable as an integer in arithmetic contexts, which can sometimes provide minor performance benefits and helps enforce type safety within the script. Any string assigned to it that doesn't evaluate to a number will result in a value of 0.

5. Why do optimizations start marking multiples from p*p?
This is a crucial optimization. Any composite number less than p*p that is a multiple of p must also have a prime factor smaller than p. For example, when our prime p is 7, the multiples are 14, 21, 28, 35, 42, 49... The multiples 14, 21, 28, 35, and 42 have already been "sieved" by smaller primes (2, 3, and 5). The first multiple that is guaranteed not to have been marked yet is 7 * 7 = 49. Starting at p*p avoids redundant work.

6. How do I run this Bash script?
First, save the code to a file, for example, sieve.sh. Then, make it executable using the terminal command chmod +x sieve.sh. Finally, run it by providing a limit, like this: ./sieve.sh 100.

7. Is there a more memory-efficient way to implement the Sieve?
Yes. The most common method is using a bit array (or bitset), where each bit represents a number, rather than a full array element. A single bit can store the true/false state, reducing memory usage by a factor of 8 or more. While this is straightforward in languages like C++ or Python, implementing a true bit array in pure Bash is complex and not idiomatic.

Conclusion and Future-Proofing Your Skills

You have successfully implemented one of the most elegant algorithms in computer science using Bash. This exercise not only provides a powerful tool for benchmarking but also deepens your understanding of core scripting concepts, from array manipulation to performance optimization. You've seen how a simple idea—eliminating multiples—can be translated into an effective script and learned the trade-offs of using a shell language for a CPU-bound task.

The principles of the Sieve of Eratosthenes are timeless. As technology evolves, the need for efficient algorithms remains constant. Looking ahead, the trend in scripting and automation is toward languages with stronger performance characteristics and better concurrency models, like Go and Rust, especially for infrastructure and high-performance tasks. However, Bash remains the undisputed lingua franca of the command line, and mastering it is a foundational skill that will serve you for years to come.

Continue exploring the fascinating world of scripting and algorithms by checking out the complete Bash learning path on kodikra.com or diving into our extensive collection of expert Bash guides.

Disclaimer: The code in this article is written for modern Bash versions (4.0+). While largely compatible with older versions, some behaviors, especially concerning array handling, may differ. Always test scripts in your target environment.


Published by Kodikra — Your trusted Bash learning resource.