Binary Search in Cairo: Complete Solution & Deep Dive Guide

Pyramids visible over buildings and street traffic

The Ultimate Guide to Mastering Binary Search in Cairo

Binary search is a highly efficient algorithm for finding an item within a sorted collection. It operates on a "divide and conquer" principle, repeatedly halving the search interval to quickly narrow down the location of a target value. This guide provides a deep dive into its implementation in Cairo, analyzing its logic, performance, and critical applications within Starknet smart contracts for optimized, gas-efficient data retrieval.

Imagine you're building a decentralized application (dApp) on Starknet that manages a whitelist of thousands of user addresses. A user attempts to interact with a protected function, and you need to verify their eligibility instantly. Searching through the list one by one—a linear search—would be painfully slow and, more importantly, prohibitively expensive in terms of gas fees. Every comparison, every step, costs resources. This is a common bottleneck that can render a smart contract impractical.

This is where the elegance of binary search shines. Instead of a brute-force check, you can pinpoint the user's address in a handful of steps, even in a list containing millions of entries. By mastering this fundamental algorithm in Cairo, you unlock the ability to build scalable, efficient, and cost-effective smart contracts. This article will guide you through the theory, implementation, and strategic importance of binary search, transforming it from an abstract concept into a powerful tool in your Cairo development arsenal.

What Exactly Is Binary Search?

Binary search is a search algorithm that finds the position of a target value within a sorted array. The "binary" in its name refers to its core strategy: at each step, it compares the target value with the middle element of the array and eliminates half of the remaining elements from consideration. This process is repeated on the remaining sub-array until the value is found or the sub-array is empty.

The single most important prerequisite for binary search is that the collection must be sorted. If the data is not in order, the fundamental assumption that allows us to discard half the data at each step breaks down. For example, if we are looking for the number 25 in an unsorted list like [40, 10, 25, 5], checking the middle element gives us no reliable information about which half might contain our target.

Think of it like finding a word in a physical dictionary. You don't start at the first page and read every word. Instead, you open it roughly to the middle. If the words on that page come alphabetically after your target word, you know your word must be in the first half of the dictionary. You then take that first half and repeat the process, again opening to its middle. This intuitive, efficient method is the real-world equivalent of a binary search.

The Core Logic: Divide and Conquer

The algorithm's power comes from its logarithmic time complexity, denoted as O(log n). This means the time it takes to find an element grows very slowly as the size of the array (n) increases. In contrast, a linear search, which checks every element one by one, has a time complexity of O(n). For large datasets, the difference in performance is monumental.

Number of Elements (n)	Max Steps for Linear Search (n)	Max Steps for Binary Search (log₂ n)
100	100	~7
1,000	1,000	~10
1,000,000	1,000,000	~20
1,000,000,000	1,000,000,000	~30

As you can see, even for a list with a billion elements, binary search can find any item in about 30 steps. This level of efficiency is not just a nice-to-have in blockchain development; it's often a necessity for managing computational resources (gas) effectively.

Why Binary Search is a Game-Changer for Cairo and Starknet

In the world of blockchain and smart contracts, every computation has a cost. The Starknet network, like other blockchains, requires gas to pay for the computational effort of executing transactions. Inefficient algorithms lead to high gas consumption, making dApps expensive for users and potentially too slow to be practical.

Cairo is a language designed for creating STARK-provable programs, where computational efficiency is paramount. An algorithm that minimizes steps, like binary search, directly translates to a more optimized and cheaper program to run on-chain. This is why understanding and implementing efficient algorithms is a core competency for any serious Cairo developer.

Key Use Cases in Smart Contracts:

Whitelist/Allowlist Verification: Quickly checking if a user's address is present in a large, sorted list of whitelisted addresses for a token sale or special access.
On-Chain Data Lookups: Searching for a specific record ID, user ID, or other unique identifier in a large, sorted dataset stored within a contract's storage.
State Verification: Efficiently finding a specific state or version number from a sorted history of states.
Oracle Data Management: If an oracle pushes a large, sorted list of data on-chain (e.g., prices, election results), binary search allows contracts to query it efficiently.

By using binary search instead of linear search in these scenarios, you can reduce the computational complexity from O(n) to O(log n), resulting in a massive reduction in gas costs, especially as the dataset grows.

How the Binary Search Algorithm Works: A Step-by-Step Breakdown

Let's formalize the logic. The algorithm maintains a "search space" within the array, which initially is the entire array. At each step, this space is halved until the target is found or the space becomes empty.

Here are the steps, which we'll illustrate with a diagram:

Initialize Pointers: Start with two pointers, low pointing to the first index (0) and high pointing to the last index (n-1). These define the current search space.
Loop Condition: Continue the search as long as low is less than or equal to high. If low becomes greater than high, it means the search space is empty and the element is not in the array.
Calculate the Middle Index: Find the middle index of the current search space: mid = low + (high - low) / 2. This method avoids potential overflow issues that can arise from (low + high) / 2 with very large indices.
Compare and Conquer:
- Match Found: If the element at array[mid] is equal to the target value, the search is successful. Return the index mid.
- Target is Smaller: If the target value is less than array[mid], it means the target must be in the left half of the current search space. Discard the right half by updating the high pointer: high = mid - 1.
- Target is Larger: If the target value is greater than array[mid], it means the target must be in the right half. Discard the left half by updating the low pointer: low = mid + 1.
Repeat: Go back to step 2 and repeat the process with the new, smaller search space.
Element Not Found: If the loop finishes without finding the element, it does not exist in the array. The function typically returns a special value like -1 or, in Cairo's case, Option::None.

Visualizing the Flow

This ASCII diagram illustrates the decision-making process within each iteration of the binary search loop.

    ● Start with Search Space [low...high]
    │
    ▼
  ┌──────────────────┐
  │ low <= high ?    │
  └─────────┬────────┘
            │ Yes
            ▼
  ┌──────────────────┐
  │ mid = (low+high)/2 │
  └─────────┬────────┘
            │
            ▼
  ◆ array[mid] == target?
   ╱         |         ╲
  ╱          |          ╲
Yes          | No        |
 │           ▼           ▼
 │   ◆ target < array[mid]?   
 │    ╱           ╲
 │   Yes           No
 │    │             │
 ▼    ▼             ▼
[Found!] │ high=mid-1    │ low=mid+1
 │    │             │
 │    └─────┬───────┘
 │          │
 └──────────┴─Loop back to Start

The Implementation: A Deep Dive into the Cairo Code

Now, let's analyze a complete, idiomatic Cairo implementation for binary search from the kodikra.com learning path. This function is generic, meaning it can work with any data type that supports comparison, and it leverages Cairo's powerful features like Span for memory-efficient operations.

The Solution Code

pub fn find<T, +Drop<T>, +Copy<T>, +PartialOrd<T>>(
    search_array: @Array<T>,
    value: T,
) -> Option<usize> {
    let mut base = 0_usize;
    let mut slice = search_array.span();

    loop {
        if slice.len() == 0 {
            break;
        }

        let head = slice.slice(0, slice.len() / 2);
        let mut tail = slice.slice(slice.len() / 2, slice.len() - head.len());

        if let Option::Some(middle_element) = tail.pop_front() {
            if *middle_element == value {
                return Option::Some(base + head.len());
            } else if *middle_element < value {
                base += head.len() + 1;
                slice = tail.span();
            } else {
                slice = head.span();
            }
        }
    }

    return Option::None;
}

Line-by-Line Code Walkthrough

This implementation is clever. Instead of managing low and high indices, it directly manipulates a Span, which is a non-owning view into a contiguous sequence of memory (like an array). This avoids copying data and is highly efficient.

Function Signature

pub fn find<T, +Drop<T>, +Copy<T>, +PartialOrd<T>>(
    search_array: @Array<T>,
    value: T,
) -> Option<usize>

pub fn find: Defines a public function named find.
<T, +Drop<T>, +Copy<T>, +PartialOrd<T>>: This makes the function generic over a type T.
- +Drop<T>: The type T can be dropped (memory can be reclaimed).
- +Copy<T>: The type T can be copied by value. This is useful for simple types like integers or felts.
- +PartialOrd<T>: This is the most critical trait here. It means the type T supports partial ordering, allowing us to use comparison operators like <, >, and ==.
search_array: @Array<T>: The input is a snapshot (read-only view) of an Array of type T. The @ symbol indicates a snapshot.
value: T: The target value we are searching for.
-> Option<usize>: The function returns an Option. It will be Some(index) if the value is found, where index is a usize, or None if the value is not present. This is a safe and idiomatic way to handle cases where a value might not be found.

Initialization

let mut base = 0_usize;
let mut slice = search_array.span();

let mut base = 0_usize;: We initialize a mutable variable base. This variable tracks the absolute starting index of our current search window (the slice) relative to the original search_array.
let mut slice = search_array.span();: We create a mutable Span from the input array. A span is a lightweight view into the array's data. We will narrow this `slice` in each iteration instead of creating new arrays.

The Main Loop

loop {
    if slice.len() == 0 {
        break;
    }
    // ... loop body ...
}

This is an infinite loop that contains the core search logic. The loop is broken explicitly when the search space (the slice) becomes empty, which means the element was not found.

Dividing the Slice

let head = slice.slice(0, slice.len() / 2);
let mut tail = slice.slice(slice.len() / 2, slice.len() - head.len());

let head = ...: We create a new slice representing the first half of the current slice.
let mut tail = ...: We create another slice for the second half. The middle element will be the first element of the tail.

Checking the Middle Element

if let Option::Some(middle_element) = tail.pop_front() {
    // ... comparison logic ...
}

tail.pop_front(): This method attempts to remove and return the first element of the tail slice. It returns an Option because the tail could be empty. This first element is our "middle" element for the current search space.
if let Option::Some(...): This is pattern matching. If pop_front() successfully returns an element, it is bound to the middle_element variable, and the code inside the if block is executed.

Comparison and State Update

if *middle_element == value {
    return Option::Some(base + head.len());
} else if *middle_element < value {
    base += head.len() + 1;
    slice = tail.span();
} else {
    slice = head.span();
}

if *middle_element == value: We dereference middle_element with * to get its value and compare it to our target value. If they match, we've found it! The final index is our current base plus the length of the head part we skipped over. We return this index wrapped in Some.
else if *middle_element < value: If the middle element is smaller than our target, we know the target must be in the remaining part of the tail.
- base += head.len() + 1;: We update our base index to reflect that we are discarding the head and the middle_element.
- slice = tail.span();: Our new search space becomes the rest of the tail.
else: This case handles *middle_element > value. The target must be in the head.
- slice = head.span();: We update our search space to be just the head. The base index does not need to be changed.

Return Value

return Option::None;

If the loop finishes (because the slice became empty), it means the value was never found. We exit the loop and return Option::None as promised by the function signature.

Visualizing the Slice and Base Logic

Let's trace the search for value = 77 in [10, 21, 34, 55, 68, 77, 89, 101].

● Initial State
  ├── array: [10, 21, 34, 55, 68, 77, 89, 101]
  ├── base: 0
  └── slice: [10, 21, 34, 55, 68, 77, 89, 101]
                 │
                 ▼
● Iteration 1
  ├── head: [10, 21, 34, 55]
  ├── tail: [68, 77, 89, 101]
  ├── middle: 68
  ├── Logic: 68 < 77, so search in tail.
  └── New State:
      ├── base: 0 + 4 + 1 = 5
      └── slice: [77, 89, 101]
                 │
                 ▼
● Iteration 2
  ├── head: [77]
  ├── tail: [89, 101]
  ├── middle: 89
  ├── Logic: 89 > 77, so search in head.
  └── New State:
      ├── base: 5 (unchanged)
      └── slice: [77]
                 │
                 ▼
● Iteration 3
  ├── head: []
  ├── tail: [77]
  ├── middle: 77
  ├── Logic: 77 == 77. Found!
  └── Return: Some(base + head.len()) ⟶ Some(5 + 0) ⟶ Some(5)

Pros, Cons, and Strategic Considerations

While binary search is incredibly powerful, it's not a silver bullet for every search problem. Understanding its trade-offs is key to being an effective engineer.

Advantages of Binary Search

Extreme Efficiency: O(log n) time complexity is its greatest strength, making it suitable for very large datasets.
Reduced Gas Costs: In the context of Cairo and Starknet, fewer computational steps directly translate to lower gas fees for users.
Predictable Performance: The maximum number of comparisons is fixed and known for a given dataset size, which helps in gas estimation.

Disadvantages and Risks

Requires Sorted Data: The most significant limitation. If the data is not sorted, the algorithm will produce incorrect results. The cost of sorting the data initially (typically O(n log n)) must be considered.
Insertion/Deletion Overhead: If the dataset changes frequently, maintaining a sorted array can be costly. Inserting an element into a sorted array requires shifting elements, which is an O(n) operation.
Not Ideal for Small Datasets: For very small arrays (e.g., fewer than 10-20 elements), the overhead of the binary search logic might make a simple linear search just as fast or even slightly faster, and certainly simpler to implement.

Frequently Asked Questions (FAQ)

What is the time and space complexity of binary search?

The time complexity is O(log n) because the search space is halved at each step. The space complexity for the iterative version shown here is O(1), as it only uses a few variables to keep track of state, regardless of the input array's size.

Why exactly must the array be sorted for binary search?

The algorithm's core "divide and conquer" strategy relies on being able to make a definitive choice after one comparison. When we compare our target to the middle element of a sorted array, we know with certainty that if our target is smaller, it *must* be in the left half, and if it's larger, it *must* be in the right half. This guarantee is lost if the array is unsorted, making it impossible to discard any part of the array safely.

What happens if the target value appears multiple times in the array?

The standard binary search algorithm does not guarantee which of the duplicate elements it will find. It will return the index of *any* of the occurrences. Modified versions of the algorithm exist to specifically find the first or last occurrence of a value if needed.

Is binary search always better than linear search?

Not always. For small, unsorted datasets, a linear search is often more practical. If the data is frequently modified, the cost of keeping it sorted for binary search might outweigh the search performance benefits. The choice depends on the size of the data, how often it changes, and the performance requirements of the application.

How does binary search significantly impact gas costs in a Cairo smart contract?

Gas fees on Starknet are tied to the number of computational steps. A linear search on an array of 1,000 elements could take up to 1,000 comparison steps in the worst case. A binary search would take a maximum of 10 steps. This dramatic reduction in operations leads to a correspondingly dramatic reduction in the transaction's gas cost, making the contract more affordable and scalable.

What is the difference between Array and Span in this Cairo code?

An Array owns its data; it's the actual container for the elements. A Span is a lightweight, non-owning "view" or "slice" of an array's data. Using spans is highly efficient because when we "slice" a span, we are just creating a new view with different start/end pointers, not copying the underlying data. This avoids unnecessary memory allocation and data copying, which is crucial for performance.

Can binary search be used on data structures other than arrays?

Yes, the principle of binary search can be applied to any sorted, random-access data structure. The most notable example is a Binary Search Tree (BST), which is a node-based data structure specifically designed to leverage this search method, providing O(log n) average time for search, insertion, and deletion.

Conclusion: An Essential Algorithm for the Modern Cairo Developer

Binary search is more than just a classic computer science algorithm; it's a fundamental technique for writing efficient, scalable, and cost-effective code. In the resource-constrained environment of the blockchain, its ability to find data with logarithmic efficiency is not just an optimization—it's a critical enabler for complex applications.

By understanding its mechanics, implementing it idiomatically in Cairo using features like generics and spans, and knowing when to apply it, you elevate your capabilities as a smart contract engineer. You move from simply writing code that works to engineering solutions that are performant, scalable, and economically viable on the decentralized web.

Ready to continue your journey and tackle more advanced concepts? Explore the complete Cairo Learning Roadmap on kodikra.com to follow our structured curriculum. To reinforce your foundational knowledge of the language, be sure to visit our comprehensive Cairo language guide.

Disclaimer: The Cairo code and concepts discussed in this article are based on recent versions of the language. As Cairo continues to evolve, syntax and best practices may change. Always refer to the official documentation for the most current information.

Published by Kodikra — Your trusted Cairo learning resource.

kodikra

Search this blog