Sublist in Cairo: Complete Solution & Deep Dive Guide

Pyramids visible over buildings and street traffic

Mastering List Comparison in Cairo: The Complete Guide to Sublist, Superlist, and Equal

Determining if a list is a sublist, superlist, or equal to another in Cairo involves comparing list lengths and iterating through potential sub-sequences. This guide covers efficient algorithms using spans, pattern matching, and core library functions to solve this common data structure challenge.

You’ve just pulled two sets of transaction data from a Starknet block. One seems to be a subset of the other, but how can you be sure? This scenario, where you need to compare sequences of data, is not just common; it's a fundamental operation in everything from data analysis and string searching to validating complex state transitions in smart contracts.

In a language like Cairo, designed for provable computation and efficiency, solving this "sublist problem" isn't just about getting the right answer—it's about doing it in a clear, verifiable, and gas-conscious way. This guide will take you from zero to hero, breaking down the logic, providing a complete Cairo implementation, and exploring the deeper concepts that make this skill essential for any serious Cairo developer.


What Exactly is the Sublist Problem?

At its core, the sublist problem is a classification task. Given two lists, which we'll call A and B, we need to determine their relationship based on their contents and order. There are four possible outcomes, forming a hierarchy of comparisons.

  • Equal: List A and list B are identical. They must have the same length and the same elements in the exact same order. For example, [1, 2, 3] is equal to [1, 2, 3].
  • Superlist: List A contains list B as a contiguous block. This means B appears somewhere inside A without any gaps. For example, [1, 2, 3, 4, 5] is a superlist of [2, 3, 4].
  • Sublist: List A is contained within list B as a contiguous block. This is the inverse of a superlist. For example, [2, 3, 4] is a sublist of [1, 2, 3, 4, 5].
  • Unequal: None of the above relationships are true. The lists might share some elements, but not in a way that satisfies any of the other conditions. For example, [1, 2, 4] and [1, 2, 3, 4] are unequal.

It's crucial to distinguish a sublist from a subsequence. A sublist must be a contiguous block of elements. A subsequence can have elements from the original list with gaps. For instance, [1, 3, 5] is a subsequence of [1, 2, 3, 4, 5], but it is not a sublist.


Why This Logic is Critical for Cairo and Starknet Developers

While this might seem like a generic computer science problem, its application in the Cairo and Starknet ecosystem is particularly important. The determinism and resource constraints (gas fees) of the blockchain environment demand efficient and precise data manipulation.

Here’s why mastering this concept is non-negotiable:

  • Data Validation: Smart contracts often receive arrays or lists of data as input. A contract might need to verify if a user's provided signature sequence is a valid sublist of a master key set.
  • Event Log Analysis: When processing transaction logs, you might search for specific event sequences. Determining if a smaller sequence (a specific interaction pattern) is a sublist of the full event log is a direct application.
  • * State Verification: In complex state machines, you might need to check if a proposed state transition (represented as a list of changes) is a valid subset of all possible transitions from the current state.
  • Optimized String and Byte Array Manipulation: In Cairo, strings are often handled as `ByteArray` or a span of `felt252`. The sublist logic is the foundation for `contains`, `startsWith`, and `endsWith` operations, which are vital for parsing and handling text data on-chain.

An inefficient implementation can lead to excessive computation, resulting in higher gas costs for users and potentially hitting block gas limits. Therefore, understanding how to implement this logic cleanly is a hallmark of a proficient Cairo programmer.


How to Implement the Sublist Algorithm in Cairo

Let's build a complete solution in Cairo. Our approach will be methodical and structured, using helper functions to keep the code clean and readable. The core data structure we'll use is Span, which is Cairo's modern and efficient way to represent a slice or view of an array.

First, we define an enum to represent the four possible outcomes. This is much cleaner than returning magic numbers or strings.


use core::option::OptionTrait;

#[derive(Drop, PartialEq, Debug)]
pub enum Comparison {
    Equal,
    Sublist,
    Superlist,
    Unequal,
}

Now, let's lay out the logic in a primary function called sublist. This function will act as a controller, delegating the heavy lifting to more specialized helper functions based on the lengths of the input lists.

The Complete Cairo Solution

Here is the full, commented code from the exclusive kodikra.com learning path. This solution is designed for clarity and correctness, following modern Cairo best practices.


use core::option::OptionTrait;
use core::traits::PartialEq;

#[derive(Drop, PartialEq, Debug)]
pub enum Comparison {
    Equal,
    Sublist,
    Superlist,
    Unequal,
}

/// Helper function to check if `small_list` is a sublist of `large_list`.
/// This function assumes len(small_list) <= len(large_list).
fn is_contained_within(small_list: Span<felt252>, large_list: Span<felt252>) -> bool {
    let small_len = small_list.len();
    let large_len = large_list.len();

    // If the small list is empty, it's considered a sublist of any list.
    if small_len == 0 {
        return true;
    }

    // If the large list is empty but the small one isn't, it can't be a sublist.
    if large_len == 0 {
        return false;
    }

    // The number of possible starting positions for `small_list` within `large_list`.
    let windows = large_len - small_len + 1;
    let mut i = 0;
    while i < windows {
        // Create a "sliding window" or slice of the large list.
        let slice = large_list.slice(i, small_len);
        
        // The `==` operator on Spans compares them element by element.
        if slice == small_list {
            return true; // Found a match!
        }
        i += 1;
    };

    // If the loop completes without finding a match.
    false
}

/// Determines the relationship between two lists: Equal, Sublist, Superlist, or Unequal.
pub fn sublist(a: Span<felt252>, b: Span<felt252>) -> Comparison {
    let len_a = a.len();
    let len_b = b.len();

    if len_a == len_b {
        // If lengths are equal, they can only be Equal or Unequal.
        // The `==` operator on Span<T> handles element-wise comparison.
        if a == b {
            Comparison::Equal
        } else {
            Comparison::Unequal
        }
    } else if len_a < len_b {
        // If A is shorter, it can be a Sublist of B or Unequal.
        if is_contained_within(a, b) {
            Comparison::Sublist
        } else {
            Comparison::Unequal
        }
    } else { // This case covers len_a > len_b
        // If A is longer, it can be a Superlist of B or Unequal.
        if is_contained_within(b, a) {
            Comparison::Superlist
        } else {
            Comparison::Unequal
        }
    }
}

High-Level Logic Flow

This diagram illustrates the decision-making process our main sublist function follows. It prioritizes the cheapest checks (length comparison) first before moving to more computationally intensive element-wise comparisons.

    ● Start: sublist(A, B)
    │
    ▼
  ┌──────────────────┐
  │ Get len(A), len(B) │
  └─────────┬────────┘
            │
            ▼
  ◆ len(A) == len(B)?
  ├─ Yes ───▶ ◆ Is A == B?
  │           ├─ Yes ──▶ [Return Equal]
  │           └─ No  ───▶ [Return Unequal]
  │
  └─ No
     │
     ▼
  ◆ len(A) < len(B)?
  ├─ Yes ───▶ ◆ is_contained_within(A, B)?
  │           ├─ Yes ──▶ [Return Sublist]
  │           └─ No  ───▶ [Return Unequal]
  │
  └─ No (Implies len(A) > len(B))
     │
     ▼
  ◆ is_contained_within(B, A)?
  ├─ Yes ───▶ [Return Superlist]
  │
  └─ No  ───▶ [Return Unequal]
     │
     ▼
    ● End

Code Walkthrough: A Deep Dive into the Logic

Understanding the code requires breaking it down piece by piece. Let's analyze the purpose and mechanics of each function.

The `Comparison` Enum

#[derive(Drop, PartialEq, Debug)]
pub enum Comparison {
    Equal,
    Sublist,
    Superlist,
    Unequal,
}
  • #[derive(Drop, PartialEq, Debug)]: These are attributes that automatically implement certain traits for our enum.
    • Drop: Allows values of this type to be automatically cleaned up when they go out of scope. Essential for all custom types in Cairo.
    • PartialEq: Implements the equality operator (==), which is useful for testing our function's output.
    • Debug: Allows the enum to be printed in a human-readable format, invaluable for debugging.

The Helper Function: `is_contained_within`

This is the workhorse of our algorithm. It implements a "sliding window" technique to find a smaller list within a larger one.

fn is_contained_within(small_list: Span<felt252>, large_list: Span<felt252>) -> bool {
    let small_len = small_list.len();
    let large_len = large_list.len();

    if small_len == 0 {
        return true;
    }
  • Edge Case Handling: The function first handles a critical edge case: an empty list. An empty list is mathematically considered a sublist of any other list, including another empty list. So, if small_list is empty, we immediately return true.
    let windows = large_len - small_len + 1;
    let mut i = 0;
    while i < windows {
        let slice = large_list.slice(i, small_len);
        
        if slice == small_list {
            return true;
        }
        i += 1;
    };

    false
}
  • Sliding Window Logic:
    • let windows = large_len - small_len + 1;: This calculates how many possible starting positions the small_list could have within the large_list. For example, finding [3, 4] (len 2) in [1, 2, 3, 4, 5] (len 5) gives 5 - 2 + 1 = 4 possible windows: [1, 2], [2, 3], [3, 4], and [4, 5].
    • while i < windows: The loop iterates through each possible starting position.
    • large_list.slice(i, small_len): This is the core of the window. It creates a temporary Span (a view, not a copy, which is efficient) of the large_list. The slice starts at index i and has the same length as small_list.
    • if slice == small_list: Cairo's core library provides an implementation of PartialEq for Span<T>, which performs an element-by-element comparison. This is both convenient and efficient. If a match is found, we can stop searching and return true immediately.
  • Failure Case: If the while loop completes without finding any matching slice, it means the small_list is not contained within the large_list, so we return false.

Visualizing the `is_contained_within` Loop

This diagram shows the step-by-step process of the sliding window inside the `is_contained_within` function when searching for `[3, 4]` in `[1, 2, 3, 4, 5]`.

    ● Start: is_contained_within([3,4], [1,2,3,4,5])
    │
    ▼
  ┌───────────────────────────┐
  │ i = 0, windows = 4        │
  │ slice = large.slice(0, 2) │
  │       = [1, 2]            │
  └────────────┬──────────────┘
               │
               ▼
      ◆ Is [1, 2] == [3, 4]?
     ╱                      ╲
   Yes (Not in this case)    No
    │                        │
    ▼                        ▼
 [Return true]             ┌───────────────────────────┐
                           │ i = 1                     │
                           │ slice = large.slice(1, 2) │
                           │       = [2, 3]            │
                           └────────────┬──────────────┘
                                        │
                                        ▼
                               ◆ Is [2, 3] == [3, 4]?
                              ╱                      ╲
                            No                        Yes
                             │                         │
                             ▼                         ▼
                           ┌───────────────────────────┐ [Return true]
                           │ i = 2                     │
                           │ slice = large.slice(2, 2) │
                           │       = [3, 4]            │
                           └────────────┬──────────────┘
                                        │
                                        ▼
                               ◆ Is [3, 4] == [3, 4]?
                              ╱                      ╲
                            Yes                       No
                             │                         │
                             ▼                         ▼
                       [Return true]             (Continue Loop)
                             │
                             ▼
                            ● End

The Main Function: `sublist`

This function orchestrates the entire process by making smart, high-level decisions first.

pub fn sublist(a: Span<felt252>, b: Span<felt252>) -> Comparison {
    let len_a = a.len();
    let len_b = b.len();

    if len_a == len_b {
        if a == b { Comparison::Equal } else { Comparison::Unequal }
    } else if len_a < len_b {
        if is_contained_within(a, b) { Comparison::Sublist } else { Comparison::Unequal }
    } else { // len_a > len_b
        if is_contained_within(b, a) { Comparison::Superlist } else { Comparison::Unequal }
    }
}
  • Length-First Comparison: The logic first compares the lengths of a and b. This is the cheapest operation and immediately narrows down the possibilities.
    • If len_a == len_b, the only possibilities are Equal or Unequal. We can perform a direct comparison.
    • If len_a < len_b, a can only potentially be a Sublist of b. We call is_contained_within(a, b) to verify.
    • If len_a > len_b, a can only potentially be a Superlist of b. We check this by seeing if b is contained within a using is_contained_within(b, a).

This structured approach ensures we don't waste computation. We never try to find a larger list inside a smaller one, and we use the most efficient checks first.


Alternative Approaches and Performance Considerations

The "sliding window" approach we implemented is often called the naive string-searching algorithm. It's easy to understand, implement, and verify. For most on-chain use cases where lists are reasonably small, this is perfectly adequate.

However, it's important to be aware of its performance characteristics and potential alternatives.

Performance Analysis

Let n be the length of the larger list and m be the length of the smaller list. The `is_contained_within` function has a time complexity of O(m * (n-m+1)), which simplifies to O(m*n) in the worst case. This happens when there are many partial matches, for example, searching for "AAAAAB" in "AAAAAAAAAAAAAAAAAB".

For every window in the larger list, we might have to compare up to m elements. While generally fast, this can become a gas concern for very large lists.

Advanced Algorithms (for future exploration)

For scenarios requiring extreme optimization (e.g., searching within massive byte arrays), more advanced algorithms exist, though their implementation complexity is much higher:

  • Knuth-Morris-Pratt (KMP) Algorithm: Pre-processes the smaller list (the pattern) to create a lookup table. This table allows the algorithm to avoid re-comparing characters that have already been matched after a mismatch, achieving a linear time complexity of O(n + m).
  • Boyer-Moore Algorithm: Often faster in practice than KMP. It starts matching from the end of the pattern rather than the beginning and uses two heuristics to make large jumps through the text, skipping large sections that cannot match.

Pros & Cons of Our Implemented Approach

Aspect Naive Sliding Window (Our Solution) Advanced Algorithms (e.g., KMP)
Performance O(m*n) - Generally good for small to medium lists. O(n+m) - Significantly faster for very large lists.
Implementation Complexity Low. The logic is straightforward and easy to audit. High. Requires complex pre-processing and state management.
Code Size / Gas Cost Smaller code footprint, leading to lower contract deployment costs. Larger code size due to pre-processing logic, increasing deployment costs.
Best For Most typical smart contract use cases, data validation, and problems from the kodikra learning path. Off-chain data processing or on-chain applications with extremely large datasets where runtime gas is the primary constraint.

Frequently Asked Questions (FAQ)

What is a `Span<T>` in Cairo and why use it?

A Span<T> is a non-owning "view" or "slice" of a contiguous sequence of data, typically an Array<T>. It's extremely efficient because it doesn't copy the underlying data; it just holds a pointer to the start of the sequence and a length. This is ideal for our sublist function, as we can create many slices of the larger list without incurring the gas cost of data duplication.

How are empty lists handled in this problem?

Our solution correctly follows the standard convention: an empty list is considered a sublist of any other list. If list A is empty and list B is [1, 2, 3], A is a sublist of B. If both A and B are empty, they are considered Equal.

Is this sublist implementation gas-efficient?

For the vast majority of on-chain applications, yes. The use of Span<T> avoids costly data copies, and the logic is direct. The O(m*n) complexity is only a concern for exceptionally large lists, which are often impractical to process on-chain anyway due to block gas limits. For more insights into Cairo efficiency, explore our complete Cairo language guide.

Can this logic be adapted for `ByteArray` or strings?

Absolutely. A ByteArray in Cairo is conceptually similar to a list of bytes. The same sliding window logic can be applied to search for substrings within a larger string or byte array. You would simply adapt the function to work with a Span<byte> or the appropriate type.

What's the key difference between a sublist and a subsequence again?

A sublist must be a contiguous block. The elements must appear next to each other in the original list without any gaps. A subsequence can be formed by deleting zero or more elements from the original list, so the elements can be non-contiguous. For [1, 2, 3, 4], [2, 3] is a sublist, but [1, 3] is only a subsequence.

Why use a custom `Comparison` enum instead of returning an integer or string?

Using an enum provides type safety and clarity. It makes the function's return value explicit and self-documenting. A caller can use a match statement to handle each case cleanly, preventing bugs that might arise from mistyping a string (e.g., "sublist" vs "SubList") or misremembering which integer corresponds to which outcome.

How can I test this Cairo code effectively?

You should write unit tests covering all four outcomes and important edge cases. This includes testing with empty lists, lists of one element, lists where the sublist appears at the beginning, middle, and end, and lists with no match. The `#[test]` attribute in Cairo is used to define test functions where you can `assert` that `sublist(a, b)` returns the expected `Comparison` variant.


Conclusion: Building Foundational Skills

Mastering the sublist problem in Cairo is more than just solving a coding challenge; it's about building a foundational understanding of sequence manipulation, algorithmic thinking, and resource management in a constrained environment. By prioritizing length checks and using efficient, non-owning data structures like Span<T>, we created a solution that is readable, verifiable, and practical for real-world Starknet development.

This pattern of breaking down a problem, handling edge cases, and choosing the right tools is a skill you will use continuously on your journey. As you progress through the kodikra.com Cairo learning modules, you'll see this logic reappear in more complex forms, from parsing data streams to implementing sophisticated cryptographic checks.

Disclaimer: The code and explanations in this article are based on Cairo v2.6.3 and the Starknet ecosystem as of its writing. The Cairo language is under active development, so always consult the latest official documentation for the most current syntax and best practices.


Published by Kodikra — Your trusted Cairo learning resource.