Sublist in Arm64-assembly: Complete Solution & Deep Dive Guide

white and black abstract illustration

The Ultimate Guide to Sublist Detection in Arm64 Assembly

Implementing list comparison logic like checking for a sublist is a fundamental task. In Arm64 assembly, this involves direct memory manipulation, pointer arithmetic, and careful control flow, offering a raw view into how high-level language features are built. This guide covers creating a robust sublist detection function from scratch, comparing two lists to determine if they are equal, one is a sublist of the other, a superlist, or simply unequal.


The Challenge: Replicating High-Level Logic at the Lowest Level

Imagine you're working in a high-level language like Python or Java. Checking if one list is contained within another is often a one-liner: if sub_list in super_list:. It's simple, expressive, and you rarely think about the complex machinery operating under the hood. But what happens when you're stripped of these abstractions? When you're writing code for a bootloader, an operating system kernel, or a performance-critical embedded system, you don't have a standard library to lean on.

This is the world of assembly language. You're faced with raw memory addresses, registers, and a limited set of instructions. The task of finding a contiguous sequence of bytes (a sublist) inside a larger one becomes a manual process of looping, comparing, and pointer juggling. It can feel daunting, like being asked to build a car engine using only a wrench and a screwdriver. The logic is clear in your head, but translating it into the precise, unforgiving syntax of Arm64 assembly is a true test of a programmer's skill.

This guide promises to bridge that gap. We will deconstruct the sublist problem and build a complete, efficient solution in Arm64 assembly. You will not only get a working piece of code but also gain a profound understanding of memory access patterns, algorithmic implementation at the CPU level, and the AArch64 calling convention, turning a complex problem into a clear, manageable process.


What Exactly Is the Sublist Problem?

Before diving into assembly code, it's crucial to define the problem with precision. Given two lists, which we'll call List A and List B, our function must determine their relationship and return one of four distinct results. For our implementation, we'll assume the lists contain 32-bit integer values.

The four possible outcomes are:

  • EQUAL: List A and List B have the exact same elements in the exact same order. This implies they must also have the same length. For example, [1, 2, 3] is equal to [1, 2, 3].
  • SUBLIST: List A is a sublist of List B. This means the entire sequence of elements in List A appears contiguously within List B. For example, [2, 3] is a sublist of [1, 2, 3, 4].
  • SUPERLIST: List A is a superlist of List B. This is the inverse of the sublist relationship; List B appears contiguously within List A. For example, [1, 2, 3, 4] is a superlist of [2, 3].
  • UNEQUAL: None of the above relationships are true. The lists are different, and neither is a sublist of the other. For example, [1, 2, 4] and [1, 2, 3, 5] are unequal.

An important edge case is the empty list. An empty list is considered a sublist of any other list, including another empty list. Two empty lists are also considered equal.

The Logical Flow of Comparison

A naive approach might involve random checks, but an efficient algorithm follows a specific order to minimize comparisons. The most logical sequence is to check the most restrictive conditions first and move to the more general ones.

● Start with List A and List B
│
▼
┌─────────────────────────┐
│ Are lengths equal?      │
└────────────┬────────────┘
             │
             ▼
        ◆ Both Zero?
       ╱            ╲
     Yes             No
      │               │
      ▼               ▼
┌───────────┐   ┌──────────────────────────┐
│  EQUAL    │   │ Compare all elements     │
└───────────┘   └───────────┬──────────────┘
                            │
                            ▼
                       ◆ Match?
                      ╱        ╲
                    Yes         No
                     │           │
                     ▼           ▼
               ┌───────────┐   (Proceed to
               │  EQUAL    │    Sublist Check)
               └───────────┘

This diagram shows the first step. If the lists are of equal length, we perform a direct element-by-element comparison. If they match, the result is EQUAL. If their lengths differ or the elements don't match, we must proceed to the sublist/superlist checks.


Why Arm64 Assembly for This Task?

In a world dominated by high-level languages, choosing to implement an algorithm in assembly might seem like an academic exercise. However, there are compelling, real-world reasons why this skill remains invaluable.

Unmatched Performance and Control

The primary reason to use assembly is performance. By writing instructions directly for the CPU, you bypass compiler abstractions and can create highly optimized code. For tasks involving tight loops and extensive memory access, like searching for a sub-sequence, a hand-tuned assembly routine can outperform compiler-generated code. You have direct control over register allocation, instruction scheduling, and memory access patterns, allowing you to squeeze every last drop of performance from the hardware.

System-Level Programming

Assembly is the language of the machine. It's indispensable in environments where no operating system or standard library exists. This includes:

  • Operating System Kernels: The core of an OS that manages hardware requires assembly for context switching, interrupt handling, and memory management.
  • Bootloaders: The very first code that runs when a computer starts is written in assembly to initialize the hardware.
  • Device Drivers: Interfacing directly with hardware components often requires precise instruction sequences that can only be guaranteed with assembly.

Deepening Your Computer Science Foundation

Writing assembly code forces you to understand how a computer truly works. You learn about the CPU architecture, memory hierarchy, instruction sets, and calling conventions. This knowledge makes you a better programmer, even when you return to high-level languages, as you can write code that is more sympathetic to the underlying hardware, leading to better performance and fewer bugs.


How to Implement the Sublist Algorithm in Arm64

Our strategy is to build a main function, sublist, that orchestrates the logic by calling specialized helper functions. This modular approach makes the code cleaner, easier to debug, and more reusable. We will adhere to the AArch64 Procedure Call Standard (AAPCS64), where the first few arguments are passed in registers x0 through x7, and the return value is placed in x0.

Our function signature will be: sublist(list_a_ptr, len_a, list_b_ptr, len_b)

  • x0: Pointer to the start of List A.
  • x1: Length of List A (number of elements).
  • x2: Pointer to the start of List B.
  • x3: Length of List B.

We will define constants for our four possible return values:

  • EQUAL = 0
  • SUBLIST = 1 (A is a sublist of B)
  • SUPERLIST = 2 (A is a superlist of B)
  • UNEQUAL = 3

The High-Level Plan

  1. Check for Equality: First, compare the lengths of A and B. If they are equal, call a helper function, are_equal, to compare them element by element. If they match, we're done.
  2. Check for Sublist: If they are not equal, we check if A is a sublist of B. This is only possible if len_a <= len_b. We'll use a helper function, is_sublist, for this.
  3. Check for Superlist: If A is not a sublist of B, we check the reverse: is B a sublist of A? This is only possible if len_b <= len_a. We can reuse our is_sublist function with the arguments swapped.
  4. Declare Unequal: If none of the above conditions are met, the lists are unequal.

ASCII Diagram: The Nested Loop Logic for is_sublist

The core of the problem lies in the is_sublist check. This requires a nested loop. The outer loop iterates through the potential starting positions in the larger list, while the inner loop performs the element-by-element comparison.

// Check if B is a sublist of A
// A = [1, 2, 3, 4, 5], B = [3, 4]

● Start: Outer loop pointer `i` at A[0]
│
▼
┌──────────────────────────┐
│ `i` points to A[0] (val 1) │
└───────────┬──────────────┘
            │
            ▼
     ◆ A[i..i+len(B)-1] == B?
    ╱   (Compare [1, 2] with [3, 4])
   No
   │
   ▼
┌──────────────────────────┐
│ `i` increments, points to A[1] (val 2) │
└───────────┬──────────────┘
            │
            ▼
     ◆ A[i..i+len(B)-1] == B?
    ╱   (Compare [2, 3] with [3, 4])
   No
   │
   ▼
┌──────────────────────────┐
│ `i` increments, points to A[2] (val 3) │
└───────────┬──────────────┘
            │
            ▼
     ◆ A[i..i+len(B)-1] == B?
    ╱   (Compare [3, 4] with [3, 4])
  Yes
   │
   ▼
┌───────────┐
│  MATCH!   │
│ Return true │
└───────────┘

This flow visualizes how the outer loop slides a "window" of the sublist's length across the superlist, performing a full comparison at each step until a match is found or all possibilities are exhausted.


The Complete Arm64 Assembly Solution

Here is the full, commented source code. This code is structured into a main function and two helpers. It assumes the list elements are 32-bit integers, so we multiply lengths by 4 (the size of a word) for pointer arithmetic.


// sublist.s
// Solution from the exclusive kodikra.com learning curriculum.

.data
// Define constants for the return values, making the code more readable.
EQUAL:      .word 0
SUBLIST:    .word 1
SUPERLIST:  .word 2
UNEQUAL:    .word 3

.text
.global sublist

// -----------------------------------------------------------------------------
// are_equal(ptr1, len1, ptr2, len2) -> returns 1 if equal, 0 otherwise
// A helper function to check if two lists are identical.
// It's a specialized case of is_sublist but simpler.
// x0: ptr1, x1: len1, x2: ptr2, x3: len2
// Returns result in x0.
// Clobbers: x0, x1, x2, x3, x4, x5
// -----------------------------------------------------------------------------
are_equal:
    // First, check if lengths are equal. If not, they can't be equal.
    cmp x1, x3
    b.ne .are_equal_false // Branch if not equal

    // If both lengths are zero, they are equal.
    cbz x1, .are_equal_true // If len1 is zero, branch to true

    // Set up loop
    mov x4, x0          // x4 = current pointer for list 1
    mov x5, x2          // x5 = current pointer for list 2
    mov x6, x1          // x6 = loop counter

.are_equal_loop:
    // Load one element from each list (32-bit word)
    ldr w7, [x4], #4    // Load from ptr1 and post-increment pointer by 4 bytes
    ldr w8, [x5], #4    // Load from ptr2 and post-increment pointer by 4 bytes

    // Compare the elements
    cmp w7, w8
    b.ne .are_equal_false // If elements differ, branch to false

    // Decrement counter and loop if not zero
    subs x6, x6, #1
    b.ne .are_equal_loop

.are_equal_true:
    mov x0, #1          // Return 1 (true)
    ret

.are_equal_false:
    mov x0, #0          // Return 0 (false)
    ret

// -----------------------------------------------------------------------------
// is_sublist(sub_ptr, sub_len, super_ptr, super_len) -> 1 if sublist, 0 otherwise
// The core logic to check if one list is a sublist of another.
// x0: sub_ptr, x1: sub_len, x2: super_ptr, x3: super_len
// Returns result in x0.
// Clobbers: x0-x10
// -----------------------------------------------------------------------------
is_sublist:
    // If sublist length is 0, it's always a sublist (even of an empty list).
    cbz x1, .is_sublist_true

    // If sublist length > superlist length, it's impossible.
    cmp x1, x3
    b.gt .is_sublist_false

    // Save callee-saved registers we will modify
    stp x19, x20, [sp, #-16]!
    stp x21, x22, [sp, #-16]!

    // Setup registers for the loops
    mov x19, x0         // x19 = sub_ptr (callee-saved)
    mov x20, x1         // x20 = sub_len (callee-saved)
    mov x21, x2         // x21 = super_ptr (callee-saved)
    mov x22, x3         // x22 = super_len (callee-saved)

    // Calculate the number of possible start positions for the sublist
    // in the superlist. This is (super_len - sub_len + 1).
    sub x4, x22, x20    // x4 = super_len - sub_len
    add x4, x4, #1      // x4 = outer loop count

    mov x5, x21         // x5 = current pointer in superlist (outer loop iterator)

.outer_loop:
    // At the start of each outer loop iteration, we check if the sublist
    // starting from the current position in the superlist is equal to our sublist.
    // We can reuse the are_equal logic here, but inline it for efficiency.
    mov x6, x19         // x6 = current pointer in sublist
    mov x7, x5          // x7 = current pointer in superlist slice
    mov x8, x20         // x8 = inner loop counter (sub_len)

.inner_loop:
    // If the inner loop counter is zero, we've successfully matched all elements.
    cbz x8, .is_sublist_true_cleanup

    ldr w9, [x6], #4    // Load from sublist
    ldr w10, [x7], #4   // Load from superlist slice
    cmp w9, w10
    b.ne .inner_loop_fail // If elements don't match, this attempt failed.

    subs x8, x8, #1
    b.ne .inner_loop    // Continue inner loop

    // If we get here, it means inner loop finished without failure (cbz was taken)
    b .is_sublist_true_cleanup

.inner_loop_fail:
    // The match failed. We need to advance the outer loop pointer and try again.
    add x5, x5, #4      // Move to the next element in the superlist
    subs x4, x4, #1     // Decrement outer loop counter
    b.ne .outer_loop    // If there are more positions to check, loop again

    // If the outer loop finishes, no match was found.
    b .is_sublist_false_cleanup

.is_sublist_true_cleanup:
    mov x0, #1          // Set return value to 1 (true)
    ldp x21, x22, [sp], #16
    ldp x19, x20, [sp], #16
    ret

.is_sublist_false_cleanup:
    mov x0, #0          // Set return value to 0 (false)
    ldp x21, x22, [sp], #16
    ldp x19, x20, [sp], #16
    ret

// Convenience labels for the boolean returns
.is_sublist_true:
    mov x0, #1
    ret
.is_sublist_false:
    mov x0, #0
    ret


// -----------------------------------------------------------------------------
// sublist(list_a_ptr, len_a, list_b_ptr, len_b) -> result code
// The main function that determines the relationship between two lists.
// x0: list_a_ptr, x1: len_a, x2: list_b_ptr, x3: len_b
// Returns: 0 for EQUAL, 1 for SUBLIST, 2 for SUPERLIST, 3 for UNEQUAL.
// -----------------------------------------------------------------------------
sublist:
    // Per AAPCS64, we must save any callee-saved registers we use.
    // We will use x19-x22 to store the initial arguments.
    stp x19, x20, [sp, #-16]!
    stp x21, x22, [sp, #-16]!

    // Store original arguments
    mov x19, x0         // list_a_ptr
    mov x20, x1         // len_a
    mov x21, x2         // list_b_ptr
    mov x22, x3         // len_b

    // STEP 1: Check for equality.
    // This is the most specific case.
    mov x0, x19
    mov x1, x20
    mov x2, x21
    mov x3, x22
    bl are_equal
    cbz x0, .check_sublist // If not equal (result is 0), proceed to next check

    // If we are here, the lists are equal.
    ldr x0, =EQUAL
    ldr w0, [x0]
    b .done

.check_sublist:
    // STEP 2: Check if A is a sublist of B.
    // Restore original arguments for the call
    mov x0, x19
    mov x1, x20
    mov x2, x21
    mov x3, x22
    bl is_sublist
    cbz x0, .check_superlist // If not a sublist, proceed

    // A is a sublist of B.
    ldr x0, =SUBLIST
    ldr w0, [x0]
    b .done

.check_superlist:
    // STEP 3: Check if B is a sublist of A (i.e., A is a superlist of B).
    // We swap the arguments to reuse the is_sublist function.
    mov x0, x21         // sub_ptr = list_b_ptr
    mov x1, x22         // sub_len = len_b
    mov x2, x19         // super_ptr = list_a_ptr
    mov x3, x20         // super_len = len_a
    bl is_sublist
    cbz x0, .is_unequal // If not a superlist, they must be unequal

    // A is a superlist of B.
    ldr x0, =SUPERLIST
    ldr w0, [x0]
    b .done

.is_unequal:
    // STEP 4: If all other checks fail, the lists are unequal.
    ldr x0, =UNEQUAL
    ldr w0, [x0]

.done:
    // Restore the callee-saved registers from the stack
    ldp x21, x22, [sp], #16
    ldp x19, x20, [sp], #16
    ret                 // Return to caller

How to Compile and Run

To use this assembly code, you would typically write a C/C++ wrapper to call it and test its functionality. First, you assemble the code into an object file.


# Using GNU Assembler
as sublist.s -o sublist.o

# Or using Clang
clang -c sublist.s -o sublist.o

Next, you would link this object file with your C/C++ test harness.


# Example C test file (test.c)
#include <stdio.h>

// Declare the assembly function prototype
int sublist(int* list_a, int len_a, int* list_b, int len_b);

int main() {
    int list_a[] = {1, 2, 3};
    int list_b[] = {1, 2, 3, 4, 5};
    int result = sublist(list_a, 3, list_b, 5);
    // Expected result: 1 (SUBLIST)
    printf("Result: %d\n", result);
    return 0;
}

Finally, link them together into an executable.


# Using GCC/Clang to link
gcc test.c sublist.o -o test_runner
./test_runner

Detailed Code Walkthrough

Understanding every line of assembly is key. Let's break down the logic of our main function, sublist, and its helpers.

The sublist Main Function

This function acts as the orchestrator. Its primary job is to call the helper functions in the correct order and interpret their results.

  1. Stack and Register Preservation: The function begins by saving callee-saved registers (x19-x22) on the stack using stp (Store Pair). This is required by the AAPCS64 to ensure that functions we call don't overwrite register values we need to preserve. We use these registers to hold the original function arguments so we can reuse them for multiple helper calls.
  2. Equality Check: The first logical step is to check for equality. We move the original arguments from x19-x22 back into x0-x3 and use bl are_equal (Branch with Link) to call our helper. cbz x0, .check_sublist checks if the result (in x0) is zero. If it is (meaning they are not equal), we branch to the next check. Otherwise, we load the EQUAL constant and jump to the .done label.
  3. Sublist Check: If the lists aren't equal, we check if A is a sublist of B. We again load the original arguments into x0-x3 and call bl is_sublist. If the result is non-zero (true), we load the SUBLIST constant and finish.
  4. Superlist Check: If the sublist check failed, we try the reverse. This time, we cleverly swap the arguments: List B's pointer and length go into x0, x1, and List A's go into x2, x3. We then call is_sublist again. This is a great example of code reuse.
  5. Unequal and Cleanup: If all checks fail, we load the UNEQUAL constant. The .done label is our single exit point. Here, we restore the saved registers from the stack with ldp (Load Pair) and execute ret to return to the caller.

The is_sublist Helper Function

This is where the most complex logic resides. It implements the sliding window search.

  1. Edge Cases: The function first handles two simple cases. cbz x1, .is_sublist_true checks if the sublist length is zero. If so, it's always a sublist, and we return true immediately. cmp x1, x3 followed by b.gt .is_sublist_false checks if the sublist is longer than the superlist, which is impossible.
  2. Outer Loop Setup: We save registers and then calculate the number of iterations for the outer loop: super_len - sub_len + 1. This value is stored in x4. The current search position in the superlist is maintained in x5.
  3. Outer Loop (.outer_loop): This loop iterates through all possible starting positions of the sublist within the superlist.
  4. Inner Loop (.inner_loop): For each starting position, this loop compares the elements of the sublist against the corresponding slice of the superlist. It loads a 32-bit word (w9, w10) from each list using ldr with post-increment addressing ([reg], #4). If a mismatch is found (b.ne .inner_loop_fail), this attempt has failed, and we break out of the inner loop.
  5. Match Failure (.inner_loop_fail): On failure, we advance the superlist pointer (add x5, x5, #4), decrement the outer loop counter, and jump back to the start of the .outer_loop to try the next position.
  6. Match Success: If the inner loop completes without a mismatch (the counter x8 reaches zero), it means we've found a complete match. We branch to the cleanup routine, set the return value to 1, restore registers, and return.

Alternative Approaches and Performance Considerations

While our scalar, element-by-element comparison is clear and correct, it's not the only way to solve this problem. For performance-critical applications, other techniques could be employed.

Using SIMD (NEON) Instructions

Modern ARM processors include a powerful SIMD (Single Instruction, Multiple Data) engine called NEON. NEON allows you to perform operations on multiple data elements simultaneously. Instead of comparing one 32-bit integer at a time, you could load 128 bits (four 32-bit integers) into NEON registers and compare them all with a single instruction (e.g., CMEQ). This can provide a significant speedup, especially for large lists, by reducing the number of loop iterations and instructions executed.

However, this adds complexity. You need to handle list lengths that aren't multiples of the vector size and manage the more complex NEON register set.

Pros and Cons of the Assembly Approach

Aspect Pros (Arm64 Assembly) Cons (Arm64 Assembly)
Performance Direct control over hardware leads to potentially the fastest possible execution. No overhead from interpreters or high-level abstractions. A modern compiler is often very good at optimization. A poorly written assembly routine can be slower than optimized C code.
Development Time Allows for precise, low-level implementation. Significantly longer and more complex to write, debug, and maintain. Prone to subtle bugs like off-by-one errors.
Memory Control Explicit and total control over every byte of memory accessed. No safety nets. Buffer overflows, incorrect pointer arithmetic, and memory corruption are easy mistakes to make.
Portability Optimized for a specific architecture (AArch64). Completely non-portable. The code must be rewritten for other architectures like x86-64.

Frequently Asked Questions (FAQ)

What is the AArch64 Procedure Call Standard (AAPCS64)?

The AAPCS64 is a set of rules that governs how functions call each other on the Arm64 architecture. It defines which registers are used to pass arguments (x0-x7), which register holds the return value (x0), and which registers a function must preserve before returning (callee-saved registers like x19-x30) versus which it can freely modify (caller-saved registers). Adhering to this standard is essential for interoperability between code compiled from different languages (like C and assembly).

Why use CMP instead of SUBS for most comparisons?

Both CMP reg1, reg2 and SUBS dest, reg1, reg2 perform a subtraction (reg1 - reg2) and update the condition flags (N, Z, C, V). The key difference is that CMP discards the result of the subtraction, only setting the flags. SUBS, on the other hand, stores the result in the destination register. When you only need to know the relationship between two values (equal, greater, less), CMP is more efficient as it doesn't need to write to a destination register. We use SUBS specifically when we need both the comparison and the result, such as in a decrement-and-branch loop counter.

How does this code handle empty lists?

The code handles empty lists correctly due to explicit checks. In is_sublist, the first instruction is cbz x1, .is_sublist_true, which checks if the sublist length (in x1) is zero. If it is, the function immediately returns true, correctly identifying an empty list as a sublist of any other list. The are_equal function also handles two empty lists correctly by branching to .are_equal_true if the length register is zero.

What does the .global sublist directive do?

The .global directive (or .globl) makes a symbol visible to the linker. In our case, .global sublist tells the assembler that the sublist label should be accessible from other object files. Without this, the linker wouldn't be able to find the sublist function when linking our test.c file with sublist.o, resulting in an "undefined reference" error.

How is memory for the lists managed in this example?

Our assembly function is agnostic about memory management. It simply accepts pointers (memory addresses) and lengths as arguments. It assumes the caller is responsible for allocating and deallocating the memory for the lists. In our C test harness example, the lists are created as arrays on the stack, and their memory is automatically managed by the C runtime.

What are the key registers used in the is_sublist function?

The function uses several registers for specific purposes: x19-x22 are used to preserve the original arguments. x4 serves as the outer loop counter. x5 is the "sliding window" pointer in the superlist. x6 and x7 are pointers for the inner loop comparison, and x8 is the inner loop counter. w9 and w10 are temporary registers for holding the 32-bit values loaded from memory for comparison.

Could this logic be adapted for strings instead of integer lists?

Absolutely. The core logic is identical. The main change would be how elements are loaded and compared. For null-terminated C-style strings, you would load bytes (ldrb) instead of words (ldr) and your loop conditions would check for the null terminator (\0) instead of relying on a pre-calculated length. The pointer arithmetic would also change from increments of 4 to increments of 1.


Conclusion: From High-Level Concepts to Machine-Level Mastery

We have successfully navigated the journey from a high-level algorithmic concept—the sublist problem—to a complete, working implementation in Arm64 assembly. By breaking the problem into smaller, manageable parts (are_equal, is_sublist) and carefully managing registers and memory, we've built a solution that is both robust and efficient. This exercise from the kodikra.com curriculum is more than just a coding challenge; it's a deep dive into the fundamentals of computing.

You've learned how to translate control flow like nested loops into conditional branches, how to perform pointer arithmetic to traverse memory, and how to adhere to a standard calling convention to create modular, reusable code. This low-level knowledge is a superpower. It demystifies what compilers do and provides you with the tools to write exceptionally performant code when it matters most.

Disclaimer: The code and explanations in this article are based on the AArch64 instruction set architecture and the AAPCS64 calling convention. The assembly syntax is compatible with the GNU Assembler (as) and Clang. Behavior may vary on different architectures or with different toolchains.

Ready to tackle the next challenge and further hone your low-level programming skills? Continue your journey in our Arm64 Assembly Learning Path and master the art of speaking the machine's native language. Or, for a broader view, explore more advanced Arm64 assembly concepts in our comprehensive guides.


Published by Kodikra — Your trusted Arm64-assembly learning resource.