Sublist in Arm64-assembly: Complete Solution & Deep Dive Guide
The Ultimate Guide to Sublist Detection in Arm64 Assembly
Implementing list comparison logic like checking for a sublist is a fundamental task. In Arm64 assembly, this involves direct memory manipulation, pointer arithmetic, and careful control flow, offering a raw view into how high-level language features are built. This guide covers creating a robust sublist detection function from scratch, comparing two lists to determine if they are equal, one is a sublist of the other, a superlist, or simply unequal.
The Challenge: Replicating High-Level Logic at the Lowest Level
Imagine you're working in a high-level language like Python or Java. Checking if one list is contained within another is often a one-liner: if sub_list in super_list:. It's simple, expressive, and you rarely think about the complex machinery operating under the hood. But what happens when you're stripped of these abstractions? When you're writing code for a bootloader, an operating system kernel, or a performance-critical embedded system, you don't have a standard library to lean on.
This is the world of assembly language. You're faced with raw memory addresses, registers, and a limited set of instructions. The task of finding a contiguous sequence of bytes (a sublist) inside a larger one becomes a manual process of looping, comparing, and pointer juggling. It can feel daunting, like being asked to build a car engine using only a wrench and a screwdriver. The logic is clear in your head, but translating it into the precise, unforgiving syntax of Arm64 assembly is a true test of a programmer's skill.
This guide promises to bridge that gap. We will deconstruct the sublist problem and build a complete, efficient solution in Arm64 assembly. You will not only get a working piece of code but also gain a profound understanding of memory access patterns, algorithmic implementation at the CPU level, and the AArch64 calling convention, turning a complex problem into a clear, manageable process.
What Exactly Is the Sublist Problem?
Before diving into assembly code, it's crucial to define the problem with precision. Given two lists, which we'll call List A and List B, our function must determine their relationship and return one of four distinct results. For our implementation, we'll assume the lists contain 32-bit integer values.
The four possible outcomes are:
- EQUAL: List A and List B have the exact same elements in the exact same order. This implies they must also have the same length. For example,
[1, 2, 3]is equal to[1, 2, 3]. - SUBLIST: List A is a sublist of List B. This means the entire sequence of elements in List A appears contiguously within List B. For example,
[2, 3]is a sublist of[1, 2, 3, 4]. - SUPERLIST: List A is a superlist of List B. This is the inverse of the sublist relationship; List B appears contiguously within List A. For example,
[1, 2, 3, 4]is a superlist of[2, 3]. - UNEQUAL: None of the above relationships are true. The lists are different, and neither is a sublist of the other. For example,
[1, 2, 4]and[1, 2, 3, 5]are unequal.
An important edge case is the empty list. An empty list is considered a sublist of any other list, including another empty list. Two empty lists are also considered equal.
The Logical Flow of Comparison
A naive approach might involve random checks, but an efficient algorithm follows a specific order to minimize comparisons. The most logical sequence is to check the most restrictive conditions first and move to the more general ones.
● Start with List A and List B
│
▼
┌─────────────────────────┐
│ Are lengths equal? │
└────────────┬────────────┘
│
▼
◆ Both Zero?
╱ ╲
Yes No
│ │
▼ ▼
┌───────────┐ ┌──────────────────────────┐
│ EQUAL │ │ Compare all elements │
└───────────┘ └───────────┬──────────────┘
│
▼
◆ Match?
╱ ╲
Yes No
│ │
▼ ▼
┌───────────┐ (Proceed to
│ EQUAL │ Sublist Check)
└───────────┘
This diagram shows the first step. If the lists are of equal length, we perform a direct element-by-element comparison. If they match, the result is EQUAL. If their lengths differ or the elements don't match, we must proceed to the sublist/superlist checks.
Why Arm64 Assembly for This Task?
In a world dominated by high-level languages, choosing to implement an algorithm in assembly might seem like an academic exercise. However, there are compelling, real-world reasons why this skill remains invaluable.
Unmatched Performance and Control
The primary reason to use assembly is performance. By writing instructions directly for the CPU, you bypass compiler abstractions and can create highly optimized code. For tasks involving tight loops and extensive memory access, like searching for a sub-sequence, a hand-tuned assembly routine can outperform compiler-generated code. You have direct control over register allocation, instruction scheduling, and memory access patterns, allowing you to squeeze every last drop of performance from the hardware.
System-Level Programming
Assembly is the language of the machine. It's indispensable in environments where no operating system or standard library exists. This includes:
- Operating System Kernels: The core of an OS that manages hardware requires assembly for context switching, interrupt handling, and memory management.
- Bootloaders: The very first code that runs when a computer starts is written in assembly to initialize the hardware.
- Device Drivers: Interfacing directly with hardware components often requires precise instruction sequences that can only be guaranteed with assembly.
Deepening Your Computer Science Foundation
Writing assembly code forces you to understand how a computer truly works. You learn about the CPU architecture, memory hierarchy, instruction sets, and calling conventions. This knowledge makes you a better programmer, even when you return to high-level languages, as you can write code that is more sympathetic to the underlying hardware, leading to better performance and fewer bugs.
How to Implement the Sublist Algorithm in Arm64
Our strategy is to build a main function, sublist, that orchestrates the logic by calling specialized helper functions. This modular approach makes the code cleaner, easier to debug, and more reusable. We will adhere to the AArch64 Procedure Call Standard (AAPCS64), where the first few arguments are passed in registers x0 through x7, and the return value is placed in x0.
Our function signature will be: sublist(list_a_ptr, len_a, list_b_ptr, len_b)
x0: Pointer to the start of List A.x1: Length of List A (number of elements).x2: Pointer to the start of List B.x3: Length of List B.
We will define constants for our four possible return values:
EQUAL = 0SUBLIST = 1(A is a sublist of B)SUPERLIST = 2(A is a superlist of B)UNEQUAL = 3
The High-Level Plan
- Check for Equality: First, compare the lengths of A and B. If they are equal, call a helper function,
are_equal, to compare them element by element. If they match, we're done. - Check for Sublist: If they are not equal, we check if A is a sublist of B. This is only possible if
len_a <= len_b. We'll use a helper function,is_sublist, for this. - Check for Superlist: If A is not a sublist of B, we check the reverse: is B a sublist of A? This is only possible if
len_b <= len_a. We can reuse ouris_sublistfunction with the arguments swapped. - Declare Unequal: If none of the above conditions are met, the lists are unequal.
ASCII Diagram: The Nested Loop Logic for is_sublist
The core of the problem lies in the is_sublist check. This requires a nested loop. The outer loop iterates through the potential starting positions in the larger list, while the inner loop performs the element-by-element comparison.
// Check if B is a sublist of A
// A = [1, 2, 3, 4, 5], B = [3, 4]
● Start: Outer loop pointer `i` at A[0]
│
▼
┌──────────────────────────┐
│ `i` points to A[0] (val 1) │
└───────────┬──────────────┘
│
▼
◆ A[i..i+len(B)-1] == B?
╱ (Compare [1, 2] with [3, 4])
No
│
▼
┌──────────────────────────┐
│ `i` increments, points to A[1] (val 2) │
└───────────┬──────────────┘
│
▼
◆ A[i..i+len(B)-1] == B?
╱ (Compare [2, 3] with [3, 4])
No
│
▼
┌──────────────────────────┐
│ `i` increments, points to A[2] (val 3) │
└───────────┬──────────────┘
│
▼
◆ A[i..i+len(B)-1] == B?
╱ (Compare [3, 4] with [3, 4])
Yes
│
▼
┌───────────┐
│ MATCH! │
│ Return true │
└───────────┘
This flow visualizes how the outer loop slides a "window" of the sublist's length across the superlist, performing a full comparison at each step until a match is found or all possibilities are exhausted.
The Complete Arm64 Assembly Solution
Here is the full, commented source code. This code is structured into a main function and two helpers. It assumes the list elements are 32-bit integers, so we multiply lengths by 4 (the size of a word) for pointer arithmetic.
// sublist.s
// Solution from the exclusive kodikra.com learning curriculum.
.data
// Define constants for the return values, making the code more readable.
EQUAL: .word 0
SUBLIST: .word 1
SUPERLIST: .word 2
UNEQUAL: .word 3
.text
.global sublist
// -----------------------------------------------------------------------------
// are_equal(ptr1, len1, ptr2, len2) -> returns 1 if equal, 0 otherwise
// A helper function to check if two lists are identical.
// It's a specialized case of is_sublist but simpler.
// x0: ptr1, x1: len1, x2: ptr2, x3: len2
// Returns result in x0.
// Clobbers: x0, x1, x2, x3, x4, x5
// -----------------------------------------------------------------------------
are_equal:
// First, check if lengths are equal. If not, they can't be equal.
cmp x1, x3
b.ne .are_equal_false // Branch if not equal
// If both lengths are zero, they are equal.
cbz x1, .are_equal_true // If len1 is zero, branch to true
// Set up loop
mov x4, x0 // x4 = current pointer for list 1
mov x5, x2 // x5 = current pointer for list 2
mov x6, x1 // x6 = loop counter
.are_equal_loop:
// Load one element from each list (32-bit word)
ldr w7, [x4], #4 // Load from ptr1 and post-increment pointer by 4 bytes
ldr w8, [x5], #4 // Load from ptr2 and post-increment pointer by 4 bytes
// Compare the elements
cmp w7, w8
b.ne .are_equal_false // If elements differ, branch to false
// Decrement counter and loop if not zero
subs x6, x6, #1
b.ne .are_equal_loop
.are_equal_true:
mov x0, #1 // Return 1 (true)
ret
.are_equal_false:
mov x0, #0 // Return 0 (false)
ret
// -----------------------------------------------------------------------------
// is_sublist(sub_ptr, sub_len, super_ptr, super_len) -> 1 if sublist, 0 otherwise
// The core logic to check if one list is a sublist of another.
// x0: sub_ptr, x1: sub_len, x2: super_ptr, x3: super_len
// Returns result in x0.
// Clobbers: x0-x10
// -----------------------------------------------------------------------------
is_sublist:
// If sublist length is 0, it's always a sublist (even of an empty list).
cbz x1, .is_sublist_true
// If sublist length > superlist length, it's impossible.
cmp x1, x3
b.gt .is_sublist_false
// Save callee-saved registers we will modify
stp x19, x20, [sp, #-16]!
stp x21, x22, [sp, #-16]!
// Setup registers for the loops
mov x19, x0 // x19 = sub_ptr (callee-saved)
mov x20, x1 // x20 = sub_len (callee-saved)
mov x21, x2 // x21 = super_ptr (callee-saved)
mov x22, x3 // x22 = super_len (callee-saved)
// Calculate the number of possible start positions for the sublist
// in the superlist. This is (super_len - sub_len + 1).
sub x4, x22, x20 // x4 = super_len - sub_len
add x4, x4, #1 // x4 = outer loop count
mov x5, x21 // x5 = current pointer in superlist (outer loop iterator)
.outer_loop:
// At the start of each outer loop iteration, we check if the sublist
// starting from the current position in the superlist is equal to our sublist.
// We can reuse the are_equal logic here, but inline it for efficiency.
mov x6, x19 // x6 = current pointer in sublist
mov x7, x5 // x7 = current pointer in superlist slice
mov x8, x20 // x8 = inner loop counter (sub_len)
.inner_loop:
// If the inner loop counter is zero, we've successfully matched all elements.
cbz x8, .is_sublist_true_cleanup
ldr w9, [x6], #4 // Load from sublist
ldr w10, [x7], #4 // Load from superlist slice
cmp w9, w10
b.ne .inner_loop_fail // If elements don't match, this attempt failed.
subs x8, x8, #1
b.ne .inner_loop // Continue inner loop
// If we get here, it means inner loop finished without failure (cbz was taken)
b .is_sublist_true_cleanup
.inner_loop_fail:
// The match failed. We need to advance the outer loop pointer and try again.
add x5, x5, #4 // Move to the next element in the superlist
subs x4, x4, #1 // Decrement outer loop counter
b.ne .outer_loop // If there are more positions to check, loop again
// If the outer loop finishes, no match was found.
b .is_sublist_false_cleanup
.is_sublist_true_cleanup:
mov x0, #1 // Set return value to 1 (true)
ldp x21, x22, [sp], #16
ldp x19, x20, [sp], #16
ret
.is_sublist_false_cleanup:
mov x0, #0 // Set return value to 0 (false)
ldp x21, x22, [sp], #16
ldp x19, x20, [sp], #16
ret
// Convenience labels for the boolean returns
.is_sublist_true:
mov x0, #1
ret
.is_sublist_false:
mov x0, #0
ret
// -----------------------------------------------------------------------------
// sublist(list_a_ptr, len_a, list_b_ptr, len_b) -> result code
// The main function that determines the relationship between two lists.
// x0: list_a_ptr, x1: len_a, x2: list_b_ptr, x3: len_b
// Returns: 0 for EQUAL, 1 for SUBLIST, 2 for SUPERLIST, 3 for UNEQUAL.
// -----------------------------------------------------------------------------
sublist:
// Per AAPCS64, we must save any callee-saved registers we use.
// We will use x19-x22 to store the initial arguments.
stp x19, x20, [sp, #-16]!
stp x21, x22, [sp, #-16]!
// Store original arguments
mov x19, x0 // list_a_ptr
mov x20, x1 // len_a
mov x21, x2 // list_b_ptr
mov x22, x3 // len_b
// STEP 1: Check for equality.
// This is the most specific case.
mov x0, x19
mov x1, x20
mov x2, x21
mov x3, x22
bl are_equal
cbz x0, .check_sublist // If not equal (result is 0), proceed to next check
// If we are here, the lists are equal.
ldr x0, =EQUAL
ldr w0, [x0]
b .done
.check_sublist:
// STEP 2: Check if A is a sublist of B.
// Restore original arguments for the call
mov x0, x19
mov x1, x20
mov x2, x21
mov x3, x22
bl is_sublist
cbz x0, .check_superlist // If not a sublist, proceed
// A is a sublist of B.
ldr x0, =SUBLIST
ldr w0, [x0]
b .done
.check_superlist:
// STEP 3: Check if B is a sublist of A (i.e., A is a superlist of B).
// We swap the arguments to reuse the is_sublist function.
mov x0, x21 // sub_ptr = list_b_ptr
mov x1, x22 // sub_len = len_b
mov x2, x19 // super_ptr = list_a_ptr
mov x3, x20 // super_len = len_a
bl is_sublist
cbz x0, .is_unequal // If not a superlist, they must be unequal
// A is a superlist of B.
ldr x0, =SUPERLIST
ldr w0, [x0]
b .done
.is_unequal:
// STEP 4: If all other checks fail, the lists are unequal.
ldr x0, =UNEQUAL
ldr w0, [x0]
.done:
// Restore the callee-saved registers from the stack
ldp x21, x22, [sp], #16
ldp x19, x20, [sp], #16
ret // Return to caller
How to Compile and Run
To use this assembly code, you would typically write a C/C++ wrapper to call it and test its functionality. First, you assemble the code into an object file.
# Using GNU Assembler
as sublist.s -o sublist.o
# Or using Clang
clang -c sublist.s -o sublist.o
Next, you would link this object file with your C/C++ test harness.
# Example C test file (test.c)
#include <stdio.h>
// Declare the assembly function prototype
int sublist(int* list_a, int len_a, int* list_b, int len_b);
int main() {
int list_a[] = {1, 2, 3};
int list_b[] = {1, 2, 3, 4, 5};
int result = sublist(list_a, 3, list_b, 5);
// Expected result: 1 (SUBLIST)
printf("Result: %d\n", result);
return 0;
}
Finally, link them together into an executable.
# Using GCC/Clang to link
gcc test.c sublist.o -o test_runner
./test_runner
Detailed Code Walkthrough
Understanding every line of assembly is key. Let's break down the logic of our main function, sublist, and its helpers.
The sublist Main Function
This function acts as the orchestrator. Its primary job is to call the helper functions in the correct order and interpret their results.
- Stack and Register Preservation: The function begins by saving callee-saved registers (
x19-x22) on the stack usingstp(Store Pair). This is required by the AAPCS64 to ensure that functions we call don't overwrite register values we need to preserve. We use these registers to hold the original function arguments so we can reuse them for multiple helper calls. - Equality Check: The first logical step is to check for equality. We move the original arguments from
x19-x22back intox0-x3and usebl are_equal(Branch with Link) to call our helper.cbz x0, .check_sublistchecks if the result (inx0) is zero. If it is (meaning they are not equal), we branch to the next check. Otherwise, we load theEQUALconstant and jump to the.donelabel. - Sublist Check: If the lists aren't equal, we check if A is a sublist of B. We again load the original arguments into
x0-x3and callbl is_sublist. If the result is non-zero (true), we load theSUBLISTconstant and finish. - Superlist Check: If the sublist check failed, we try the reverse. This time, we cleverly swap the arguments: List B's pointer and length go into
x0, x1, and List A's go intox2, x3. We then callis_sublistagain. This is a great example of code reuse. - Unequal and Cleanup: If all checks fail, we load the
UNEQUALconstant. The.donelabel is our single exit point. Here, we restore the saved registers from the stack withldp(Load Pair) and executeretto return to the caller.
The is_sublist Helper Function
This is where the most complex logic resides. It implements the sliding window search.
- Edge Cases: The function first handles two simple cases.
cbz x1, .is_sublist_truechecks if the sublist length is zero. If so, it's always a sublist, and we return true immediately.cmp x1, x3followed byb.gt .is_sublist_falsechecks if the sublist is longer than the superlist, which is impossible. - Outer Loop Setup: We save registers and then calculate the number of iterations for the outer loop:
super_len - sub_len + 1. This value is stored inx4. The current search position in the superlist is maintained inx5. - Outer Loop (
.outer_loop): This loop iterates through all possible starting positions of the sublist within the superlist. - Inner Loop (
.inner_loop): For each starting position, this loop compares the elements of the sublist against the corresponding slice of the superlist. It loads a 32-bit word (w9,w10) from each list usingldrwith post-increment addressing ([reg], #4). If a mismatch is found (b.ne .inner_loop_fail), this attempt has failed, and we break out of the inner loop. - Match Failure (
.inner_loop_fail): On failure, we advance the superlist pointer (add x5, x5, #4), decrement the outer loop counter, and jump back to the start of the.outer_loopto try the next position. - Match Success: If the inner loop completes without a mismatch (the counter
x8reaches zero), it means we've found a complete match. We branch to the cleanup routine, set the return value to 1, restore registers, and return.
Alternative Approaches and Performance Considerations
While our scalar, element-by-element comparison is clear and correct, it's not the only way to solve this problem. For performance-critical applications, other techniques could be employed.
Using SIMD (NEON) Instructions
Modern ARM processors include a powerful SIMD (Single Instruction, Multiple Data) engine called NEON. NEON allows you to perform operations on multiple data elements simultaneously. Instead of comparing one 32-bit integer at a time, you could load 128 bits (four 32-bit integers) into NEON registers and compare them all with a single instruction (e.g., CMEQ). This can provide a significant speedup, especially for large lists, by reducing the number of loop iterations and instructions executed.
However, this adds complexity. You need to handle list lengths that aren't multiples of the vector size and manage the more complex NEON register set.
Pros and Cons of the Assembly Approach
| Aspect | Pros (Arm64 Assembly) | Cons (Arm64 Assembly) |
|---|---|---|
| Performance | Direct control over hardware leads to potentially the fastest possible execution. No overhead from interpreters or high-level abstractions. | A modern compiler is often very good at optimization. A poorly written assembly routine can be slower than optimized C code. |
| Development Time | Allows for precise, low-level implementation. | Significantly longer and more complex to write, debug, and maintain. Prone to subtle bugs like off-by-one errors. |
| Memory Control | Explicit and total control over every byte of memory accessed. | No safety nets. Buffer overflows, incorrect pointer arithmetic, and memory corruption are easy mistakes to make. |
| Portability | Optimized for a specific architecture (AArch64). | Completely non-portable. The code must be rewritten for other architectures like x86-64. |
Frequently Asked Questions (FAQ)
What is the AArch64 Procedure Call Standard (AAPCS64)?
The AAPCS64 is a set of rules that governs how functions call each other on the Arm64 architecture. It defines which registers are used to pass arguments (x0-x7), which register holds the return value (x0), and which registers a function must preserve before returning (callee-saved registers like x19-x30) versus which it can freely modify (caller-saved registers). Adhering to this standard is essential for interoperability between code compiled from different languages (like C and assembly).
Why use CMP instead of SUBS for most comparisons?
Both CMP reg1, reg2 and SUBS dest, reg1, reg2 perform a subtraction (reg1 - reg2) and update the condition flags (N, Z, C, V). The key difference is that CMP discards the result of the subtraction, only setting the flags. SUBS, on the other hand, stores the result in the destination register. When you only need to know the relationship between two values (equal, greater, less), CMP is more efficient as it doesn't need to write to a destination register. We use SUBS specifically when we need both the comparison and the result, such as in a decrement-and-branch loop counter.
How does this code handle empty lists?
The code handles empty lists correctly due to explicit checks. In is_sublist, the first instruction is cbz x1, .is_sublist_true, which checks if the sublist length (in x1) is zero. If it is, the function immediately returns true, correctly identifying an empty list as a sublist of any other list. The are_equal function also handles two empty lists correctly by branching to .are_equal_true if the length register is zero.
What does the .global sublist directive do?
The .global directive (or .globl) makes a symbol visible to the linker. In our case, .global sublist tells the assembler that the sublist label should be accessible from other object files. Without this, the linker wouldn't be able to find the sublist function when linking our test.c file with sublist.o, resulting in an "undefined reference" error.
How is memory for the lists managed in this example?
Our assembly function is agnostic about memory management. It simply accepts pointers (memory addresses) and lengths as arguments. It assumes the caller is responsible for allocating and deallocating the memory for the lists. In our C test harness example, the lists are created as arrays on the stack, and their memory is automatically managed by the C runtime.
What are the key registers used in the is_sublist function?
The function uses several registers for specific purposes: x19-x22 are used to preserve the original arguments. x4 serves as the outer loop counter. x5 is the "sliding window" pointer in the superlist. x6 and x7 are pointers for the inner loop comparison, and x8 is the inner loop counter. w9 and w10 are temporary registers for holding the 32-bit values loaded from memory for comparison.
Could this logic be adapted for strings instead of integer lists?
Absolutely. The core logic is identical. The main change would be how elements are loaded and compared. For null-terminated C-style strings, you would load bytes (ldrb) instead of words (ldr) and your loop conditions would check for the null terminator (\0) instead of relying on a pre-calculated length. The pointer arithmetic would also change from increments of 4 to increments of 1.
Conclusion: From High-Level Concepts to Machine-Level Mastery
We have successfully navigated the journey from a high-level algorithmic concept—the sublist problem—to a complete, working implementation in Arm64 assembly. By breaking the problem into smaller, manageable parts (are_equal, is_sublist) and carefully managing registers and memory, we've built a solution that is both robust and efficient. This exercise from the kodikra.com curriculum is more than just a coding challenge; it's a deep dive into the fundamentals of computing.
You've learned how to translate control flow like nested loops into conditional branches, how to perform pointer arithmetic to traverse memory, and how to adhere to a standard calling convention to create modular, reusable code. This low-level knowledge is a superpower. It demystifies what compilers do and provides you with the tools to write exceptionally performant code when it matters most.
Disclaimer: The code and explanations in this article are based on the AArch64 instruction set architecture and the AAPCS64 calling convention. The assembly syntax is compatible with the GNU Assembler (as) and Clang. Behavior may vary on different architectures or with different toolchains.
Ready to tackle the next challenge and further hone your low-level programming skills? Continue your journey in our Arm64 Assembly Learning Path and master the art of speaking the machine's native language. Or, for a broader view, explore more advanced Arm64 assembly concepts in our comprehensive guides.
Published by Kodikra — Your trusted Arm64-assembly learning resource.
Post a Comment