House in Arm64-assembly: Complete Solution & Deep Dive Guide

a sign that says house a on it

Mastering Recursion in Arm64 Assembly: The Ultimate Guide to Building Complex Strings

Discover how to implement recursion in Arm64 assembly to solve complex, cumulative string generation tasks. This guide breaks down stack management, system calls, and recursive logic to build the "House that Jack Built" nursery rhyme, turning a classic puzzle into a practical lesson in low-level programming.


The Frustration of Strings, The Power of Recursion

If you've ever written new_string = string1 + string2 in Python or JavaScript, you've experienced high-level programming bliss. The machine just... handles it. But when you descend into the world of assembly language, that simple plus sign explodes into a complex dance of memory addresses, pointers, and byte counters. It can feel like you've gone from driving a car to building the engine from scratch.

This is a common pain point for developers learning low-level programming. The perceived complexity of fundamental tasks like string manipulation can be a major barrier. But what if you could harness one of computer science's most elegant concepts—recursion—to tame this complexity? What if you could build intricate, nested structures of text not with clunky loops, but with a function that gracefully calls itself?

In this comprehensive guide, we'll do exactly that. We will tackle the "House that Jack Built" challenge from the exclusive kodikra.com curriculum. By the end, you won't just have a program that recites a nursery rhyme; you'll have a deep, practical understanding of recursion, stack frames, and system I/O in the AArch64 architecture, transforming a daunting task into an intuitive one.


What Is the "House that Jack Built" Challenge?

The core task is to programmatically generate the full text of the classic nursery rhyme, "This Is the House That Jack Built." The rhyme is cumulative; each new verse repeats the entire previous verse and adds a new line. This structure is the key to the challenge.

For example:

  • Verse 1: This is the house that Jack built.
  • Verse 2: This is the malt that lay in the house that Jack built.
  • Verse 3: This is the rat that ate the malt that lay in the house that Jack built.

This "embedding" of phrases within phrases is a perfect real-world model for a recursive data structure. From a programming perspective, this presents several low-level challenges in Arm64 assembly:

  • String Management: How do you store and access the individual phrases ("the house that Jack built", "the malt", "the rat", etc.) efficiently?
  • Cumulative Logic: How do you build each new verse by appending to the previous one without complex, nested loops and pointer arithmetic?
  • -State Management: How does a function remember which phrase to print next as it unwinds a chain of calls?
  • System Output: How do you actually print the final strings to the console using raw Linux system calls (syscalls)?

Solving this requires a solid grasp of the .data section for storing strings, the .text section for logic, and, most importantly, the stack for managing the recursive calls.


Why Use Recursion for This Problem in Arm64 Assembly?

In computer science, recursion is a method where the solution to a problem depends on solutions to smaller instances of the same problem. A recursive function is one that calls itself, either directly or indirectly. The "House that Jack Built" rhyme is inherently recursive.

Consider Verse 3: recite("rat") is essentially print("the rat that ate") followed by the result of recite("malt"). And recite("malt") is print("the malt that lay in") followed by the result of recite("house"). This chain continues until it hits a "base case"—the final phrase, "the house that Jack built."

This maps perfectly to a recursive function's structure:

  1. Recursive Step: If not the base case, print the current phrase and then call the function again for the *next* phrase in the sequence.
  2. Base Case: If it's the last phrase, simply print it and return. The chain of calls then unwinds.

The AArch64 architecture, like most modern processors, provides native support for this pattern through the stack. When a function is called (using the BL or Branch with Link instruction), the processor automatically stores the return address in the Link Register (LR or x30). In a recursive function, we must manually save this LR value onto the stack before making the next recursive call. Otherwise, the nested call would overwrite it, and the function would never be able to return correctly.

This mechanism of pushing and popping values from the stack allows each instance of the recursive function to have its own isolated state, making it a clean and elegant solution compared to an iterative approach that would require manually managing an array of indices or pointers.

● Start Recite(Verse N)
│
▼
┌─────────────────────────┐
│ Is Verse N the Base Case? │
└────────────┬────────────┘
             │
        Yes  ▼   No
   ┌─────────┘  └──────────┐
   │                       │
   ▼                       ▼
┌────────────────┐  ┌──────────────────────────┐
│ Print Base     │  │ Print Current Phrase     │
│ Phrase & Return│  └────────────┬─────────────┘
└────────────────┘               │
                                 ▼
                          ┌────────────────┐
                          │ Save LR to Stack │
                          └────────┬───────┘
                                   │
                                   ▼
                          ┌────────────────────┐
                          │ Recite(Verse N-1)  │
                          └────────┬───────────┘
                                   │
                                   ▼
                          ┌──────────────────┐
                          │ Restore LR from  │
                          │ Stack & Return   │
                          └──────────────────┘

How to Implement the "House" Rhyme in Arm64 Assembly

Let's break down the implementation into three core components: defining the data, structuring the recursive logic, and handling the system calls for output. This approach provides a clear separation of concerns, making the code easier to write and debug.

Part 1: The Data Segment (.data)

First, we need to store all the unique phrases of the rhyme. A highly effective way to do this is to create a table of pointers and a corresponding table of lengths. This avoids the need for null-terminated strings and the `strlen` logic that would accompany them, leading to more predictable code.

We'll define two main structures in our .data section:

  1. phrases: An array of 8-byte addresses, where each address points to the beginning of a string literal.
  2. phrase_lengths: A parallel array of 8-byte values, where each value is the length of the corresponding string in phrases.

This parallel array structure makes it simple to fetch both the string's starting address and its length using a single index.


.data
.balign 8

// Table of pointers to the phrases
phrases:
    .quad p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11

// Table of lengths for each phrase
phrase_lengths:
    .quad l0, l1, l2, l3, l4, l5, l6, l7, l8, l9, l10, l11

// The actual string data
p0: .string "the house that Jack built."
p1: .string "the malt that lay in "
p2: .string "the rat that ate "
p3: .string "the cat that killed "
p4: .string "the dog that worried "
p5: .string "the cow with the crumpled horn that tossed "
p6: .string "the maiden all forlorn that milked "
p7: .string "the man all tattered and torn that kissed "
p8: .string "the priest all shaven and shorn that married "
p9: .string "the rooster that crowed in the morn that woke "
p10: .string "the farmer sowing his corn that kept "
p11: .string "the horse and the hound and the horn that belonged to "

// Calculate lengths at assembly time
l0: .quad . - p0
l1: .quad . - p1
// ... and so on for all phrases
l11: .quad . - p11

// The prefix for each verse
prefix: .string "This is "
prefix_len: .quad . - prefix

// Newline character for formatting
newline: .string "\n"
newline_len: .quad . - newline

In this setup, . - p0 is a powerful assembler directive that calculates the distance (in bytes) from the current position (`.`) to the label `p0`, effectively giving us the string length at assembly time.

Part 2: The Recursive Logic (recite_verse)

This function is the heart of our program. It will take one argument: an index representing which phrase to recite. It must follow the standard recursive pattern:

  1. Prologue: Save the link register (x30) and any other registers we modify onto the stack. This preserves the caller's state.
  2. Base Case Check: Check if the index is 0. If it is, we're at "the house that Jack built." We print this phrase and return immediately.
  3. Recursive Step: If the index is not 0, we print the current phrase. Then, we decrement the index and call recite_verse on ourselves with the new index.
  4. Epilogue: After the recursive call returns, we restore the registers we saved from the stack and then return to our caller using the ret instruction.

Here is the pseudocode for our recursive function:


function recite_verse(index):
    // Prologue: Save state
    push LR to stack

    // Print the phrase for the current index
    print(phrases[index])

    // Base Case Check
    if index == 0:
        // Epilogue: Restore state and return
        pop LR from stack
        return

    // Recursive Step
    recite_verse(index - 1)

    // Epilogue: Restore state and return
    pop LR from stack
    return

This logic ensures that phrases are printed in descending order of index (e.g., for verse 3, it prints phrase 3, then calls itself for 2, prints 2, calls for 1, prints 1, calls for 0, prints 0, and then the calls all return).

Part 3: Managing the Stack and System Calls

The stack is critical. The Arm64 Application Binary Interface (ABI) dictates that the stack pointer (SP) must be 16-byte aligned. When we save registers, we must adjust the SP accordingly.

The stp x29, x30, [sp, #-16]! instruction is perfect for this. It "stores pair" of registers (x29, the frame pointer, and x30, the link register) to the stack, pre-decrementing the stack pointer by 16 bytes. The corresponding ldp x29, x30, [sp], #16 instruction restores them and readjusts the stack pointer.

For printing, we use the Linux write syscall. The process is:

  1. Set x8 to 64 (the syscall number for write).
  2. Set x0 to 1 (the file descriptor for stdout).
  3. Set x1 to the address of the string to print.
  4. Set x2 to the length of the string.
  5. Execute the svc #0 instruction to trigger the kernel.

We'll wrap this syscall logic in a helper function, say print_string, to keep our main recursive logic clean.


The Complete Solution and Code Walkthrough

Now, let's assemble all the pieces into a complete, working Arm64 assembly program. This solution is designed for clarity and follows the principles discussed above.

Full Arm64 Assembly Code


// Kodikra.com Arm64 Assembly Module: House
// Solution demonstrating recursion and syscalls.

.global _start

.data
.balign 8

// Table of pointers to the phrases
phrases:
    .quad p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11

// Table of lengths for each phrase
phrase_lengths:
    .quad l0, l1, l2, l3, l4, l5, l6, l7, l8, l9, l10, l11

// The actual string data
p0: .string "the house that Jack built."
p1: .string "the malt that lay in "
p2: .string "the rat that ate "
p3: .string "the cat that killed "
p4: .string "the dog that worried "
p5: .string "the cow with the crumpled horn that tossed "
p6: .string "the maiden all forlorn that milked "
p7: .string "the man all tattered and torn that kissed "
p8: .string "the priest all shaven and shorn that married "
p9: .string "the rooster that crowed in the morn that woke "
p10: .string "the farmer sowing his corn that kept "
p11: .string "the horse and the hound and the horn that belonged to "

// Calculate lengths at assembly time
l0: .quad . - p0
l1: .quad . - p1
l2: .quad . - p2
l3: .quad . - p3
l4: .quad . - p4
l5: .quad . - p5
l6: .quad . - p6
l7: .quad . - p7
l8: .quad . - p8
l9: .quad . - p9
l10: .quad . - p10
l11: .quad . - p11

// Static strings
prefix: .string "This is "
prefix_len: .quad . - prefix

newline: .string "\n"
newline_len: .quad . - newline

.text
.balign 4

// --- Helper function to print a string ---
// x0: address of string
// x1: length of string
print_string:
    mov x2, x1      // Move length to x2 for syscall
    mov x1, x0      // Move address to x1 for syscall
    mov x0, #1      // 1 = stdout
    mov x8, #64     // 64 = write syscall
    svc #0          // Call the kernel
    ret             // Return to caller

// --- Recursive function to recite a verse ---
// x0: index of the phrase to start from
recite_verse:
    // --- Function Prologue ---
    // Save frame pointer (x29) and link register (x30) to the stack
    // Pre-index addressing: SP is decremented *before* the store
    stp x29, x30, [sp, #-16]!
    mov x29, sp

    // Save the input index (x0) because we need it later
    // and we are about to use x0 for the print_string call.
    mov x19, x0     // Use a callee-saved register x19 to store the index

    // --- Function Body ---
    // Calculate address of the phrase pointer
    // index * 8 (since each pointer is 8 bytes)
    lsl x1, x19, #3 
    ldr x2, =phrases // Load base address of the phrases table
    add x1, x1, x2   // Add offset to base address
    ldr x0, [x1]     // Dereference to get the actual string address

    // Calculate address of the phrase length
    lsl x1, x19, #3
    ldr x2, =phrase_lengths // Load base address of the lengths table
    add x1, x1, x2
    ldr x1, [x1]     // Dereference to get the length

    // Print the current phrase
    bl print_string

    // --- Base Case Check ---
    // If our index (in x19) is 0, we're done with this path.
    cmp x19, #0
    b.eq .L_recite_verse_end // If equal, jump to the end

    // --- Recursive Step ---
    // It's not the base case, so call ourselves with index - 1.
    sub x0, x19, #1 // x0 = current_index - 1
    bl recite_verse

.L_recite_verse_end:
    // --- Function Epilogue ---
    // Restore the registers we saved
    // Post-index addressing: SP is incremented *after* the load
    ldp x29, x30, [sp], #16
    ret             // Return to the address in x30

// --- Main program entry point ---
_start:
    mov x20, #0     // x20 will be our loop counter for verses (0 to 11)

.L_verse_loop:
    // Check if we have recited all 12 verses
    cmp x20, #12
    b.ge .L_exit    // If counter >= 12, exit

    // Print the "This is " prefix for each verse
    ldr x0, =prefix
    ldr x1, =prefix_len
    ldr x1, [x1]
    bl print_string

    // Call the recursive function with the current verse index
    mov x0, x20
    bl recite_verse

    // Print a newline character for clean formatting
    ldr x0, =newline
    ldr x1, =newline_len
    ldr x1, [x1]
    bl print_string
    
    // If it's not the last verse, print an extra newline
    cmp x20, #11
    b.eq .L_next_verse
    bl print_string

.L_next_verse:
    // Increment verse counter and loop again
    add x20, x20, #1
    b .L_verse_loop

.L_exit:
    // Exit the program cleanly
    mov x0, #0      // Exit code 0 (success)
    mov x8, #93     // 93 = exit syscall
    svc #0

Detailed Code Walkthrough

1. The Main Loop (_start)

Execution begins at the _start label. The logic here is straightforward: a loop that iterates from 0 to 11, controlling which verse is generated.

  • mov x20, #0: We use x20 as our loop counter. It's a callee-saved register, which is good practice for loop variables that persist across function calls.
  • .L_verse_loop: This is the main loop. It first prints the static prefix "This is ".
  • mov x0, x20: It then moves the current loop index into x0, the designated register for the first argument to a function.
  • bl recite_verse: It calls our recursive function to generate the rest of the verse.
  • bl print_string: After the verse is complete, it prints one or two newline characters to separate the verses neatly.
  • add x20, x20, #1: Increments the counter and branches back to the start of the loop.

2. The Recursive Engine (recite_verse)

This is where the magic happens. Let's trace a call for index 2 ("the rat").

  1. Entry (index=2): The function saves x29 and x30 to the stack. It saves its input index, 2, into x19. It then calculates the address and length for "the rat that ate " and prints it.
  2. Check & Recurse: It checks if x19 is 0. It's not, so it subtracts 1 (making x0 = 1) and calls bl recite_verse again.
  3. Entry (index=1): The function again saves x29 and x30 to the stack (at a new, lower address). It saves its input index, 1, into x19. It then prints "the malt that lay in ".
  4. Check & Recurse: It checks if x19 is 0. It's not, so it subtracts 1 (making x0 = 0) and calls bl recite_verse.
  5. Entry (index=0): The function saves registers, saves its index 0 to x19, and prints "the house that Jack built.".
  6. Base Case Hit: It checks if x19 is 0. It is! So it jumps to .L_recite_verse_end.
  7. Unwinding (index=0 returns): It restores its saved x29 and x30 from the stack and executes ret. Control returns to the call inside the "index=1" instance.
  8. Unwinding (index=1 returns): That function has now completed its recursive call. It proceeds to its epilogue, restores *its* saved registers from the stack, and returns. Control passes to the call inside the "index=2" instance.
  9. Unwinding (index=2 returns): Finally, the original call restores its registers and returns to the main loop in _start. The verse is complete.

ASCII Art: String Construction Flow

This diagram shows the logical flow for generating verse 3. The program follows the path down, printing as it goes, until the base case is hit, at which point the execution path unwinds.

● Start Verse(3)
│
▼
┌──────────────────┐
│ Print "This is " │
└─────────┬────────┘
          │
          ▼
    recite_verse(3)
          │
          ├─ Print "the cat that killed "
          │
          ▼
    recite_verse(2)
          │
          ├─ Print "the rat that ate "
          │
          ▼
    recite_verse(1)
          │
          ├─ Print "the malt that lay in "
          │
          ▼
    recite_verse(0)
          │
          ├─ Print "the house that Jack built."
          │
          ▼
      (Base Case Hit)
          │
          ⟶ Return
          │
      ⟶ Return
      │
  ⟶ Return
  │
⟶ Return to Main Loop

3. Assembling and Running the Code

To compile and run this code on a Linux system with Arm64 build tools, use the following commands:


# Assemble the .s file into an object file .o
as -o house.o house.s

# Link the object file into an executable
ld -o house house.o

# Run the executable
./house

The output will be the complete, formatted nursery rhyme printed to your terminal.


Alternative Approaches: Iteration vs. Recursion

While recursion is an elegant fit for this problem, it's not the only solution. An iterative approach using nested loops is also possible, and it's important to understand the trade-offs.

An iterative solution would likely involve:

  • An outer loop that iterates from verse 0 to 11.
  • An inner loop that iterates backward from the current verse index down to 0.
  • Inside the inner loop, you would fetch and print the phrase corresponding to the inner loop's counter.

This avoids the overhead of function calls (saving/restoring registers, manipulating the stack pointer), which can make the iterative version slightly faster and use less memory. However, the code can become less intuitive, as you are manually managing the state that recursion handles automatically via the call stack.

Pros & Cons of Recursion in Assembly

Pros Cons / Risks
  • Code Clarity: The logic often mirrors the problem description directly, making it easier to read and understand for problems that are naturally recursive.
  • State Management: The call stack automatically handles saving and restoring state for each level of recursion, simplifying the logic.
  • Elegance: For problems like tree traversal or cumulative generation, recursion provides a very clean and concise solution.
  • Stack Overflow: Each recursive call consumes stack space. For very deep recursion, you can exhaust the available stack memory, causing a crash.
  • Performance Overhead: Function calls are not free. The prologue and epilogue (saving/restoring registers, adjusting SP) add overhead compared to a simple loop jump.
  • Debugging Complexity: Tracing execution through many levels of a recursive function can be more difficult than stepping through a simple loop.

For the "House that Jack Built" problem, with only 12 levels of recursion, the risk of stack overflow is nonexistent, and the performance overhead is negligible. Therefore, the clarity and elegance of the recursive solution make it an excellent choice.


Frequently Asked Questions (FAQ)

What is the role of the LR (Link Register) in Arm64 recursion?

The Link Register (LR, or x30) holds the memory address of the instruction that the processor should return to after a function call completes. When a function calls itself (recursion), the BL instruction overwrites the LR with the new return address. If you don't save the original LR on the stack before the recursive call, the function will lose its way back to its original caller, leading to an infinite loop or crash.

Why do we need to save registers like x19 or x29 on the stack?

The Arm64 ABI divides registers into two types: caller-saved (x0-x18) and callee-saved (x19-x29). If a function (the "callee") wants to use a callee-saved register, it is responsible for saving its original value to the stack at the beginning and restoring it before returning. This ensures that the calling function doesn't have its important data unexpectedly overwritten. We use x19 to store the verse index because it's callee-saved, guaranteeing its value survives the print_string function call.

What is a stack overflow and how could it happen here?

A stack overflow is an error that occurs when a program tries to use more space on the call stack than is available. Each time recite_verse calls itself, it allocates 16 bytes on the stack for the saved x29 and x30. If the rhyme had millions of verses instead of 12, the total stack space required (millions * 16 bytes) would exceed the default stack size allocated by the OS, causing a segmentation fault.

Can I use printf in Arm64 assembly?

Yes, but not directly. printf is a function from the C standard library (libc). To use it, you would need to link your assembly program with libc and then call the printf function according to the ABI (passing arguments in x0, x1, etc.). The solution in this guide uses raw Linux system calls (svc #0), which is more fundamental as it communicates directly with the OS kernel and has no external dependencies.

How does the svc #0 instruction work?

SVC stands for "Supervisor Call." It's an instruction that generates a software interrupt, intentionally pausing the user program and transferring control to the operating system's kernel. The kernel then inspects the registers (especially x8 for the syscall number) to determine what task the user program wants it to perform (like writing to the screen or exiting). Once the kernel completes the task, it resumes the user program.

Is Arm64 assembly case-sensitive?

Instructions (e.g., MOV, LDR) and register names (e.g., X0, SP) are generally case-insensitive in most assemblers like GNU as. However, labels (e.g., _start, .L_verse_loop) are case-sensitive. It's best practice to maintain a consistent case for readability.

What's the difference between the .string and .asciz directives?

Both directives are used to define strings in the data segment. The key difference is that .asciz automatically appends a null terminator (a zero byte, \0) to the end of the string. The .string directive does not. In our solution, we manage string lengths explicitly, so a null terminator is unnecessary, making .string a suitable choice.


Conclusion: From Rhymes to Real-World Mastery

We've taken a simple nursery rhyme and used it as a lens to explore some of the most powerful and fundamental concepts in Arm64 assembly. By implementing a recursive solution, you've moved beyond simple linear execution and learned how to manage the call stack, preserve state across function calls, and interact directly with the operating system kernel for I/O.

While the problem may seem academic, the skills are profoundly practical. The same recursive patterns and stack management techniques are used to parse complex file formats, traverse file systems, implement compilers, and work with sophisticated data structures. You have successfully built an engine from scratch, and in doing so, gained a much deeper appreciation for what happens under the hood of high-level languages.

Disclaimer: The code and explanations in this article are based on the AArch64 architecture and Linux system calls as of the current standards. Assembly language syntax and syscall numbers can vary between operating systems and architectures.

Ready to continue your journey into low-level programming? Explore our complete Arm64-assembly 5 learning path to tackle more advanced challenges, or dive deeper into our Arm64-assembly modules for a comprehensive overview of the language.


Published by Kodikra — Your trusted Arm64-assembly learning resource.