Proverb in Arm64-assembly: Complete Solution & Deep Dive Guide

a clock with a red message on it in a dark room

From Zero to Hero: Building the Proverb Generator in Arm64 Assembly

Master the art of string manipulation and control flow in Arm64-assembly by building a classic proverb generator. This guide provides a deep dive into iterating through data, formatting dynamic output, and managing memory at the hardware level, transforming a simple rhyme into a profound learning experience.

The Haunting Challenge of Low-Level String Handling

You've mastered loops and logic in Python or JavaScript, where concatenating strings is as simple as using the + operator. But have you ever wondered what's happening under the hood? When you descend into the world of assembly language, the safety nets are gone. There are no built-in string types, no automatic memory management, and no convenient print functions.

You're faced with raw memory addresses, byte-by-byte manipulation, and the need to tell the CPU exactly how to perform every single operation. The task of generating a simple, repetitive proverb like "For want of a nail the shoe was lost" suddenly becomes a complex dance of pointers, registers, and system calls. This is the ultimate challenge, but also the ultimate reward. By conquering it, you will gain an unparalleled understanding of how software truly interacts with hardware.

This comprehensive guide will walk you through every step of solving the Proverb challenge from the exclusive kodikra learning path. We will dissect the logic, write clean and efficient Arm64-assembly code, and demystify the core concepts that power all modern computing.


What is the Proverb Generation Problem?

The core task is to dynamically generate a well-known proverbial rhyme based on a given list of words. The rhyme follows a specific "domino effect" pattern where the loss of one item leads to the loss of the next.

Given an input list of strings, for example: ["nail", "shoe", "horse", "rider"], the program must produce the following output:

For want of a nail the shoe was lost.
For want of a shoe the horse was lost.
For want of a horse the rider was lost.
And all for the want of a nail.

The logic can be broken down into two parts:

  1. The Chain: For every adjacent pair of words (word[i], word[i+1]) in the list, a line is generated in the format: "For want of a word[i] the word[i+1] was lost."
  2. The Conclusion: The final line always refers back to the very first item in the original list, formatted as: "And all for the want of a word[0]."

While trivial in a high-level language, this requires careful planning in assembly. We need to manage pointers to each string, iterate correctly, and construct each output line piece by piece before sending it to the operating system to be displayed.


Why Use Arm64-assembly for This Task?

Choosing Arm64-assembly (also known as AArch64) for a string-based problem might seem like using a sledgehammer to crack a nut. However, the purpose here is not efficiency of development, but depth of understanding. The Arm architecture powers the vast majority of mobile devices, and is increasingly found in servers and laptops, making it a crucial instruction set to learn.

Solving this problem in assembly forces you to confront fundamental concepts head-on:

  • Memory Management: You will directly manipulate memory addresses and understand how data is laid out. There's no garbage collector or abstract string object to help you.
  • CPU Registers: You'll learn the purpose of general-purpose registers like x0-x30, the stack pointer (sp), and the link register (lr) for function calls.
  • System Calls (Syscalls): You'll interface directly with the Linux kernel to perform I/O operations (like writing to the console), bypassing standard libraries like libc. This is the most fundamental way a program communicates with its host operating system.
  • Algorithmic Thinking: You must translate a simple high-level loop into a sequence of low-level compare and branch instructions, gaining a true appreciation for how compilers work.

This module from the kodikra.com Arm64-assembly curriculum is designed to solidify these core skills, providing a practical foundation for more advanced topics like systems programming, embedded development, and performance optimization.


How to Implement the Proverb Generator in Arm64-assembly

Our strategy involves setting up our data, creating a main loop to generate the "chain" verses, and then handling the final concluding line. We will also need a helper function to calculate string lengths, as this is a prerequisite for the write system call.

The Complete Solution Code

Here is the full, commented source code. We will break it down in detail in the following sections. Save this file as proverb.s.

/*
 * Proverb Generator in Arm64 Assembly for Linux
 * This program generates the "For want of a nail..." proverb.
 *
 * Assembling and Linking:
 * as -o proverb.o proverb.s
 * ld -o proverb proverb.o
 *
 * Running:
 * ./proverb
 */

.section .rodata
    // The list of words for the proverb
    word1: .asciz "nail"
    word2: .asciz "shoe"
    word3: .asciz "horse"
    word4: .asciz "rider"
    word5: .asciz "message"
    word6: .asciz "battle"
    word7: .asciz "kingdom"

    // An array of pointers to the words
    words_list:
        .quad word1
        .quad word2
        .quad word3
        .quad word4
        .quad word5
        .quad word6
        .quad word7
    
    // Store the count of words
    words_count: .quad 7

    // String templates for building the output
    prefix:     .asciz "For want of a "
    infix:      .asciz " the "
    suffix:     .asciz " was lost.\n"
    final_prefix: .asciz "And all for the want of a "
    final_suffix: .asciz ".\n"

.section .text
.global _start

// Helper function to print a string to stdout
// Input: x0 = address of string, x1 = length of string
// Clobbers: x0, x1, x2, x8
_print:
    mov x2, x1      // x2 <- length
    mov x1, x0      // x1 <- address
    mov x0, #1      // x0 <- file descriptor (1 for stdout)
    mov x8, #64     // x8 <- syscall number for write
    svc #0          // Make the system call
    ret             // Return to caller

// Helper function to calculate the length of a null-terminated string
// Input: x0 = address of string
// Output: x0 = length of string
// Clobbers: x0, x1, x2
_strlen:
    mov x1, x0      // x1 <- start address
_strlen_loop:
    ldrb w2, [x1], #1 // Load byte and post-increment pointer
    cmp w2, #0      // Compare with null terminator
    b.eq _strlen_end // If null, we're done
    b _strlen_loop
_strlen_end:
    sub x0, x1, x0  // Calculate length (end - start)
    sub x0, x0, #1  // Adjust for the null byte
    ret

_start:
    // Setup registers
    ldr x19, =words_list // x19 <- base address of the words list
    ldr x20, words_count // x20 <- number of words
    mov x21, #0          // x21 <- loop counter (i)

proverb_loop:
    // Loop condition: while (i < words_count - 1)
    sub x22, x20, #1
    cmp x21, x22
    b.ge generate_final_line // If i >= count-1, exit loop

    // --- Print "For want of a " ---
    ldr x0, =prefix
    bl _strlen
    mov x1, x0
    ldr x0, =prefix
    bl _print

    // --- Print word[i] ---
    ldr x0, [x19, x21, lsl #3] // Load address of current word
    bl _strlen
    mov x1, x0
    ldr x0, [x19, x21, lsl #3]
    bl _print

    // --- Print " the " ---
    ldr x0, =infix
    bl _strlen
    mov x1, x0
    ldr x0, =infix
    bl _print

    // --- Print word[i+1] ---
    add x23, x21, #1 // i + 1
    ldr x0, [x19, x23, lsl #3] // Load address of next word
    bl _strlen
    mov x1, x0
    ldr x0, [x19, x23, lsl #3]
    bl _print

    // --- Print " was lost.\n" ---
    ldr x0, =suffix
    bl _strlen
    mov x1, x0
    ldr x0, =suffix
    bl _print

    // Increment loop counter and continue
    add x21, x21, #1
    b proverb_loop

generate_final_line:
    // --- Print "And all for the want of a " ---
    ldr x0, =final_prefix
    bl _strlen
    mov x1, x0
    ldr x0, =final_prefix
    bl _print

    // --- Print word[0] (the first word) ---
    ldr x0, [x19] // Load address of the first word
    bl _strlen
    mov x1, x0
    ldr x0, [x19]
    bl _print

    // --- Print ".\n" ---
    ldr x0, =final_suffix
    bl _strlen
    mov x1, x0
    ldr x0, =final_suffix
    bl _print

exit_program:
    mov x0, #0      // Exit code 0 (success)
    mov x8, #93     // Syscall number for exit
    svc #0          // Make the system call

Assembly, Linking, and Execution

To run this code on a Linux system with the GNU toolchain, open your terminal and execute the following commands:

# 1. Assemble the .s file into an object file .o
as -o proverb.o proverb.s

# 2. Link the object file into an executable
ld -o proverb proverb.o

# 3. Run the executable
./proverb

You should see the correctly formatted proverb printed to your console. This process converts your human-readable assembly into machine code the CPU can directly execute.


Code Walkthrough: A Deep Dive into the Logic

Let's dissect the code section by section to understand how it works. Our program is structured into a read-only data section (.rodata) and a code section (.text).

1. The .rodata Section: Defining Our Data

This section holds all our constant data—the strings that won't change during program execution.

.section .rodata
    word1: .asciz "nail"
    // ... more words ...
    word7: .asciz "kingdom"

    words_list:
        .quad word1
        // ... more pointers ...
        .quad word7
    
    words_count: .quad 7

    prefix:     .asciz "For want of a "
    // ... other template strings ...
  • .asciz: This directive declares a null-terminated string. The assembler automatically adds a \0 byte at the end, which is crucial for our _strlen function.
  • words_list: This is an array of pointers. The .quad directive reserves 8 bytes (a quadword) for each entry, which is the size of a memory address in a 64-bit system. We store the memory address (label) of each word string here.
  • words_count: Storing the count separately makes the loop logic cleaner and more adaptable if we decide to change the number of words.
  • Template Strings: We define all the static parts of the proverb (like " the " and " was lost.\n") as separate strings. This modular approach makes the printing logic repetitive and easier to manage.

2. The Helper Functions: _strlen and _print

Since we are not linking against the C standard library, we must implement our own helper functions for basic tasks.

_strlen (String Length):

_strlen:
    mov x1, x0      // Copy start address to x1
_strlen_loop:
    ldrb w2, [x1], #1 // Load byte from address in x1, then increment x1
    cmp w2, #0      // Is it the null terminator?
    b.eq _strlen_end // If yes, exit loop
    b _strlen_loop
_strlen_end:
    sub x0, x1, x0  // length = (end_address - start_address)
    sub x0, x0, #1  // Adjust because x1 is one past the null byte
    ret

This function implements a classic C-style string length calculation. It takes a pointer in x0, iterates byte by byte until it finds the null character, and then calculates the length by subtracting the start address from the end address.

_print (System Call Wrapper):

_print:
    mov x2, x1      // x2 <- length
    mov x1, x0      // x1 <- address
    mov x0, #1      // x0 <- stdout file descriptor
    mov x8, #64     // x8 <- write syscall number
    svc #0          // Trigger the syscall
    ret

This is a wrapper around the Linux write syscall. The AArch64 calling convention uses register x8 for the syscall number and x0-x5 for arguments. We set up the registers as required by the kernel (fd in x0, buffer address in x1, count in x2) and then use the svc #0 instruction to ask the kernel to perform the write operation.

3. The _start Entry Point and Main Loop

This is where the program's execution begins.

_start:
    ldr x19, =words_list // Load address of our pointer array
    ldr x20, words_count // Load the word count
    mov x21, #0          // Initialize loop counter i = 0

proverb_loop:
    sub x22, x20, #1
    cmp x21, x22
    b.ge generate_final_line // if i >= (count - 1), jump to the end
  • We use "callee-saved" registers (x19-x28) to store our loop variables. This is good practice as they are preserved across function calls.
  • x19 holds the base address of our words_list.
  • x20 holds the total number of words (7).
  • x21 is our loop index, i.
  • The loop continues as long as i is less than count - 1, because we always need a pair of words (word[i] and word[i+1]).

4. Inside the Loop: Constructing a Verse

The body of the loop is a sequence of calls to our helper functions to print each part of a line.

    // --- Print word[i] ---
    ldr x0, [x19, x21, lsl #3] // Load address of current word
    bl _strlen
    mov x1, x0
    ldr x0, [x19, x21, lsl #3]
    bl _print

The most important instruction here is ldr x0, [x19, x21, lsl #3]. Let's break it down:

  • ldr x0, [...]: Load a value from memory into register x0.
  • x19: The base address (the start of words_list).
  • x21: The index register (our counter i).
  • lsl #3: Logical Shift Left by 3. This is equivalent to multiplying the index by 8 (23). We do this because each pointer in our list is 8 bytes (a quadword). This calculation gives us the correct offset from the base address to find the pointer for word[i].

This single instruction calculates the address of the i-th pointer (base_address + i * 8), fetches the pointer from that address, and places it into x0, ready to be passed to our functions. The same logic is applied to get word[i+1] using an adjusted index.

5. Generating the Final Line and Exiting

After the loop finishes, the code jumps to generate_final_line.

generate_final_line:
    // ... print final_prefix ...

    // --- Print word[0] (the first word) ---
    ldr x0, [x19] // Load address of the first word
    // ... call _strlen and _print ...

    // ... print final_suffix ...

exit_program:
    mov x0, #0      // Exit code 0
    mov x8, #93     // exit syscall number
    svc #0

The logic is similar, but simpler. We load the pointer to the first word using ldr x0, [x19], which fetches the value from the base address itself (an offset of 0). Finally, we use the exit syscall (number 93) to terminate the program cleanly.


Visualizing the Logic: ASCII Flow Diagrams

To better understand the program flow, here are two diagrams. The first shows the high-level loop structure, and the second illustrates the memory access for printing a single line.

Main Program Flow

This diagram shows the overall control flow, from initialization to the final exit.

    ● Start
    │
    ▼
  ┌───────────────────┐
  │ Initialize        │
  │ x19 = &words_list │
  │ x20 = count       │
  │ x21 = 0 (counter) │
  └─────────┬─────────┘
            │
            ▼
    ◆ Loop Condition?
    │ (x21 < x20 - 1)
    │
   ╱           ╲
 Yes (continue) No (loop done)
  │              │
  ▼              ▼
┌────────────────┐  ┌──────────────────┐
│ Get word[i]    │  │ Print Final Line │
│ Get word[i+1]  │  └─────────┬────────┘
│ Print Verse    │            │
│ Increment x21  │            │
└───────┬────────┘            │
        │                     │
        └─────────⟶───────────┘
                  │
                  ▼
              ┌─────────┐
              │ Exit(0) │
              └─────────┘
                  │
                  ▼
                ● End

Memory Access for One Verse

This diagram shows how the program accesses memory to construct the line "For want of a nail the shoe was lost."

    ● Print Verse (i=0)
    │
    ▼
  ┌──────────────────────────┐
  │ Print static `prefix`    │
  │ ("For want of a ")       │
  └────────────┬─────────────┘
               │
               ▼
  ┌──────────────────────────┐
  │ Access `words_list`      │
  │ addr = x19 + (0 * 8)     │
  │ ldr x0, [x19] ⟶ "nail"   │
  │ Print content of x0      │
  └────────────┬─────────────┘
               │
               ▼
  ┌──────────────────────────┐
  │ Print static `infix`     │
  │ (" the ")                │
  └────────────┬─────────────┘
               │
               ▼
  ┌──────────────────────────┐
  │ Access `words_list`      │
  │ addr = x19 + (1 * 8)     │
  │ ldr x0, [x19, #8] ⟶ "shoe"│
  │ Print content of x0      │
  └────────────┬─────────────┘
               │
               ▼
  ┌──────────────────────────┐
  │ Print static `suffix`    │
  │ (" was lost.\n")         │
  └────────────┬─────────────┘
               │
               ▼
            ● Done

Pros and Cons: Assembly vs. High-Level Languages

This exercise clearly highlights the trade-offs between different levels of programming abstraction.

Aspect Arm64-assembly Approach High-Level Language (e.g., Python)
Development Speed Very slow. Requires manual memory management, syscall setup, and implementing basic functions like strlen. Extremely fast. A few lines of code with a simple for loop and string formatting.
Performance Potentially the highest possible. Direct control over CPU instructions and registers, no overhead from interpreters or runtimes. Very good for most cases, but includes overhead from the runtime environment, garbage collection, and abstractions.
Code Readability Low. The logic is obscured by low-level details. Requires deep knowledge of the architecture to understand. High. The code reads almost like plain English, making it easy to understand and maintain.
Portability None. The code is specific to the AArch64 architecture and the Linux kernel's syscall interface. Excellent. The same Python code can run on Windows, macOS, and Linux on various architectures without changes.
Learning Value Exceptional. Provides a fundamental understanding of how computers work at the lowest software level. High for application logic and algorithms, but abstracts away the underlying hardware details.

Frequently Asked Questions (FAQ)

What are registers in Arm64 and why are they important?
Registers are small, extremely fast storage locations directly inside the CPU. In Arm64, there are 31 general-purpose registers (x0-x30). They are used to hold data for immediate computation, pass arguments to functions, and store local variables. Using registers is much faster than accessing main memory (RAM), so efficient assembly programming involves keeping data in registers as much as possible.
Why use svc #0 for system calls?
svc stands for "Supervisor Call". It's a special instruction that causes the CPU to switch from user mode (where your application runs) to a privileged kernel mode. This is a deliberate security boundary. The kernel then checks the syscall number in register x8 and performs the requested task (like writing to the screen) on behalf of the application, preventing user programs from directly accessing hardware and corrupting the system.
How does this code handle strings of different lengths?
The code is completely dynamic thanks to the _strlen helper function. Before printing any string (whether a template or a word from our list), we first call _strlen to calculate its exact length. This length is then passed to the write syscall in register x2, ensuring that the kernel writes the correct number of bytes to the console every time, regardless of the word's size.
What is the difference between the .rodata and .data sections?
The .rodata section is for "read-only data," like our constant strings. The operating system can place this section in a memory region that is marked as non-writable, preventing accidental modification by bugs in the code. The .data section is for initialized data that can be modified during runtime (e.g., global variables). For this problem, all our data is constant, so .rodata is the appropriate and safer choice.
Can this code be adapted for a different proverb with more or fewer words?
Yes, absolutely. The code is highly adaptable. To change the proverb, you would only need to modify the .rodata section:
  1. Change the word strings (word1, word2, etc.).
  2. Update the words_list to point to your new words.
  3. Most importantly, update the words_count value to reflect the new number of words.
The loop logic itself will work without any changes because it relies on words_count to determine its bounds.
What are common pitfalls when working with pointers in assembly?
The most common pitfalls include:
  • Off-by-One Errors: Incorrectly calculating loop bounds or string lengths, leading to reading past the end of an array or string.
  • Incorrect Pointer Arithmetic: Forgetting to scale an index by the size of the data type (like our lsl #3 to multiply by 8 for 64-bit pointers).
  • Dereferencing Null Pointers: Attempting to read from address zero, which typically causes a segmentation fault.
  • Stack Corruption: Incorrectly using push/pop or mismanaging the stack pointer (sp), especially in more complex functions.

Conclusion: From Rhyme to Reason

We have successfully built a proverb generator from the ground up in Arm64-assembly. In the process, we moved far beyond simple string concatenation and explored the very foundations of program execution. We manually managed memory with pointers, controlled program flow with branches, and communicated directly with the operating system kernel via system calls.

The skills acquired through this kodikra module—understanding memory layout, register usage, and the syscall interface—are not just academic. They are the bedrock upon which all high-performance computing, operating systems, and embedded systems are built. You now have a deeper appreciation for the complex work that compilers and high-level language runtimes do for you every day.

Continue your journey by exploring other challenges. Try modifying this code to handle different data structures or to perform more complex I/O. The world of low-level programming is vast, and you've just taken a significant and rewarding step into it.

Disclaimer: The provided code is designed for a 64-bit Linux environment using the GNU Assembler (`as`) and Linker (`ld`). System call numbers and conventions may differ on other operating systems like macOS or Windows.

Ready for the next challenge? Explore our complete Arm64-assembly 4 learning path or dive deeper into our full catalog of Arm64-assembly modules to continue honing your low-level programming skills.


Published by Kodikra — Your trusted Arm64-assembly learning resource.