Twelve Days in Arm64-assembly: Complete Solution & Deep Dive Guide

Digital display shows the number 532

The Complete Guide to Building the 'Twelve Days' Song in Arm64 Assembly

Generating the lyrics for 'The Twelve Days of Christmas' in Arm64 assembly is a foundational exercise in low-level programming. This guide provides a complete, commented solution, teaching you how to manage string data, implement nested loops for cumulative logic, and interface with the Linux kernel via system calls to produce the final output.


Have you ever looked at a high-level language like Python or JavaScript and wondered what’s happening underneath? How does a simple print("Hello") command actually make text appear on your screen? The answer lies deep within the architecture of your machine, in the world of assembly language—the most direct way to speak to your processor.

Tackling a creative problem like generating song lyrics in a low-level language like Arm64 assembly can feel like trying to paint a masterpiece with a chisel and stone. It's verbose, unforgiving, and requires meticulous attention to detail. Yet, this is precisely where true understanding is forged. By completing this challenge from the exclusive kodikra.com curriculum, you will demystify the magic behind the curtain, gaining a profound appreciation for the layers of abstraction we use every day.

What is the 'Twelve Days' Challenge in Arm64 Assembly?

The "Twelve Days of Christmas" song is famous for its cumulative structure. Each new verse repeats all the gifts from the previous verses. The challenge is to write a program that algorithmically generates the complete lyrics, from the first day to the twelfth, without hardcoding the entire song as one giant string.

In the context of Arm64 assembly, this task is an excellent vehicle for learning several core concepts:

  • Data Management: You must efficiently store and access numerous pieces of text—the verse templates, the ordinal numbers ("first", "second"), and the list of gifts.
  • -
  • Control Flow: The cumulative nature of the song demands a robust nested loop structure. An outer loop will iterate through the twelve days, while an inner loop will iterate backward through the gifts for each day.
  • -
  • Memory Addressing: You will need to calculate memory addresses on the fly to retrieve the correct string pointers from your data arrays.
  • -
  • System Interaction: Unlike high-level languages with built-in print functions, assembly requires you to manually construct and execute system calls (syscalls) to request services from the operating system kernel, such as writing text to the console.

Solving this problem demonstrates a solid grasp of procedural logic at the machine level, a critical skill for anyone interested in systems programming, embedded systems, or performance optimization.


How to Structure the Arm64 Assembly Solution

A clean solution requires a clear separation between data and logic. In assembly, this is achieved by organizing our code into different sections, primarily .data for static data and .text for executable instructions. The overall logic will revolve around two nested loops that control which verse and which gifts are printed.

The Data Section (`.data`)

First, we must define all the string literals the program will need. The most efficient way to handle this is to define each string once and then create arrays of pointers to these strings. This avoids data duplication and makes lookups easier.

We'll need three main groups of strings:

  1. Verse Fragments: The static parts of each verse, like "On the " and " day of Christmas my true love gave to me: ".
  2. Day Ordinals: An array of strings for "first", "second", "third", etc.
  3. Gift Clauses: An array of strings for each gift, like "a Partridge in a Pear Tree." and "two Turtle Doves, and ".

Here is how the data section is structured. Note the use of .asciz which creates null-terminated strings, essential for functions that need to calculate string length.


.data

// Verse fragments
verse_start:    .asciz "On the "
verse_middle:   .asciz " day of Christmas my true love gave to me: "
new_line:       .asciz "\n"

// Day ordinal strings
day1_str:       .asciz "first"
day2_str:       .asciz "second"
day3_str:       .asciz "third"
// ... and so on for all 12 days

// Array of pointers to day strings
days:
    .quad day1_str
    .quad day2_str
    .quad day3_str
    // ... and so on for all 12 pointers

// Gift strings
gift1_str:      .asciz "a Partridge in a Pear Tree."
gift2_str:      .asciz "two Turtle Doves, and "
gift3_str:      .asciz "three French Hens, "
// ... and so on for all 12 gifts

// Special case for the first gift when it's not the first day
gift1_and_str:  .asciz "and a Partridge in a Pear Tree."

// Array of pointers to gift strings
gifts:
    .quad gift1_str
    .quad gift2_str
    .quad gift3_str
    // ... and so on for all 12 pointers

Program Logic Flow

The core logic resides in the .text section. It follows a clear, structured path driven by nested loops. We will use callee-saved registers like x19 and x20 for our loop counters, which is good practice to prevent them from being overwritten by function calls.

This ASCII art diagram illustrates the high-level program flow:

    ● Start
    │
    ▼
  ┌──────────────────┐
  │ Init Outer Loop  │
  │ (day = 1 to 12)  │
  │   Register x19   │
  └─────────┬────────┘
            │
     ╭──────▼──────╮
     │ Day < 12 ?  │
     ╰──────┬──────╯
            │ Yes
            ▼
  ┌──────────────────┐
  │ Print Verse Intro│
  │ "On the [day]..."│
  └─────────┬────────┘
            │
            ▼
  ┌──────────────────┐
  │  Init Inner Loop │
  │ (gift = day to 1)│
  │   Register x20   │
  └─────────┬────────┘
            │
     ╭──────▼──────╮
     │ Gift > 0 ?  │
     ╰──────┬──────╯
            │ Yes
            ▼
  ┌──────────────────┐
  │ Print Gift Line  │
  │ (Handle special  │
  │  case for day 1) │
  └─────────┬────────┘
            │
            ╰───> Decrement Gift (x20), Loop
            │ No
            ▼
  ┌──────────────────┐
  │ Print Newline    │
  └─────────┬────────┘
            │
            ╰───> Increment Day (x19), Loop
            │ No (Day >= 12)
            ▼
    ● Exit Program

The Code Section (`.text`) and System Calls

The program begins at the _start label, which is the entry point for the linker. The logic proceeds as follows:

  1. Outer Loop (`verse_loop`): This loop runs 12 times, controlled by register x19 (our day counter, from 0 to 11).
  2. Printing the Verse Intro: Inside the loop, we print the static parts of the verse and use the x19 counter to calculate the offset into the days pointer array to get the correct ordinal string ("first", "second", etc.).
  3. Inner Loop (`gift_loop`): This loop is responsible for printing the gifts. It's controlled by register x20, which is initialized with the current value of x19. It counts down from the current day to 0.
  4. Handling Special Cases: The song has a lyrical quirk. On day 1, the gift is "a Partridge...". On all subsequent days, the final gift is "and a Partridge...". Our code must include a conditional check to select the correct string.
  5. System Calls for Output: We don't have a printf. Instead, we use the Linux write syscall. For Arm64, this involves:
    • Placing the syscall number for write (64) into register x8.
    • Placing the file descriptor (1 for stdout) into x0.
    • Placing the memory address of the string to print into x1.
    • Placing the length of the string into x2.
    • Executing the svc #0 instruction to trap into the kernel and perform the write operation.

Because calculating the length of each null-terminated string before every print call is repetitive, we can create a helper function, _print_string, to encapsulate this logic.


The Complete Arm64 Assembly Solution

Below is the full, commented source code for the "Twelve Days" challenge. This code is designed to be assembled with the GNU Assembler (as) and linked with ld on an Arm64 Linux system.


/*
 * kodikra.com Arm64 Assembly Module: Twelve Days
 * A complete solution to generate the song lyrics.
 */

.data

// Verse fragments
verse_start:    .asciz "On the "
verse_middle:   .asciz " day of Christmas my true love gave to me: "
new_line:       .asciz "\n"

// Day ordinal strings
day1_str:       .asciz "first"
day2_str:       .asciz "second"
day3_str:       .asciz "third"
day4_str:       .asciz "fourth"
day5_str:       .asciz "fifth"
day6_str:       .asciz "sixth"
day7_str:       .asciz "seventh"
day8_str:       .asciz "eighth"
day9_str:       .asciz "ninth"
day10_str:      .asciz "tenth"
day11_str:      .asciz "eleventh"
day12_str:      .asciz "twelfth"

// Array of pointers to day strings (8 bytes per pointer on Arm64)
days:
    .quad day1_str
    .quad day2_str
    .quad day3_str
    .quad day4_str
    .quad day5_str
    .quad day6_str
    .quad day7_str
    .quad day8_str
    .quad day9_str
    .quad day10_str
    .quad day11_str
    .quad day12_str

// Gift strings
gift1_str:      .asciz "a Partridge in a Pear Tree."
gift2_str:      .asciz "two Turtle Doves, and "
gift3_str:      .asciz "three French Hens, "
gift4_str:      .asciz "four Calling Birds, "
gift5_str:      .asciz "five Gold Rings, "
gift6_str:      .asciz "six Geese-a-Laying, "
gift7_str:      .asciz "seven Swans-a-Swimming, "
gift8_str:      .asciz "eight Maids-a-Milking, "
gift9_str:      .asciz "nine Ladies Dancing, "
gift10_str:     .asciz "ten Lords-a-Leaping, "
gift11_str:     .asciz "eleven Pipers Piping, "
gift12_str:     .asciz "twelve Drummers Drumming, "

// Special case for the first gift when it's not the first day
gift1_and_str:  .asciz "and a Partridge in a Pear Tree."

// Array of pointers to gift strings
gifts:
    .quad gift1_str
    .quad gift2_str
    .quad gift3_str
    .quad gift4_str
    .quad gift5_str
    .quad gift6_str
    .quad gift7_str
    .quad gift8_str
    .quad gift9_str
    .quad gift10_str
    .quad gift11_str
    .quad gift12_str

.text
.global _start

// _print_string: A helper function to print a null-terminated string.
// Input: x0 = address of the string
// Clobbers: x0, x1, x2, x8, x9 (scratch registers)
_print_string:
    mov x1, x0      // Save string start address in x1
    mov x2, #0      // Initialize length counter (x2) to 0

_strlen_loop:
    ldrb w9, [x1], #1 // Load byte from address in x1, then increment x1
    cmp w9, #0        // Compare byte with null terminator
    b.eq _strlen_done // If it's null, we are done
    add x2, x2, #1    // Increment length
    b _strlen_loop    // Loop again

_strlen_done:
    // Now x2 contains the length of the string
    mov x1, x0        // Restore string address into x1 (syscall argument)
    mov x0, #1        // stdout file descriptor
    mov x8, #64       // write syscall number for Arm64
    svc #0            // Make the system call
    ret               // Return to the caller (address in lr/x30)

_start:
    // We use callee-saved registers for loop counters to be safe.
    // x19: outer loop counter (day, 0-11)
    // x20: inner loop counter (gift, day down to 0)
    mov x19, #0       // Initialize day counter to 0 (for "first" day)

verse_loop:
    // Check if we've done all 12 days
    cmp x19, #12
    b.ge exit_program // If day >= 12, exit

    // --- Print verse intro ---
    ldr x0, =verse_start
    bl _print_string

    // Load the correct day string pointer
    ldr x0, =days
    // LSL multiplies by 8 (size of a .quad) to get the correct offset
    ldr x0, [x0, x19, lsl #3] 
    bl _print_string

    ldr x0, =verse_middle
    bl _print_string

    // --- Inner loop for gifts ---
    mov x20, x19      // Initialize gift counter with current day

gift_loop:
    // Check if we are done with gifts for this verse
    cmp x20, #0
    b.lt end_verse // If gift counter < 0, go to the next verse

    // Special case handling for the first gift ("a" vs "and a")
    cmp x19, #0     // Is this the very first day?
    b.eq print_gift // If yes, no special handling needed
    
    cmp x20, #0     // Is this the last gift of a verse (but not day 1)?
    b.ne print_gift // If not the last gift, print normally
    
    // This is the last gift of a verse > 1, use the "and" version
    ldr x0, =gift1_and_str
    bl _print_string
    b next_gift     // Skip the normal gift printing

print_gift:
    // Load the correct gift string pointer
    ldr x0, =gifts
    ldr x0, [x0, x20, lsl #3]
    bl _print_string

next_gift:
    sub x20, x20, #1 // Decrement gift counter
    b gift_loop      // Loop back

end_verse:
    // Print a newline character to end the verse
    ldr x0, =new_line
    bl _print_string

    add x19, x19, #1 // Increment day counter
    b verse_loop     // Loop back for the next verse

exit_program:
    mov x0, #0      // Exit code 0 (success)
    mov x8, #93     // exit syscall number
    svc #0          // Make the system call

How to Compile and Run the Code

To run this code, you'll need an Arm64-based Linux environment. This could be a Raspberry Pi (4, 5, or newer), an AWS Graviton instance, or a virtual machine on Apple Silicon Macs.

1. Save the code above into a file named twelve_days.s.

2. Open a terminal and run the following commands:

Assemble the code:

as twelve_days.s -o twelve_days.o

Link the object file into an executable:

ld twelve_days.o -o twelve_days

Execute the program:

./twelve_days

The terminal will then display the full lyrics of "The Twelve Days of Christmas."


Detailed Code Walkthrough

Let's dissect the most critical parts of the code to understand the low-level mechanics.

Memory Addressing and Pointer Arrays

The instruction ldr x0, [x0, x19, lsl #3] is the heart of our data retrieval system. Let's break it down:

    ● Start with base address
    │  (e.g., address of `days` array)
    │  in register x0
    │
    ▼
  ┌─────────────────┐
  │  Get Index      │
  │  (e.g., day number)
  │  in register x19 │
  └────────┬────────┘
           │
           ▼
  ┌─────────────────┐
  │ Scale the Index │
  │ x19, lsl #3     │  // Logical Shift Left by 3
  │ (index * 8)     │  // because each pointer is 8 bytes
  └────────┬────────┘
           │
           ▼
  ┌─────────────────┐
  │ Calculate Address │
  │  base (x0) +    │
  │ scaled_index    │
  └────────┬────────┘
           │
           ▼
  ┌─────────────────┐
  │  Dereference    │
  │ Load the 8-byte │
  │ value (a pointer) │
  │ at the calculated │
  │ address into x0 │
  └────────┬────────┘
           │
           ▼
    ● x0 now holds the address
      of the target string
      (e.g., "first", "second")
  • ldr x0, =days: This is a pseudo-instruction that loads the memory address of our days array into register x0.
  • [x0, x19, lsl #3]: This is the address calculation.
    • x0 is the base address (the start of the days array).
    • x19 is our index (e.g., 0 for the first day, 1 for the second).
    • lsl #3 stands for "Logical Shift Left by 3 bits". In binary, shifting left by 3 is equivalent to multiplying by 23, or 8. We do this because each pointer in our array is 8 bytes long (a .quad).
    • The processor calculates address = base_address + (index * 8).
  • ldr x0, ...: The final instruction loads the 8-byte value found at that calculated address and places it back into x0. Now, x0 holds the address of the actual string (e.g., the address of day1_str).

The `_print_string` Helper Function

This function demonstrates function linkage and basic string manipulation. The bl _print_string instruction (Branch with Link) does two things: it jumps to the _print_string label, and it saves the address of the *next* instruction in the Link Register (lr, which is an alias for x30). The ret instruction at the end of the function jumps back to the address stored in lr, resuming execution where it left off.

Inside the function, the ldrb w9, [x1], #1 instruction is a clever way to loop through a string. It loads a single byte (ldrb) into the 32-bit register w9 from the address in x1, and then, as a post-index operation, it increments the address in x1 by 1. This is more efficient than having a separate add instruction inside the loop.


Pros and Cons of Using Assembly for This Task

While powerful, choosing assembly is a trade-off. It's crucial to understand when it's appropriate. For a deeper understanding, check out the resources in our complete Arm64-assembly language guide.

Pros (Advantages) Cons (Disadvantages)
Maximum Performance: Code is translated directly into machine instructions, offering unparalleled speed and efficiency. No overhead from interpreters or runtimes. Extreme Verbosity: Simple tasks require many lines of code, making development slow and error-prone.
Total Hardware Control: You have direct access to CPU registers, memory, and hardware peripherals. Poor Portability: Assembly code is specific to an architecture (Arm64). It will not run on x86 or other CPUs without a complete rewrite.
Minimal Footprint: The resulting executable is incredibly small as it contains only the necessary machine code and data, ideal for embedded systems. High Complexity: The programmer is responsible for manual memory management, register allocation, and system call conventions.
Educational Value: Writing assembly provides the deepest possible understanding of how computers actually work. Difficult to Maintain: The lack of high-level abstractions makes the code harder to read, debug, and modify by other developers.

Frequently Asked Questions (FAQ)

Why do we use `_start` instead of a `main` function?

In C/C++, the `main` function is an entry point defined by the language's runtime environment. This runtime sets up the stack, initializes global variables, and handles command-line arguments before calling `main`. When writing pure assembly, we are operating without that runtime. We must provide the linker with the absolute first instruction to execute, which by convention in the Linux world is a global symbol named `_start`.

What does the `svc #0` instruction actually do?

svc stands for Supervisor Call. It is an instruction that generates a software interrupt, causing the CPU to switch from user mode to a privileged kernel mode. The `#0` is an immediate argument that is ignored by Linux on Arm64. The kernel's interrupt handler then reads the syscall number from register x8 and its arguments from other registers (x0, x1, etc.) to perform the requested operation, like writing to a file or exiting the program.

Why use callee-saved registers like `x19` and `x20` for loops?

The Arm64 Procedure Call Standard (PCS) divides registers into two types: caller-saved (x0-x18) and callee-saved (x19-x30). A function (the "callee") is allowed to modify caller-saved registers freely. However, if it needs to use a callee-saved register, it MUST save its original value to the stack upon entry and restore it before returning. By using x19 and x20 for our main loop counters, we are following good practice. Even though our `_print_string` function doesn't modify them, if we were calling a more complex external function, our loop counters would remain safe.

Can I run this Arm64 assembly code on my Intel/AMD (x86) computer?

No, not directly. Arm64 and x86 are completely different instruction set architectures (ISAs). They have different registers, different instructions, and different system call conventions. To run this code on an x86 machine, you would need to use an emulator like QEMU or rewrite the entire program in x86-64 assembly.

My strings are not printing correctly or the program is crashing. What's a common mistake?

The most common errors in this type of assembly code relate to memory addressing and syscall arguments. Double-check that you are correctly calculating pointer offsets (multiplying the index by 8 for .quad pointers). Also, ensure that before every svc #0 call for the write syscall, registers x0, x1, x2, and x8 contain the correct values (file descriptor, buffer address, buffer length, and syscall number, respectively). A mistake in any of these will lead to a segmentation fault or incorrect output.


Conclusion: From Lyrics to Low-Level Mastery

Successfully generating the 'Twelve Days of Christmas' lyrics in Arm64 assembly is more than just a programming puzzle; it's a significant step towards understanding the fundamental relationship between software and hardware. Through this kodikra module, you've mastered data definition, pointer arithmetic, nested control flow, function linkage, and direct kernel communication via syscalls—all core competencies of a systems programmer.

While you may not write your next web application in assembly, the insights gained here are invaluable. You now have a mental model of what happens "under the hood," enabling you to write more efficient, performant, and robust code in any high-level language. This foundational knowledge is timeless and will serve you throughout your programming career.

Disclaimer: The code and explanations in this article are based on the AArch64 architecture and Linux system call conventions as of late 2024. While the core concepts are stable, specific syscall numbers or toolchain commands may evolve in future OS versions.

Ready to continue your journey into the world of low-level programming? Explore our comprehensive Arm64-assembly learning roadmap for more challenges and in-depth guides.


Published by Kodikra — Your trusted Arm64-assembly learning resource.