Food Chain in Arm64-assembly: Complete Solution & Deep Dive Guide
The Ultimate Guide to Algorithmic Song Generation in Arm64 Assembly
Discover how to generate the classic "Food Chain" song lyrics using pure Arm64 assembly. This guide breaks down the algorithmic approach, from data structuring and memory management to control flow with nested loops and direct Linux system calls, providing a deep understanding of low-level programming.
You've probably heard that assembly language is the arcane dialect of programming wizards, a cryptic layer of code sitting just above the raw electricity flowing through a CPU. Many developers see it as a relic, something to be studied in a computer architecture class and then promptly forgotten. But what if you could harness its raw power to solve a creative, algorithmic puzzle? What if you could make the machine sing?
The challenge isn't just about getting the right output; it's about thinking like the machine. It’s about manually managing memory, orchestrating loops with registers, and speaking directly to the operating system's kernel. This guide will walk you through exactly that. We will build a program that algorithmically generates the lyrics to a cumulative song, not by copying and pasting, but by designing a logical system in Arm64 assembly. Prepare to transform from a code user into a true machine whisperer.
What is the Food Chain Song Problem?
The "Food Chain" problem, a classic exercise from the exclusive kodikra.com curriculum, challenges you to programmatically generate the lyrics to the cumulative song "I Know an Old Lady Who Swallowed a Fly." A cumulative song is one where each verse builds upon the previous one, creating a growing, repetitive structure.
The core of the song follows a simple but expanding pattern. Each verse introduces a new, larger animal that the old lady swallows, and then recounts all the previous animals she swallowed in reverse order.
Let's look at the first few verses to understand the pattern:
I know an old lady who swallowed a fly.
I don't know why she swallowed the fly. Perhaps she'll die.
I know an old lady who swallowed a spider.
It wriggled and jiggled and tickled inside her.
She swallowed the spider to catch the fly.
I don't know why she swallowed the fly. Perhaps she'll die.
I know an old lady who swallowed a bird.
How absurd to swallow a bird!
She swallowed the bird to catch the spider that wriggled and jiggled and tickled inside her.
She swallowed the spider to catch the fly.
I don't know why she swallowed the fly. Perhaps she'll die.
...and so on.
The key algorithmic components are:
- A unique introductory line for each new animal.
- A unique "reaction" line for each animal (e.g., "It wriggled and jiggled...").
- A cumulative, recursive-like chain of "She swallowed the [animal A] to catch the [animal B]...".
- A consistent closing refrain for most verses.
- A special final verse for the horse, which ends differently.
Our task is to model this logic in Arm64 assembly without hardcoding the entire song. This requires careful data organization and robust control flow.
Why Use Arm64 Assembly for a String Generation Task?
You might be thinking, "Wouldn't Python or JavaScript solve this in ten lines of code?" And you'd be right. A high-level language would abstract away all the complexity. But that's precisely the point. By choosing Arm64 assembly, we peel back those layers of abstraction to gain invaluable insights.
Here's why tackling this problem in assembly is a powerful learning experience:
- Deep Understanding of Memory: You don't just declare a string; you allocate space for it in the
.datasection and manage its memory address directly. You'll learn how pointers and arrays are just addresses and offsets at the hardware level. - Mastery of Control Flow: High-level
forandwhileloops don't exist here. You will build them from the ground up using comparison instructions (cmp) and conditional branches (b.le,b.gt, etc.), giving you a fundamental understanding of how loops actually work. - Direct Kernel Interaction: Instead of calling a simple
print()function, you will prepare registers according to the Application Binary Interface (ABI) and execute a Supervisor Call (svc) to ask the Linux kernel to perform I/O operations for you. This is the bedrock of all program interaction with the OS. - CPU-Level Optimization: While not critical for this specific problem, working in assembly forces you to think about register allocation, instruction pipelines, and minimizing memory access—the core skills of performance engineering.
Solving this problem in assembly is like learning to build a car engine instead of just learning to drive. It's more challenging, but the depth of knowledge you gain is unparalleled. This exercise is a stepping stone from being a programmer to becoming a systems engineer.
How to Structure the Algorithmic Solution in Arm64
A successful assembly program is built on a foundation of clean data structures and logical code flow. We'll divide our solution into two main parts: the data definition in the .data section and the program logic in the .text section.
The Data Strategy: Organizing the Song Lyrics
Hardcoding the entire song is inefficient and misses the point of the algorithm. Instead, we break the song into its constituent, reusable parts and store them as null-terminated strings (using .asciz).
The key is to use arrays of pointers. We will define several arrays in memory, where each element of the array is the 8-byte memory address of a string.
animals: An array of pointers to the names of the animals ("fly", "spider", "bird", etc.).reactions: An array of pointers to the unique second line for each animal ("It wriggled and jiggled...", "How absurd...", etc.).catch_phrases: An array of pointers to the cumulative lines ("She swallowed the spider to catch the fly.", "She swallowed the bird to catch the spider...", etc.).- Common Phrases: Individual strings for recurring parts like "I know an old lady who swallowed a " and the final refrain.
This structure allows us to access any piece of a verse using a simple index. For example, to get the data for the "bird" (the 3rd animal), we can use index 2 (since we start counting from 0) to access animals[2], reactions[2], and so on.
Here is an ASCII diagram illustrating this data structure:
Memory Address
(e.g., 0x400080)
┌────────────────┐
│ animals_ptr │ Array of Pointers
└────────────────┘
│ │ │
0x...0 │ ● │ [0]─────┼───────────► "fly"
│ │ │
0x...8 │ ● │ [1]─────┼────────────────► "spider"
│ │ │
0x...16│ ● │ [2]─────┼─────────────────────► "bird"
│ │ │
│ ... │ │
└──────┴─────────┘
The Logic Strategy: Building the Nested Loops
The program's logic will reside in the .text section, starting from the _start label. The core of the algorithm is a set of nested loops.
Outer Loop (Verse Loop):
This loop will iterate through each animal, from the fly to the horse. We'll use a register, say x19, as our main verse counter (or index).
- Initialization: Set
x19to 0. - Loop Body:
- Print the first line: "I know an old lady who swallowed a ".
- Use the index
x19to get the address of the current animal's name from ouranimalspointer array and print it. - Use
x19to get the address of the unique reaction line from thereactionspointer array and print it. - Check if this is the last verse (the horse). If so, print the special ending and exit the program.
- If not the last verse, start the inner loop.
- Increment: Add 1 to
x19. - Condition: Compare
x19to the total number of verses. If it's less, loop again.
Inner Loop (Cumulative Loop):
This loop is responsible for printing the "She swallowed the X to catch the Y" part. It needs to iterate downwards from the current animal to the fly.
- Initialization: Use another register, say
x20, and set it equal to the current outer loop index (x19). - Loop Body:
- Use the index
x20to get the address of the appropriate catch phrase from ourcatch_phrasesarray. Note that the array indices will need careful management here. For example, `catch_phrases[0]` might be "She swallowed the spider to catch the fly". So the index into `catch_phrases` would be `x20 - 1`. - Print the catch phrase.
- Use the index
- Decrement: Subtract 1 from
x20. - Condition: Loop as long as
x20is greater than 0.
After the inner loop completes, we print the common refrain ("I don't know why...") and then the outer loop continues to the next verse.
This logic is visualized in the following flow diagram:
● Start
│
▼
┌──────────────────┐
│ verse_idx = 0 │
└─────────┬────────┘
│
▼
┌───────────┴───────────┐
│ Loop (verse_idx < 8) │
└───────────┬───────────┘
│
├─► Print "I know an old lady..."
│
├─► Print animal[verse_idx]
│
├─► Print reaction[verse_idx]
│
▼
◆ Is it the horse? ◆
╱ ╲
Yes No
│ │
▼ ▼
┌───────────┐ ┌──────────────────┐
│ Print End │ │ c_idx = verse_idx│
│ Exit │ └─────────┬────────┘
└───────────┘ │
▼
┌─────────────┴──────────────┐
│ Loop (c_idx > 0) │
└─────────────┬──────────────┘
│
├─► Print catch_phrase[c_idx-1]
│
├─► c_idx--
│
▼ Loop until c_idx <= 0
│
├─► Print "I don't know why..."
│
▼
verse_idx++
│
└─ Loop back to top
The Complete Arm64 Assembly Solution
Below is the full, commented source code for generating the Food Chain song. This code is designed for a Linux environment on an AArch64 architecture and can be assembled using as and linked using ld.
// food-chain.s
// Solution for the Food Chain song generation from the kodikra.com learning path.
// Assembled with: as food-chain.s -o food-chain.o
// Linked with: ld food-chain.o -o food-chain
.data
// --- Common Phrases ---
line1_part1: .asciz "I know an old lady who swallowed a "
line1_part2: .asciz ".\n"
line_final_common1: .asciz "I don't know why she swallowed the fly. Perhaps she'll die.\n"
line_final_horse: .asciz "She's dead, of course!\n"
newline: .asciz "\n"
// --- Animal Names ---
animal_fly: .asciz "fly"
animal_spider: .asciz "spider"
animal_bird: .asciz "bird"
animal_cat: .asciz "cat"
animal_dog: .asciz "dog"
animal_goat: .asciz "goat"
animal_cow: .asciz "cow"
animal_horse: .asciz "horse"
// --- Unique Reaction Lines ---
reaction_fly: .asciz "" // The first verse has no unique second line
reaction_spider: .asciz "It wriggled and jiggled and tickled inside her.\n"
reaction_bird: .asciz "How absurd to swallow a bird!\n"
reaction_cat: .asciz "Imagine that, to swallow a cat!\n"
reaction_dog: .asciz "What a hog, to swallow a dog!\n"
reaction_goat: .asciz "Just opened her throat and swallowed a goat!\n"
reaction_cow: .asciz "I don't know how she swallowed a cow!\n"
reaction_horse: .asciz "" // The last verse has a special ending instead
// --- Cumulative Catch Phrases ---
catch_spider: .asciz "She swallowed the spider to catch the fly.\n"
catch_bird: .asciz "She swallowed the bird to catch the spider that wriggled and jiggled and tickled inside her.\n"
catch_cat: .asciz "She swallowed the cat to catch the bird.\n"
catch_dog: .asciz "She swallowed the dog to catch the cat.\n"
catch_goat: .asciz "She swallowed the goat to catch the dog.\n"
catch_cow: .asciz "She swallowed the cow to catch the goat.\n"
// --- Pointer Arrays for Indexed Access ---
animals:
.quad animal_fly, animal_spider, animal_bird, animal_cat, animal_dog, animal_goat, animal_cow, animal_horse
reactions:
.quad reaction_fly, reaction_spider, reaction_bird, reaction_cat, reaction_dog, reaction_goat, reaction_cow, reaction_horse
catch_phrases:
.quad catch_spider, catch_bird, catch_cat, catch_dog, catch_goat, catch_cow
.text
.global _start
// A simple strlen function to calculate string length
// Input: x0 = address of the string
// Output: x0 = length of the string
strlen:
mov x1, x0 // x1 will be the moving pointer
mov x2, #0 // x2 will be the counter (length)
strlen_loop:
ldrb w3, [x1], #1 // Load byte and post-increment pointer
cmp w3, #0 // Check for null terminator
b.eq strlen_end // If null, we're done
add x2, x2, #1 // Increment length
b strlen_loop
strlen_end:
mov x0, x2 // Return length in x0
ret
// A function to print a string
// Input: x0 = address of the string
print_string:
stp x29, x30, [sp, #-16]! // Save frame pointer and link register
mov x29, sp
mov x1, x0 // Save string address in x1
bl strlen // Get length, result is in x0
mov x2, x0 // x2 = length
mov x0, #1 // x0 = file descriptor (1 for stdout)
// x1 is already the address
mov x8, #64 // x8 = syscall number for write
svc #0 // Make the system call
ldp x29, x30, [sp], #16 // Restore registers
ret
_start:
// x19 will be our main verse index (0 to 7)
mov x19, #0
verse_loop:
// Compare index with 8 (total number of verses)
cmp x19, #8
b.ge exit_program // If index >= 8, we are done
// --- Print Verse Header ---
// "I know an old lady who swallowed a [animal]."
ldr x0, =line1_part1
bl print_string
// Load address of the current animal's name string
ldr x0, =animals
mov x1, x19
lsl x1, x1, #3 // Multiply index by 8 (size of a pointer)
ldr x0, [x0, x1]
bl print_string
ldr x0, =line1_part2
bl print_string
// --- Handle Special Last Verse ---
cmp x19, #7 // Is it the horse?
b.eq verse_horse
// --- Print Unique Reaction Line (for verses 1-6) ---
cmp x19, #0 // The fly has no unique reaction line
b.ne print_reaction
b continue_verse
print_reaction:
ldr x0, =reactions
mov x1, x19
lsl x1, x1, #3
ldr x0, [x0, x1]
bl print_string
continue_verse:
// --- Print Cumulative Part (Inner Loop) ---
// x20 will be the inner loop counter, from current verse index down to 1
mov x20, x19
cumulative_loop:
cmp x20, #0
b.le end_cumulative_loop // Loop until counter is 0
// The catch_phrases array is offset.
// When verse is 1 (spider), we need catch_phrases[0].
// So we use index (x20 - 1).
sub x1, x20, #1
ldr x0, =catch_phrases
lsl x1, x1, #3
ldr x0, [x0, x1]
bl print_string
sub x20, x20, #1 // Decrement inner loop counter
b cumulative_loop
end_cumulative_loop:
// --- Print Final Common Lines ---
ldr x0, =line_final_common1
bl print_string
// Add a blank line between verses for readability
ldr x0, =newline
bl print_string
// --- Prepare for Next Verse ---
add x19, x19, #1 // Increment outer loop index
b verse_loop
verse_horse:
// Special handling for the last verse (the horse)
ldr x0, =line_final_horse
bl print_string
b exit_program
exit_program:
mov x0, #0 // Exit code 0 (success)
mov x8, #93 // Syscall number for exit
svc #0 // Make the system call
Detailed Code Walkthrough
Let's dissect the code to understand how each part contributes to the final output.
The .data Section
This is where we declare all our static data—the strings that make up the song. We use .asciz which automatically adds a null terminator (\0) to the end of each string. This is crucial for our strlen function.
The most important structures here are the pointer arrays: animals, reactions, and catch_phrases. We use .quad to define 8-byte (64-bit) memory addresses. When the program is loaded, the linker replaces labels like animal_fly with their actual memory addresses. This setup allows us to fetch the correct string using a calculated offset from the base address of the array, which is the core of our algorithmic approach.
The Helper Functions: strlen and print_string
To avoid repetitive code, we create two helper subroutines.
strlen: This function takes a string's address in registerx0. It iterates byte by byte (ldrb) from that address until it finds the null terminator, counting the bytes as it goes. The final count (the length) is returned inx0.print_string: This is our workhorse for I/O. It first callsstrlento get the length of the string. Then, it sets up the registers for the Linuxwritesyscall (number 64) according to the AArch64 ABI:x0: File descriptor (1 for standard output).x1: Address of the buffer (our string).x2: Number of bytes to write (the length).x8: The syscall number.
svc #0triggers the kernel to perform the write operation. The use ofstpandldpensures we save and restore the link register (x30), making our function calls safe.
The Main Logic: _start and the Loops
The program execution begins at the _start label.
verse_loop (Outer Loop):
We initialize a persistent register, x19, to 0. This register holds the index of the current verse. The loop continues as long as x19 is less than 8. Inside the loop, we perform pointer arithmetic to access our data arrays. The instruction lsl x1, x1, #3 is a fast way to multiply the index by 8 (left shifting by 3 bits), which gives us the correct byte offset into our pointer arrays.
A crucial part is the check for the final verse (cmp x19, #7). If it's the horse, we branch to a special handler verse_horse to print the unique ending and exit.
cumulative_loop (Inner Loop):
This loop is nested inside the outer one. It copies the current verse index from x19 to x20. It then decrements x20, printing the corresponding "catch phrase" at each step. The index into catch_phrases is x20 - 1 because this array is smaller and offset (it starts with catching the spider, not the fly). The loop terminates when x20 reaches 0.
Program Termination
The program ends by calling the exit syscall (number 93). We load x0 with the exit code (0 for success) and execute svc #0. This gracefully terminates the program and returns control to the operating system.
Alternative Approaches and Considerations
While our iterative, loop-based approach is efficient and clear, it's not the only way to solve this problem.
- Recursive Approach: The cumulative nature of the song lends itself to a recursive solution. One could write a function that, for a given verse `N`, calls itself for verse `N-1` before printing its own unique line. However, managing the stack frame (pushing and popping registers and return addresses) in assembly is significantly more complex and error-prone than a simple loop.
- Pre-calculating String Lengths: Our
strlenfunction is called multiple times for the same string. A performance optimization would be to calculate all lengths once at the start or, even better, define them as constants in the.datasection alongside the strings. This trades a tiny bit of binary size for faster execution, though the difference would be negligible for this program. - Comparison with a High-Level Language (Python): It's instructive to see how simple this is in a language like Python, which highlights the abstractions we are working without.
This comparison makes the verbosity and complexity of the assembly version apparent, but also showcases the raw control assembly provides.# A simplified Python equivalent animals = ["fly", "spider", "bird", ...] reactions = ["", "It wriggled...", ...] catch_phrases = ["She swallowed the spider to catch the fly.", ...] for i in range(len(animals)): print(f"I know an old lady who swallowed a {animals[i]}.") if reactions[i]: print(reactions[i]) if i == len(animals) - 1: print("She's dead, of course!") break for j in range(i, 0, -1): print(catch_phrases[j-1]) print("I don't know why she swallowed the fly. Perhaps she'll die.\n")
Pros & Cons of the Assembly Approach
| Pros | Cons |
|---|---|
| Maximum Performance & Efficiency: Direct CPU instructions and no runtime overhead result in the fastest possible execution. | High Complexity & Verbosity: Requires many lines of code for simple tasks like printing a string. |
| Complete Control: You manage every byte of memory and every CPU cycle. There are no hidden abstractions. | Platform Dependent: This code is specific to AArch64 architecture and the Linux ABI. It is not portable. |
| Excellent Learning Tool: Provides unparalleled insight into computer architecture, memory, and OS interaction. | Difficult to Debug: Errors often result in a `Segmentation fault` with little context, requiring tools like `gdb`. |
| Minimal Binary Size: The final executable is extremely small as it links against no external libraries. | Slow Development Time: Writing, testing, and debugging assembly is a meticulous and time-consuming process. |
FAQ: Food Chain in Arm64 Assembly
- Why use syscalls directly instead of linking against a C library like libc?
-
Using syscalls directly creates a "static" executable with zero external dependencies. This makes the binary extremely small and self-contained. While linking against libc would provide convenient functions like
printfandstrlen, it would abstract away the very low-level details we are trying to learn in this kodikra module, such as register conventions and kernel communication. - What exactly does the
svc #0instruction do? -
svcstands for Supervisor Call. It's an instruction that generates a software interrupt, causing the CPU to switch from user mode to the more privileged kernel mode. The operating system's kernel then handles the interrupt, reads the syscall number from registerx8, and executes the requested operation (like writing to the screen or exiting) on behalf of the program. - How is string length handled in this assembly code?
-
We use null-terminated strings, a convention popularized by the C language. Each string in the
.datasection ends with a zero byte (\0). Our customstrlenfunction dynamically calculates the length at runtime by iterating through memory from the string's start address until it encounters this null byte. - Could this code be made more memory-efficient?
-
Yes, slightly. There is some redundancy in the
catch_phrasesstrings. For example, "to catch the fly" appears multiple times. A more advanced solution could construct these lines dynamically from smaller string fragments to reduce the overall size of the.datasection. However, this would significantly increase the complexity of the code logic. - Is this code portable to other architectures like x86-64?
-
No, this code is not portable. Assembly language is, by definition, architecture-specific. The instruction set (
mov,ldr,svc), the number and names of registers (x0,x19), and the system call conventions are all unique to Arm64 (AArch64). Porting this to x86-64 would require a complete rewrite using x86-64 instructions and its different ABI. - What tools are needed to assemble and run this code?
-
On a Linux system with the necessary build tools, you would use the GNU Assembler (
as) and Linker (ld). The commands are typically:
These tools are part of the `binutils` package, which is standard on most Linux distributions.as food-chain.s -o food-chain.o ld food-chain.o -o food-chain ./food-chain - Why is the first verse (the fly) handled differently from the others?
-
The first verse is the base case. It has no unique reaction line and no cumulative "catch" part. Our code handles this implicitly: the inner cumulative loop (
cumulative_loop) starts with its counter at 0, so its condition (cmp x20, #0followed byb.le) fails immediately, causing it to be skipped entirely for the first verse.
Conclusion: From Instructions to Art
You have successfully journeyed from high-level problem-solving to low-level implementation, transforming a simple song's pattern into a precise sequence of Arm64 instructions. By completing this challenge, you've done more than just generate text; you've practiced fundamental computer science concepts like data structures, pointer arithmetic, control flow, and direct operating system interaction.
This exercise from the kodikra Arm64-assembly track demonstrates that assembly language is not just for writing device drivers or optimizing kernels. It is a powerful tool for understanding how software truly works. The discipline it requires—meticulous planning, careful register management, and a deep awareness of the underlying hardware—makes you a stronger, more knowledgeable programmer in any language.
Continue exploring the fascinating world of low-level programming by checking out the rest of the modules in our complete Arm64-assembly learning path. Keep building, keep learning, and keep commanding the machine.
Disclaimer: The code in this article is written for the AArch64 architecture using the Linux ABI for system calls. It is current as of the latest stable Linux kernel standards. Conventions may differ on other operating systems like macOS or Windows on ARM.
Published by Kodikra — Your trusted Arm64-assembly learning resource.
Post a Comment