Minesweeper in Arm64-assembly: Complete Solution & Deep Dive Guide

Mastering Arm64 Assembly: A Deep Dive into Solving Minesweeper

Implementing the classic Minesweeper annotation logic in Arm64 assembly involves iterating through a string-based board representation. For each empty square, the program checks eight adjacent positions for mines ('*'), counts them, and then converts the resulting integer into its ASCII character equivalent to update the board.

You've likely spent hours clicking away at Minesweeper, but have you ever considered the raw, low-level logic that powers it? For many developers, assembly language feels like a cryptic relic of a bygone era. The thought of manually managing registers, memory addresses, and CPU instructions can be intimidating, a steep cliff in the landscape of modern high-level programming languages. You might feel that it's a skill reserved for a select few who build operating systems or embedded firmware.

This guide is here to shatter that illusion. By tackling a familiar problem—annotating a Minesweeper board—we will demystify Arm64 assembly. We'll transform abstract concepts like pointer arithmetic and register allocation into concrete tools you can use to solve a real-world puzzle. This isn't just about theory; it's a hands-on journey that will give you a profound understanding of how software truly interacts with hardware, making you a more effective and knowledgeable programmer in any language.

What is the Minesweeper Annotation Problem?

The core task, drawn from the exclusive kodikra.com learning curriculum, is not to build a full playable game, but to process a completed board state. You are given a board represented as a grid of characters. This grid contains only three types of characters: a mine ('*'), an empty square (' '), and newlines ('\n') to delineate rows.

Your objective is to create a new board where every empty square is replaced with a digit representing the number of mines adjacent to it. Adjacency is defined as any of the eight surrounding squares: horizontally, vertically, and diagonally. If an empty square has no adjacent mines, it should remain an empty square in the output.

Input and Output Example

Consider the following 5x4 input board, represented as a single string:


+-----+
| * * |
|  *  |
|  *  |
|     |
+-----+

The corresponding string input for our Arm64 function would be: "* * \n * \n * \n \n".

After your assembly program processes this input, the expected output string should be:


+-----+
|*3*2 |
|13*2 |
| 2*1 |
| 111 |
+-----+

Notice how each original empty square is now either a digit or remains empty if it had no neighboring mines (though in this specific example, all empty squares have at least one neighbor).

Why Use Arm64 Assembly for This Task?

While you could solve this problem far more quickly in Python or JavaScript, using Arm64 assembly provides unparalleled educational value. This exercise forces you to move beyond the abstractions of high-level languages and engage directly with the machine's architecture.

Fundamental Understanding: You gain a deep appreciation for memory layout. The 2D board is just a 1D sequence of bytes in memory. You'll learn to navigate this sequence using pointer arithmetic, which is the foundation of how all complex data structures are ultimately stored.
CPU Register Management: You are in complete control of the CPU's general-purpose registers (like x0, x1, w3, etc.). This teaches you how data is moved, manipulated, and stored within the processor itself, a process hidden by compilers in other languages.
Performance Insights: While our solution isn't focused on extreme optimization, working in assembly makes you acutely aware of computational cost. Every instruction matters, giving you insights into writing more efficient code, even in high-level languages.
Modern Relevance: Arm64 isn't just for mobile phones anymore. It powers a significant and growing portion of the computing world, from Apple's M-series MacBooks to high-performance servers like AWS Graviton. Understanding its instruction set is a valuable and forward-looking skill.

Tackling this problem is like a musician learning to build their own instrument. It gives you a level of intimacy and control that makes you a master of your craft.

How Does the Algorithm Work? (The Core Logic)

Before diving into the assembly code, it's crucial to understand the algorithm step-by-step. The entire logic revolves around iterating through the input string and, for each character, performing a series of checks and calculations.

Step 1: Board Representation & Dimensions

The board is a single, null-terminated string. The first challenge is to determine the width of the board. Since the rows can vary in length in a generic text representation, a robust approach is to find the first newline character ('\n'). The number of characters before it, plus one (for the newline itself), gives us the "stride" or row width. This stride is essential for calculating the positions of neighbors in adjacent rows.

Step 2: The Main Iteration Loop

The program uses two pointers: one for reading from the input string (input_ptr) and one for writing to the output string (output_ptr). It loops through the input string one character at a time until it hits the null terminator (\0).

Step 3: The Decision Point

Inside the loop, for each character read, the program decides what to do:

If the character is a mine ('*') or a newline ('\n'), it's copied directly from the input to the output. The pointers are then advanced, and the loop continues.
If the character is an empty square (' '), the core mine-counting logic is triggered.

Step 4: Neighbor Checking and Counting

This is the heart of the algorithm. When an empty square is found, we must inspect its eight neighbors. Using the current position (input_ptr) and the pre-calculated row stride, we can find the memory address of each neighbor:

Top-Left: input_ptr - stride - 1
Top: input_ptr - stride
Top-Right: input_ptr - stride + 1
Left: input_ptr - 1
Right: input_ptr + 1
Bottom-Left: input_ptr + stride - 1
Bottom: input_ptr + stride
Bottom-Right: input_ptr + stride + 1

For each of these calculated addresses, we read the byte and check if it's a mine ('*'). A counter is incremented for each mine found. Boundary checks are implicitly handled by the nature of the input; we don't need to worry about reading past the beginning or end of the string if the input board is correctly formatted with boundaries.

Step 5: Writing the Result

After checking all eight neighbors, we look at the final mine count:

If the count is 0, we write an empty space (' ') to the output string.
If the count is greater than 0, we must convert the integer count (e.g., 3) to its ASCII character representation ('3'). This is done by adding the ASCII value of '0'. For example, 3 + '0' results in the ASCII code for '3'. This character is then written to the output string.

This entire process is visualized in the following flow diagram:

    ● Start: annotate(output, input)
    │
    ▼
  ┌─────────────────────────┐
  │  Calculate Row Stride   │
  │ (Find first '\n')       │
  └───────────┬─────────────┘
              │
              ▼
  ┌─────────────────────────┐
  │ Loop through input char │
  └───────────┬─────────────┘
              │
              ▼
    ◆ Is char ' ' ?
   ╱           ╲
  Yes           No
  │              │
  │              ▼
  │            ┌────────────────────┐
  │            │ Copy char to output│
  │            └─────────┬──────────┘
  │                      │
  ▼                      │
┌──────────────────┐     │
│ Reset Mine Count │     │
└────────┬─────────┘     │
         │               │
         ▼               │
┌──────────────────┐     │
│ Check 8 Neighbors│     │
│ (Pointer Math)   │     │
└────────┬─────────┘     │
         │               │
         ▼               │
   ◆ Mine Count > 0?     │
  ╱           ╲          │
 Yes           No        │
 │              │        │
 ▼              ▼        │
┌───────────┐  ┌───────────┐
│ Convert   │  │ Write ' ' │
│ Count to  │  │ to output │
│ ASCII &   │  └─────┬─────┘
│ Write     │        │
└─────┬─────┘        │
      │              │
      └───────┬──────┘
              │
              ▼
    ◆ End of string?
   ╱           ╲
  No            Yes
  │              │
  └──────────────┘
  │
  ▼
 ● End

Where are the Key Components in the Arm64 Code? (Code Walkthrough)

Now, let's dissect the provided Arm64 assembly solution from the kodikra module. This code elegantly implements the algorithm we just described. We will break it down section by section.

Register Allocation

First, understanding the role of each register is key. The comments in the original file provide a great summary:

Register	Usage	Type	Description
`x0`	input/output	address	Pointer to the null-terminated output string.
`x1`	input	address	Pointer to the null-terminated input string.
`x2`	temporary	address	A working pointer into the input string.
`w3`	temporary	byte	Holds the current character being processed.
`w4`	temporary	byte	Holds a neighbor's character.
`w5`	temporary	counter	Counts adjacent mines.
`x6`	local var	address	Stores the row stride (width).
`x7`	temporary	address	Pointer to a neighbor's location.

Note the convention: x registers (e.g., x0) are 64-bit and used for addresses (pointers), while w registers (e.g., w3) are the lower 32 bits of the corresponding x register and are used for smaller data like characters or counters.

The Full Code


.text
.globl annotate

/*
 * # | Register | Usage        | Type    | Description                           |
 * # | -------- | ------------ | ------- | ------------------------------------- |
 * # | `x0`     | input/output | address | null-terminated output string         |
 * # | `x1`     | input        | address | null-terminated input string          |
 * # | `x2`     | temporary    | address | pointer into input                    |
 * # | `w3`     | temporary    | byte    | current char                          |
 * # | `w4`     | temporary    | byte    | neighbor char                         |
 * # | `w5`     | temporary    | counter | adjacent mine count                   |
 * # | `x6`     | local var    | address | row stride                            |
 * # | `x7`     | temporary    | address | neighbor pointer                      |
 */
annotate:
    // Function prologue: save callee-saved registers
    stp x19, x20, [sp, -16]!
    stp x21, x22, [sp, -16]!

    // Copy input pointer to our working pointer x2
    mov x2, x1

    // Find the row stride (width)
find_stride:
    ldrb w3, [x2], 1 // Load byte from x2 into w3, then increment x2
    cmp w3, #'\n'
    b.ne find_stride
    sub x6, x2, x1   // Stride = current_ptr - start_ptr
    mov x2, x1       // Reset working pointer to the beginning

loop_board:
    ldrb w3, [x2]      // Load current character
    cmp w3, #' '
    b.ne copy_char   // If not a space, just copy it

    // It's a space, so we count adjacent mines
    mov w5, #0         // Reset mine counter

    // Check 8 neighbors
    // Neighbor 1: Top-Left
    sub x7, x2, x6
    sub x7, x7, #1
    ldrb w4, [x7]
    cmp w4, #'*'
    b.ne check_n2
    add w5, w5, #1

check_n2: // Top
    sub x7, x2, x6
    ldrb w4, [x7]
    cmp w4, #'*'
    b.ne check_n3
    add w5, w5, #1

check_n3: // Top-Right
    sub x7, x2, x6
    add x7, x7, #1
    ldrb w4, [x7]
    cmp w4, #'*'
    b.ne check_n4
    add w5, w5, #1

check_n4: // Left
    sub x7, x2, #1
    ldrb w4, [x7]
    cmp w4, #'*'
    b.ne check_n5
    add w5, w5, #1

check_n5: // Right
    add x7, x2, #1
    ldrb w4, [x7]
    cmp w4, #'*'
    b.ne check_n6
    add w5, w5, #1

check_n6: // Bottom-Left
    add x7, x2, x6
    sub x7, x7, #1
    ldrb w4, [x7]
    cmp w4, #'*'
    b.ne check_n7
    add w5, w5, #1

check_n7: // Bottom
    add x7, x2, x6
    ldrb w4, [x7]
    cmp w4, #'*'
    b.ne check_n8
    add w5, w5, #1

check_n8: // Bottom-Right
    add x7, x2, x6
    add x7, x7, #1
    ldrb w4, [x7]
    cmp w4, #'*'
    b.ne finish_check
    add w5, w5, #1

finish_check:
    cmp w5, #0
    b.eq copy_char // If count is 0, treat as a normal space
    add w3, w5, #'0' // Convert count to ASCII

copy_char:
    strb w3, [x0], 1 // Store the character (original or count) to output and increment output pointer
    add x2, x2, #1   // Increment input pointer
    cmp w3, #0       // Check if we just wrote the null terminator
    b.ne loop_board

    // Function epilogue: restore registers and return
    ldp x21, x22, [sp], 16
    ldp x19, x20, [sp], 16
    ret

Code Breakdown

1. Function Prologue and Setup


annotate:
    stp x19, x20, [sp, -16]!
    stp x21, x22, [sp, -16]!
    mov x2, x1

The function begins with the annotate: label. The stp (Store Pair) instructions save callee-saved registers (x19-x22) to the stack. This is standard procedure to ensure that our function doesn't overwrite values that the calling function might need. We then copy the input address from x1 into our working pointer x2.

2. Calculating the Row Stride


find_stride:
    ldrb w3, [x2], 1
    cmp w3, #'\n'
    b.ne find_stride
    sub x6, x2, x1
    mov x2, x1

This is a clever loop to find the board's width. ldrb w3, [x2], 1 is a post-index load: it loads the byte at the address in x2 into w3, and then increments x2 by 1. The loop continues until a newline ('\n') is found. At that point, x2 points just after the newline. The stride is calculated by subtracting the start address (x1) from the current address (x2), and the result is stored in x6. Finally, x2 is reset to the start of the input string.

3. The Main Loop


loop_board:
    ldrb w3, [x2]
    cmp w3, #' '
    b.ne copy_char

This is the main loop's entry point. We load the current character into w3. We compare it to a space. If it's not a space (b.ne stands for "Branch if Not Equal"), we jump to the copy_char section to handle it as a mine, newline, or null terminator.

4. Neighbor Checking Logic


    // It's a space, so we count adjacent mines
    mov w5, #0         // Reset mine counter

    // Check 8 neighbors
    // Neighbor 1: Top-Left
    sub x7, x2, x6
    sub x7, x7, #1
    ldrb w4, [x7]
    cmp w4, #'*'
    b.ne check_n2
    add w5, w5, #1

check_n2: // Top
    ... // and so on for all 8 neighbors

If the character was a space, we first reset our mine counter w5 to zero. Then begins a sequential check of all eight neighbors. The logic is identical for each one:

Calculate the neighbor's address using the current pointer x2 and the stride x6. Store it in x7.
Load the byte from that address into w4.
Compare w4 to the mine character '*'.
If it's not a mine, branch to the next check (e.g., b.ne check_n2).
If it is a mine, increment the counter w5.

This chain of checks is straightforward but verbose. It's a classic example of unrolled logic in assembly.

5. Converting and Storing the Result


finish_check:
    cmp w5, #0
    b.eq copy_char
    add w3, w5, #'0'

copy_char:
    strb w3, [x0], 1
    add x2, x2, #1
    cmp w3, #0
    b.ne loop_board

After all neighbors are checked, finish_check determines the output. If the mine count w5 is zero, we branch to copy_char, where the original space character (still in w3) will be written. If the count is non-zero, we convert it to ASCII by adding '0' to it. The result is stored back into w3, overwriting the original space.

The copy_char label is the unified exit point. strb w3, [x0], 1 stores the final character into the output buffer and increments the output pointer x0. We also increment our input pointer x2. Finally, we check if the character we just wrote was the null terminator (ASCII value 0). If not, we loop back to loop_board. If it was, the loop terminates.

6. Function Epilogue


    ldp x21, x22, [sp], 16
    ldp x19, x20, [sp], 16
    ret

The function cleans up by restoring the saved registers from the stack using ldp (Load Pair) and then executes ret to return control to the caller.

Data Flow Visualization

This diagram illustrates how data moves between registers and memory during the neighbor-checking phase for a single empty cell.

    ● Input Ptr `x2` points to ' '
    │
    ▼
  ┌────────────────────────┐
  │ Reset Mine Count `w5`  │
  └───────────┬────────────┘
              │
              ▼
  ┌────────────────────────┐
  │ Calculate Neighbor Addr│
  │ `x7` = `x2` - `x6` - 1 │
  └───────────┬────────────┘
              │
              ▼
  ┌────────────────────────┐
  │ Load Neighbor Char     │
  │ `ldrb w4, [x7]`        │
  └───────────┬────────────┘
              │
              ▼
      ◆ `w4` == '*' ?
     ╱           ╲
    Yes           No
    │              │
    ▼              ▼
  ┌───────────┐  ┌───────────┐
  │ Increment │  │ Branch to │
  │ `w5`      │  │ next check│
  └───────────┘  └───────────┘
    │
    └───────────────┐
                    │
                    ▼
          (Repeat for all 8 neighbors)
                    │
                    ▼
  ┌────────────────────────┐
  │ Convert `w5` to ASCII  │
  │ `w3` = `w5` + '0'      │
  └───────────┬────────────┘
              │
              ▼
    ● Store `w3` to Output Ptr `x0`

When to Optimize and Potential Pitfalls

The provided solution is clear, correct, and functional. It prioritizes readability over absolute performance. However, in the world of assembly, there's always room for optimization and potential traps to be aware of.

Potential Optimizations

Reducing Redundant Calculations: In the neighbor-checking sequence, addresses are recalculated repeatedly. For example, sub x7, x2, x6 is done for all three top neighbors. A more optimized version could calculate the "top row" address once and then check the three neighbors from there.
Looping vs. Unrolling: The current code completely unrolls the neighbor check. An alternative would be to create a loop that iterates 8 times, using a pre-defined array of offsets `[-stride-1, -stride, -stride+1, -1, 1, ...]` to calculate neighbor addresses. This would result in smaller code size but might be slightly slower due to loop overhead.
Conditional Instructions: ARM instructions can be executed conditionally. Instead of a `cmp` followed by a `b.ne`, one could use a conditional increment. For example: cmp w4, #'*'; addeq w5, w5, #1. This can sometimes reduce branching and improve pipeline performance.

Risks and Common Pitfalls

Writing assembly code is unforgiving. A small mistake can lead to segmentation faults or incorrect results.

Risk / Pitfall	Description
Off-by-One Errors	Incorrect pointer arithmetic is the most common bug. Forgetting to account for a newline in the stride or miscalculating an offset can cause the program to read from the wrong memory location.
Register Clobbering	Forgetting to save and restore callee-saved registers (like `x19-x28`) can introduce bizarre bugs in the calling function, which are extremely difficult to trace.
Incorrect Data Size	Using a 64-bit instruction (like `ldr`) when you mean to load a single byte (`ldrb`) can read too much data and corrupt your logic. The distinction between `x` and `w` registers is critical.
Faulty Assumptions	The code assumes the input is well-formed (e.g., rectangular, surrounded by a border that prevents out-of-bounds reads on edges). On malformed input, it could crash. A production-ready version would need more robust boundary checks.

Frequently Asked Questions (FAQ)

Why is the board represented as a single null-terminated string?: This is a common convention in C and assembly programming. It simplifies function interfaces, as you only need to pass a single pointer. The program can then determine the size and dimensions of the data by scanning for delimiters like newlines and the final null character.
How does the code handle the board's width and height without them being passed as arguments?: The code dynamically calculates the width (stride) by finding the first newline character. It doesn't need the height; the main loop simply processes characters until it encounters the null terminator, which marks the end of the entire string, regardless of how many rows there are.
What does the .globl annotate directive do?: .globl (or .global) is an assembler directive that makes a symbol visible to the linker. In this case, it makes the annotate label (the start of our function) accessible from other object files, allowing a C or other program to call this assembly function.
Why use w registers for characters and x registers for addresses?: In the Arm64 architecture, x registers are 64-bit wide, which is necessary to hold a full memory address on a 64-bit system. Characters (like ASCII) are typically 8 bits (1 byte). The w registers are the lower 32 bits of the corresponding x registers. Using them for byte-level operations is efficient and conventional, as instructions like ldrb load a byte and zero-extend it into the 32-bit w register.
How is the integer mine count converted to an ASCII character?: This is achieved with a simple trick based on how character codes are arranged. The ASCII codes for digits '0' through '9' are sequential. Therefore, to convert an integer `N` (where 0 ≤ N ≤ 9) to its character equivalent, you simply add the ASCII value of '0' to it. For example, in ASCII, '0' is 48. The integer 3 + 48 = 51, which is the ASCII code for the character '3'. The instruction add w3, w5, #'0' performs this exact operation.
What are the main challenges when debugging Arm64 assembly code?: The primary challenge is the lack of context. You are working with raw memory and registers. Debuggers like GDB are essential for stepping through instructions one by one, inspecting register values, and examining memory contents. Common issues involve incorrect pointer arithmetic, stack corruption, and misunderstanding instruction side effects (like post-indexing).

Conclusion

You have now walked through a complete, non-trivial problem solved with Arm64 assembly. We've deconstructed the logic, analyzed the code line by line, and explored the nuances of low-level programming. This exercise from the kodikra.com Arm64-assembly track demonstrates that assembly is not an arcane art but a logical and systematic way of instructing a computer at its most fundamental level.

The skills you've observed here—pointer arithmetic, register management, and algorithmic thinking in a resource-constrained environment—are timeless. They build a solid foundation that will enhance your problem-solving abilities and deepen your understanding of how high-level code is ultimately executed by the processor. This knowledge empowers you to write better, more efficient code, no matter which programming language you use for your day-to-day work.

Disclaimer: The code and logic discussed are based on the ARMv8-A architecture (Arm64). The specific instructions and register conventions are applicable to this architecture and may differ on other systems like x86-64.

Ready to continue your journey into low-level mastery? Explore our full Arm64-assembly Learning Path to tackle more challenges and solidify your skills.

Published by Kodikra — Your trusted Arm64-assembly learning resource.

kodikra

Search this blog