Rectangles in Arm64-assembly: Complete Solution & Deep Dive Guide

The Ultimate Guide to Counting Rectangles in Arm64 Assembly

Learn to count rectangles in an ASCII diagram using Arm64 assembly. This guide breaks down the brute-force algorithm, memory manipulation, and nested loop structures required to parse a 2D grid and identify geometric shapes at the lowest level of programming for maximum performance and educational insight.

Ever stared at a block of text art and wondered how a computer truly 'sees' it? Not as a picture, but as a raw grid of bytes in memory. Now, imagine the challenge of teaching a machine to find geometric shapes within that grid, like rectangles, using only the most fundamental CPU instructions. It’s a task that strips away all abstraction, forcing you to think like the processor itself. This is the fascinating world of Arm64 assembly, and it's the journey we're embarking on today.

This deep dive isn't just about solving a puzzle; it's about mastering memory access, register management, and algorithmic thinking at the hardware level. By the end of this guide, you will not only have a working Arm64 assembly solution from the exclusive kodikra.com curriculum but also a profound appreciation for the intricate dance between software logic and silicon.

What is the ASCII Rectangle Counting Problem?

The problem is conceptually simple yet implementation-wise complex, especially in a low-level language. You are given a diagram composed of ASCII characters, representing a grid. The goal is to write a function that counts how many valid rectangles are present in this diagram.

A valid rectangle is defined by specific characters:

Corners: Must be a plus sign (+).
Horizontal Edges: Must be composed of minus signs (-) or plus signs (+).
Vertical Edges: Must be composed of vertical bars (|) or plus signs (+).

Consider this example input diagram:


+--+
|  |
+--+

This diagram contains exactly one rectangle. The corners are at (0,0), (0,3), (2,0), and (2,3). The top and bottom edges are formed by -, and the side edges are formed by |.

The challenge escalates with more complex diagrams where rectangles can overlap or be contained within one another. The algorithm must be robust enough to correctly identify and count every single one, from the smallest 1x1 squares to the largest encompassing shapes.

Why Use Arm64 Assembly for This Task?

In a world dominated by high-level languages like Python or JavaScript, choosing to solve this problem in Arm64 assembly might seem unconventional. However, the reasons are both practical and educational, offering benefits you cannot get from abstracted languages.

Unmatched Performance and Control

Assembly language gives you direct, unmediated control over the processor's resources. For a computationally intensive task like this—which can involve many nested loops—eliminating the overhead of an interpreter or a virtual machine can lead to significant performance gains. Every memory access, every comparison, and every loop is an explicit instruction you write, allowing for micro-optimizations that are impossible at a higher level.

Deepening Your Computer Science Fundamentals

Writing this solution in assembly forces you to confront core computer science concepts head-on. You will manually manage memory pointers, allocate and use registers efficiently, and understand how a 2D data structure is laid out in linear memory. This experience provides an invaluable foundation, making you a better programmer even when you return to high-level languages.

Relevance in Specialized Domains

While not for everyday web development, low-level programming is critical in many fields. This includes embedded systems, operating system development, high-performance computing (HPC), game engine optimization, and security research. The skills honed by solving problems like this are directly transferable to these demanding domains.

How to Solve It: The Algorithmic Approach

To solve this problem, we will employ a robust, if computationally intensive, brute-force algorithm. The strategy is to systematically check every possible pair of corners to see if they form a valid rectangle. This method is clear, logical, and translates well to the procedural nature of assembly language.

The core logic can be broken down into these steps:

Iterate for Top-Left Corner: We'll use two nested loops to iterate through every cell (r1, c1) of the grid. If the character at this cell is a +, we consider it a potential top-left corner of a rectangle.
Iterate for Bottom-Right Corner: For each potential top-left corner, we start another two nested loops to find a potential bottom-right corner (r2, c2). The conditions are that r2 > r1, c2 > c1, and the character at (r2, c2) must also be a +.
Validate Remaining Corners: Once we have a pair of diagonal corners, we must check if the other two corners exist. The character at the potential top-right (r1, c2) and bottom-left (r2, c1) must also be a +. If not, this pair cannot form a rectangle, and we move on.
Validate the Edges: If all four corners are present, the final step is to validate the four edges connecting them.
- The top edge (from c1+1 to c2-1 at row r1) must only contain - or +.
- The bottom edge (from c1+1 to c2-1 at row r2) must only contain - or +.
- The left edge (from r1+1 to r2-1 at column c1) must only contain | or +.
- The right edge (from r1+1 to r2-1 at column c2) must only contain | or +.
Increment Counter: If all four corners and all four edges are valid, we have found a rectangle. We increment our rectangle counter.

This process is repeated until all possible corner pairs have been checked. The final value in our counter is the answer.

High-Level Algorithm Flow

This ASCII diagram illustrates the nested loop structure and validation logic of our approach.

    ● Start
    │
    ▼
  ┌───────────────────────────────┐
  │ Initialize rectangle_count = 0 │
  └───────────────┬───────────────┘
                  │
                  ▼
  Loop r1 from 0 to height-1
  └───────────────┬───────────────┘
                  │
                  ▼
      Loop c1 from 0 to width-1
      └───────────┬───────────┘
                  │
                  ▼
            ◆ grid[r1][c1] == '+' ?
           ╱                     ╲
          Yes                     No
          │                       │
          ▼                       └─────────────────┐
Loop r2 from r1+1 to height-1                       │
└─────────┬───────────────┘                       │
          │                                         │
          ▼                                         │
Loop c2 from c1+1 to width-1                        │
└─────────┬───────────────┘                       │
          │                                         │
          ▼                                         │
    ◆ IsValidRectangle(r1,c1,r2,c2) ?               │
   ╱                               ╲              │
  Yes                               No            │
  │                                 │             │
  ▼                                 ▼             │
┌─────────────────┐             (Continue c2 loop)│
│ Increment count │                 │             │
└─────────────────┘                 │             │
          │                         │             │
          └──────────┬──────────────┘             │
                     ▼                            │
                (Continue c1 loop) ◀──────────────┘
                     │
                     ▼
                (Continue r1 loop)
                     │
                     ▼
    ● End (Return count)

The Complete Arm64 Assembly Solution

Here is the full, commented source code for solving the rectangle counting problem. This solution is designed for the Linux `aarch64` ABI, where arguments are passed in registers `x0`, `x1`, etc., and the result is returned in `w0`.

The function signature in a C-like language would be: int count_rectangles(const char** lines, int height);


// Solution for the Rectangle Counting module from the kodikra.com curriculum.
// Language: Arm64 Assembly (aarch64)

.data
.text
.global count_rectangles

// function: count_rectangles
// arguments:
//   x0: address of an array of strings (char**) representing the grid
//   x1: height of the grid (number of rows)
// returns:
//   w0: the total count of rectangles

count_rectangles:
    // --- Function Prologue ---
    stp x29, x30, [sp, #-112]!   // Store frame pointer and link register
    stp x19, x20, [sp, #16]     // Callee-saved registers for loop counters/pointers
    stp x21, x22, [sp, #32]
    stp x23, x24, [sp, #48]
    stp x25, x26, [sp, #64]
    stp x27, x28, [sp, #80]
    mov x29, sp                 // Set up new frame pointer

    // --- Register Allocation Plan ---
    // x19: char** lines (base address of the grid)
    // x20: height
    // x21: width (we'll calculate this once)
    // x22: r1 (outer loop row)
    // x23: c1 (outer loop col)
    // x24: r2 (inner loop row)
    // x25: c2 (inner loop col)
    // x26: rectangle_count

    mov x19, x0                 // Save lines pointer
    mov x20, x1                 // Save height
    mov x26, xzr                // Initialize rectangle_count = 0

    // --- Calculate Width ---
    // We assume all lines have the same length.
    // We calculate it by calling strlen on the first line.
    cbz x20, .L_exit_early      // If height is 0, no rectangles
    ldr x0, [x19]               // Load address of the first string (lines[0])
    bl strlen                   // Call strlen, result is in x0
    mov x21, x0                 // Store width in x21
    cbz x21, .L_exit_early      // If width is 0, no rectangles

    // --- Main Loops ---
    // for (r1 = 0; r1 < height; r1++)
    mov x22, xzr                // r1 = 0
.L_r1_loop:
    cmp x22, x20                // if r1 >= height, exit loop
    b.ge .L_r1_loop_end

    // for (c1 = 0; c1 < width; c1++)
    mov x23, xzr                // c1 = 0
.L_c1_loop:
    cmp x23, x21                // if c1 >= width, exit loop
    b.ge .L_c1_loop_end

    // Check if grid[r1][c1] is a top-left corner ('+')
    // Get character at (r1, c1)
    ldr x0, [x19, x22, lsl 3]   // x0 = lines[r1]
    ldrb w1, [x0, x23]          // w1 = lines[r1][c1]
    cmp w1, #'+'
    b.ne .L_c1_loop_continue    // If not '+', it can't be a top-left corner

    // It's a potential top-left corner. Now find a bottom-right.
    // for (r2 = r1 + 1; r2 < height; r2++)
    add x24, x22, #1            // r2 = r1 + 1
.L_r2_loop:
    cmp x24, x20                // if r2 >= height, exit loop
    b.ge .L_r2_loop_end

    // for (c2 = c1 + 1; c2 < width; c2++)
    add x25, x23, #1            // c2 = c1 + 1
.L_c2_loop:
    cmp x25, x21                // if c2 >= width, exit loop
    b.ge .L_c2_loop_end

    // Now we have a candidate rectangle defined by (r1,c1) and (r2,c2)
    // We need to validate all four corners and four edges.

    // --- Corner Validation ---
    // 1. Check grid[r1][c2] (top-right)
    ldr x0, [x19, x22, lsl 3]   // x0 = lines[r1]
    ldrb w1, [x0, x25]          // w1 = lines[r1][c2]
    cmp w1, #'+'
    b.ne .L_c2_loop_continue    // If not '+', fail

    // 2. Check grid[r2][c1] (bottom-left)
    ldr x0, [x19, x24, lsl 3]   // x0 = lines[r2]
    ldrb w1, [x0, x23]          // w1 = lines[r2][c1]
    cmp w1, #'+'
    b.ne .L_c2_loop_continue    // If not '+', fail

    // 3. Check grid[r2][c2] (bottom-right)
    // x0 still holds lines[r2]
    ldrb w1, [x0, x25]          // w1 = lines[r2][c2]
    cmp w1, #'+'
    b.ne .L_c2_loop_continue    // If not '+', fail

    // All 4 corners are '+'. Now validate edges.

    // --- Edge Validation ---
    // Validate top edge: for (c = c1 + 1; c < c2; c++)
    ldr x0, [x19, x22, lsl 3]   // x0 = lines[r1]
    add x2, x23, #1             // c = c1 + 1
.L_top_edge_loop:
    cmp x2, x25                 // if c >= c2, loop ends
    b.ge .L_top_edge_valid
    ldrb w1, [x0, x2]           // w1 = char
    cmp w1, #'+'
    b.eq .L_top_edge_continue
    cmp w1, #'-'
    b.ne .L_c2_loop_continue    // Invalid char, fail
.L_top_edge_continue:
    add x2, x2, #1
    b .L_top_edge_loop
.L_top_edge_valid:

    // Validate bottom edge: for (c = c1 + 1; c < c2; c++)
    ldr x0, [x19, x24, lsl 3]   // x0 = lines[r2]
    add x2, x23, #1             // c = c1 + 1
.L_bottom_edge_loop:
    cmp x2, x25
    b.ge .L_bottom_edge_valid
    ldrb w1, [x0, x2]
    cmp w1, #'+'
    b.eq .L_bottom_edge_continue
    cmp w1, #'-'
    b.ne .L_c2_loop_continue
.L_bottom_edge_continue:
    add x2, x2, #1
    b .L_bottom_edge_loop
.L_bottom_edge_valid:

    // Validate left edge: for (r = r1 + 1; r < r2; r++)
    add x2, x22, #1             // r = r1 + 1
.L_left_edge_loop:
    cmp x2, x24
    b.ge .L_left_edge_valid
    ldr x0, [x19, x2, lsl 3]    // x0 = lines[r]
    ldrb w1, [x0, x23]          // w1 = lines[r][c1]
    cmp w1, #'+'
    b.eq .L_left_edge_continue
    cmp w1, #'|'
    b.ne .L_c2_loop_continue
.L_left_edge_continue:
    add x2, x2, #1
    b .L_left_edge_loop
.L_left_edge_valid:

    // Validate right edge: for (r = r1 + 1; r < r2; r++)
    add x2, x22, #1             // r = r1 + 1
.L_right_edge_loop:
    cmp x2, x24
    b.ge .L_right_edge_valid
    ldr x0, [x19, x2, lsl 3]    // x0 = lines[r]
    ldrb w1, [x0, x25]          // w1 = lines[r][c2]
    cmp w1, #'+'
    b.eq .L_right_edge_continue
    cmp w1, #'|'
    b.ne .L_c2_loop_continue
.L_right_edge_continue:
    add x2, x2, #1
    b .L_right_edge_loop
.L_right_edge_valid:

    // --- Success! Found a valid rectangle ---
    add x26, x26, #1            // rectangle_count++

.L_c2_loop_continue:
    add x25, x25, #1            // c2++
    b .L_c2_loop
.L_c2_loop_end:
    add x24, x24, #1            // r2++
    b .L_r2_loop
.L_r2_loop_end:

.L_c1_loop_continue:
    add x23, x23, #1            // c1++
    b .L_c1_loop
.L_c1_loop_end:
    add x22, x22, #1            // r1++
    b .L_r1_loop
.L_r1_loop_end:

.L_exit_early:
    // --- Function Epilogue ---
    mov w0, w26                 // Move final count to return register
    ldp x19, x20, [sp, #16]     // Restore callee-saved registers
    ldp x21, x22, [sp, #32]
    ldp x23, x24, [sp, #48]
    ldp x25, x26, [sp, #64]
    ldp x27, x28, [sp, #80]
    ldp x29, x30, [sp], #112   // Restore frame pointer, link register, and stack
    ret

// Note: A standard 'strlen' implementation is assumed to be available
// for linking. For a self-contained example, it would be included here.

Detailed Code Walkthrough

Understanding assembly code requires breaking it down line by line. Let's dissect the key sections of the solution.

Function Prologue and Setup


stp x29, x30, [sp, #-112]!
stp x19, x20, [sp, #16]
...
mov x29, sp

This is standard procedure. We save the frame pointer (x29) and link register (x30) to the stack. We also save all the callee-saved registers (x19-x28) that we plan to use, as the ARM64 ABI requires us to preserve their original values for the calling function. We then establish our own stack frame.


mov x19, x0                 // Save lines pointer
mov x20, x1                 // Save height
mov x26, xzr                // Initialize rectangle_count = 0

We move the function arguments from x0 and x1 into our callee-saved registers (x19, x20) so they are safe from being overwritten by function calls (like `strlen`). x26 is designated as our rectangle counter and initialized to zero (xzr is the zero register).

Calculating Grid Dimensions


ldr x0, [x19]               // Load address of the first string (lines[0])
bl strlen                   // Call strlen, result is in x0
mov x21, x0                 // Store width in x21

The function only receives the height. We must determine the width. Assuming a rectangular grid, all lines have the same length. We load the address of the first line (lines[0]) into x0 and call the standard C library function `strlen`. The result (the length) is returned in x0, which we then move to x21 for safekeeping.

Navigating the Grid: Memory Access

The most crucial operation is accessing a character at a specific (row, col). This is a two-step process in assembly:


// Get character at (r, c) - e.g., r in x22, c in x23
ldr x0, [x19, x22, lsl 3]   // Step 1: Get line pointer
ldrb w1, [x0, x23]          // Step 2: Get character from line

ldr x0, [x19, x22, lsl 3]: This instruction calculates the address of the pointer to the desired row. x19 is the base address of the `lines` array (a `char**`). x22 holds the row index `r1`. We multiply `r1` by 8 (lsl 3 means "logical shift left by 3 bits," which is equivalent to `* 2^3`) because each pointer on a 64-bit system is 8 bytes. The result, `lines[r1]`, is loaded into x0.
ldrb w1, [x0, x23]: Now x0 holds the address of the string for the correct row. We use `ldrb` (Load Register Byte) to get the character at the column index `c1` (held in x23). The byte is loaded into the 32-bit register w1.

The Validation Logic Flow

Once four potential corners `(r1,c1)`, `(r1,c2)`, `(r2,c1)`, and `(r2,c2)` are identified as `+` characters, the code proceeds to validate the four connecting edges. This process is methodical and repetitive, making it a good fit for assembly.

This diagram shows the flow for the validation part of the algorithm.

    ● Begin Validation for (r1,c1,r2,c2)
    │
    ▼
  ┌──────────────────────────┐
  │ Check grid[r1][c2] == '+' │
  └────────────┬─────────────┘
               │
               ▼
  ┌──────────────────────────┐
  │ Check grid[r2][c1] == '+' │
  └────────────┬─────────────┘
               │
               ▼
  ┌──────────────────────────┐
  │ Check grid[r2][c2] == '+' │
  └────────────┬─────────────┘
               │
               ▼
           ◆ All Corners OK?
          ╱                 ╲
        Yes                  No ⟶ [Fail & Continue Outer Loops]
         │
         ▼
  ┌──────────────────────────┐
  │ Loop & Validate Top Edge   │
  └────────────┬─────────────┘
               │
               ▼
  ┌──────────────────────────┐
  │ Loop & Validate Bottom Edge│
  └────────────┬─────────────┘
               │
               ▼
  ┌──────────────────────────┐
  │ Loop & Validate Left Edge  │
  └────────────┬─────────────┘
               │
               ▼
  ┌──────────────────────────┐
  │ Loop & Validate Right Edge │
  └────────────┬─────────────┘
               │
               ▼
           ◆ All Edges OK?
          ╱                 ╲
        Yes                  No ⟶ [Fail & Continue Outer Loops]
         │
         ▼
  ┌──────────────────┐
  │ Increment Count! │
  └──────────────────┘

Each edge validation is a tight loop. For example, the top edge validation:


.L_top_edge_loop:
    cmp x2, x25                 // Compare current column 'c' with 'c2'
    b.ge .L_top_edge_valid      // If c >= c2, edge is valid
    ldrb w1, [x0, x2]           // Load character
    cmp w1, #'+'
    b.eq .L_top_edge_continue   // It's a '+', which is valid
    cmp w1, #'-'
    b.ne .L_c2_loop_continue    // Not a '-' either, so it's invalid. Fail.
.L_top_edge_continue:
    add x2, x2, #1              // c++
    b .L_top_edge_loop

This pattern of load, compare, and branch is the fundamental building block of algorithms in assembly. If any check fails at any point, we use a branch instruction (like b.ne .L_c2_loop_continue) to immediately abandon the current candidate and move to the next iteration of the `c2` loop, saving countless unnecessary computations.

Alternative Approaches and Performance Considerations

The brute-force O(H² * W²) algorithm (where H is height and W is width) is clear and educational, but it's not the most performant for very large grids. It's important to be aware of other strategies.

Vertical Line Segment Matching: A more optimized approach involves iterating through the grid and identifying all pairs of vertical lines (`|`) that are aligned. For each pair of vertical lines at columns `c1` and `c2`, you would then scan down the rows. When you encounter a row with `+` at both `c1` and `c2`, you have a potential top edge. The next row with `+` at both columns forms the bottom edge, completing a rectangle. This can reduce the complexity significantly.
Dynamic Programming: One could use DP to pre-calculate information about the grid, such as the length of continuous horizontal or vertical segments starting at each cell. This information could then be used to construct rectangles more efficiently.

However, implementing these more complex algorithms in assembly adds a tremendous amount of code and logical complexity, often obscuring the learning objectives related to basic memory access and control flow. For the purpose of mastering assembly fundamentals, the brute-force method is an excellent choice.

Pros and Cons of the Assembly Approach

Choosing your implementation language always involves trade-offs. Here's how our assembly solution stacks up against a high-level equivalent (e.g., in Python).

Aspect	Solving in Arm64 Assembly	Solving in a High-Level Language
Performance	Extremely high. Direct CPU control, no overhead. Ideal for performance-critical scenarios.	Significantly slower due to interpreter/VM overhead, dynamic typing, and garbage collection.
Development Time	Very high. Manual register and memory management is time-consuming and error-prone.	Very low. Expressive syntax and rich libraries allow for rapid development.
Readability & Maintainability	Low. The logic is obscured by low-level details. Requires extensive commenting to be understood.	High. The code is often self-documenting and easy for other developers to understand.
Portability	None. The code is specific to the Arm64 architecture and Linux ABI.	Excellent. The code can run on any platform with the corresponding runtime environment.
Learning Value	Exceptional for understanding computer architecture, memory layout, and how software interacts with hardware.	Excellent for learning algorithmic concepts and general problem-solving without low-level distractions.

Frequently Asked Questions (FAQ)

1. How is the 2D grid (char**) represented in memory?: A char** is a pointer to a pointer. In memory, you have a contiguous block of memory (the array) where each element is an 8-byte pointer. Each of these pointers, in turn, points to another location in memory where the actual null-terminated string for that row is stored. Our code navigates this by first finding the correct row pointer and then accessing the character within that string.
2. What is the difference between the `ldr` and `ldrb` instructions?: ldr stands for "Load Register" and is used to load a full register's worth of data (typically 4 or 8 bytes). We use it to load an 8-byte pointer. ldrb stands for "Load Register Byte" and it loads a single byte (8 bits) from memory into the destination register, zero-extending it. We use it to load individual characters like `+` or `-`.
3. Why are registers `x19` through `x28` used for loops instead of `x0`, `x1`, etc.?: In the Arm64 ABI, registers `x0-x18` are "caller-saved" or temporary registers. This means a function you call (like `strlen`) is free to change their values. Registers `x19-x28` are "callee-saved." This means if our function wants to use them, it must first save their original values to the stack and restore them before returning. By using these for our main loop variables, we ensure their values persist across function calls.
4. Could this assembly code be made even faster?: Yes, micro-optimizations are possible. One could unroll some of the inner loops to reduce branching overhead, or use advanced SIMD (NEON) instructions to validate sections of edges simultaneously. However, these techniques dramatically increase code complexity for diminishing returns on most inputs.
5. How would I compile and run this code?: You would typically write a C "driver" program that prepares the input data (the `char** grid`) and calls your assembly function. You would compile the assembly file (`.s`) with an assembler like `as` or `gcc`, compile the C file with `gcc`, and then link them together into a single executable. For example: `gcc -o program main.c rectangles.s`.
6. Why not just check for four '+' signs and assume it's a rectangle?: Because the problem requires validating the connecting lines. Four `+` signs might form a diamond or be disconnected entirely. The edge validation step is crucial to ensure the shape is a true, contiguous rectangle according to the rules of the problem.
7. Is this algorithm practical for real-time image processing?: For small, text-based grids, absolutely. For large, high-resolution images, this O(N⁴) style algorithm would be too slow. Real-world image processing uses highly optimized algorithms (like the Viola-Jones object detection framework) and often leverages dedicated hardware (GPUs) for parallel processing.

Conclusion: From Logic to Machine Code

We have successfully journeyed from a high-level problem description to a fully functional, low-level Arm64 assembly solution. By meticulously managing registers, calculating memory offsets, and translating algorithmic logic into raw instructions, we've solved the rectangle counting challenge in a way that offers maximum performance and unparalleled insight into the machine's operation.

While you may not write your next web application in assembly, the principles learned here—of memory layout, processor behavior, and the cost of operations—are universally applicable. They build a stronger mental model of how software runs, empowering you to write more efficient and robust code in any language. This module from the kodikra learning path is a testament to the power of understanding computation from the ground up.

Technology Disclaimer: The code and concepts discussed are based on the Armv8-A architecture (aarch64) and the standard Linux ABI. Assembly syntax and system call conventions may differ on other operating systems or architectures.

Continue your journey through the kodikra Arm64-assembly learning path to tackle even more complex challenges. Or, explore more advanced Arm64-assembly concepts in our complete guide.

Published by Kodikra — Your trusted Arm64-assembly learning resource.

kodikra

Search this blog