State Of Tic Tac Toe in Arm64-assembly: Complete Solution & Deep Dive Guide

From Zero to Hero: Building a Tic-Tac-Toe State Analyzer in Arm64 Assembly

Determining the state of a Tic-Tac-Toe game in Arm64 assembly involves representing the 3x3 board in memory and using low-level instructions to systematically check all win conditions. The process requires iterating through rows, columns, and diagonals with pointer arithmetic and conditional branches to identify a winner, a draw, or an ongoing game.

Have you ever looked at a high-level language like Python or JavaScript and wondered what’s happening underneath? How do those simple `if` statements and loops translate into the raw electrical signals that a processor understands? For many developers, assembly language is a mysterious, intimidating realm—a black box of cryptic commands best left to compiler engineers and hardware wizards. This perception creates a gap in our understanding, preventing us from truly grasping how software interacts with hardware.

This article shatters that illusion. We will bridge that gap by taking a familiar, simple game—Tic-Tac-Toe—and building a complete game state analyzer from scratch using Arm64 assembly. You will learn to think like the CPU, manipulating memory directly and orchestrating logic with fundamental instructions. By the end, you won't just have a working program; you'll have a profound new appreciation for the elegant machinery that powers all modern software.

What is Game State Analysis in Computing?

At its core, game state analysis is the process of evaluating a snapshot of a game to determine its current status. For Tic-Tac-Toe, this means answering one of four questions given the 3x3 grid:

Has player 'X' won?
Has player 'O' won?
Is the game a draw (no more moves possible and no winner)?
Is the game still in progress?

In a high-level language, you might use a 2D array and nested loops. In Arm64 assembly, we must perform the same logical steps but with much finer control. We'll define a memory layout for the board, load values into registers, compare them, and branch (jump) to different parts of our code based on the results. This is the essence of computation at the hardware level.

Why Use Arm64 Assembly for a Logic Problem?

Let's be clear: you would almost never choose assembly language to write game logic in a commercial project. The development time is significantly longer, and the code is harder to maintain. However, the purpose of this exercise from the kodikra learning path is not production efficiency but deep, foundational learning.

Writing this program in Arm64 assembly forces you to understand:

Memory Layout: How data, like our game board, is physically arranged in memory and how to access it using addresses and offsets.
CPU Registers: The role of registers as the CPU's high-speed workspace for holding data, pointers, and counters.
Instruction Set Architecture (ISA): How simple instructions like `LDR` (load), `CMP` (compare), and `B.EQ` (branch if equal) are the fundamental building blocks of all complex algorithms.
System Calls (syscalls): The standardized way a program requests services from the operating system kernel, such as exiting the program.

This knowledge is invaluable for anyone in systems programming, embedded systems, performance optimization, or cybersecurity. It provides a mental model that demystifies what compilers do behind the scenes.

How to Implement the Tic-Tac-Toe Analyzer

Our implementation strategy involves breaking the problem down into manageable, testable functions. We'll create a main function that acts as an orchestrator, calling specialized functions to check each type of win condition. The final result will be communicated through the program's exit code—a standard convention in shell environments.

Step 1: Representing the Board in Memory

The first step is deciding how to store the 3x3 grid. A simple and efficient way is a flat, one-dimensional array of 9 bytes. We'll define a convention for the values:

0: Empty square
1: Player 'X'
2: Player 'O'

In the .data section of our assembly code, this looks straightforward:


.data
board:
    .byte 2, 1, 1  // O, X, X
    .byte 0, 1, 2  //  , X, O
    .byte 0, 2, 1  //  , O, X

This layout represents a board where 'X' has a potential winning move. Our code will analyze this static board, but the logic can be applied to any board configuration.

Step 2: The Overall Logic Flow

The program will follow a clear, hierarchical process to avoid incorrect conclusions. For instance, we must check for a win *before* checking for a draw, as a board can be full on the final, winning move.

Here is the high-level algorithm our assembly code will implement:

● Start
│
▼
┌───────────────────────────┐
│ get_game_state(board_addr)│
└────────────┬──────────────┘
             │
             ▼
        ◆ Check Rows for Win?
        ╱         ╲
      Yes          No
      │            │
      ▼            ▼
┌─────────────┐ ◆ Check Cols for Win?
│ Return Winner │╱         ╲
└─────────────┘Yes          No
               │            │
               ▼            ▼
         ┌─────────────┐ ◆ Check Diagonals for Win?
         │ Return Winner │╱         ╲
         └─────────────┘Yes          No
                        │            │
                        ▼            ▼
                  ┌─────────────┐ ◆ Is Board Full?
                  │ Return Winner │╱         ╲
                  └─────────────┘Yes          No
                                 │            │
                                 ▼            ▼
                           ┌───────────┐ ┌───────────────┐
                           │ Return Draw │ │ Return Ongoing  │
                           └───────────┘ └───────────────┘
                                 │            │
                                 └─────┬──────┘
                                       │
                                       ▼
                                    ● End

Step 3: The Complete Arm64 Assembly Solution

Below is the full, commented source code. We use the standard Arm64 Application Procedure Call Standard (AAPCS64), where the first argument to a function is passed in register x0, and the return value is placed in w0 (the lower 32 bits of x0).


// tictactoe.s
// Solution for the State of Tic-Tac-Toe module from kodikra.com

.data
// Board representation: 0 = empty, 1 = X, 2 = O
// This board is an example of an ongoing game.
board:
    .byte 1, 2, 0  // X, O, .
    .byte 0, 1, 0  // ., X, .
    .byte 0, 0, 2  // ., ., O

.text
.global _start

// Main entry point of the program
_start:
    // Load the address of our game board into x0, the first argument register.
    ldr x0, =board
    // Call the function to determine the game state.
    bl get_game_state
    // The result is now in w0. Move it to x0 for the exit syscall.
    mov x0, x0
    // Setup the exit system call (syscall number 93).
    mov x8, #93
    // Execute the syscall to exit the program.
    // The exit code will be the value in x0 (our game state).
    svc #0


// get_game_state
// Determines the state of a Tic-Tac-Toe game.
// Input: x0 = address of the 9-byte board array.
// Output: w0 = game state (0: ongoing, 1: X wins, 2: O wins, 3: draw).
get_game_state:
    // Standard function prologue: save the frame pointer and link register.
    stp x29, x30, [sp, #-16]!
    mov x29, sp

    // Preserve registers we will modify (callee-saved).
    stp x19, x20, [sp, #-16]!
    mov x19, x0 // Save the board address in x19.

    // 1. Check for a winner (rows, columns, diagonals).
    bl check_all_wins
    // If w0 is not 0 after the check, a winner was found.
    cmp w0, #0
    b.ne .return_state // Branch to return if winner found.

    // 2. If no winner, check for a draw (is the board full?).
    mov x0, x19 // Restore board address for the function call.
    bl is_board_full
    // is_board_full returns 1 if full (draw), 0 otherwise.
    cmp w0, #1
    b.ne .set_ongoing // If not full, game is ongoing.

    // Board is full and no winner, so it's a draw.
    mov w0, #3 // Return code for draw.
    b .return_state

.set_ongoing:
    // Game is not won and board is not full, so it's ongoing.
    mov w0, #0 // Return code for ongoing.

.return_state:
    // Standard function epilogue: restore saved registers and return.
    ldp x19, x20, [sp], #16
    ldp x29, x30, [sp], #16
    ret

// check_all_wins
// Checks all 8 winning conditions.
// Input: x19 should contain the board address.
// Output: w0 = winner (1 for X, 2 for O), or 0 if no winner.
check_all_wins:
    stp x29, x30, [sp, #-16]!
    mov x29, sp

    // Check Rows
    mov x0, x19
    bl check_rows
    cmp w0, #0
    b.ne .found_winner

    // Check Columns
    mov x0, x19
    bl check_cols
    cmp w0, #0
    b.ne .found_winner

    // Check Diagonals
    mov x0, x19
    bl check_diagonals
    cmp w0, #0
    b.ne .found_winner

    // No winner found
    mov w0, #0
    b .done_checking

.found_winner:
    // w0 already contains the winner (1 or 2).

.done_checking:
    ldp x29, x30, [sp], #16
    ret

// check_rows
// Checks the 3 rows for a winner.
// Input: x0 = board address.
// Output: w0 = winner, or 0.
check_rows:
    // Loop through 3 rows. i = 0, 1, 2. Base offset = i * 3.
    mov w1, #0 // i = 0 (loop counter)
.row_loop:
    cmp w1, #3
    b.ge .no_row_win

    // Calculate base address for the current row.
    mov w2, #3
    mul w3, w1, w2 // offset = i * 3
    add x2, x0, x3   // row_start_addr = board_addr + offset

    // Load the three cells of the row.
    ldrb w4, [x2, #0]
    ldrb w5, [x2, #1]
    ldrb w6, [x2, #2]

    // Check if they are all the same and not empty.
    cmp w4, #0
    b.eq .next_row // Skip if first cell is empty.
    cmp w4, w5
    b.ne .next_row // Not a win if first two differ.
    cmp w4, w6
    b.ne .next_row // Not a win if first and third differ.

    // We have a winner!
    mov w0, w4 // The winner is the value in the cells.
    ret

.next_row:
    add w1, w1, #1
    b .row_loop

.no_row_win:
    mov w0, #0 // No winner found in any row.
    ret

// check_cols
// Checks the 3 columns for a winner.
// Input: x0 = board address.
// Output: w0 = winner, or 0.
check_cols:
    // Loop through 3 columns. i = 0, 1, 2.
    mov w1, #0 // i = 0 (loop counter)
.col_loop:
    cmp w1, #3
    b.ge .no_col_win

    // Load the three cells of the column.
    // Col i: board[i], board[i+3], board[i+6]
    ldrb w4, [x0, x1] // board[i]
    add x2, x0, #3
    ldrb w5, [x2, x1] // board[i+3]
    add x2, x0, #6
    ldrb w6, [x2, x1] // board[i+6]

    // Check if they are all the same and not empty.
    cmp w4, #0
    b.eq .next_col
    cmp w4, w5
    b.ne .next_col
    cmp w4, w6
    b.ne .next_col

    // We have a winner!
    mov w0, w4
    ret

.next_col:
    add w1, w1, #1
    b .col_loop

.no_col_win:
    mov w0, #0
    ret

// check_diagonals
// Checks the 2 diagonals for a winner.
// Input: x0 = board address.
// Output: w0 = winner, or 0.
check_diagonals:
    // Check main diagonal: board[0], board[4], board[8]
    ldrb w1, [x0, #0]
    ldrb w2, [x0, #4]
    ldrb w3, [x0, #8]
    cmp w1, #0
    b.ne .check_diag1_win
    b .check_anti_diag // If first cell empty, can't be a win.

.check_diag1_win:
    cmp w1, w2
    b.ne .check_anti_diag
    cmp w1, w3
    b.ne .check_anti_diag
    // Winner on main diagonal.
    mov w0, w1
    ret

.check_anti_diag:
    // Check anti-diagonal: board[2], board[4], board[6]
    ldrb w1, [x0, #2]
    ldrb w2, [x0, #4]
    ldrb w3, [x0, #6]
    cmp w1, #0
    b.eq .no_diag_win // If first cell empty, can't be a win.
    cmp w1, w2
    b.ne .no_diag_win
    cmp w1, w3
    b.ne .no_diag_win
    // Winner on anti-diagonal.
    mov w0, w1
    ret

.no_diag_win:
    mov w0, #0
    ret

// is_board_full
// Checks if the board has any empty (0) cells.
// Input: x0 = board address.
// Output: w0 = 1 if full, 0 if not full.
is_board_full:
    mov w1, #0 // i = 0 (loop counter)
.full_loop:
    cmp w1, #9
    b.ge .board_is_full // If we checked all 9 cells, it's full.

    ldrb w2, [x0, x1] // Load board[i]
    cmp w2, #0
    b.eq .board_not_full // If we find an empty cell, it's not full.

    add w1, w1, #1
    b .full_loop

.board_not_full:
    mov w0, #0 // Return 0 (not full)
    ret

.board_is_full:
    mov w0, #1 // Return 1 (full)
    ret

Step 4: Assembling and Running the Code

To compile and run this program on an Arm64 system (like a Raspberry Pi 4 running a 64-bit OS, or an Apple Silicon Mac), you use the GNU Assembler (`as`) and Linker (`ld`).

Save the code as tictactoe.s and execute the following commands in your terminal:


# Assemble the .s file into an object file .o
as -o tictactoe.o tictactoe.s

# Link the object file into an executable
ld -o tictactoe tictactoe.o

# Run the executable
./tictactoe

# Check the exit code to see the result
echo $?

The echo $? command prints the exit code of the last executed program. Based on the board defined in our .data section, the output should be 0, indicating the game is ongoing.

Detailed Code Walkthrough

Understanding assembly code requires a line-by-line approach. Let's dissect the most critical parts of the solution.

The `_start` Label and System Calls

Unlike C or Java, assembly programs don't have a main function by default. The linker needs an entry point, which is conventionally named _start. Our _start function is simple: it loads the address of our board into register x0 and calls our main logic function, get_game_state. After the function returns, the result (the game state) is in w0. We then use the svc instruction to make a system call to the Linux kernel. x8 = 93 is the syscall number for exit, and x0 holds the exit code.

Memory Addressing in `check_cols`

The logic for checking columns demonstrates a key assembly concept: memory addressing. To check the first column, we need to access elements at indices 0, 3, and 6.

    ● Start Column Check (i=0)
    │
    ▼
  ┌───────────────────────┐
  │ Base Address (x0)     │
  │ Points to board[0]    │
  └──────────┬────────────┘
             │
             ├─ ldrb w4, [x0, #0]  ⟶  Read board[0]
             │
             ├─ ldrb w5, [x0, #3]  ⟶  Read board[3]
             │
             └─ ldrb w6, [x0, #6]  ⟶  Read board[6]
             │
             ▼
        ◆ w4 == w5 && w4 == w6?
        ╱         ╲
      Yes          No
      │            │
      ▼            ▼
  ┌───────────┐  ┌──────────────────┐
  │ Found Win │  │ Continue to i=1  │
  └───────────┘  └──────────────────┘

In our implementation, we use an index register x1 inside a loop. The instruction ldrb w4, [x0, x1] loads a byte from the address calculated by `(address in x0) + (value in x1)`. For the next cell in the column, we load from a different base address: add x2, x0, #3 gives us the address of board[3], and we can then use the same index x1 to access board[3+i].

Function Prologue and Epilogue

Notice the lines stp x29, x30, [sp, #-16]! and ldp x29, x30, [sp], #16 at the beginning and end of our functions. This is standard procedure.

x30 is the Link Register (lr), which holds the address of where to return after the function finishes. We must save it to the stack so we can call other functions (which would overwrite x30).
x29 is the Frame Pointer (fp), which points to the base of the current function's stack frame. Saving the old one and setting a new one allows for proper stack unwinding and debugging.
sp is the Stack Pointer. We decrement it to allocate space and increment it to deallocate.

This disciplined stack management is essential for writing correct, non-trivial assembly programs.

Pros & Cons: Assembly vs. High-Level Language

To appreciate the trade-offs, let's compare our approach to solving the same problem in a language like Python.

Aspect	Arm64 Assembly	Python (High-Level Language)
Performance	Extremely fast. Direct CPU instructions with no overhead. Unmatched potential for optimization.	Slower due to interpreter overhead, dynamic typing, and abstractions. Sufficiently fast for this problem.
Control	Absolute control over memory, registers, and CPU cycles.	Minimal direct control. The runtime environment and garbage collector manage memory and resources.
Development Time	Very slow. Requires deep knowledge of the architecture and careful manual implementation of all logic.	Very fast. Expressive syntax, rich libraries, and high-level constructs make implementation trivial.
Readability & Maintainability	Low. Code is verbose, platform-specific, and hard for others to understand without expertise.	High. Code is concise, portable, and easy to read, resembling natural language.
Educational Value	Exceptional. Provides a fundamental understanding of how computers work at the lowest level.	Good for learning algorithms and software design principles, but abstracts away hardware details.

Frequently Asked Questions (FAQ)

What is the difference between Arm64 and x86-64 assembly?

They are completely different instruction set architectures (ISAs). Arm64 is a RISC (Reduced Instruction Set Computer) architecture, characterized by a larger number of general-purpose registers and simpler, fixed-length instructions. x86-64 is a CISC (Complex Instruction Set Computer) architecture with fewer registers and more complex, variable-length instructions. Code written for one will not run on the other.

Why not just use a high-level language for everything?

For most applications, you should! But for specific domains like operating system kernels, device drivers, embedded systems, and high-performance computing, the fine-grained control and performance of assembly (or C/C++) are indispensable.

How do you debug Arm64 assembly code?

You can use a debugger like GDB (GNU Debugger). You assemble and link with debug flags (-g), which allows you to step through the code instruction by instruction, inspect register values, and examine memory contents. This is a powerful way to see your logic execute at the CPU level.

What do registers like `x0`, `w0`, and `sp` mean?

Registers are small, extremely fast storage locations inside the CPU. In Arm64:

x0-x30 are 64-bit general-purpose registers. x0-x7 are typically used for function arguments and return values.
w0-w30 refer to the lower 32 bits of the corresponding x register (e.g., w0 is the lower half of x0).
sp is the Stack Pointer, which keeps track of the top of the current program's stack in memory.

Can this code run on a Raspberry Pi?

Yes, absolutely, provided you are running a 64-bit operating system (like Raspberry Pi OS 64-bit) on a compatible model (Raspberry Pi 3, 4, 5, or Zero 2 W). The Arm64 architecture is dominant in the mobile and embedded world.

How could this code be optimized further?

For this simple problem, the code is already very efficient. However, one could explore alternative win-checking algorithms. For example, using bitboards, where each player's positions are represented by bits in a single integer. A win can then be detected by performing bitwise AND operations with pre-calculated winning masks, which can be faster than loops and branches.

What is the "kodikra learning path"?

The problem and concepts discussed here are part of a structured curriculum from kodikra.com, designed to build programming skills from the ground up. This module is a practical exercise within the Arm64 Assembly learning path, which aims to provide hands-on experience with low-level programming.

Conclusion: Beyond the Game

We've successfully built a complete Tic-Tac-Toe state analyzer in Arm64 assembly. While the game itself is simple, the journey has taken us deep into the core of computer science. You've seen how to represent data in memory, manipulate it with CPU registers, control program flow with branches, and interact with the operating system via syscalls. This is the bedrock upon which all other software is built.

The skills you've touched upon here are not just academic. They are the key to unlocking maximum performance, understanding system vulnerabilities, and building the next generation of efficient software for a world increasingly powered by Arm processors. This module is a significant step in your journey to becoming a truly proficient programmer.

To continue exploring low-level development and other advanced topics, check out our complete Arm64 Assembly guide and the other modules available on the kodikra learning platform.

Disclaimer: The assembly code and techniques described in this article are based on the AArch64 instruction set and standard Linux system calls. Behavior may vary on different operating systems or architectures. Always refer to official documentation for the most accurate information.

Published by Kodikra — Your trusted Arm64-assembly learning resource.

kodikra

Search this blog

State Of Tic Tac Toe in Arm64-assembly: Complete Solution & Deep Dive Guide

From Zero to Hero: Building a Tic-Tac-Toe State Analyzer in Arm64 Assembly

What is Game State Analysis in Computing?

Why Use Arm64 Assembly for a Logic Problem?

How to Implement the Tic-Tac-Toe Analyzer

Step 1: Representing the Board in Memory

Step 2: The Overall Logic Flow

Step 3: The Complete Arm64 Assembly Solution

Step 4: Assembling and Running the Code

Detailed Code Walkthrough

The `_start` Label and System Calls

Memory Addressing in `check_cols`

Function Prologue and Epilogue

Pros & Cons: Assembly vs. High-Level Language

Frequently Asked Questions (FAQ)

Conclusion: Beyond the Game

Post a Comment

The Complete Idris Guide: From Zero to Expert

The Complete Odin Guide: From Zero to Expert

The Complete Vlang Guide: From Zero to Expert

The Complete Roc Guide: From Zero to Expert

The Complete D Guide: From Zero to Expert

Kodikra