Roman Numerals in Arm64-assembly: Complete Solution & Deep Dive Guide

a close up of a sign with a lot of dots on it

Mastering Roman Numerals in Arm64 Assembly: From Zero to Hero

Discover how to convert Arabic numbers into their Roman numeral counterparts using Arm64 assembly language. This guide breaks down the subtractive conversion algorithm, implementing it with powerful macros to handle symbols like 'M' (1000) and subtractive pairs like 'CM' (900), processing the input number from largest to smallest value.

You’ve seen them etched on the faces of grand clocks, at the end of a monarch’s name, or in the credits of a classic film. Roman numerals, with their elegant composition of letters, are a relic of a powerful empire. But have you ever wondered how a modern computer, built on binary logic, could be taught to speak this ancient numerical language? It feels like a task for a high-level language like Python or Java, not the raw, close-to-the-metal world of assembly.

The challenge of implementing an algorithm like this in Arm64 assembly can seem intimidating. You're not just dealing with logic; you're managing memory pointers and registers directly. This guide is here to dissolve that complexity. We will walk you through the entire process, from understanding the rules of Roman numerals to writing and dissecting a clean, efficient Arm64 assembly solution. By the end, you'll not only have solved the problem but will have gained a much deeper appreciation for how computation works at its core.

What Are Roman Numerals? The Ancient System Explained

Before we can write a single line of code, we must first understand the system we're trying to replicate. Roman numerals are a base-10 numeral system that originated in ancient Rome. Unlike the Arabic numerals we use daily (0-9), this system uses a combination of letters from the Latin alphabet to signify values.

The Core Symbols

The system is built upon seven fundamental symbols, each with a specific integer value. All other numbers are constructed using a combination of these seven letters.

Roman Symbol	Arabic Value
I	1
V	5
X	10
L	50
C	100
D	500
M	1000

The Rules of Combination

Numbers are formed by combining these symbols and adding their values. This is typically done by placing symbols from left to right in order of decreasing value. For example, the number 16 is written as XVI (10 + 5 + 1).

However, there's a crucial exception to this additive rule: the subtractive principle. To avoid repeating a symbol four times (like IIII for 4), a smaller value symbol is placed before a larger value symbol. When this happens, the smaller value is subtracted from the larger one.

This subtractive rule only applies to six specific pairs:

IV = 4 (5 - 1)
IX = 9 (10 - 1)
XL = 40 (50 - 10)
XC = 90 (100 - 10)
CD = 400 (500 - 100)
CM = 900 (1000 - 100)

This module from the kodikra learning path focuses on the traditional system, which can represent any integer from 1 to 3,999 (MMMCMXCIX). Understanding these rules is the key to designing our algorithm.

Why Use Arm64 Assembly for This Task?

In a world dominated by high-level languages, choosing Arm64 assembly for a numeral conversion task might seem unconventional. However, this choice offers profound educational and practical benefits, especially for aspiring systems programmers, embedded engineers, and performance enthusiasts.

First, it forces a deep understanding of the algorithm. You cannot rely on built-in libraries or complex data structures. You must think in terms of basic arithmetic, comparisons, and memory manipulation. This builds a foundational knowledge that makes you a better programmer in any language.

Second, it provides direct insight into CPU architecture. You will be working with registers—the CPU's immediate workspace—and learning how instructions like mov (move), cmp (compare), and sub (subtract) operate on data. This is invaluable for performance-critical applications where every clock cycle counts.

Finally, the ARM architecture is ubiquitous. From the smartphone in your pocket to Apple's M-series chips and high-performance servers like AWS Graviton, Arm64 is a dominant force in modern computing. Gaining proficiency in its instruction set is a highly relevant and future-proof skill.

How the Conversion Algorithm Works: A Greedy Approach

The most straightforward and effective algorithm for this conversion is a "greedy" subtractive method. The core idea is to process the Arabic number from the largest possible Roman numeral value down to the smallest. At each step, we check if the current Roman numeral's value can be subtracted from our number. If it can, we append the corresponding Roman symbol(s) to our result string and update the number. We repeat this until the number becomes zero.

The key is the order of operations. We must check for the subtractive pairs (like 900) before their constituent larger parts (like 500 or 100). Otherwise, a number like 900 would incorrectly become DCCCC instead of the correct CM.

Here is a logical flow diagram of the algorithm:

    ● Start (Input: Arabic Number `N`, Output: String Buffer `S`)
    │
    ▼
  ┌───────────────────────────┐
  │ Check if N >= 1000 ('M')  │
  └────────────┬──────────────┘
               │
    Yes ╱──────┴──────╲ No
    ▼                  ▼
┌─────────────────┐  ┌────────────────────────┐
│ Append 'M' to S │  │ Check if N >= 900 ('CM') │
│ N = N - 1000    │  └───────────┬────────────┘
│ Loop back to    │              │
│ check for 1000  │   Yes ╱──────┴──────╲ No
└─────────────────┘   ▼                  ▼
                  ┌─────────────────┐  ┌───────────────────────┐
                  │ Append 'CM' to S│  │ Check if N >= 500 ('D') │
                  │ N = N - 900     │  └──────────┬────────────┘
                  └─────────────────┘             │
                                                  ▼
                                                [...]
                                                  │
                                                  ▼
                                        ┌────────────────────┐
                                        │ Check if N >= 1 ('I')  │
                                        └──────────┬───────────┘
                                                   │
                                        Yes ╱──────┴──────╲ No
                                        ▼                  ▼
                                  ┌─────────────────┐    ┌──────────┐
                                  │ Append 'I' to S │    │ N is now 0 │
                                  │ N = N - 1       │    └─────┬────┘
                                  │ Loop back       │          │
                                  └─────────────────┘          ▼
                                                             ● End

This top-down, greedy approach ensures correctness by prioritizing the largest possible chunks, including the special subtractive pairs, at each stage of the conversion.

Where the Logic is Implemented: An Arm64 Code Walkthrough

Now, let's translate this logic into Arm64 assembly code. The solution provided in our exclusive kodikra.com curriculum uses a clever technique involving macros to avoid repetitive code and improve readability.

According to the Arm64 procedure call standard (AAPCS64), the first few arguments to a function are passed in registers. For our function, we'll assume:

x0: The memory address of the output buffer (where we will write the Roman numeral string).
x1: The input Arabic number to be converted.

The Core Building Blocks: Macros

A macro is a template that allows you to define a block of code and reuse it with different parameters. It's a powerful feature of assemblers that helps keep code DRY (Don't Repeat Yourself).

The `SINGLE` Macro

This macro handles the standard Roman numerals that are represented by a single character (M, D, C, L, V, I).


.macro SINGLE name, value, numeral
    mov x2, \value
    mov w3, \numeral
.compare_\name:
    cmp x1, x2
    blt .end_\name
    strb w3, [x0], #1
    sub x1, x1, x2
    b .compare_\name
.end_\name:
.endm

Let's break this down line-by-line:

.macro SINGLE name, value, numeral: Defines a macro named SINGLE that takes three arguments: a unique name for labels, the Arabic value, and the character numeral.
mov x2, \value: Moves the Arabic value (e.g., 1000) into register x2. We use x2 as a temporary register to hold the value we are currently checking against.
mov w3, \numeral: Moves the character numeral (e.g., 'M') into the lower 32 bits of register x3 (w3). We only need a byte for the character, so a 32-bit register is sufficient.
.compare_\name:: This is a local label. The \name syntax ensures the label is unique for each macro invocation (e.g., .compare_m, .compare_d).
cmp x1, x2: Compares the remaining Arabic number in x1 with the current value in x2. This operation sets internal CPU flags based on the result (e.g., is x1 greater than, less than, or equal to x2?).
blt .end_\name: "Branch if Less Than". If the comparison shows that our number (x1) is less than the current value (x2), we can't subtract it. So, we jump to the end of this macro's logic.
strb w3, [x0], #1: This is the most critical instruction. strb means "Store Byte". It takes the byte from w3 (our Roman numeral character) and stores it at the memory location pointed to by x0. The [x0], #1 part is a post-index addressing mode, which means after storing the byte, it automatically increments the pointer in x0 by 1. This makes our buffer ready for the next character.
sub x1, x1, x2: Subtracts the value in x2 from our number in x1, storing the result back in x1. This updates the remaining value to be converted.
b .compare_\name: "Branch". This is an unconditional jump back to the .compare_\name label. This creates a loop that will continue to subtract the same value (e.g., 1000) as long as the remaining number is large enough.
.end_\name:: The exit point for this macro's logic.

Here is an ASCII diagram illustrating the logic within the `SINGLE` macro:

    ● Macro Start (x1=Number, x0=BufferPtr)
    │
    ▼
  ┌──────────────────┐
  │ mov value -> x2  │
  │ mov char -> w3   │
  └────────┬─────────┘
           │
           ▼
    ◆ Loop: Cmp x1, x2 (Is Number >= value?)
   ╱                  ╲
  Yes (Greater/Equal)  No (Less Than)
  │                     │
  ▼                     ▼
┌───────────────────┐   ● Macro End
│ strb w3, [x0], #1 │
│ (Write char,      │
│  inc BufferPtr)   │
└────────┬──────────┘
         │
         ▼
┌───────────────────┐
│ sub x1, x1, x2    │
│ (Number -= value) │
└────────┬──────────┘
         │
         └───────────⟶ Back to Loop ◆

The `DOUBLE` Macro

This macro is a clever extension for handling the subtractive pairs (CM, CD, XC, XL, IX, IV).


.macro DOUBLE name, value, first, second
    mov x2, \value
.compare_\name:
    cmp x1, x2
    blt .end_\name
    mov w3, \first
    mov w4, \second
    strb w3, [x0], #1
    strb w4, [x0], #1
    sub x1, x1, x2
    b .compare_\name
.end_\name:
.endm

This is very similar to `SINGLE`, with a few key differences:

It takes four arguments, including two characters: `first` and `second`.
mov w3, \first and mov w4, \second: It loads both characters of the pair (e.g., 'C' and 'M') into two separate registers, w3 and w4.
strb w3, [x0], #1 and strb w4, [x0], #1: It performs two consecutive byte stores, writing the first character and then the second, advancing the buffer pointer each time.

Crucially, this macro doesn't need to loop. A subtractive pair like 'CM' (900) can only ever appear once for a given magnitude. For example, you can't have 1800 be 'CMCM'. However, the provided solution uses a loop structure for consistency with the `SINGLE` macro. A more optimized version might remove the loop, but this implementation is clear and functional.

Putting It All Together: The Main Function

The main function body is now incredibly simple and readable. It's just a sequence of macro calls, ordered from the largest value to the smallest, perfectly implementing our greedy algorithm.


.global to_roman_numeral
.text

to_roman_numeral:
    SINGLE m, 1000, 'M'
    DOUBLE cm, 900, 'C', 'M'
    SINGLE d, 500, 'D'
    DOUBLE cd, 400, 'C', 'D'
    SINGLE c, 100, 'C'
    DOUBLE xc, 90, 'X', 'C'
    SINGLE l, 50, 'L'
    DOUBLE xl, 40, 'X', 'L'
    SINGLE x, 10, 'X'
    DOUBLE ix, 9, 'I', 'X'
    SINGLE v, 5, 'V'
    DOUBLE iv, 4, 'I', 'V'
    SINGLE i, 1, 'I'

    mov w3, #0
    strb w3, [x0]

    ret

The execution flows sequentially:

The code first checks for thousands (`M`).
Then, it checks for nine hundreds (`CM`).
Then, five hundreds (`D`), and so on.
mov w3, #0: After all checks are done, this moves the null value (0) into w3.
strb w3, [x0]: This stores the null byte at the end of the string, terminating it. This is crucial for C-style strings so that functions like `printf` know where the string ends.
ret: This instruction returns control to the calling function.

When to Consider Optimization: Pros, Cons, and Alternatives

The macro-based solution is elegant and highly readable for assembly code. However, it's worth analyzing its trade-offs and considering alternative approaches for different scenarios.

Analysis of the Macro-Based Approach

Pros	Cons
Readability: The main function is very clear and declarative. It reads like a description of the algorithm.	Code Size: Each macro call expands into a block of assembly instructions, which can lead to a larger binary file (code bloat).
Maintainability: If you need to change the logic for handling a numeral, you only need to edit the macro definition once.	Instruction Cache Performance: Larger code size can potentially lead to more instruction cache misses, though for a small function like this, the effect is likely negligible.
Simplicity: It avoids complex data structures or memory lookups, relying purely on sequential logic.	Minor Inefficiency: The `DOUBLE` macro includes a loop that is technically unnecessary, as subtractive pairs are never repeated consecutively.

An Optimized Alternative: The Lookup Table (LUT) Approach

A more common and scalable approach in systems programming is to use a lookup table. We can define a table in memory that stores pairs of (Arabic value, Roman string).

The algorithm would then be:

Iterate through the lookup table from the largest value to the smallest.
For each entry, loop and subtract its value from the input number as many times as possible.
For each successful subtraction, copy the corresponding Roman string to the output buffer.

Here’s what that might look like conceptually in Arm64 assembly:


.section .rodata
// Lookup table
roman_map:
    .quad 1000
    .asciz "M"
    .quad 900
    .asciz "CM"
    .quad 500
    .asciz "D"
    // ... and so on for all 13 values
    .quad 1
    .asciz "I"
    .quad 0 // End of table marker

.text
.global to_roman_numeral_lut

to_roman_numeral_lut:
    // x0 = output buffer, x1 = input number
    adrp x2, roman_map // Load address of the map
    add x2, x2, :lo12:roman_map

loop_table:
    ldp x3, x4, [x2], #16 // Load value (x3) and string pointer (x4)
                         // Also, advance table pointer x2 by 16 bytes
    
    cbz x3, end_function // If value is 0, we're at the end of the table

loop_subtract:
    cmp x1, x3 // Compare remaining number with table value
    blt loop_table // If less, move to next table entry

    // If greater or equal, copy the string
    // This part requires a small string copy loop (e.g., using ldrb/strb)
    // ... string copy logic here ...

    sub x1, x1, x3 // Subtract the value
    b loop_subtract // Loop back to check again

end_function:
    // Null-terminate the string in x0
    mov w3, #0
    strb w3, [x0]
    ret

This lookup table approach is often more efficient for larger, more complex mapping problems. It centralizes the data, separates it from the logic, and can result in a smaller code footprint compared to macro expansion. For this specific problem, both solutions are excellent, but understanding the trade-offs is key to becoming an expert programmer. You can explore more advanced techniques in our complete Arm64-assembly guide.

Frequently Asked Questions (FAQ)

What is the role of the `x0` and `x1` registers in this function?

According to the Arm64 Procedure Call Standard (AAPCS64), the first eight integer or pointer arguments to a function are passed in registers x0 through x7. In our case, x0 is used to pass the address of the output string buffer (a pointer), and x1 is used to pass the integer value of the Arabic number we need to convert.

Why is the order of macro calls in the function so important?

The order is critical for the greedy algorithm to work correctly. We must check for the largest possible values first. Specifically, subtractive pairs like CM (900) must be checked before their larger constituent parts like D (500) or C (100). If we checked for D first on an input of 900, the algorithm would incorrectly produce DCCCC instead of the correct and more compact CM.

What does the instruction `strb w3, [x0], #1` do exactly?

This is a "Store Byte with Post-index" instruction. It breaks down into three actions:

strb w3, [x0]: It takes the lowest byte from the w3 register and stores it into the memory location pointed to by the x0 register.
, #1: This signifies the post-index update.
Update x0: After the store operation is complete, the value in register x0 is incremented by 1. This efficiently writes a character and moves the pointer to the next position in the buffer, all in one instruction.

Can this code handle numbers larger than 3,999?

No, this implementation is designed for traditional Roman numerals and will not produce a correct result for numbers of 4,000 or greater. The traditional system has no standard, universally accepted symbol for 5,000. While historical variations exist (like using a bar over a numeral to multiply its value by 1,000), they are not part of this kodikra module's scope.

What is a `.macro` and why is it useful here?

A .macro is an assembler directive that allows you to define a reusable block of code, much like a function in a high-level language but with a key difference: expansion. When the assembler encounters a macro call, it replaces the call with the full body of the macro's code. This is useful for reducing code duplication and improving readability, as seen in our use of SINGLE and DOUBLE macros.

How does Arm64 assembly differ from x86 assembly?

Arm64 and x86 are fundamentally different architectures. Arm64 is a RISC (Reduced Instruction Set Computer) architecture, characterized by a larger number of general-purpose registers (31 vs. ~16 in x86-64), fixed-length instructions, and a load/store architecture (memory operations are separate from arithmetic operations). x86 is a CISC (Complex Instruction Set Computer) architecture with variable-length instructions and instructions that can often operate directly on memory. This generally makes Arm64 assembly more verbose but also more uniform and often easier to parse for the CPU.

Conclusion: From Ancient Numerals to Modern Silicon

We have successfully journeyed from the abstract rules of an ancient number system to a concrete, low-level implementation on a modern CPU architecture. By leveraging Arm64 assembly's macros, we created a solution that is not only functional but also remarkably readable and elegant. You've seen how high-level algorithmic concepts—like a greedy, subtractive approach—are translated into a sequence of simple machine instructions: moving data into registers, comparing values, branching based on flags, and storing bytes in memory.

This exercise from the kodikra.com curriculum does more than just solve a puzzle; it builds a bridge between theoretical computer science and practical hardware implementation. Understanding how to manipulate data at this fundamental level is a skill that transcends programming languages and provides a powerful foundation for tackling complex challenges in performance optimization, systems programming, and beyond.

Disclaimer: The assembly code and concepts discussed are based on the AArch64 instruction set architecture. While the principles are stable, always consult the latest official ARM documentation for the most current specifications.

Ready for the next challenge? Continue your journey on our Arm64-assembly learning path to further sharpen your low-level programming skills. Or, for a broader perspective, explore more concepts in our complete Arm64-assembly guide.

Published by Kodikra — Your trusted Arm64-assembly learning resource.

kodikra

Search this blog