Perfect Numbers in Arm64-assembly: Complete Solution & Deep Dive Guide

a close up of a sign with a lot of dots on it

The Complete Guide to Perfect Numbers in Arm64 Assembly

Discover how to classify integers as perfect, abundant, or deficient using Arm64 assembly language. This in-depth guide explores the ancient mathematical theory of aliquot sums and provides a meticulously crafted, step-by-step implementation in AArch64, unlocking maximum performance on modern hardware like Apple Silicon and AWS Graviton.


The Ancient Secret Hiding in Your CPU

Thousands of years ago, long before the first silicon chip was ever imagined, Greek mathematicians like Nicomachus were obsessed with the hidden properties of numbers. They saw a certain harmony, a kind of cosmic order, in the way integers related to their own divisors. This quest led them to classify numbers into three distinct categories: perfect, abundant, and deficient.

You might be wondering, "What does ancient number theory have to do with my modern, ultra-fast processor?" The answer is: everything. The fundamental logic—the loops, the divisions, the comparisons—that Nicomachus performed in his mind is the very same logic that powers complex algorithms today. Learning to implement this logic at the lowest level, directly in Arm64 assembly, is like learning the native language of your CPU. It's challenging, but it grants you unparalleled control and performance.

This guide bridges that gap between ancient mathematics and cutting-edge hardware. We will take you from the theoretical foundations of perfect numbers to a practical, line-by-line implementation in Arm64 assembly. You'll not only solve a classic computer science problem but also gain a profound understanding of how your processor actually works. This is a core challenge from the kodikra.com Arm64-assembly learning path, designed to build a solid foundation for systems programming.


What Are Perfect, Abundant, and Deficient Numbers?

The entire classification system hinges on a single concept: the aliquot sum. Understanding this is the first and most crucial step.

The Aliquot Sum: A Number's True Friends

The aliquot sum of a positive integer is the sum of its proper positive divisors. A proper divisor is any divisor of a number, excluding the number itself. Think of them as the number's "building blocks" or "friends."

Let's take the number 12 as an example:

  • The divisors of 12 are: 1, 2, 3, 4, 6, and 12.
  • The proper divisors of 12 are: 1, 2, 3, 4, and 6.
  • The aliquot sum of 12 is: 1 + 2 + 3 + 4 + 6 = 16.

Once you have the aliquot sum, you compare it to the original number. This comparison determines its category.

The Three Classifications of Nicomachus

  1. Perfect Number: A number is perfect if its aliquot sum is equal to the number itself. The smallest perfect number is 6. Its proper divisors are 1, 2, and 3. The sum is 1 + 2 + 3 = 6. The next one is 28 (1 + 2 + 4 + 7 + 14 = 28). These numbers were considered to have perfect harmony.
  2. Abundant Number: A number is abundant if its aliquot sum is greater than the number itself. Our example, 12, is abundant because its aliquot sum (16) is greater than 12. These were seen as having an excess, or an "abundance," of parts.
  3. Deficient Number: A number is deficient if its aliquot sum is less than the number itself. Take the number 10. Its proper divisors are 1, 2, and 5. The sum is 1 + 2 + 5 = 8. Since 8 is less than 10, the number is deficient.

Here is a summary table for quick reference:

Number (n) Proper Divisors Aliquot Sum (s) Comparison Classification
6 1, 2, 3 6 s == n Perfect
12 1, 2, 3, 4, 6 16 s > n Abundant
10 1, 2, 5 8 s < n Deficient
1 (none) 0 s < n Deficient

Why Use Arm64 Assembly for This Problem?

In a world of high-level languages like Python and JavaScript, why would anyone choose to write in assembly? For a problem like this, it's not about getting the answer quickly; it's about understanding how the answer is computed at the most fundamental level.

  • Unmatched Performance: Assembly gives you direct access to the CPU's instruction set. You can write code that is orders of magnitude faster than what a generic compiler might produce, by eliminating all overhead.
  • Hardware Mastery: Writing assembly forces you to understand CPU architecture, registers, memory access, and the instruction pipeline. This knowledge is invaluable for performance tuning, debugging, and systems-level development.
  • Energy Efficiency: On mobile and embedded devices, every CPU cycle consumes power. Highly optimized assembly code can perform a task using fewer instructions, leading to significant battery savings. This is a cornerstone of development for platforms built on ARM architecture.
  • A Foundational Skill: Understanding assembly helps you write better code even in high-level languages, because you know what's happening "under the hood." It demystifies concepts like pointers, memory layout, and function calls.

The Arm64 (AArch64) architecture is particularly relevant. It powers everything from your iPhone and Android device to Apple's M-series Macs and massive server farms like AWS Graviton. Mastering it is a future-proof skill for any serious programmer.


How to Implement the Perfect Number Logic in Arm64

Now we get to the core of the task. We will design an algorithm, visualize it, and then translate it into clean, efficient Arm64 assembly code. Our function will take a 64-bit unsigned integer and return its classification.

The Algorithm: A Step-by-Step Plan

Our goal is to calculate the aliquot sum and compare it to the input number. A straightforward algorithm is as follows:

  1. Function Signature: Define a function that accepts one 64-bit integer, let's call it n.
  2. Edge Case: If n is less than 2, it is deficient by definition (its aliquot sum is 0). Return the "deficient" classification immediately.
  3. Initialization: Create a variable to hold the aliquot sum, initialized to 1. We can start with 1 because 1 is a proper divisor of every integer greater than 1. This lets us start our loop from 2.
  4. The Loop: Iterate with a counter, let's call it i, from 2 up to the square root of n. We'll discuss this optimization later, but for a clear first implementation, we will loop up to n / 2.
  5. Divisibility Check: Inside the loop, check if i is a divisor of n. In assembly, this is done with a division instruction and checking for a remainder of zero.
  6. Summation: If i is a divisor, add it to our aliquot sum.
  7. Final Comparison: After the loop finishes, compare the calculated aliquot sum with the original number n.
  8. Return Value:
    • If sum == n, return 0 (Perfect).
    • If sum > n, return 1 (Abundant).
    • If sum < n, return -1 (Deficient).

High-Level Algorithm Flowchart

This ASCII art diagram illustrates the logical flow of our algorithm before we dive into the code.

    ● Start(n)
    │
    ▼
┌──────────────────┐
│  sum = 0         │
│  i = 1           │
└────────┬─────────┘
    │
    ▼
┌──────────────────┐
│  Loop while i <= n/2 │
└────────┬─────────┘
    │
    ▼
◆ n % i == 0 ?
   ╱           ╲
 Yes           No
  │              │
  ▼              ▼
┌───────────┐    │
│ sum += i  │    │
└───────────┘    │
  │              │
  └──────┬───────┘
         │
         ▼
┌───────────┐
│ i++       │
└───────────┘
    │
    │ ◀─────── Loop back
    ▼
◆ sum == n ?
   ╱      ╲
 Yes       No
  │         │
  ▼         ▼
[Return 0]  ◆ sum > n ?
 (Perfect)   ╱       ╲
           Yes        No
            │          │
            ▼          ▼
         [Return 1] [Return -1]
        (Abundant) (Deficient)
            │          │
            └─────┬────┘
                  ▼
                ● End

The Complete Arm64 Assembly Solution

Here is the full, commented source code. This code adheres to the standard ARM64 Application Binary Interface (ABI), where the first argument is passed in register x0 and the return value is placed in w0 (the lower 32 bits of x0).


// Classifies a number as perfect, abundant, or deficient.
//
// Input:
//   x0: The 64-bit unsigned integer 'n' to classify.
//
// Output:
//   w0: The classification result:
//       -  0 if 'n' is Perfect (aliquot sum == n)
//       -  1 if 'n' is Abundant (aliquot sum > n)
//       - -1 if 'n' is Deficient (aliquot sum < n)
//
// Register Usage:
//   x0: Holds input 'n' and is used for comparisons.
//   x1: Loop counter 'i'.
//   x2: Aliquot sum.
//   x3, x4: Temporary registers for division and modulo.

.global classify_number
.balign 4

classify_number:
    // Function Prologue: Save frame pointer and link register
    stp     x29, x30, [sp, #-16]!
    mov     x29, sp

    // Handle edge case: numbers less than 2 are deficient.
    // The aliquot sum of 1 is 0. The aliquot sum of 0 is 0.
    cmp     x0, #2
    b.lt    .deficient // Branch if n < 2

    // Initialize registers
    mov     x2, #1          // Aliquot sum starts at 1 (since 1 is always a proper divisor)
    mov     x1, #2          // Loop counter 'i' starts at 2

.loop_start:
    // Loop condition: check if 2*i > n. This is more efficient
    // than i > n/2 as it avoids a division in the loop condition.
    lsl     x3, x1, #1      // x3 = i * 2
    cmp     x3, x0          // Compare 2*i with n
    b.gt    .loop_end       // If 2*i > n, exit the loop

    // Perform modulo operation: n % i
    // In ARMv8, there's no single modulo instruction. We use:
    // remainder = n - (n / i) * i
    udiv    x3, x0, x1      // x3 = n / i (unsigned division)
    msub    x4, x3, x1, x0  // x4 = n - (x3 * x1) which is the remainder

    // Check if remainder is zero
    cmp     x4, #0
    b.ne    .skip_add       // If remainder is not 0, 'i' is not a divisor

    // 'i' is a divisor, add it to the sum
    add     x2, x2, x1      // sum = sum + i

.skip_add:
    // Increment loop counter
    add     x1, x1, #1      // i = i + 1
    b       .loop_start     // Jump back to the start of the loop

.loop_end:
    // Compare the final aliquot sum (x2) with the original number (x0)
    cmp     x2, x0

    // Use conditional set instructions to determine the return value based on flags.
    // cset w-reg, condition: sets the destination register to 1 if condition is true, else 0.
    cset    w3, gt          // w3 = 1 if sum > n (Abundant), else 0
    cset    w4, lt          // w4 = 1 if sum < n (Deficient), else 0

    // We want -1 for deficient. We can achieve this by negating w4.
    neg     w4, w4          // w4 is now -1 if deficient, else 0

    // Combine the results. Only one of w3 or w4 can be non-zero.
    orr     w0, w3, w4      // w0 = w3 OR w4. This gives 1, -1, or 0.
    b       .exit

.deficient:
    // Special path for n < 2, which is always deficient.
    mov     w0, #-1

.exit:
    // Function Epilogue: Restore frame pointer and link register, then return
    ldp     x29, x30, [sp], #16
    ret

Detailed Code Walkthrough

Let's break down the assembly code section by section to understand what each instruction does.

1. Function Prologue and Edge Case


stp     x29, x30, [sp, #-16]!
mov     x29, sp

This is standard procedure. stp (Store Pair) saves the frame pointer (x29) and link register (x30, which holds the return address) onto the stack. We then update the frame pointer to the current stack pointer. This allows us to safely call other functions and manage the stack.


cmp     x0, #2
b.lt    .deficient

We immediately handle the simplest cases. Any number less than 2 (i.e., 0 or 1) is classified as deficient. cmp sets the CPU flags based on the result of x0 - 2. b.lt (Branch if Less Than) jumps to the .deficient label if the condition is met.

2. Initialization


mov     x2, #1
mov     x1, #2

We initialize our aliquot sum in register x2 to 1. This is a small optimization; since 1 is a proper divisor for any n > 1, we can pre-load it and start our loop from 2. The loop counter i is initialized in x1 to 2.

3. The Main Loop


.loop_start:
    lsl     x3, x1, #1
    cmp     x3, x0
    b.gt    .loop_end

This is the loop's entry point and condition check. Instead of calculating n / 2, we check if 2 * i > n. This is much faster because a multiplication by 2 can be done with a single lsl (Logical Shift Left) instruction, avoiding a slow division inside the loop condition. If 2*i is greater than n, we know we've checked all possible factors and can exit.


    udiv    x3, x0, x1
    msub    x4, x3, x1, x0

This is the heart of the divisibility test. The AArch64 instruction set provides a powerful msub (Multiply-Subtract) instruction. The operation it performs is x4 = x0 - (x3 * x1). By first calculating x3 = n / i with udiv (Unsigned Divide), this sequence effectively calculates n % i and stores the remainder in x4.


    cmp     x4, #0
    b.ne    .skip_add
    add     x2, x2, x1

We compare the remainder in x4 to zero. If it's not zero (b.ne - Branch if Not Equal), we jump over the addition. If it is zero, it means i is a divisor, and we execute add x2, x2, x1 to add it to our running sum.


.skip_add:
    add     x1, x1, #1
    b       .loop_start

Finally, we increment our loop counter i and unconditionally branch (b) back to the start of the loop to process the next number.

Register and Instruction Flow Diagram

This diagram shows the flow of data between registers during the core loop logic.

    ● Loop Iteration Start
    │
    │  x0: n (original number)
    │  x1: i (current divisor candidate)
    │  x2: sum (aliquot sum)
    │
    ▼
┌───────────────────────────┐
│ udiv x3, x0, x1           │  // x3 ← n / i
└─────────────┬─────────────┘
              │
              ▼
┌───────────────────────────┐
│ msub x4, x3, x1, x0       │  // x4 ← n - (x3 * x1) == n % i
└─────────────┬─────────────┘
              │
              ▼
        ◆ x4 == 0 ?
         ╱         ╲
       Yes         No
        │           │
        ▼           │
┌─────────────────┐ │
│ add x2, x2, x1  │ │  // sum ← sum + i
└─────────────────┘ │
        │           │
        └─────┬─────┘
              │
              ▼
┌─────────────────┐
│ add x1, x1, #1  │      // i ← i + 1
└─────────────────┘
              │
              ▼
    ● Continue to next iteration

4. Final Comparison and Return


.loop_end:
    cmp     x2, x0
    cset    w3, gt
    cset    w4, lt
    neg     w4, w4
    orr     w0, w3, w4

This is a very efficient, branchless way to set the return value.

  • cmp x2, x0: Compares the final sum with n and sets the condition flags.
  • cset w3, gt: "Conditional Set". If the 'greater than' (gt) flag is set from the cmp, it sets w3 to 1. Otherwise, w3 is 0. This handles the Abundant case.
  • cset w4, lt: Similarly, if the 'less than' (lt) flag is set, it sets w4 to 1. This is the first step for the Deficient case.
  • neg w4, w4: "Negate". If w4 was 1 (deficient), it becomes -1. If it was 0, it remains 0.
  • orr w0, w3, w4: "Bitwise OR". We combine the results into our return register w0. Since a number cannot be both abundant and deficient, only one of w3 or w4 can be non-zero.
    • If abundant, w0 = 1 OR 0, so w0 = 1.
    • If deficient, w0 = 0 OR -1, so w0 = -1.
    • If perfect, w0 = 0 OR 0, so w0 = 0.

5. Function Epilogue


.exit:
    ldp     x29, x30, [sp], #16
    ret

This cleans up our work. ldp (Load Pair) restores the original frame pointer and link register from the stack. The ret instruction reads the address from the link register (x30) and jumps back to where the function was called, completing the execution.


Alternative Approaches & Further Optimizations

The provided solution is clear and correct, but it's not the most performant. In low-level programming, there's always a trade-off between readability and raw speed.

Optimization: Looping to the Square Root

A significant optimization is to loop only up to the square root of n. Factors always come in pairs. For example, the factors of 36 are (1, 36), (2, 18), (3, 12), (4, 9), and (6, 6). Notice that once we find a factor i, we automatically know its pair, which is n / i.

The algorithm would change:

  1. Loop i from 2 up to floor(sqrt(n)).
  2. If i is a divisor of n:
    • Add i to the sum.
    • Calculate the pair: p = n / i.
    • If p is not equal to i (to handle perfect squares), add p to the sum as well.

This dramatically reduces the number of iterations for large numbers. However, it requires a square root calculation, which can be complex in assembly, or more divisions inside the loop. It's a classic performance trade-off.

Pros and Cons of Assembly Implementation

Writing this logic in assembly is a powerful learning exercise, but it's important to know when it's appropriate in a real-world project.

Pros Cons
Maximum Performance: Direct control over instructions for optimal speed and energy use. Complexity & Time: Development is significantly slower and more error-prone than in high-level languages.
Minimal Footprint: The resulting binary is extremely small, ideal for embedded systems. Poor Portability: Code is tied to a specific CPU architecture (Arm64). It won't run on x86.
Deep System Understanding: Provides unparalleled insight into how hardware executes code. Maintenance Burden: Assembly code is harder to read, debug, and maintain for other developers.
Access to Special Instructions: Allows use of SIMD, crypto, and other specialized CPU features not always exposed in high-level languages. Compiler Optimizations: Modern compilers (like Clang and GCC) are incredibly good at optimization and can often produce code that is "good enough" or even better than hand-written assembly by a non-expert.

Frequently Asked Questions (FAQ)

What is an aliquot sum again?
The aliquot sum of a number is the sum of all its positive divisors, excluding the number itself. For example, for the number 6, the divisors are 1, 2, 3, and 6. The aliquot sum is 1 + 2 + 3 = 6.
Why do we only need to check for factors up to n/2?
A proper divisor of a number n cannot be larger than half of n. The next integer after n/2 is (n/2) + 1. If you multiply that by the smallest possible integer (2), the result is n + 2, which is already larger than n. Therefore, no factor can exist between n/2 and n.
How do you compile and run this Arm64 assembly code?
On a machine with an Arm64 processor (like a Raspberry Pi 4, an Apple Silicon Mac, or an AWS Graviton instance) and a C toolchain (like GCC or Clang), you would save the code as a .s file. You can then assemble and link it, often by calling it from a simple C wrapper function. For example: gcc main.c perfect.s -o my_program.
Is learning assembly language still relevant today?
Absolutely. While you won't write entire applications in it, it's critical for performance-sensitive domains like game engine development, high-frequency trading, operating system kernels, compiler design, and embedded systems programming. It is also essential for security research and reverse engineering.
Can we optimize the factor-finding loop even further?
Yes, the standard optimization is to loop only up to the square root of the number. When you find a divisor i, you also find its pair n/i. By adding both to the sum in each step, you can significantly reduce the number of loop iterations. This is especially effective for very large numbers.
What is the difference between ARMv8 and ARMv9?
ARMv9 is the next generation of the ARM architecture. It builds upon ARMv8 by adding major new features focused on security (Confidential Compute Architecture) and AI/ML performance (SVE2 - Scalable Vector Extension 2). Code written for ARMv8, like our example, is fully compatible and will run on ARMv9 processors.
Are there any odd perfect numbers?
This is one of the oldest unsolved problems in mathematics! To date, no odd perfect numbers have ever been found. However, mathematicians have not yet been able to prove that one cannot exist. All known perfect numbers are even.

Conclusion: From Ancient Theory to Modern Mastery

We've journeyed from the abstract world of Greek number theory to the concrete reality of CPU registers and instructions. By implementing the perfect number classification in Arm64 assembly, you've done more than just solve a problem; you've gained a deeper appreciation for the elegance and power of low-level code.

You learned how to translate a mathematical algorithm into a sequence of operations the processor can understand, how to manage registers efficiently, and how to use clever tricks like branchless conditional sets to optimize your code. This is the essence of systems programming and a critical skill for anyone serious about software performance.

This exercise is a fundamental building block. The concepts of loops, conditional logic, and integer arithmetic are universal. Mastering them here, in assembly, provides a solid foundation for tackling more complex challenges.

Ready to continue your journey into low-level programming? Explore the next module in our Arm64-assembly learning path or dive deeper into the language with our complete Arm64-assembly guide available on kodikra.com.

Disclaimer: The code in this article is written for the AArch64 instruction set, compatible with ARMv8-A and later architectures. The specific assembler syntax is for the GNU Assembler (as), commonly used with GCC and Clang.


Published by Kodikra — Your trusted Arm64-assembly learning resource.