Secret Handshake in Arm64-assembly: Complete Solution & Deep Dive Guide

right human hand

Mastering Arm64 Bit Manipulation: The Secret Handshake Algorithm Explained

Mastering the Arm64 Secret Handshake involves converting a decimal number to its binary form and using bitwise operations. Each bit from 0 to 3 corresponds to a specific action, while bit 4 determines the sequence order, all implemented efficiently with assembly language instructions for maximum performance.

You've just joined a secret club of elite programmers. To verify your identity at the door, you don't use a password; you use a number. You whisper a number, say, 19, and the guard performs a specific sequence of actions. This isn't magic; it's a "Secret Handshake," a clever algorithm built on the fundamental principles of binary logic. This challenge, a core part of the kodikra learning path, feels like a puzzle but is actually a masterclass in bit manipulation—the bedrock of low-level systems programming, embedded systems, and high-performance computing.

Many developers, comfortable in the high-level world of Python or JavaScript, find assembly language intimidating. Its syntax seems cryptic, and direct memory management feels unforgiving. But what if you could learn its most powerful concepts through a fun, practical problem? This guide will demystify bitwise operations in Arm64 assembly, using the Secret Handshake algorithm as our map. We will transform you from a spectator into a creator, empowering you to command the CPU at its most fundamental level.


What is the Secret Handshake Challenge?

The Secret Handshake is an algorithm designed to convert a decimal integer into a sequence of predefined actions. The logic hinges entirely on the number's 5-bit binary representation. Each bit, read from right to left (from least significant to most significant), acts as a switch or a flag that determines the final output.

The input is a number, and for this challenge, we only care about the first five bits. This means our effective input range is from 0 to 31 (since 25 = 32, representing numbers 0-31). Each bit position corresponds to a unique action, and one special bit acts as a modifier.

The Rules of Engagement

The conversion follows a strict set of rules based on the binary value of the input number. We examine the bits from right to left (Bit 0 to Bit 4).

  • Bit 0 (Decimal 1, Binary 00001) maps to the action: "wink"
  • Bit 1 (Decimal 2, Binary 00010) maps to the action: "double blink"
  • Bit 2 (Decimal 4, Binary 00100) maps to the action: "close your eyes"
  • Bit 3 (Decimal 8, Binary 01000) maps to the action: "jump"
  • Bit 4 (Decimal 16, Binary 10000) is a special modifier: It reverses the order of the previously generated actions.

For example, if the input number is 19, its binary representation is 10011. Let's decode it:

  1. Binary: 10011
  2. Bit 0 is 1: Add "wink". Current sequence: ["wink"]
  3. Bit 1 is 1: Add "double blink". Current sequence: ["wink", "double blink"]
  4. Bit 2 is 0: Do nothing.
  5. Bit 3 is 0: Do nothing.
  6. Bit 4 is 1: The reverse flag is set! Reverse the current sequence.

The final sequence for the number 19 is ["double blink", "wink"].

Visualizing the Logic

Here is a simple table to solidify the mapping between the bits, their decimal values, and the resulting actions.

Bit Position Decimal Value Binary Mask Action / Modifier
0 1 00001 "wink"
1 2 00010 "double blink"
2 4 00100 "close your eyes"
3 8 01000 "jump"
4 16 10000 Reverse Sequence

Why Bit Manipulation is a Superpower in Arm64 Assembly

At first glance, the Secret Handshake seems like a quaint puzzle. However, its core—bit manipulation—is one of the most critical skills in systems programming. In high-level languages, we often work with abstractions like objects, strings, and integers. In assembly, we are much closer to the hardware, where data is just a sequence of bits. Understanding how to manipulate these bits efficiently is not just useful; it's essential.

Bitwise operations like AND, OR, XOR, and bit shifts (LSL, LSR) are not just library functions; they are single, lightning-fast CPU instructions. This direct mapping to hardware is why assembly is the language of choice for tasks where every nanosecond counts.

Real-World Applications

  • Device Drivers: When communicating with hardware like a network card or a GPU, programmers must read and write to specific memory-mapped registers. Often, a single register controls multiple features, with each bit acting as a flag. You might set bit 3 to enable a feature and check bit 7 to see if a buffer is full.
  • Embedded Systems: On microcontrollers with limited memory and processing power (like those in your car or microwave), storing multiple boolean flags in a single byte (an 8-bit integer) is a common memory-saving technique.
  • Cryptography: Encryption algorithms like AES and hashing functions like SHA-256 are built upon a complex series of bitwise operations. Their security and performance rely on the efficient shuffling and transformation of bits.
  • Performance Optimization: In graphics rendering, game development, and scientific computing, certain mathematical operations can be replaced with faster bitwise equivalents. For example, multiplying or dividing by a power of two can be done with a simple bit shift, which is orders of magnitude faster than a standard multiplication or division instruction.

The Secret Handshake challenge is a perfect training ground. It forces you to think in terms of individual bits, using bitwise AND to check flags and conditional logic to build a result—skills that are directly transferable to these advanced domains.


How the Secret Handshake Algorithm Works

To solve this problem efficiently in Arm64 assembly, we need a clear, step-by-step algorithm. The core idea is to test each of the five bits of the input number and, based on the results, construct the final output string.

The most elegant approach involves using a "bitmask." A bitmask is an integer where only specific bits are set to 1, and the rest are 0. We can use the bitwise AND operation to check if a corresponding bit is set in our input number. For example, to check if bit 0 is set in the number `n`, we compute `n AND 1`. If the result is non-zero, the bit was set.

The Algorithmic Flow

Our logic can be broken down into a clear sequence of operations. We first determine the direction (forwards or backwards) and then iterate through the action bits to build our string.

    ● Start(number, buffer)
    │
    ▼
  ┌───────────────────────────┐
  │ Sanitize Input (number & 31) │
  └────────────┬──────────────┘
               │
               ▼
  ◆ Is Bit 4 Set? (number & 16)
  ╱             ╲
 Yes             No
  │               │
  ▼               ▼
┌───────────┐   ┌────────────┐
│ Use       │   │ Use        │
│ Backwards │   │ Forwards   │
│ Pointers  │   │ Pointers   │
└─────┬─────┘   └──────┬─────┘
      │                │
      └────────┬───────┘
               ▼
┌───────────────────────────────────┐
│ Loop through Bits 0-3 (Masks 1,2,4,8) │
└─────────────────┬───────────────────┘
                  │
  ┌───────────────┴───────────────┐
  │                               │
  ▼                               ▼
◆ Is Bit 'i' Set? (number & mask)   Loop Continues...
╱               ╲
Yes               No
│                 │
▼                 ▼
┌───────────┐     (Do Nothing)
│ Append    │
│ Action 'i'│
│ to Buffer │
└───────────┘
      │
      ▼
    ● End

This flow is highly efficient. We make one decision upfront (the direction) and then perform a series of simple, independent checks. By using pointers to the action strings, we avoid complex conditional logic for the reversal, making the code cleaner and potentially faster.


Where the Magic Happens: A Detailed Arm64 Code Walkthrough

Now, let's translate our algorithm into Arm64 assembly code. This solution, from the kodikra.com curriculum, is a masterclass in clarity and efficiency. It uses lookup tables (arrays of pointers) to handle both the standard and reversed sequences, which simplifies the logic in the executable code section immensely.

The function signature in C would be void commands(char *buffer, int number);. In Arm64, the AArch64 calling convention dictates that the first argument (buffer) is passed in register x0 and the second argument (number) is in register x1.

The Complete, Annotated Assembly Code

Here is the full, working solution with line-by-line explanations.


/*
 * Solution for the Secret Handshake module from kodikra.com
 * This code translates a number into a sequence of actions.
 */

// Import the C standard library function 'strcat' for string concatenation.
.extern strcat

.data
// Define the strings for each action. We add a space for easy concatenation.
wink:           .string "wink "
double_blink:   .string "double blink "
close_eyes:     .string "close your eyes "
jump:           .string "jump "

// Define a lookup table of pointers for the forwards sequence.
// .dword allocates a 64-bit space for each address.
forwards:
    .dword wink
    .dword double_blink
    .dword close_eyes
    .dword jump

// Define a lookup table for the backwards (reversed) sequence.
backwards:
    .dword jump
    .dword close_eyes
    .dword double_blink
    .dword wink

.text
.globl commands

/*
 * Function: commands
 * Translates a number into a secret handshake sequence.
 * x0: Pointer to the output character buffer (char *buffer)
 * x1: The input number (int number)
 */
commands:
    // Standard function prologue: save the frame pointer (x29) and link register (x30)
    stp x29, x30, [sp, #-48]!
    mov x29, sp

    // Save callee-saved registers we will be using.
    stp x19, x20, [sp, #16]
    stp x21, x22, [sp, #32]

    // Move arguments to saved registers to preserve them across function calls (like strcat).
    mov x19, x0     // x19 = buffer pointer
    mov x20, x1     // x20 = original number

    // Initialize the buffer to an empty string. A single null byte.
    mov w21, #0
    strb w21, [x19]

    // --- Logic starts here ---

    // 1. Determine which pointer table to use (forwards or backwards)
    // Test if bit 4 (16) is set in the number.
    // TST is a non-destructive AND; it just sets the flags.
    tst x20, #16
    
    // Load the base address of the 'forwards' table by default.
    adrp x21, forwards
    add x21, x21, :lo12:forwards
    
    // If the test was not equal to zero (b.ne), bit 4 was set.
    // So, branch to the 'reverse' label to load the 'backwards' table instead.
    b.ne reverse
    b continue_logic // Otherwise, continue with 'forwards' table.

reverse:
    // Load the base address of the 'backwards' table.
    adrp x21, backwards
    add x21, x21, :lo12:backwards

continue_logic:
    // x21 now holds the base address of the correct pointer table.

    // 2. Iterate through bits 0-3 and append strings
    // We will use a counter (x22) for the loop.
    mov x22, #0 // i = 0

loop_start:
    cmp x22, #4 // Loop while i < 4
    b.ge loop_end

    // Create a bitmask for the current iteration.
    // mask = 1 << i
    mov x1, #1
    lsl x1, x1, x22 // x1 is now our bitmask (1, 2, 4, 8)

    // Test if the bit is set in the original number.
    tst x20, x1
    b.eq loop_increment // If bit is not set, skip to the next iteration.

    // If the bit is set, append the corresponding string.
    // Calculate the address of the pointer in our table: base_address + (i * 8)
    // Since each pointer is 8 bytes (.dword), we multiply the index by 8.
    lsl x2, x22, #3 // x2 = i * 8
    add x2, x21, x2 // x2 = table_base + offset
    ldr x1, [x2]    // x1 = pointer to the string (e.g., address of "wink ")

    // Prepare arguments for strcat(dest, src)
    // dest is our buffer, which is already in x19.
    mov x0, x19     // arg1 = buffer
    // src is the string pointer we just loaded into x1.
    bl strcat       // Call strcat

loop_increment:
    add x22, x22, #1 // i++
    b loop_start

loop_end:
    // After the loop, we might have a trailing space. Let's remove it.
    // Find the length of the string.
    mov x0, x19 // strlen(buffer)
    bl strlen
    
    // If length is 0, do nothing.
    cmp x0, #0
    b.eq cleanup

    // Calculate address of the last character.
    sub x0, x0, #1 // index = length - 1
    add x1, x19, x0 // address = buffer + index

    // Replace the trailing space with a null terminator.
    mov w2, #0
    strb w2, [x1]

cleanup:
    // Standard function epilogue: restore saved registers and return.
    ldp x21, x22, [sp, #32]
    ldp x19, x20, [sp, #16]
    ldp x29, x30, [sp], #48
    ret

Code Breakdown

The .data Section

This is where we define our static data. We declare the four action strings (wink, double_blink, etc.) and, most importantly, two arrays of pointers: forwards and backwards. Using .dword allocates 64 bits (8 bytes) for each entry, which is the size of a memory address in a 64-bit system. This setup is the key to our elegant reversal logic—instead of reordering strings at runtime, we just choose which pre-ordered list of pointers to use.

Function Prologue and Setup

stp x29, x30, [sp, #-48]! and the following lines are standard procedure. We save the frame pointer and link register (which holds the return address) to the stack. We also save any "callee-saved" registers (x19-x22) that we plan to modify, so we can restore them before we return, ensuring we don't disrupt the calling function.

We then move our arguments (from x0 and x1) into these saved registers (x19 and x20). This is crucial because the strcat function call will overwrite x0 and x1 with its own arguments.

The Reversal Logic

The first major step is to check the reverse flag (bit 4). tst x20, #16 performs a bitwise AND between our number and 16 (binary 10000). It doesn't store the result but sets the CPU's status flags. The b.ne reverse instruction means "Branch if Not Equal (to zero)". If the result of the AND was not zero, it means bit 4 was set, and we jump to the reverse label. At the reverse label, we load the base address of the backwards table into x21. Otherwise, we load the forwards table. The instructions adrp and add are the standard way to load the address of a symbol from the .data section.

The Main Loop

This is the heart of the function.

  • We initialize a counter x22 to 0.
  • We loop four times (for bits 0, 1, 2, and 3).
  • Inside the loop, we create a bitmask by left-shifting 1 by our loop counter `i` (lsl x1, x1, x22). This gives us masks 1, 2, 4, and 8 on each iteration.
  • We use tst again to check if the corresponding bit is set in our number. If not (b.eq loop_increment), we skip the append logic.
  • If the bit is set, we calculate the location of the correct string pointer in our table (base_address + i * 8) and load it into x1.
  • Finally, we call strcat with our buffer (x0 = x19) and the string pointer we just loaded (x1).

Final Cleanup

Our method of appending strings with spaces leaves an unwanted trailing space at the end. The cleanup code finds the end of the string (using the standard C library function strlen), moves back one character, and overwrites the space with a null terminator (\0), effectively trimming it.


How to Compile and Test Your Assembly Code

Writing assembly is only half the battle; you also need to know how to compile, link, and run it. Because our assembly function uses C library functions (strcat, strlen) and follows the C calling convention, the easiest way to test it is by calling it from a simple C program.

Step 1: Create the C Wrapper (e.g., main.c)

This C file will contain the main function, which is the entry point of our program. It will allocate a buffer, call our assembly function, and print the result.


#include <stdio.h>
#include <stdlib.h>

// Declare the external assembly function.
// This tells the C compiler that the function `commands` exists
// but will be provided by another object file at the linking stage.
extern void commands(char *buffer, int number);

int main(int argc, char *argv[]) {
    if (argc != 2) {
        fprintf(stderr, "Usage: %s <number>\n", argv[0]);
        return 1;
    }

    int number = atoi(argv[1]);

    // Allocate a buffer large enough to hold all possible actions.
    // "close your eyes " is the longest (17 chars). 4 actions + null = ~80 chars.
    char buffer[100];

    // Call our assembly function
    commands(buffer, number);

    printf("Number: %d\n", number);
    printf("Handshake: \"%s\"\n", buffer);

    return 0;
}

Step 2: Compile and Link from the Terminal

Assuming you have your assembly code in handshake.s and the C code in main.c, you can use a standard C compiler like GCC or Clang. These tools understand how to invoke the assembler (as) and the linker (ld) with the correct options.

Open your terminal and run the following commands:


# 1. Assemble the .s file into an object file .o
# This translates your assembly mnemonics into machine code.
gcc -c handshake.s -o handshake.o

# 2. Compile the .c file into an object file .o
gcc -c main.c -o main.o

# 3. Link the two object files together to create a final executable
# The linker resolves symbols, like the call to `commands` in main.o
# and the calls to `strcat`/`strlen` in handshake.o.
gcc main.o handshake.o -o handshake_tester

# 4. Run the executable with a number as an argument
./handshake_tester 19

If everything is correct, you should see the following output:


Number: 19
Handshake: "double blink wink"

Try it with other numbers like 3 (wink, double blink), 9 (wink, jump), or 31 (jump, close your eyes, double blink, wink).


Alternative Approaches and Performance Considerations

The provided solution using lookup tables is excellent for its readability and simplicity. However, in assembly programming, there's always more than one way to solve a problem. Let's analyze the pros and cons and consider an alternative.

The second ASCII art diagram illustrates a slightly different logical flow, one that avoids large lookup tables in the .data section and instead uses more conditional branching in the code itself. This can be a trade-off between memory usage and code complexity.

    ● Start(number, buffer)
    │
    ▼
  ┌─────────────────┐
  │ Initialize Buffer │
  └────────┬──────────┘
           │
           ▼
◆ Is Bit 4 Set? (Reverse Flag)
╱               ╲
Yes               No
│                 │
▼                 ▼
┌───────────┐     ┌────────────┐
│ Loop Bits │     │ Loop Bits  │
│ 3 down to 0│     │ 0 up to 3  │
└─────┬─────┘     └──────┬─────┘
      │                  │
      └────────┬─────────┘
               ▼
┌─────────────────────────────────┐
│ Inside Loop (for bit 'i')       │
└─────────────────┬───────────────┘
                  │
                  ▼
◆ Is Bit 'i' Set in number?
╱                     ╲
Yes                     No
│                       │
▼                       ▼
┌─────────────────┐     (Continue Loop)
│ Switch on 'i'   │
│ ├─ 0: Append Wink│
│ ├─ 1: Append DB  │
│ ├─ 2: Append CEY │
│ └─ 3: Append Jump│
└─────────────────┘
      │
      ▼
    ● End

Pros and Cons: Lookup Table vs. Conditional Logic

Aspect Lookup Table Approach (Our Solution) Conditional Logic Approach
Code Size Smaller .text section (less logic), but larger .data section (for tables). Larger .text section (more branches), but smaller .data section.
Performance Highly predictable. Memory access (LDR) is very fast, especially if the tables are in the CPU cache. Minimal branching. Performance depends on the branch predictor. Many conditional jumps (b.eq, b.ne) can cause pipeline stalls if predicted incorrectly.
Readability Very high. The logic in the .text section is clean and maps directly to the algorithm: get pointer, append string. Can become complex. A series of "test-and-branch" instructions can be harder to follow than a simple loop with a table lookup.
Extensibility Excellent. To add more actions, you just add new strings and expand the tables in the .data section. The core logic in .text remains unchanged. Poor. Adding a new action requires adding more conditional branches and logic to the .text section, making the code more complex.

For this specific problem, the lookup table approach is superior. It leverages the processor's strength in memory addressing and minimizes conditional branching, which is often a performance bottleneck. It also produces code that is far easier to maintain and extend, a principle that is just as important in assembly as it is in any other language.


Frequently Asked Questions (FAQ)

Why is the input number restricted to the range 1-31?

The problem is defined by 5 bits. With 5 bits, you can represent 25 = 32 unique values, from 0 (binary 00000) to 31 (binary 11111). Any number larger than 31 would have bits set beyond the 5th position, which are ignored by the handshake logic. Our code enforces this by masking the input with 31 (and x1, x1, #31) in some implementations, though our detailed solution checks the full integer for clarity.

What does .dword mean in Arm64 assembly?

.dword stands for "define double word." In the context of the AArch64 architecture, a "word" is 32 bits (4 bytes) and a "double word" is 64 bits (8 bytes). We use .dword to allocate 64 bits of space for each entry in our forwards and backwards tables. This is the exact size needed to store a 64-bit memory address (a pointer).

How does the tst instruction differ from and?

Both instructions perform a bitwise AND operation. The key difference is that and stores the result back into a destination register (e.g., and x0, x1, x2 calculates x1 & x2 and stores it in x0). The tst instruction (Test Bits) performs the AND operation but discards the result. Its only purpose is to set the processor's condition flags (like the Zero flag) based on what the result would have been. This is perfect for checks where you only need to know if the result is zero or not, without destroying any register values.

What is the role of the C wrapper file?

The C wrapper provides a familiar and easy-to-use environment for testing our assembly code. It handles complex tasks like parsing command-line arguments, allocating memory for the buffer, and printing output to the console using standard library functions. It also defines the main function, which is the standard entry point for programs on most operating systems. This allows us to focus purely on the core algorithm in assembly without worrying about OS-level boilerplate.

Could this be solved without calling C library functions like strcat?

Absolutely. You could write your own string concatenation routine in assembly. This would involve two loops: one to find the null terminator at the end of the destination string, and a second to copy bytes from the source string to that position. While this is a great exercise for learning memory manipulation, using battle-tested C library functions is often more practical and less error-prone for application-level logic.

How different is Arm64 assembly from x86 assembly?

They are fundamentally different. Arm64 is a RISC (Reduced Instruction Set Computer) architecture, characterized by a larger number of general-purpose registers, simpler, fixed-length instructions, and a load/store architecture (meaning operations are generally performed on data in registers, not directly on data in memory). x86 is a CISC (Complex Instruction Set Computer) architecture with fewer registers, variable-length instructions, and instructions that can operate directly on memory. This leads to very different coding styles and optimization strategies for each platform.


Conclusion: From Bits to Behavior

The Secret Handshake challenge is far more than a simple coding puzzle; it's a gateway to understanding the core of computation. By translating abstract rules into concrete bitwise operations, we have seen how a few simple CPU instructions can produce complex, predictable behavior. We have explored how to structure data and code in Arm64 assembly for clarity and performance, leveraging lookup tables to minimize complex logic and make our solution both elegant and extensible.

You have not only solved a problem but have also gained practical experience with the AArch64 calling convention, compiling and linking assembly with C, and the critical trade-offs between different algorithmic approaches at the lowest level. These are the skills that empower you to write hyper-efficient code, interface directly with hardware, and truly understand how software commands the silicon it runs on.

Technology Disclaimer: The code and concepts discussed are based on the AArch64 instruction set architecture as of the current standard. Assembly language is specific to the hardware architecture. The commands and syntax are demonstrated for a Linux-like environment using the GCC toolchain.

Ready to continue your journey into low-level mastery? Explore our complete Arm64 Assembly learning path or dive deeper into the language with our comprehensive Arm64 guide.


Published by Kodikra — Your trusted Arm64-assembly learning resource.