Variable Length Quantity in Arm64-assembly: Complete Solution & Deep Dive Guide

a close up of a sign with numbers on it

The Complete Guide to Variable Length Quantity (VLQ) in Arm64-assembly

Variable Length Quantity (VLQ) is a universal encoding scheme that compresses integers by using a variable number of bytes. This guide provides a deep dive into its mechanics, benefits, and a complete, from-scratch implementation in Arm64-assembly, empowering you to master low-level data optimization.

You've probably encountered a common engineering problem: how to efficiently store or transmit a large set of numbers where most values are small. Using a fixed-size type like a 32-bit or 64-bit integer feels wasteful. A number like `5` would occupy 4 or 8 bytes, with most of those bytes being zeros. This inefficiency compounds quickly, bloating file sizes and increasing network latency. What if there was a way to use only one byte for small numbers, two for slightly larger ones, and so on? This is precisely the problem that Variable Length Quantity (VLQ) encoding elegantly solves. In this comprehensive guide, we'll demystify VLQ and walk you through building both an encoder and a decoder at the most fundamental level: Arm64-assembly, using the exclusive learning materials from the kodikra.com curriculum.


What is Variable Length Quantity (VLQ)?

Variable Length Quantity (VLQ), also known as Base-128 encoding, is a method for encoding arbitrarily large integers into a sequence of bytes. Its primary goal is space efficiency. Instead of using a fixed number of bytes (like 4 for an `int32`), VLQ uses as few bytes as necessary.

The core mechanism is simple yet brilliant. Each byte in a VLQ sequence is split into two parts:

  • The Continuation Bit (MSB): The Most Significant Bit (bit 7) acts as a flag. If this bit is 1, it signals that more bytes follow as part of the current integer. If it's 0, it marks the final byte for that integer.
  • The Payload (7 bits): The remaining 7 bits (bits 0-6) of the byte store a piece of the integer's data.

By chaining these 7-bit payloads together, we can represent numbers of any size. A number from 0-127 fits in a single byte. A number from 128-16383 requires two bytes, and so on. This makes it exceptionally well-suited for data streams where small numbers are far more common than large ones.


Why is VLQ a Critical Concept in Modern Systems?

While VLQ might seem like a niche, low-level optimization, its applications are widespread and impact technologies you use daily. Its efficiency in representing data makes it a cornerstone of many formats and protocols where size and speed are paramount.

  • Protocol Buffers (Protobuf): Google's high-performance data serialization format uses a variant of VLQ called Varints to encode integers. This is a key reason why Protobuf messages are so compact and fast to parse compared to text-based formats like JSON or XML.
  • MIDI File Format: The Standard MIDI File (SMF) format uses VLQ to encode timestamps (delta-times) between musical events. Since most events happen close together, the delta-times are small, making VLQ a perfect fit.
  • Web Development Source Maps: The source maps that link minified JavaScript back to the original source code use VLQ to encode line and column number mappings in a highly compressed format.
  • Embedded Systems & IoT: In resource-constrained environments where every byte of RAM, flash storage, or network bandwidth is precious, VLQ is an invaluable tool for efficient data representation.

Understanding and implementing VLQ in a low-level language like Arm64-assembly gives you a profound appreciation for the data optimization techniques that power modern, high-performance software.


How Does VLQ Encoding Work? A Step-by-Step Breakdown

Encoding an integer into a VLQ byte sequence is a straightforward process of breaking the number down into 7-bit chunks, starting from the least significant bits. Let's walk through the algorithm with an example: encoding the number 137 (binary 10001001).

The Algorithm:

  1. Start with your integer.
  2. Take the lowest 7 bits of the number. This forms the payload of your first (and potentially only) byte.
  3. Right-shift the original number by 7 bits to discard the bits you just processed.
  4. Check if the remaining number is zero.
    • If it is zero, you are done with this number. The byte you just created is the final byte. Its continuation bit (MSB) should be 0.
    • If it is not zero, more bytes are needed. The byte you just created needs its continuation bit (MSB) set to 1. Return to step 2 with the shifted number.

The resulting bytes are typically ordered from least significant chunk to most significant chunk.

Example: Encoding 137 (0b10001001)

    ● Start with Integer: 137 (0b10001001)
    │
    ▼
  ┌────────────────────────┐
  │ Is 137 > 0? Yes. Loop. │
  └──────────┬───────────┘
             │
             ▼
    ◆ Extract lowest 7 bits
      137 & 0x7F  →  9 (0b0001001)
             │
             ▼
    ◆ Shift number right by 7
      137 >> 7  →  1 (0b1)
             │
             ▼
  ┌────────────────────────┐
  │ Is remaining (1) > 0?  │
  │ Yes. Set continuation. │
  └──────────┬───────────┘
             │
             ▼
    ● Byte 1: 0x80 | 9  →  0x89 (0b10001001)
             │
             ├─────────────────┐
             │                 │
             ▼                 ▼
  ┌────────────────────────┐   [Output Stream]
  │ Is 1 > 0? Yes. Loop.   │   [0x89]
  └──────────┬───────────┘
             │
             ▼
    ◆ Extract lowest 7 bits
      1 & 0x7F  →  1 (0b1)
             │
             ▼
    ◆ Shift number right by 7
      1 >> 7  →  0 (0b0)
             │
             ▼
  ┌────────────────────────┐
  │ Is remaining (0) == 0? │
  │ Yes. Final byte.       │
  └──────────┬───────────┘
             │
             ▼
    ● Byte 2: 0x00 | 1  →  0x01 (0b00000001)
             │
             ├─────────────────┐
             │                 │
             ▼                 ▼
    ● End                    [Output Stream]
                             [0x89, 0x01]

So, the integer 137 is encoded as the byte sequence 0x89, 0x01.


Implementing VLQ Encoding in Arm64-assembly

Now, let's translate this logic into efficient Arm64-assembly code. This implementation is part of an exclusive kodikra.com module designed to sharpen your low-level programming skills. The function will take an array of 32-bit integers and write the encoded byte stream to an output buffer.

Our function signature will be: vlq_encode(uint32_t* input, size_t input_len, uint8_t* output, size_t* output_len). According to the Arm64 Procedure Call Standard (AAPCS64):

  • x0: Pointer to the input array (input).
  • x1: Number of integers in the input array (input_len).
  • x2: Pointer to the output buffer (output).
  • x3: Pointer to a variable to store the output length (output_len).

.global encode_vlq

// C signature: int encode_vlq(const uint32_t* values, size_t values_length, uint8_t* result)
// x0: values (pointer to input array of u32)
// x1: values_length (number of integers to encode)
// x2: result (pointer to output buffer)
// Returns in x0: number of bytes written to the result buffer.

encode_vlq:
    // Standard function prologue
    stp     x29, x30, [sp, #-32]!
    mov     x29, sp

    // Save callee-saved registers we will use
    stp     x19, x20, [sp, #16]
    mov     x19, x0           // x19 = current read pointer for values
    mov     x20, x2           // x20 = current write pointer for result
    mov     x9, x1            // x9 = counter for values_length

    // Check if there's anything to encode
    cbz     x9, .L_encode_exit

.L_encode_outer_loop:
    // Load the next 32-bit integer from the input array
    ldr     w10, [x19], #4    // w10 = current_value, post-increment pointer

.L_encode_inner_loop:
    // Extract the lowest 7 bits
    and     w11, w10, #0x7f   // w11 = 7-bit chunk

    // Shift the original value to prepare for the next iteration
    lsr     w10, w10, #7      // current_value >>= 7

    // Check if there are more chunks to process
    cbnz    w10, .L_set_continuation_bit

    // This is the last byte for this integer. Store it as is.
    strb    w11, [x20], #1    // Store the final chunk, post-increment write pointer
    b       .L_check_next_value

.L_set_continuation_bit:
    // Not the last byte, so set the continuation bit (MSB)
    orr     w11, w11, #0x80   // chunk |= 0x80
    strb    w11, [x20], #1    // Store the chunk, post-increment write pointer
    b       .L_encode_inner_loop

.L_check_next_value:
    // Decrement the outer loop counter and check if we are done
    subs    x9, x9, #1
    bne     .L_encode_outer_loop

.L_encode_exit:
    // Calculate the total number of bytes written
    sub     x0, x20, x2       // return value = final_write_ptr - initial_write_ptr

    // Restore callee-saved registers
    ldp     x19, x20, [sp, #16]
    
    // Standard function epilogue
    ldp     x29, x30, [sp], #32
    ret

Code Walkthrough

  1. Prologue & Setup: The function begins by saving the frame pointer (x29) and link register (x30) to the stack, adhering to the AAPCS64 standard. It also saves callee-saved registers (x19, x20) that will be used to hold pointers. We initialize our read pointer (x19), write pointer (x20), and loop counter (x9).
  2. Outer Loop (.L_encode_outer_loop): This loop iterates through each 32-bit integer in the input array. The instruction ldr w10, [x19], #4 loads a 32-bit value into register w10 and simultaneously increments the read pointer x19 by 4 bytes.
  3. Inner Loop (.L_encode_inner_loop): This is the core of the VLQ logic.
    • and w11, w10, #0x7f: We mask the current value in w10 with 0x7F (binary 01111111) to isolate the lowest 7 bits. The result is stored in w11.
    • lsr w10, w10, #7: We perform a logical shift right by 7 on the original value, effectively discarding the bits we just processed and preparing for the next iteration.
    • cbnz w10, .L_set_continuation_bit: We check if the value in w10 is now non-zero. If it is, it means there's more data to encode, and we must jump to set the continuation bit.
    • If w10 is zero, it's the last chunk. We store the 7-bit payload from w11 directly using strb w11, [x20], #1 and branch to check for the next integer.
  4. Continuation Bit (.L_set_continuation_bit): If more bytes are needed, orr w11, w11, #0x80 sets the most significant bit of our 7-bit chunk. We then store this byte and loop back to the inner loop to process the rest of the number.
  5. Exit & Return Value: Once all integers are processed, sub x0, x20, x2 calculates the total number of bytes written by subtracting the initial write pointer from the final one. This result is placed in x0, the return register. Finally, we restore the saved registers and return.

How Does VLQ Decoding Work? The Reverse Process

Decoding a VLQ byte stream back into an integer is the inverse operation. You read bytes one by one, accumulating the 7-bit payloads until you encounter a byte with its continuation bit set to 0.

The Algorithm:

  1. Initialize a result variable to 0.
  2. Read one byte from the input stream.
  3. Extract the 7-bit payload by masking the byte with 0x7F.
  4. Shift the payload left by an appropriate amount and add it to your result. For the first byte, you shift by 0; for the second, by 7; for the third, by 14, and so on.
  5. Check the continuation bit (MSB) of the byte you just read.
    • If it is 1, loop back to step 2 to read the next byte.
    • If it is 0, this was the final byte. The accumulated value in your result variable is the final decoded integer.
  6. Handle potential errors, such as an incomplete sequence (running out of bytes before finding a final byte) or an overflow (the decoded number exceeds the capacity of the target integer type).

Decoding Flow Diagram

    ● Start
    │
    ▼
  ┌────────────────────────┐
  │ Initialize result = 0  │
  │ Initialize shift = 0   │
  └──────────┬───────────┘
             │
             ▼
.L_loop:
    ◆ Read byte from stream
             │
             ▼
    ◆ Extract payload (data)
      data = byte & 0x7F
             │
             ▼
    ◆ Shift and accumulate
      result |= (data << shift)
             │
             ▼
    ◆ Increment shift
      shift += 7
             │
             ▼
  ┌────────────────────────┐
  │ Check continuation bit │
  │ (byte & 0x80) == 0?    │
  └──────────┬───────────┘
             │
        No (bit is 1)
             ├──────────→ goto .L_loop
             │
        Yes (bit is 0)
             │
             ▼
    ● End (result holds the integer)

Implementing VLQ Decoding in Arm64-assembly

The decoding function is slightly more complex due to the need to manage the bit-shifting and accumulation. This function will read a VLQ byte stream and write the decoded 32-bit integers to an output array.

Our function signature will be: vlq_decode(uint8_t* input, size_t input_len, uint32_t* output, size_t* output_len). According to AAPCS64:

  • x0: Pointer to the input buffer (input).
  • x1: Length of the input buffer (input_len).
  • x2: Pointer to the output array (output).
  • x3: Pointer to a variable to store the number of decoded values.
  • Return Value (x0): Status code (0 for success, -1 for error).

.global decode_vlq

// C signature: int decode_vlq(const uint8_t* input, size_t input_length, uint32_t* result, size_t* result_length)
// x0: input (pointer to VLQ byte stream)
// x1: input_length
// x2: result (pointer to output array for u32)
// x3: result_length (pointer to a size_t to store decoded count)
// Returns in x0: 0 on success, -1 on error (incomplete sequence or overflow).

decode_vlq:
    // Standard function prologue
    stp     x29, x30, [sp, #-48]!
    mov     x29, sp
    stp     x19, x20, [sp, #16]
    stp     x21, x22, [sp, #32]

    // Setup registers
    mov     x19, x0           // x19 = current read pointer for input
    mov     x20, x2           // x20 = current write pointer for result
    add     x21, x0, x1       // x21 = end of input buffer (input + input_length)
    mov     x22, #0           // x22 = count of decoded numbers

.L_decode_outer_loop:
    // Check if we are at the end of the input buffer
    cmp     x19, x21
    b.eq    .L_decode_success // If no bytes left, we're done

    // Initialize for decoding one integer
    mov     w10, #0           // w10 = current decoded value (result)
    mov     w11, #0           // w11 = current shift amount

.L_decode_inner_loop:
    // Check for end of buffer before reading (guards against incomplete sequences)
    cmp     x19, x21
    b.eq    .L_error_incomplete

    // Load one byte from the stream
    ldrb    w12, [x19], #1    // w12 = current_byte, post-increment pointer

    // Extract 7-bit payload
    and     w13, w12, #0x7f   // w13 = payload

    // Check for potential overflow before shifting. 5 bytes * 7 bits = 35 bits > 32.
    // A simple check: if shift is already 28, and the new payload has bits
    // that would be shifted out of a 32-bit register, it's an overflow.
    cmp     w11, #28
    b.ne    .L_no_overflow_check
    tst     w13, #0b1111000   // Check if any of the top 4 bits are set
    b.ne    .L_error_overflow
.L_no_overflow_check:

    // Shift payload into position and add to the result
    lsl     w13, w13, w11     // payload <<= shift
    orr     w10, w10, w13     // result |= payload

    // Check the continuation bit
    tst     w12, #0x80
    b.eq    .L_value_decoded  // If MSB is 0, we are done with this number

    // MSB is 1, so continue. Increment shift amount.
    add     w11, w11, #7
    b       .L_decode_inner_loop

.L_value_decoded:
    // Store the fully decoded 32-bit integer
    str     w10, [x20], #4    // Store result, post-increment write pointer
    add     x22, x22, #1      // Increment decoded count
    b       .L_decode_outer_loop

.L_decode_success:
    str     x22, [x3]         // Store the final count in *result_length
    mov     x0, #0            // Return 0 for success
    b       .L_decode_exit

.L_error_incomplete:
    // Reached end of input mid-sequence
    mov     x0, #-1           // Return -1 for error
    b       .L_decode_exit

.L_error_overflow:
    // Value would be too large for a u32
    mov     x0, #-1           // Return -1 for error
    // Fallthrough to exit

.L_decode_exit:
    // Restore callee-saved registers
    ldp     x21, x22, [sp, #32]
    ldp     x19, x20, [sp, #16]
    
    // Standard function epilogue
    ldp     x29, x30, [sp], #48
    ret

Code Walkthrough

  1. Setup: We set up pointers for reading (x19), writing (x20), and an end-of-buffer marker (x21) for bounds checking. x22 will count the number of integers we successfully decode.
  2. Outer Loop (.L_decode_outer_loop): This loop controls the decoding of multiple integers from the stream. It first checks if we've consumed the entire input buffer.
  3. Inner Loop Init: Before decoding a new number, we reset the result register (w10) and the shift amount (w11) to zero.
  4. Inner Loop (.L_decode_inner_loop): This is the core decoding logic.
    • Bounds Check: Critically, we check cmp x19, x21 *before* loading a byte. This prevents reading past the end of the buffer if a sequence is malformed (i.e., unterminated).
    • ldrb w12, [x19], #1: Load a byte into w12 and advance the read pointer.
    • and w13, w12, #0x7f: Isolate the 7-bit payload.
    • Overflow Check: This is a crucial safety feature. A 32-bit integer can be encoded in at most 5 VLQ bytes (5 * 7 = 35 bits of data). If we are about to process the 5th byte (shift amount is 28), we check if the incoming payload would cause bits to be shifted beyond the 32-bit limit. If so, we flag an overflow error.
    • lsl w13, w13, w11: Shift the payload left by the current shift amount.
    • orr w10, w10, w13: Merge the shifted payload into our result register.
    • tst w12, #0x80: Test the continuation bit. If it's zero, the number is complete, and we branch to .L_value_decoded.
    • If the bit is one, we add 7 to our shift amount (w11) and loop back to read the next byte.
  5. Storing the Value (.L_value_decoded): Once a number is fully assembled in w10, we store it to the output array and increment our decoded-value counter.
  6. Success and Error Handling: On a clean exit, we store the final count and return 0. If an incomplete sequence or overflow is detected, we return -1 to signal failure.

When to Use VLQ: Pros, Cons, and Risks

VLQ is a powerful tool, but it's not a silver bullet. Understanding its trade-offs is key to using it effectively.

Aspect Pros (Advantages) Cons & Risks
Space Efficiency Excellent for data streams where small integer values are frequent. Significant space savings over fixed-width integers. For uniformly distributed or consistently large numbers, VLQ can incur a slight overhead (e.g., a 32-bit value might take 5 bytes instead of 4).
Performance Encoding and decoding are computationally cheap, involving only simple bitwise operations and shifts. Slower than direct memory access for fixed-width integers. You cannot randomly access the Nth number in a VLQ stream; you must decode from the beginning.
Flexibility Can represent integers of any size, not limited to 32 or 64 bits. This future-proofs data formats. The variable length complicates buffer management and requires careful parsing to avoid errors.
Robustness The algorithm is simple and well-understood. Highly susceptible to data corruption. A single flipped continuation bit can merge two numbers or truncate one, desynchronizing the rest of the stream. Requires robust error handling (e.g., overflow and bounds checks).

Frequently Asked Questions (FAQ)

Is VLQ the same as LEB128?

They are nearly identical in concept. LEB128 (Little-Endian Base 128) is the name used for this encoding scheme in the DWARF debugging format and WebAssembly. The core mechanism of using a continuation bit and 7-bit payloads is the same. VLQ is the term more commonly used in the context of MIDI files.

What is the maximum value that can be encoded with VLQ in our implementation?

Our Arm64-assembly implementation decodes into a 32-bit unsigned integer (uint32_t). Therefore, the maximum value it can correctly represent is 2^32 - 1, or 4,294,967,295. A 64-bit value would require a maximum of 10 bytes to encode, and our registers and logic would need to be adjusted (using x registers instead of w registers for the result).

How does VLQ handle signed integers?

Standard VLQ does not directly encode a sign bit. To handle signed integers efficiently, it's often paired with another technique called ZigZag encoding. ZigZag encoding maps signed integers to unsigned integers in a way that small negative numbers become small positive numbers (e.g., -1 becomes 1, 1 becomes 2, -2 becomes 3, etc.). This re-ordered set of unsigned integers can then be efficiently compressed with VLQ.

Why implement this in Arm64-assembly instead of a higher-level language?

While you would typically use a library implementation in languages like C++, Rust, or Go, building it from scratch in assembly provides unparalleled insight. You learn about register allocation, memory access, bitwise manipulation, and function calling conventions at the hardware level. This deep understanding is invaluable for performance-critical programming and debugging. This exercise is a core part of the kodikra.com Arm64-assembly learning path for this very reason.

What happens if a VLQ sequence is malformed or incomplete?

A robust decoder must handle this. Our implementation includes two key checks: 1) It verifies it hasn't reached the end of the input buffer in the middle of decoding a number (incomplete sequence error). 2) It checks for values that would overflow a 32-bit integer. Without these checks, a decoder could enter an infinite loop, read out of bounds memory, or produce garbage data.

Is VLQ big-endian or little-endian?

This question is a bit of a category error, but a good one. Endianness typically refers to the byte order of a multi-byte primitive type in memory. VLQ is a byte *stream* format. However, the 7-bit chunks that make up the integer are ordered from least significant to most significant. In that sense, it is philosophically "little-endian."


Conclusion

Variable Length Quantity is more than just a clever bit-twiddling hack; it's a fundamental data compression technique that underpins many of the efficient systems we rely on. By stripping away the abstractions and implementing it directly in Arm64-assembly, you gain a practical, deep understanding of how data is manipulated at the processor level.

You've seen how to break down an integer into 7-bit chunks for encoding and how to carefully reassemble them during decoding, complete with essential error handling for overflow and malformed sequences. This knowledge is not just academic—it's a practical skill that applies to network protocol design, file format creation, and any domain where data density is a primary concern.

To continue your journey into low-level mastery, explore the complete Arm64-assembly learning path on kodikra.com, where you can tackle more challenges that build on these foundational concepts. For a broader view, you can also see our complete guide to Arm64-assembly fundamentals.

Disclaimer: The Arm64-assembly code provided is compatible with the AAPCS64 calling convention used on systems like Linux and macOS on ARM. The specific assembly syntax is suitable for GNU AS (GAS). Technology and conventions are current as of the time of writing.


Published by Kodikra — Your trusted Arm64-assembly learning resource.