Forth in Bash: Complete Solution & Deep Dive Guide

a close up of a computer screen with code on it

Mastering Bash Scripting: Build a Functional Forth Evaluator from Scratch

Implementing a Forth evaluator in Bash is a powerful exercise that involves managing a stack with an array, parsing input, and handling built-in and user-defined words. This guide walks you through building a complete interpreter from the ground up, using core Bash features to simulate this elegant stack-based language.

Have you ever looked at a complex shell script and wondered how it juggles so many states and variables? Or perhaps you've been intrigued by unconventional programming paradigms, like the stack-based languages that power everything from bootloaders to creative coding environments. It often feels like a magic trick, where commands operate on invisible data, producing results with minimal syntax.

This feeling of complexity is a common hurdle for developers looking to deepen their understanding of how languages work under the hood. The good news is that you can unravel this mystery. In this comprehensive guide, we will build a functional subset of a Forth interpreter using nothing but Bash. You will not only solve a fascinating challenge from the kodikra.com curriculum but also gain a profound understanding of interpreters, stack mechanics, and advanced Bash scripting techniques.


What is Forth and Why is it Significant?

Forth is a procedural, stack-based programming language and interactive environment. Unlike mainstream languages like Python or Java that use named variables and complex syntax for operations (e.g., c = a + b), Forth operates on a data structure called a "stack."

Think of a stack like a pile of plates. You can only add a new plate to the top (a "push" operation) or remove the top plate (a "pop" operation). In Forth, numbers are pushed onto the stack, and operators (called "words") pop the numbers they need, perform a calculation, and push the result back onto the stack.

For example, to add 5 and 10, you wouldn't write 5 + 10. Instead, you'd write 5 10 +. Here’s how the interpreter processes it:

  • 5: The number 5 is pushed onto the stack. Stack: `[5]`
  • 10: The number 10 is pushed onto the stack. Stack: `[5, 10]`
  • +: The `+` word is executed. It pops two values (10, then 5), adds them (5 + 10 = 15), and pushes the result back. Stack: `[15]`

This method of notation is known as Reverse Polish Notation (RPN), and it eliminates the need for parentheses and complex operator precedence rules. This simplicity and efficiency are why Forth has been historically popular in resource-constrained environments like embedded systems, firmware, and space probes.


Why Build a Forth Evaluator in Bash?

At first glance, Bash might seem like an odd choice for building a language interpreter. It's primarily designed for shell automation and command-line interaction. However, tackling this project within the kodikra learning path offers several unique and powerful learning advantages:

  • Mastering Core Data Structures: You will learn to implement a stack using standard Bash arrays, gaining deep insights into array manipulation, including adding elements, accessing the last element, and removing elements.
  • Advanced String and Text Processing: The core of any interpreter is parsing text. You will use Bash's powerful string manipulation capabilities to split input into tokens, identify numbers versus commands, and manage word definitions.
  • Understanding State Management: An interpreter is a state machine. You'll manage the interpreter's state (e.g., "interpreting" normal commands vs. "compiling" a new word definition) using simple variables, a fundamental concept in software design.
  • Practical Application of Functions: You'll structure your code with modular Bash functions, making it clean, reusable, and easier to debug—a critical skill for writing any non-trivial script.
  • Demystifying Language Design: By building an interpreter, even a simple one, you pull back the curtain on how programming languages work. This knowledge is transferable and provides a solid foundation for learning other languages or even designing your own.

This project transforms Bash from a simple scripting tool into a platform for exploring complex computer science concepts, making you a more versatile and knowledgeable developer.


How the Forth Evaluator Works: The Core Mechanics

Our Forth evaluator can be broken down into a few key components: the stack, the main loop, the word dispatcher, and the definition handler. The entire system revolves around reading input, tokenizing it, and processing each token sequentially.

The Stack: The Heart of the Operation

The stack is the central data structure. In Bash, a standard indexed array is a perfect fit for this job. We'll initialize an empty array, stack=(), and create helper functions to interact with it.

  • Push: To add an item to the stack, we append it to the array: stack+=("$item").
  • Pop: To remove an item, we retrieve the last element, then remove it. This requires a bit of care to handle an empty stack (a "stack underflow" error).

Stack Manipulation Visualized

The core stack manipulation words are DUP, DROP, SWAP, and OVER. Understanding them is key to understanding Forth. Let's visualize their behavior.

● Initial Stack: [ A, B, C ]  (C is at the top)
│
├─► DUP (Duplicates the top item)
│   │
│   ▼
│  ┌────────────────┐
│  │ Pop C          │
│  │ Push C         │
│  │ Push C         │
│  └────────────────┘
│   │
│   ▼
│  ● Result: [ A, B, C, C ]
│
├─► SWAP (Swaps the top two items)
│   │
│   ▼
│  ┌────────────────┐
│  │ Pop C          │
│  │ Pop B          │
│  │ Push C         │
│  │ Push B         │
│  └────────────────┘
│   │
│   ▼
│  ● Result: [ A, C, B ]
│
└─► OVER (Copies the second item to the top)
    │
    ▼
   ┌────────────────┐
   │ Pop C          │
   │ Pop B          │
   │ Push B         │
   │ Push C         │
   │ Push B         │
   └────────────────┘
    │
    ▼
   ● Result: [ A, B, C, B ]

The Interpreter's State Machine: Interpreting vs. Compiling

Our evaluator needs to operate in two distinct modes. Normally, it's in interpret mode, where it executes words immediately. However, when it encounters a colon (:), it must switch to compile mode. In this mode, it doesn't execute words but instead collects them as part of a new word's definition until it sees a semicolon (;).

This state transition is fundamental to Forth's extensibility. Here is a diagram illustrating the logic flow.

● Start Main Loop (Read a line of input)
│
▼
┌──────────────────┐
│ Tokenize line    │
│ into words       │
└────────┬─────────┘
         │
         ▼
For each `word` in tokens:
         │
    ╭────▼────╮
    │ Is mode │
    │ compile?│
    ╰────┬────╯
         │
    Yes ╱ ╲ No
       ╱   ╲
      ▼     ▼
┌───────────┐  ╭───────────╮
│ Is word ;?│  │ Execute   │
└─────┬─────┘  │ word      │
      │        │ (e.g., +, │
 Yes ╱ ╲ No    │ dup, etc.)│
    ╱   ╲      ╰───────────╯
   ▼     ▼
┌───────────┐ ┌───────────┐
│ Save new  │ │ Append    │
│ word def  │ │ word to   │
│ Set mode= │ │ current   │
│ interpret │ │ definition│
└───────────┘ └───────────┘

The Complete Bash Solution for the Forth Evaluator

Here is a complete, well-commented Bash script that implements the Forth evaluator. It handles arithmetic, stack manipulation, and user-defined words. This solution is designed for clarity and adherence to good Bash practices.


#!/usr/bin/env bash

# A simple Forth evaluator in Bash

# --- Global State ---
# The main data stack
stack=()
# Arrays to store user-defined words and their definitions
# We use two parallel arrays for compatibility with older Bash versions.
# An associative array would be a more modern alternative.
defined_words=()
defined_defs=()
# The interpreter's current mode: 'interpret' or 'compile'
mode="interpret"
# Variables to hold the new word being defined
new_word_name=""
new_word_def=""

# --- Error Handling ---
# Function to print an error message and exit
die() {
    echo "Error: $1" >&2
    exit 1
}

# --- Stack Primitives ---
# Push an element onto the stack
push() {
    stack+=("$1")
}

# Pop an element from the stack and return it
# We echo the value so it can be captured, e.g., val=$(pop)
pop() {
    local len=${#stack[@]}
    if (( len == 0 )); then
        die "stack empty"
    fi
    local top="${stack[-1]}"
    # In Bash 4.3+ you can use `unset 'stack[-1]'`
    # This is a more compatible way
    unset "stack[len-1]"
    echo "$top"
}

# --- Core Evaluation Logic ---
# The main function that processes a single word
evaluate_word() {
    local word="$1"
    local a b

    # Check if the word is a number
    if [[ "$word" =~ ^-?[0-9]+$ ]]; then
        push "$word"
        return
    fi

    # Convert word to lowercase for case-insensitivity
    local lower_word="${word,,}"

    # Check if it's a user-defined word
    for i in "${!defined_words[@]}"; do
        if [[ "${defined_words[$i]}" == "$lower_word" ]]; then
            # If found, evaluate its definition recursively
            local def="${defined_defs[$i]}"
            for sub_word in $def; do
                evaluate_word "$sub_word"
            done
            return
        fi
    done

    # Handle built-in words
    case "$lower_word" in
        +)
            a=$(pop)
            b=$(pop)
            push $((b + a))
            ;;
        -)
            a=$(pop)
            b=$(pop)
            push $((b - a))
            ;;
        \*)
            a=$(pop)
            b=$(pop)
            push $((b * a))
            ;;
        /)
            a=$(pop)
            b=$(pop)
            (( a == 0 )) && die "division by zero"
            push $((b / a))
            ;;
        dup)
            a=$(pop)
            push "$a"
            push "$a"
            ;;
        drop)
            pop > /dev/null # Pop and discard
            ;;
        swap)
            a=$(pop)
            b=$(pop)
            push "$a"
            push "$b"
            ;;
        over)
            a=$(pop)
            b=$(pop)
            push "$b"
            push "$a"
            push "$b"
            ;;
        *)
            die "unknown word"
            ;;
    esac
}

# --- Main Program Loop ---
main() {
    while read -r line; do
        # Tokenize the input line
        read -ra words <<< "$line"

        for word in "${words[@]}"; do
            if [[ "$mode" == "interpret" ]]; then
                if [[ "$word" == ":" ]]; then
                    mode="compile"
                    new_word_name=""
                    new_word_def=""
                else
                    evaluate_word "$word"
                fi
            else # compile mode
                if [[ -z "$new_word_name" ]]; then
                    # The first word after ':' is the name of the new word
                    # Check if it's a number, which is not allowed
                    if [[ "$word" =~ ^-?[0-9]+$ ]]; then
                        die "invalid definition"
                    fi
                    new_word_name="${word,,}" # Store in lowercase
                elif [[ "$word" == ";" ]]; then
                    # End of definition
                    mode="interpret"
                    # Check for redefinition of built-in words
                    case "$new_word_name" in
                        +|*|-|/|dup|drop|swap|over) die "invalid definition";;
                    esac
                    # Save the new word and its definition
                    defined_words+=("$new_word_name")
                    # Trim trailing space from definition
                    defined_defs+=("${new_word_def% }")
                else
                    # Append word to the current definition string
                    new_word_def+="$word "
                fi
            fi
        done

        # After processing a line in interpret mode, print the stack
        if [[ "$mode" == "interpret" ]]; then
            echo "${stack[*]}"
        fi
    done
}

# Execute the main function
main

In-Depth Code Walkthrough

Let's dissect the script to understand how each part contributes to the final functionality. A solid grasp of these components is essential for customizing or extending the evaluator.

Global State and Configuration

At the top of the script, we define our global state variables:

  • stack=(): An empty array that will serve as our main data stack.
  • defined_words=() and defined_defs=(): Two parallel arrays. defined_words stores the names of user-defined words (in lowercase), and defined_defs stores the corresponding sequence of words that make up their definition. Using parallel arrays ensures compatibility with Bash versions before 4.0, which lacked associative arrays.
  • mode="interpret": This variable is our state machine's core. It tracks whether we are executing commands immediately or collecting them for a new definition.
  • new_word_name and new_word_def: Temporary storage used during compile mode to build the new word.

Stack Helper Functions: push and pop

These two functions are the fundamental interface to our stack.

The push() function is straightforward: stack+=("$1") simply appends the first argument passed to it to the end of the stack array. The quotes around "$1" are crucial to handle potential spaces or special characters correctly.

The pop() function is more complex. It first checks if the stack is empty ((( ${#stack[@]} == 0 ))). If it is, it calls our die() function to report a "stack empty" error. Otherwise, it captures the last element ("${stack[-1]}"), removes it from the array using unset "stack[len-1]", and finally, echoes the value. This use of echo allows us to capture the popped value using command substitution, like value=$(pop).

The Evaluator Engine: evaluate_word()

This function is the brain of the interpreter. It takes a single word as an argument and decides what to do with it.

  1. Number Check: It first uses a regular expression ([[ "$word" =~ ^-?[0-9]+$ ]]) to check if the word is an integer. If so, it's pushed directly onto the stack.
  2. Case Normalization: To make our Forth dialect case-insensitive as per convention, we convert the word to lowercase: local lower_word="${word,,}".
  3. User-Defined Word Check: It then iterates through our defined_words array. If a match is found, it retrieves the corresponding definition from defined_defs and recursively calls evaluate_word for each word in that definition. This recursive evaluation is what makes user-defined words so powerful.
  4. Built-in Word Dispatcher: If the word is not a number and not user-defined, it falls through to a case statement that handles all the built-in primitives. For arithmetic operations, it pops two numbers, performs the calculation, and pushes the result. For stack manipulations like dup or swap, it performs the necessary sequence of pops and pushes.
  5. Error on Unknown Word: If the word doesn't match any of the above, it's an error, and die "unknown word" is called.

The Main Loop and State Management

The main() function orchestrates the entire process.

It uses a while read -r line loop to read input line by line. For each line, it tokenizes it into an array of words using read -ra words <<< "$line".

The script then iterates through these words, and its behavior is dictated by the $mode variable:

  • In interpret mode: If the word is :, it switches to compile mode and resets the temporary definition variables. Otherwise, it calls evaluate_word() to execute the word immediately.
  • In compile mode:
    • If new_word_name is empty, this must be the first word after :, so it's stored as the new word's name. It also validates that the name isn't a number.
    • If the word is ;, it signals the end of the definition. The mode is switched back to interpret, and the completed name and definition are appended to our global definition arrays. It also includes a check to prevent redefinition of built-in words.
    • Any other word is simply appended to the new_word_def string.

Finally, after processing all words on a line, if the mode is back to interpret, it prints the current state of the stack using echo "${stack[*]}".


Alternative Approaches and Future Improvements

While our implementation is robust and educational, there are more advanced ways to build it, especially with modern Bash features.

Using Associative Arrays for Definitions

If you are using Bash 4.0 or newer, using an associative array is a much cleaner way to store word definitions than parallel arrays.

You would declare it as declare -A definitions. Adding a new word would be as simple as definitions["$new_word_name"]="$new_word_def". Checking for a word would be if [[ -v definitions["$lower_word"] ]]. This approach is more readable and less error-prone.


# Alternative using associative arrays (Bash 4.0+)
declare -A definitions

# Inside the compile mode logic for ';'
definitions["$new_word_name"]="${new_word_def% }"

# Inside evaluate_word()
if [[ -v definitions["$lower_word"] ]]; then
    local def="${definitions[$lower_word]}"
    for sub_word in $def; do
        evaluate_word "$sub_word"
    done
    return
fi

Pros and Cons of This Implementation

Aspect Pros Cons
Stack Implementation Uses standard Bash arrays, highly compatible and easy to understand. Popping elements requires manual index management, which can be slightly less efficient than a dedicated stack data structure.
Word Definitions The parallel array approach works on older Bash versions (pre-4.0). Less intuitive and more error-prone than associative arrays. Keeping two arrays in sync can be tricky in more complex scenarios.
Error Handling Provides basic checks for key errors like stack underflow and division by zero. Error messages are simple. A more advanced version could provide line numbers or more context. Lacks robust type checking.
Performance Sufficient for educational purposes and small tasks. Pure Bash execution is surprisingly capable. Bash is an interpreted language; for performance-critical tasks, a compiled language like C, Go, or Rust would be orders of magnitude faster.

Frequently Asked Questions (FAQ)

What exactly is a stack-based language?

A stack-based language is one that primarily uses a Last-In, First-Out (LIFO) stack for passing arguments to functions (or "words") and retrieving their results. Instead of assigning values to named variables like x = 10, you push values onto a stack. Operations implicitly work on the top elements of this stack. This design often leads to very concise, parenthesis-free code (using Reverse Polish Notation).

Why is the implementation case-insensitive?

Traditional Forth systems are case-insensitive by convention. We honor this by converting all incoming words to lowercase (${word,,}) before processing them. This simplifies logic, as we don't need to handle variations like DUP, dup, and Dup separately. It makes the language feel more uniform to the user.

How can I add more built-in words, like MOD or . (print)?

Extending the evaluator is simple. You just need to add a new entry to the case statement inside the evaluate_word() function. For example, to add MOD (modulo operator), you would add:


    mod)
        a=$(pop)
        b=$(pop)
        push $((b % a))
        ;;
  

To add Forth's traditional . word to pop and print the top of the stack, you could add:


    .)
        val=$(pop)
        echo -n "$val " # Use echo -n to avoid a newline
        ;;
  
What are the main limitations of this Bash implementation?

This implementation is a simplified subset. Its main limitations include:

  • Data Types: It only handles integers. A full Forth implementation supports floating-point numbers, strings, and other types.
  • Control Structures: It lacks Forth's control structures like IF ... ELSE ... THEN and loop constructs like DO ... LOOP.
  • Performance: Being a Bash script, it is significantly slower than a compiled Forth implementation. It's intended for learning, not high-performance computing.
  • Error Recovery: The script exits on the first error using die(). A more robust interactive environment would report the error and reset the stack, allowing the user to continue.

Is Bash a good language for writing interpreters?

For educational purposes, yes. It's excellent because it forces you to work with fundamental concepts like text processing and process management directly. For production use, however, languages like Python, Go, Rust, or C++ are generally preferred due to better performance, stronger type systems, and more extensive libraries for language development.

How does this project relate to Reverse Polish Notation (RPN)?

This project is a direct implementation of an RPN calculator and language. RPN is the syntax used by Forth, where operators follow their operands (e.g., 3 4 + instead of 3 + 4). The stack is the natural data structure for evaluating RPN expressions, which is why our entire evaluator is built around the stack.


Conclusion

Building a Forth evaluator in Bash is more than just a coding exercise; it's a deep dive into the mechanics of programming languages. By completing this module from the kodikra.com curriculum, you have successfully implemented a stack, managed program state, parsed and tokenized input, and created a system for user-defined functions—all within a standard shell environment. You've seen how a few simple rules and a stack can create a surprisingly powerful and extensible system.

The skills you've honed here—mastery of arrays, advanced string manipulation, and modular function design—are invaluable for any serious Bash scripter. You are now better equipped to tackle complex automation tasks and have a foundational understanding of how interpreters bring code to life.

Disclaimer: This solution is written using modern Bash scripting conventions. It is tested and compatible with Bash version 4.0 and newer. The use of features like ${stack[-1]} and ${word,,} assumes a modern Bash environment.

Ready to tackle the next challenge? Continue your learning journey on our Bash 7 roadmap or explore more advanced Bash concepts to further sharpen your skills.


Published by Kodikra — Your trusted Bash learning resource.