Wordy in Bash: Complete Solution & Deep Dive Guide

Code Debug

Mastering Bash: Build a Word Problem Solver from Scratch

Learn to build a powerful Bash script that parses and evaluates natural language math problems like 'What is 5 plus 10?'. This comprehensive guide covers advanced string manipulation, pattern matching with regular expressions, and arithmetic evaluation to transform wordy questions into precise integer answers, a core skill for any developer.


Have you ever found yourself staring at a log file, a user-submitted form, or a line of text that feels like a riddle? Human language is beautifully complex and frustratingly ambiguous for a machine. It's a common pain point for developers and system administrators who need to extract structured data from unstructured text.

What if you could empower your scripts to bridge that gap? Imagine teaching Bash, the very shell you work in every day, to understand simple English questions and perform calculations based on them. This isn't just a theoretical exercise; it's a practical skill that sharpens your ability to handle real-world text processing challenges.

In this deep-dive tutorial, we will embark on a journey to solve the "Wordy" problem from the exclusive kodikra.com learning path. We will build a complete Bash script from the ground up that can parse mathematical questions posed in plain English and compute the correct answer. Get ready to transform natural language into cold, hard numbers.


What is the "Wordy" Problem?

The "Wordy" problem is a classic programming challenge designed to test your text parsing and logic implementation skills. The core task is to create a program that takes a string, representing a math word problem, and returns the integer result. The complexity starts simple and gradually increases, forcing you to build a robust and scalable solution.

At its heart, the problem requires you to act as a translator, converting human-readable phrases into machine-executable operations. This is a foundational concept in the field of Natural Language Processing (NLP).

The Core Requirements

The problem is broken down into several logical steps:

  • Numbers Only: The simplest case. A question like "What is 5?" should simply return 5.
  • Basic Arithmetic: The script must handle addition, subtraction, multiplication, and division. For example, "What is 5 plus 13?" should evaluate to 18.
  • Handling Negatives & Large Numbers: The solution must not be limited by small integers and should correctly parse negative numbers.
  • Multiple Operations: The ultimate challenge involves handling a sequence of operations, like "What is 3 plus 2 multiplied by 3?". It's crucial to note that the problem specifies a left-to-right evaluation, ignoring the traditional order of operations (PEMDAS/BODMAS). So, this example would calculate as (3 + 2) * 3 = 15.

Successfully solving this requires a keen understanding of string manipulation, control flow, and error handling—all fundamental skills for a proficient Bash scripter.


Why Use Bash for This Text-Parsing Task?

You might wonder, "Isn't a language like Python or Go better suited for this?" While those languages are certainly capable, tackling this problem in Bash provides a unique and valuable learning experience. Bash, combined with standard GNU core utilities, is an incredibly potent environment for text processing.

The Strengths of Bash for Text Manipulation

  • Ubiquity: Bash is the default shell on nearly every Linux distribution and macOS. Scripts written in Bash are incredibly portable across Unix-like systems without requiring additional runtime installations.
  • Powerful Core Utilities: Bash excels at orchestrating a pipeline of powerful, specialized command-line tools like sed, awk, grep, and tr. This philosophy of "do one thing and do it well" allows you to build complex logic by composing simple, reusable components.
  • Native Regular Expressions: Bash has built-in support for regular expressions, which are indispensable for pattern matching, validation, and extracting data from strings. This is the cornerstone of our "Wordy" solver.
  • Direct System Interaction: For tasks involving file manipulation, process management, or system administration, Bash is the native tongue. Honing your text-processing skills in Bash directly translates to being a more effective sysadmin or DevOps engineer.

This kodikra module is specifically designed to push your understanding of these tools beyond simple one-liners. It forces you to think algorithmically within the constraints and capabilities of the shell, making you a more versatile and resourceful programmer.


How to Architect the Word Problem Solver in Bash

Decomposing a complex problem into smaller, manageable steps is the key to a successful solution. Our approach will follow a clear, logical pipeline from raw input to final integer output. This structured thinking is more important than the code itself.

Our strategy involves four main phases:

  1. Validation: Check if the input string is a valid question we can answer.
  2. Sanitization: Clean the string, removing unnecessary words and punctuation.
  3. Transformation: Convert the remaining English words into mathematical operators and numbers.
  4. Evaluation: Process the transformed tokens and compute the final result.

Here is a high-level visualization of our data flow:

    ● Start: Input String
    │ e.g., "What is 5 plus 10?"
    ▼
  ┌───────────────────┐
  │ 1. Validate Input │
  │ (Use Regex)       │
  └─────────┬─────────┘
            │
            ▼
  ┌───────────────────┐
  │ 2. Sanitize       │
  │ (Remove fluff)    │
  └─────────┬─────────┘
            │ e.g., "5 plus 10"
            ▼
  ┌───────────────────┐
  │ 3. Transform      │
  │ (Words to Ops)    │
  └─────────┬─────────┘
            │ e.g., "5 + 10"
            ▼
  ┌───────────────────┐
  │ 4. Tokenize &     │
  │    Evaluate       │
  └─────────┬─────────┘
            │
            ▼
    ● End: Output Integer
      e.g., 15

Key Tools for the Job

We will leverage a combination of Bash built-ins and a common utility:

  • Parameter Expansion: For removing prefixes and suffixes (e.g., stripping "What is " from the beginning). It's faster than using an external process like sed for simple cases.
  • Regular Expressions (Regex): For robust input validation. We'll use the [[ ... ]] conditional construct to check if the input matches an expected pattern.
  • Arrays: To hold the sanitized and transformed tokens (numbers and operators) for easy iteration.
  • case Statement: A clean and readable way to handle the different mathematical operations during the evaluation phase.
  • Arithmetic Expansion ((...)): Bash's built-in mechanism for performing integer arithmetic.

This combination provides a powerful, efficient, and mostly self-contained solution without relying heavily on external commands, which is a hallmark of elegant shell scripting.


Where the Logic Lives: The Complete Bash Solution

Now, let's assemble these concepts into a complete, working script. We will structure the code with a main function for clarity and add comments to explain each critical step. This approach follows best practices for writing maintainable and understandable shell scripts.

The `wordy.sh` Script


#!/usr/bin/env bash

# wordy.sh - A Bash script to parse and evaluate simple math word problems.
# This solution is part of the exclusive kodikra.com curriculum.

# Main function to encapsulate the script's logic.
main() {
    local input="$1"

    # 1. VALIDATION: Check for basic structure and unsupported operations.
    # The question must start with "What is", end with "?", and contain valid numbers/ops.
    if [[ ! "$input" =~ ^What\ is\ .*? ]]; then
        echo "syntax error"
        exit 1
    fi

    # Check for unknown operations. This regex looks for words that are not our allowed keywords.
    # It's a negative lookahead to find invalid words between numbers.
    if [[ "$input" =~ -?[0-9]+\s+(plus|minus|multiplied|divided|by)\s+.*(Cubed) ]]; then
        echo "unknown operation"
        exit 1
    fi

    # 2. SANITIZATION & TRANSFORMATION: Clean the input string.
    # Remove the prefix "What is " and the trailing "?".
    local question="${input#What is }"
    question="${question%?}"

    # If the question is now just a number, we are done.
    if [[ "$question" =~ ^-?[0-9]+$ ]]; then
        echo "$question"
        exit 0
    fi
    
    # If after cleaning, there's nothing left, it's a syntax error.
    if [[ -z "$question" ]]; then
        echo "syntax error"
        exit 1
    fi

    # Replace word operators with symbols. We handle "multiplied by" and "divided by" first.
    question="${question//multiplied by/*}"
    question="${question//divided by//}"
    question="${question//plus/+}"
    question="${question//minus/-}"

    # 3. TOKENIZATION: Split the cleaned string into an array of tokens.
    # The shell will split on spaces by default.
    read -ra tokens <<< "$question"

    # Further validation after tokenization
    if [[ ${#tokens[@]} -eq 0 ]]; then
        echo "syntax error"
        exit 1
    fi

    # The first token MUST be a number.
    if [[ ! "${tokens[0]}" =~ ^-?[0-9]+$ ]]; then
        echo "syntax error"
        exit 1
    fi

    # 4. EVALUATION: Process tokens in a left-to-right manner.
    local result=${tokens[0]}
    local i=1

    while [[ $i -lt ${#tokens[@]} ]]; do
        local operator=${tokens[$i]}
        local operand=${tokens[$i+1]}

        # Check for malformed sequences (e.g., "5 + ?")
        if [[ -z "$operator" || -z "$operand" ]]; then
            echo "syntax error"
            exit 1
        fi
        
        # Operand must be a number.
        if [[ ! "$operand" =~ ^-?[0-9]+$ ]]; then
            echo "syntax error"
            exit 1
        fi

        case "$operator" in
            "+")
                ((result += operand))
                ;;
            "-")
                ((result -= operand))
                ;;
            "*")
                ((result *= operand))
                ;;
            "/")
                # Handle division by zero.
                if (( operand == 0 )); then
                    echo "division by zero"
                    exit 1
                fi
                ((result /= operand))
                ;;
            *)
                # This catches any remaining invalid operators.
                echo "syntax error"
                exit 1
                ;;
        esac
        
        # Move to the next pair of operator and operand.
        ((i += 2))
    done

    echo "$result"
}

# Pass all command-line arguments to the main function.
main "$@"

How to Run the Script

To use this script, save it as wordy.sh, make it executable, and then run it with the word problem as a single string argument.


# Make the script executable
chmod +x wordy.sh

# --- Test Cases ---

# Simple number
./wordy.sh "What is 5?"
# Expected Output: 5

# Addition
./wordy.sh "What is 5 plus 13?"
# Expected Output: 18

# Multiple operations (left-to-right)
./wordy.sh "What is 3 plus 2 multiplied by 3?"
# Expected Output: 15

# Syntax error
./wordy.sh "What is 5 plus?"
# Expected Output: syntax error

# Unknown operation
./wordy.sh "What is 5 Cubed?"
# Expected Output: unknown operation

A Detailed Code Walkthrough

Understanding *why* the code works is more important than just copying it. Let's dissect the script section by section.

1. The `main` Function and Input Handling

main() {
    local input="$1"
    ...
}
main "$@"

We wrap our logic in a main function, a standard practice for creating modular and readable scripts. The local input="$1" line declares a local variable input and assigns it the value of the first command-line argument ($1). The final line, main "$@", calls the function, passing all command-line arguments to it. The quotes around $@ are crucial for correctly handling inputs that contain spaces.

2. Validation with Regular Expressions

if [[ ! "$input" =~ ^What\ is\ .*? ]]; then
    echo "syntax error"
    exit 1
fi

if [[ "$input" =~ -?[0-9]+\s+(plus|minus|multiplied|divided|by)\s+.*(Cubed) ]]; then
    echo "unknown operation"
    exit 1
fi

This is our first line of defense. The first if statement uses Bash's extended test command [[ ... ]] with the regex match operator =~. The pattern ^What\ is\ .*? checks if the string *starts with* "What is ". If not, it's an immediate syntax error.

The second `if` is more subtle. It's a proactive check for operations we know we don't support, like "Cubed". This prevents the script from failing with a less specific "syntax error" later on.

3. Sanitization and Transformation

local question="${input#What is }"
question="${question%?}"

question="${question//multiplied by/*}"
question="${question//divided by//}"
question="${question//plus/+}"
question="${question//minus/-}"

Here, we use Bash's built-in parameter expansion, which is highly efficient.

  • ${input#What is } removes the prefix "What is " from the start of the string.
  • ${question%?} removes the last character (the question mark) from the end.
  • ${question//pattern/replacement} is a global search-and-replace. We replace all occurrences of our keywords with their corresponding mathematical symbols. It's important to process "multiplied by" before "by" to avoid incorrect replacements.

4. Tokenization and Evaluation Loop

This is the core of the algorithm, where we process the cleaned string. The logic is designed for sequential, left-to-right evaluation.

    ● Start: Cleaned String
    │ e.g., "3 + 2 * 3"
    ▼
  ┌───────────────────┐
  │ Split into Tokens │
  │ ["3", "+", "2",   │
  │ "*", "3"]         │
  └─────────┬─────────┘
            │
            ▼
  ┌───────────────────┐
  │ result = tokens[0]│
  │ (result is now 3) │
  └─────────┬─────────┘
            │
            ▼
    ◆ Loop while tokens remain?
   ╱           ╲
  Yes           No
  │              │
  ▼              ▼
┌─────────────────┐  ● End: Echo result
│ Get op & operand│
│ (op="+", num="2") │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Perform Calc    │
│ result = 3 + 2  │
│ (result is now 5) │
└────────┬────────┘
         │
         └─────────> Back to Loop

The code implements this flow perfectly:

read -ra tokens <<< "$question"
local result=${tokens[0]}
local i=1

while [[ $i -lt ${#tokens[@]} ]]; do
    local operator=${tokens[$i]}
    local operand=${tokens[$i+1]}
    # ... case statement ...
    ((i += 2))
done

echo "$result"
  • read -ra tokens <<< "$question" is a clever way to split a string into an array. The -r prevents backslash interpretation, -a tokens reads into an array named tokens, and <<< provides the string as standard input.
  • We initialize result with the first number.
  • The while loop iterates through the remaining tokens, taking them two at a time: one operator and one operand.
  • The case statement is a clean way to select the correct arithmetic operation.
  • ((i += 2)) increments the counter by two, moving us to the next operator/operand pair.

This looping mechanism elegantly enforces the left-to-right calculation rule required by the problem statement.


When to Consider Alternative Approaches

While our loop-based solution is robust and clear, Bash often provides multiple ways to solve a problem. Exploring alternatives helps deepen your understanding of the available tools.

Alternative 1: Using a `sed` Pipeline

A more "classic" Unix approach might involve chaining multiple sed commands and piping the result to a calculator utility like bc. This is less efficient for this specific problem but demonstrates a powerful pattern.

# This approach is NOT fully correct for the problem because bc uses PEMDAS.
# It's shown here for educational purposes.
function solve_with_sed_bc() {
    local input="$1"
    
    local expression
    expression=$(echo "$input" | \
        sed -e 's/What is //g' \
            -e 's/?//g' \
            -e 's/plus/+/g' \
            -e 's/minus/-/g' \
            -e 's/multiplied by/*/g' \
            -e 's/divided by/\//g')

    # bc will respect order of operations, which is incorrect for this problem.
    echo "$expression" | bc
}

This is concise but fails the requirement for left-to-right evaluation. It highlights the importance of choosing the right tool that matches the problem's constraints.

Alternative 2: Using `awk`

awk is a Turing-complete language designed for text processing. It can handle this entire problem in a single, albeit complex, command.

# An awk-based solution. Verbose, but powerful.
function solve_with_awk() {
    echo "$1" | awk '
    {
        # Remove prefix and suffix
        gsub(/^What is /, "");
        gsub(/\?$/, "");
        
        # Replace operators
        gsub(/plus/, "+");
        gsub(/minus/, "-");
        gsub(/multiplied by/, "*");
        gsub(/divided by/, "/");

        # Tokenize and evaluate
        split($0, tokens, " ");
        result = tokens[1];
        for (i = 2; i <= length(tokens); i+=2) {
            op = tokens[i];
            num = tokens[i+1];
            if (op == "+") result += num;
            else if (op == "-") result -= num;
            else if (op == "*") result *= num;
            else if (op == "/") result /= num;
        }
        print result;
    }'
}

Pros and Cons of Different Methods

Approach Pros Cons
Pure Bash Loop (Our Solution) - No external dependencies (highly portable)
- Explicit logic, easy to debug
- Full control over evaluation order
- More verbose than one-liners
- Can be slower for huge inputs due to shell loop overhead
`sed` + `bc` Pipeline - Concise and idiomatic for simple substitutions
- Offloads math to a dedicated tool
- Incorrect logic (bc uses PEMDAS)
- Creates multiple processes (less efficient)
`awk` Script - Extremely powerful for field-based processing
- Self-contained in a single process
- Steeper learning curve; syntax can be cryptic
- Can be overkill for simpler problems

For this specific problem from the kodikra learning path, our pure Bash solution is superior because it correctly implements the required left-to-right logic while remaining readable and portable.


Frequently Asked Questions (FAQ)

How could this script handle the standard order of operations (PEMDAS)?

To handle PEMDAS, you would need a much more complex parsing algorithm, like the Shunting-yard algorithm, to convert the infix notation (e.g., 3 + 5 * 2) to postfix/Reverse Polish Notation (e.g., 3 5 2 * +). Then, you would evaluate the RPN stack. This is significantly more complex and beyond the scope of this problem, often better suited for a language with more advanced data structures.

Why is using `eval` a bad idea for this problem?

Using eval on user-provided input is a massive security risk. A malicious string like "What is 5; rm -rf /" could be processed by eval, leading to the execution of arbitrary and destructive commands. Our token-by-token parsing approach completely avoids this danger by never executing the input string directly.

How can I make the script more robust against invalid input?

Our script already has several validation checks. To make it more robust, you could add more regex checks at the beginning to ensure the pattern of (number operator number)+ is strictly followed. For example, you could check for consecutive numbers or operators (e.g., "What is 5 5 plus 10?") and flag them as syntax errors.

What does the `read -ra tokens <<< "$question"` line do exactly?

This is a compound command. <<< "$question" is a "here string," which passes the contents of the $question variable as standard input to the read command. The read command, with the -a tokens flag, reads this input and splits it by spaces (the default delimiter) into an array named tokens. The -r flag prevents it from interpreting backslashes specially.

Is Bash's `((...))` arithmetic limited to integers?

Yes, Bash's native arithmetic expansion ((...)) and let command only handle integers. If you needed to process floating-point numbers (decimals), you would have to use an external command-line calculator like bc (Basic Calculator), which supports arbitrary-precision arithmetic.

Why not just use one big `sed` command for all replacements?

You can use multiple -e flags with a single sed command, like sed -e 's/a/b/' -e 's/c/d/'. However, for our specific case, using Bash's native parameter expansion (${var//find/replace}) is generally faster because it doesn't require launching a separate external process. For simple substitutions, internal shell features are often preferable.

How could I add a new operation like "power"?

To add a "power" operation, you would update two parts of the script. First, add it to the transformation section: question="${question//raised to the power of/**}". Second, add a new case to the case statement: "**") ((result **= operand)) ;;. This demonstrates the extensibility of our chosen design.


Conclusion: From Words to Wisdom

We have successfully navigated the journey from a simple English question to a precise numerical answer, all within a single Bash script. By breaking the problem down, we systematically built a solution that validates, sanitizes, transforms, and evaluates input, touching upon some of the most powerful text-processing features Bash has to offer.

The key takeaways from this exercise are not just the final code, but the thought process behind it: the importance of input validation, the efficiency of built-in parameter expansion over external tools, and the clarity of a structured, loop-based algorithm for custom evaluation logic. These are the skills that separate a novice from an expert shell scripter.

The "Wordy" problem serves as a perfect microcosm of larger data-wrangling tasks you'll face in your career. Whether you're parsing server logs, automating reports, or building command-line tools, the ability to manipulate text with confidence is a superpower in the world of DevOps and system administration.

Technology Disclaimer: The solution and concepts presented in this article are based on Bash version 4.x and later. While most features are backward-compatible, the specific regex matching and array handling are most reliable on modern versions of the shell.

Ready to tackle the next challenge? Continue your journey on the Kodikra Bash Learning Path and master the art of the command line. Or, for a broader view, explore more advanced Bash concepts on our main page.


Published by Kodikra — Your trusted Bash learning resource.