Wordy in Coffeescript: Complete Solution & Deep Dive Guide

a close up of a computer screen with code on it

The Complete Guide to Parsing Math Word Problems in CoffeeScript

Learn to build a CoffeeScript parser that translates natural language math questions like 'What is 5 plus 10?' into numerical answers. This guide covers string manipulation, regular expressions, and state management to solve complex wordy problems from scratch, turning ambiguous text into precise logic.

Have you ever looked at a simple sentence and wondered how a machine like Amazon's Alexa or Google Assistant understands it? The gap between human language—fluid, contextual, and sometimes messy—and the rigid, logical world of programming can seem vast. Many developers hit a wall when they need to extract structured data from unstructured text, a task that feels more like art than science.

This is where the power of parsing comes in. It's the bridge between human intent and machine execution. In this comprehensive guide, we'll demystify this process. We will build a "Wordy" calculator in CoffeeScript, a program that takes a simple math problem phrased as a question and computes the answer. You'll move from zero to hero, mastering the techniques to transform words into calculations, a fundamental skill in everything from building chatbots to analyzing data.

What is the "Wordy" Problem? A Bridge Between Language and Logic

At its core, the "Wordy" problem, a classic challenge from the kodikra.com CoffeeScript learning path, is about translation. The goal is to write a program that can take a string of text representing a mathematical question, parse it, and return the integer result. It's a foundational exercise in Natural Language Processing (NLP).

The input is a simple English question, such as:

"What is 5?" which should evaluate to 5.
"What is 5 plus 13?" which should evaluate to 18.
"What is 7 minus 5?" which should evaluate to 2.
"What is 3 multiplied by 4?" which should evaluate to 12.
"What is 25 divided by 5?" which should evaluate to 5.

The challenge escalates by introducing multiple, sequential operations and the need for robust error handling. For instance, the program must correctly calculate "What is 1 plus 1 plus 1?" but throw an error for nonsensical input like "What is 5 plus plus 6?" or "Who is the President of the United States?".

This problem forces us to think like a compiler. We must define a grammar (the rules of our simple language), tokenize the input (break it into meaningful parts), and then evaluate the tokens in a structured way. It’s a microcosm of how more complex programming languages and interpreters are built.

Why Is Parsing Text a Crucial Skill for Modern Developers?

You might think that parsing word problems is a niche, academic exercise. However, the underlying principles are incredibly relevant in today's technology landscape. The world runs on unstructured data—text messages, emails, support tickets, social media posts, and voice commands. The ability to extract meaning and structure from this chaos is a superpower.

Here’s why this skill is indispensable:

Building Intelligent Interfaces: Every chatbot, voice assistant (like Siri or Alexa), and command-line interface (CLI) needs to parse user input. They translate a user's natural language command into a specific action.
Data Extraction and Web Scraping: When you scrape a website or analyze a log file, you're often dealing with semi-structured text. Parsing techniques allow you to pull out specific pieces of information, like prices from an e-commerce site or error codes from server logs.
Configuration File Management: Tools like Docker (Dockerfile), Kubernetes (YAML), and Webpack (webpack.config.js) all rely on parsing configuration files to set up environments and build processes.
Developing Domain-Specific Languages (DSLs): Sometimes, you need to create a small language for a specific task, like a query language for a database or a rule engine for a business application. The skills you learn here are the first step in that direction.

By mastering this "Wordy" problem, you are not just solving a puzzle; you are learning the fundamental mechanics of how software makes sense of the human world. It’s a direct application of computer science theory that has immediate, practical value.

How to Design the Solution in CoffeeScript: A Step-by-Step Strategy

Before writing a single line of code, a good developer strategizes. Our approach will be to break the problem down into manageable, logical steps. We will use a class-based structure in CoffeeScript to encapsulate the logic and manage the state of our calculation cleanly.

The Core Strategy

Our plan can be summarized in three main phases: Cleaning, Tokenizing, and Evaluating.

Clean and Validate: The raw input string (e.g., "What is 5 plus 10?") contains noise. We need to remove the non-essential parts like "What is" and the trailing question mark. We also need to validate that the question is in a format we can understand.
Tokenize: Once cleaned, we'll break the remaining string (e.g., "5 plus 10") into a list of "tokens." These tokens will be either numbers (5, 10) or operators (plus, minus).
Evaluate Sequentially: We will process the tokens in order. Since the problem specifies simple left-to-right evaluation (no operator precedence like PEMDAS/BODMAS), we can maintain a running total and apply each operation as we encounter it.

This sequential approach is perfect for the problem's constraints and avoids the complexity of building a full Abstract Syntax Tree (AST), which would be overkill here.

High-Level Logic Flow

Here is a conceptual diagram of our entire process, from receiving the raw string to producing the final integer result.

    ● Start (Input: "What is 5 multiplied by -2?")
    │
    ▼
  ┌───────────────────────┐
  │   Validate & Clean    │
  │ (Remove "What is", "?") │
  └──────────┬────────────┘
             │
             ▼
  ┌───────────────────────┐
  │   Tokenize with Regex   │
  │  (Extract numbers/ops)  │
  │  Tokens: [5, 'multiplied by', -2]
  └──────────┬────────────┘
             │
             ▼
    ◆  Is Token List Valid?
   ╱                       ╲
  Yes                       No
  │                         │
  ▼                         ▼
┌───────────────────────┐ [Throw SyntaxError]
│  Evaluate Sequentially  │
│  (Process tokens left-to-right) │
└──────────┬────────────┘
           │
           ▼
    ● End (Output: -10)

The Complete CoffeeScript Solution

We'll encapsulate our logic within a WordProblem class. This is good practice as it keeps our state (the question) and behavior (the answer method) bundled together. It's clean, testable, and reusable.


# kodikra.com CoffeeScript Module 4: Wordy Problem Solution
# This class parses and evaluates simple math word problems.

class WordProblem
  # Constructor: Initializes the instance with the question string.
  constructor: (@question) ->
    # A map to translate word operators to actual CoffeeScript operators.
    # This provides a clean lookup table.
    @OPERATORS =
      'plus': '+'
      'minus': '-'
      'multiplied by': '*'
      'divided by': '/'

  # answer: The main method that performs parsing and evaluation.
  answer: ->
    # 1. TOKENIZATION using Regular Expressions
    # This regex is the heart of the parser. It finds:
    #   - An initial number (positive or negative).
    #   - Zero or more groups of an operator followed by another number.
    #   The `g` flag is not needed as we expect one full match.
    #   The `i` flag makes it case-insensitive.
    pattern = /What is (-?\d+)(?: (plus|minus|multiplied by|divided by) (-?\d+))*\?/i
    matches = @question.match(pattern)

    # 2. ERROR HANDLING: Initial Validation
    # If the question doesn't match our expected structure, it's a syntax error.
    unless matches
      throw new Error('Syntax error')

    # 3. EXTRACTION: Get the core components from the cleaned string.
    # We remove the full match and the first number, leaving only operators and subsequent numbers.
    # Example: "What is 5 plus 10 minus 2?"
    # cleaned_string becomes " plus 10 minus 2"
    # Note: We slice from index 2 because matches[0] is the full string and matches[1] is the first number.
    # We then filter out any `undefined` captures from optional groups.
    components = @question.replace(pattern, '$1 $2 $3').split(/\s+/)
    # A more robust way to get the components is to process the rest of the string after the first number.
    # Let's refine this. A better approach is to find all numbers and operators separately.

    # Let's use a more robust tokenization strategy.
    # Find all numbers (including negative) and all known operators.
    tokens = @question.match(/-?\d+|plus|minus|multiplied by|divided by/g)

    # If no tokens are found after the initial "What is", it's an error.
    unless tokens
        throw new Error('Syntax error')
    
    # The first token MUST be a number.
    result = parseInt(tokens.shift(), 10)
    if isNaN(result)
        throw new Error('Syntax error')

    # 4. EVALUATION LOOP
    # Process the remaining tokens in pairs (operator, number).
    while tokens.length > 0
      operator_word = tokens.shift()
      # Handle cases like "multiplied by" which is two words but one token in our regex.
      # The regex above handles this well.
      
      number_str = tokens.shift()

      # Error Handling within the loop
      unless operator_word and @OPERATORS[operator_word] and number_str
        throw new Error('Syntax error')
      
      num = parseInt(number_str, 10)
      if isNaN(num)
        throw new Error('Syntax error')

      # Perform the calculation using a switch statement for clarity.
      switch @OPERATORS[operator_word]
        when '+' then result += num
        when '-' then result -= num
        when '*' then result *= num
        when '/' then result /= num
        else throw new Error('Unknown operation')

    # If there are leftover tokens, the query was malformed.
    if tokens.length > 0
      throw new Error('Syntax error')

    # 5. RETURN THE FINAL RESULT
    result

# Export the class for use in other modules (e.g., testing frameworks).
# In a browser or simple script, this line might not be necessary.
module.exports = WordProblem

This solution is robust. It uses a single, powerful regular expression to validate the overall structure and then a token-based loop to perform the calculation, with error checks at each critical step.

Where the Logic Happens: A Deep Dive Code Walkthrough

Let's dissect the CoffeeScript code piece by piece to understand the "how" and "why" behind each decision. Understanding the flow of data and control is key to mastering the solution.

Step 1: The `WordProblem` Class and `constructor`


class WordProblem
  constructor: (@question) ->
    @OPERATORS =
      'plus': '+'
      'minus': '-'
      'multiplied by': '*'
      'divided by': '/'

We start by defining a class, WordProblem. This is an excellent object-oriented practice. Each instance of this class will represent a single word problem to be solved.

The constructor: (@question) -> is CoffeeScript's concise syntax for creating a constructor that accepts an argument (question) and automatically assigns it to an instance variable (@question, which compiles to this.question in JavaScript).

The @OPERATORS map is a crucial piece of our design. It acts as a translation dictionary. It decouples the words used in the problem ("plus", "minus") from the actual symbols used for computation (+, -). This makes the code cleaner, easier to read, and extensible. If we wanted to add support for "added to" as a synonym for "plus", we could simply add it to this map.

Step 2: Tokenization - The Brain of the Parser


# In the answer() method...
tokens = @question.match(/-?\d+|plus|minus|multiplied by|divided by/g)

unless tokens
    throw new Error('Syntax error')

This is where the magic begins. Instead of complex string splitting and checking, we use a single, powerful regular expression with the match() method and the global flag (g).

-?\d+: This part matches numbers. \d+ matches one or more digits. The -? at the beginning makes an optional hyphen, so it correctly captures both positive (5) and negative (-10) numbers.
|: This is the "OR" operator in regex.
plus|minus|multiplied by|divided by: This part explicitly lists all the operator words we want to capture. It's important that "multiplied by" comes before "by" if we were to match shorter words, to avoid partial matches.

For an input like "What is 5 plus -10?", the tokens array will become ['5', 'plus', '-10']. This clean list of tokens is the perfect input for our evaluation logic.

Step 3: The Evaluation Loop - Calculating the Result

The core of the calculation happens in a while loop. This loop iteratively consumes the tokens to build the final answer.

    ● Start Loop (result = initial_number)
    │
    ▼
  ┌──────────────────────┐
  │  tokens.shift()      │◀─┐
  │  (Get Operator Word) │  │
  └──────────┬───────────┘  │
             │              │
             ▼              │
    ◆ Operator & Number Exist?
   ╱                       ╲
  Yes                       No
  │                         │
  ▼                         ▼
┌──────────────────────┐ [Throw SyntaxError]
│  Apply Operation     │
│  (e.g., result += num) │
└──────────┬───────────┘
           │
           ▼
    ◆ Any Tokens Left?
   ╱                  ╲
 Yes                   No
  │                     │
  └─────────────────────┘
                          │
                          ▼
                         ● End Loop & Return Result

The code implements this flow perfectly:


result = parseInt(tokens.shift(), 10)
# ... error check ...

while tokens.length > 0
  operator_word = tokens.shift()
  number_str = tokens.shift()

  # ... error checks ...
  
  num = parseInt(number_str, 10)

  switch @OPERATORS[operator_word]
    when '+' then result += num
    when '-' then result -= num
    when '*' then result *= num
    when '/' then result /= num
    else throw new Error('Unknown operation')

We first initialize result with the very first token, which must be a number. Then, the while loop continues as long as there are tokens left to process. Inside the loop, we expect to pull tokens out in pairs: an operator followed by a number. The .shift() method is perfect for this, as it removes and returns the first element of the array.

The switch statement is a clean and efficient way to perform the correct mathematical operation based on the operator word. It looks up the symbol in our @OPERATORS map and executes the corresponding code block.

Step 4: Robust Error Handling

A good program doesn't just work for valid inputs; it fails gracefully and predictably for invalid ones. Our code includes several checks:

Invalid Question Format: If the initial tokenization returns nothing (unless tokens), it means the question was fundamentally malformed (e.g., "Tell me the answer").
Missing Numbers or Operators: Inside the loop, unless operator_word and ... and number_str checks if the sequence is broken (e.g., "5 plus 6 plus").
Non-numeric Tokens: if isNaN(result) and if isNaN(num) ensure that what we think are numbers are actually numbers.
Unknown Operations: The else clause in the switch statement catches any operator words that are not in our @OPERATORS map (e.g., "5 modulo 2").

By throwing an Error, we adhere to standard JavaScript/CoffeeScript practice for handling exceptional situations, allowing the calling code to catch and manage the failure.

When to Consider Alternative Approaches

The sequential evaluation approach we implemented is perfect for this problem's constraints. However, in the real world, you'll encounter more complex parsing needs. It's important to know what other tools and algorithms are available.

Alternative 1: Recursive Descent Parsing

For grammars with nested structures (like expressions with parentheses), a recursive descent parser is a common technique. You would define functions for each part of the grammar (e.g., `parseExpression`, `parseTerm`, `parseFactor`), and these functions would call each other to break down the input. This is a step up in complexity but is extremely powerful for handling operator precedence.

Alternative 2: Shunting-Yard Algorithm

Created by Edsger Dijkstra, the Shunting-Yard algorithm is a classic method for converting an infix expression (like 3 + 4 * 2) to a postfix expression (Reverse Polish Notation: 3 4 2 * +). Postfix expressions are trivial to evaluate with a stack. This is the standard way to handle operator precedence (PEMDAS/BODMAS).

Alternative 3: Using a Parser Generator

For truly complex languages, writing a parser by hand is tedious and error-prone. Tools like Jison (a JavaScript/CoffeeScript port of Bison) allow you to define your language's grammar in a formal way. The tool then generates the entire parser for you. This is how many compilers and interpreters are built.

Pros and Cons of Our Approach

Let's compare our simple iterative method to a more advanced one like Shunting-Yard.

Aspect	Our Iterative Approach	Shunting-Yard Algorithm
Complexity	Low. Easy to understand and implement.	Medium. Requires understanding stacks and postfix notation.
Performance	Excellent for this problem. Single pass over the tokens.	Very good. Still linear time, but slightly more overhead.
Operator Precedence	Not supported. Evaluates strictly left-to-right.	Primary strength. Correctly handles PEMDAS/BODMAS.
Extensibility	Easy to add new operators of the same precedence. Hard to add precedence rules.	Designed for extensibility with different precedence levels and associativity.
Use Case	Perfect for simple, sequential command parsing or DSLs without precedence.	Ideal for building calculators, scientific software, or language interpreters.

For the "Wordy" problem as defined in the kodikra.com module, our solution is the optimal choice: simple, correct, and efficient.

Frequently Asked Questions (FAQ)

1. Why not just use `eval()` to solve this?: Using eval() is extremely dangerous from a security perspective. It executes any string as code, which opens your application to Code Injection attacks. If a malicious user could control the input string, they could execute arbitrary code on your server or in a user's browser. Our manual parsing approach is safe because we only allow a small, well-defined set of operations.
2. How does the solution handle negative numbers?: The regular expression /-?\d+/ is specifically designed for this. The -? part of the pattern matches an optional hyphen (-) at the beginning of a sequence of digits (\d+). This allows the tokenizer to correctly identify tokens like -10 as a single number.
3. What happens if the input has two numbers in a row, like "What is 5 5 plus 6?": Our error handling catches this. The evaluation loop expects an operator followed by a number. In the case of "5 5 plus 6", the tokens would be ['5', '5', 'plus', '6']. The loop would start with result = 5. It would then try to read an operator but would get '5' instead. The lookup @OPERATORS['5'] would fail, and the code would throw a 'Syntax error'.
4. Can this parser handle exponents or square roots?: Not out of the box, but it's highly extensible. To add exponentiation, you would add "raised to the" to the operator list in the regex, add a mapping in the @OPERATORS object (e.g., 'raised to the': '**'), and add a new when '**' case to the switch statement. Square roots would be slightly different, likely a unary operator, requiring a small logic change.
5. Why is a class-based approach better than a simple function?: While a single function could work, a class provides better structure and encapsulation. It bundles the data (the question) and the behavior (the `answer` method) together. This makes the code more organized, reusable, and easier to test. It also allows you to store state, like our @OPERATORS map, cleanly within the instance.
6. Is CoffeeScript still a good choice for this kind of task?: Absolutely. CoffeeScript's syntactic sugar shines in tasks involving data manipulation. Its concise syntax for arrays, objects, and classes can make parsing logic cleaner and more readable than its JavaScript equivalent. While modern JavaScript (ES6+) has adopted many of its features, CoffeeScript still offers a very pleasant and efficient development experience for this type of problem.
7. How would I test this `WordProblem` class?: You would use a testing framework like Mocha and an assertion library like Chai. You'd write test cases for all valid inputs (addition, subtraction, multiple operations) and also for all the expected error conditions (syntax errors, unknown operations). For example: expect(new WordProblem("What is 5?").answer()).to.equal(5) and expect(() => new WordProblem("Who are you?").answer()).to.throw('Syntax error').

Conclusion: From Words to Wisdom

We have successfully journeyed from a simple English question to a precise numerical answer. In building this "Wordy" parser, we've done more than just solve a coding challenge; we've explored the fundamental principles of parsing, tokenization, state management, and robust error handling. You've seen how the elegance of CoffeeScript and the power of regular expressions can turn a complex problem into a clean, understandable solution.

The skills learned here are a gateway to more advanced topics in computer science, including compiler design, natural language processing, and building sophisticated developer tools. The ability to bridge the gap between human language and machine logic is what separates a good programmer from a great one. You are now equipped with the foundational knowledge to tackle any problem that involves extracting structure and meaning from the vast world of text.

Disclaimer: The code and concepts in this article are based on modern CoffeeScript and Node.js environments. The core logic is timeless, but always ensure your project's dependencies and toolchain are up to date for the best performance and security.

Ready to continue your journey? Explore the next module in our CoffeeScript learning path or dive deeper into the language with our complete CoffeeScript guide at kodikra.com.

Published by Kodikra — Your trusted Coffeescript learning resource.

kodikra

Search this blog