Wordy in Csharp: Complete Solution & Deep Dive Guide


The Complete Guide to Parsing Math Word Problems in C#

Solving the "Wordy" problem in C# is a foundational exercise in string parsing and algorithm design. It requires transforming a natural language question, such as "What is 5 plus 13?", into a calculable result by extracting numbers and operators using tools like Regular Expressions or string splitting, and then sequentially evaluating the mathematical expression.

Have you ever marveled at how a digital assistant like Siri or Google Assistant instantly understands your spoken math questions? It might seem like magic, but behind the curtain is a sophisticated process of parsing natural language. You ask, "What is 7 minus 5?", and it doesn't just hear words; it sees a problem to be solved. This process of turning human language into machine-executable instructions is a cornerstone of modern software.

Many developers feel a gap between writing simple command-line apps and building software that intelligently interacts with user input. This guide is designed to bridge that gap. We will dissect the logic required to build a C# program that can parse and solve simple math word problems. You will learn not just one, but multiple robust techniques, transforming you from a coder who simply follows instructions to an engineer who designs solutions.


What is the Wordy Problem?

The "Wordy" problem, a classic challenge from the kodikra.com exclusive curriculum, tasks a developer with creating a parser that can understand and evaluate simple mathematical word problems. The input is always a string in a specific format, and the expected output is a single integer representing the answer.

The complexity of the problem is introduced incrementally, making it an excellent exercise for building up skills. It starts with the most basic case and gradually adds more layers.

  • Level 0: Just a Number
    The simplest input is a question asking for a number itself.
    "What is 5?" ⟶ 5
  • Level 1: Simple Addition
    The next step introduces a single binary operation, addition.
    "What is 5 plus 13?" ⟶ 18
  • Level 2: All Basic Operations
    The problem then expands to include subtraction, multiplication, and division.
    "What is 7 minus 5?" ⟶ 2
    "What is 6 multiplied by 4?" ⟶ 24
    "What is 25 divided by 5?" ⟶ 5
  • Level 3: Multiple Operations
    The true challenge emerges when multiple operations are chained together. The evaluation must happen sequentially, from left to right, without considering mathematical precedence (like PEMDAS/BODMAS).
    "What is 5 plus 13 plus 8?" ⟶ 26
    "What is 3 plus 2 multiplied by 3?" ⟶ 15 (i.e., (3 + 2) * 3)

Furthermore, a robust solution must handle invalid or malformed input gracefully. Questions with incorrect syntax, unknown operations, or missing numbers should trigger specific errors (exceptions) rather than crashing the program. For example, "What is 5 plus plus 6?" or "Who is the President?" are invalid and must be rejected.


Why Is This Challenge Important for C# Developers?

At first glance, the Wordy problem might seem like a simple academic puzzle. However, mastering it equips you with several indispensable skills that are directly applicable to real-world C# development.

Mastering String Manipulation and Parsing

A significant portion of software development involves processing text data. Whether you're parsing log files, handling API responses (like JSON or XML), building a compiler, or creating a command-line interface (CLI), the core task is the same: extracting meaningful information from raw strings. This problem forces you to move beyond simple string.Contains() checks and employ more powerful techniques like Regular Expressions (Regex) or strategic splitting and iteration.

Algorithmic Thinking and State Management

Solving the Wordy problem requires you to think algorithmically. You must devise a step-by-step process (an algorithm) to read the question and calculate the result. As you process the question from left to right, you need to maintain the "state" of the calculation—specifically, the current running total. This concept of state management is crucial in everything from simple loops to complex UI frameworks like Blazor or MAUI.

Robust Error Handling

Production-grade software never assumes valid input. The Wordy problem forces you to think about edge cases and failure modes. What if the question is missing a number? What if it contains an operation you don't recognize? Learning to anticipate these issues and handle them by throwing appropriate exceptions (like ArgumentException or FormatException) is a hallmark of a professional developer. This defensive programming mindset prevents bugs and makes your applications more reliable.

Foundation for More Complex Parsers

This challenge is a gateway to more complex parsing tasks. The skills you build here are the foundation for creating interpreters for simple domain-specific languages (DSLs), configuration file readers, or even the front-end of a programming language compiler (the tokenizer/lexer phase). It's a small-scale simulation of how complex systems interpret structured input.


How to Design a Solution in C#? (The Core Logic)

Before writing a single line of code, it's crucial to formulate a strategy. The core task is to convert the unstructured string into a structured sequence of operations that a computer can execute. Our goal is to transform "What is 5 plus 3 multiplied by 2?" into a process that calculates (5 + 3) * 2.

A highly effective and modern approach involves using Regular Expressions to tokenize the input string. Tokenization is the process of breaking down a stream of text into meaningful elements called tokens. For our problem, the tokens are numbers and operators.

The Overall Parsing and Evaluation Flow

Our strategy can be visualized as a clear, multi-step pipeline. We take the raw input, clean it, break it into tokens, and then process those tokens sequentially to arrive at a final answer.

    ● Start: Input String
    │  "What is 5 plus 13?"
    ▼
  ┌───────────────────┐
  │  Initial Cleanup  │
  │ & Basic Validation│
  └─────────┬─────────┘
            │ (e.g., check for "What is...?")
            ▼
  ┌───────────────────┐
  │   Tokenize Input  │
  │ (Using Regex)     │
  └─────────┬─────────┘
            │
            ▼
  [ "5", "plus", "13" ]
            │
            ▼
  ┌───────────────────┐
  │  Evaluate Tokens  │
  │ (Sequential Loop) │
  └─────────┬─────────┘
            │
            ▼
      ◆ Is valid?
     ╱           ╲
    Yes           No
    │             │
    ▼             ▼
  ┌─────────┐   ┌──────────┐
  │ Return  │   │ Throw    │
  │ Result  │   │ Exception│
  └─────────┘   └──────────┘
    │
    ▼
    ● End: 18

Step 1: The Power of Regular Expressions

Regular Expressions (Regex) are a perfect tool for this job. They are mini-languages for pattern matching in strings. We can define a single pattern that finds all the parts of the string we care about: numbers (including negatives) and the specific words for our operations.

The regex pattern we'll use is: -?\d+|plus|minus|multiplied by|divided by

Let's break this pattern down:

  • -?: Matches an optional hyphen (for negative numbers). The ? makes the preceding character (-) optional.
  • \d+: Matches one or more digits (0-9).
  • -?\d+: Together, this part matches any integer, positive or negative.
  • |: This is the "OR" operator. It separates the different things we want to match.
  • plus|minus|multiplied by|divided by: These match the literal strings for our operations.

When we apply this pattern to "What is 5 plus 13?", the C# Regex.Matches method will return a collection of matches: ["5", "plus", "13"]. This clean list of tokens is exactly what we need for the next step.

Step 2: The Evaluation Loop

Once we have our list of tokens, we can process them. The logic is as follows:

  1. Initialize a result variable with the first token, which must be a number.
  2. Loop through the remaining tokens in pairs: one operator followed by one operand (a number).
  3. For each pair, apply the operation to the current result and the new operand.
  4. Update the result with the outcome of the operation.
  5. Continue until all tokens are processed.

To map the operator strings ("plus", "minus", etc.) to actual C# operations, a switch expression is a clean, modern, and highly readable choice. It allows us to concisely define the action for each operator string.

For example:


result = operation switch
{
    "plus"          => result + operand,
    "minus"         => result - operand,
    "multiplied by" => result * operand,
    "divided by"    => result / operand,
    _               => throw new ArgumentException("Unknown operation.")
};

This structure is not only efficient but also self-documenting. It clearly states what happens for each possible operation, and the wildcard case _ ensures we handle any unexpected operator tokens, making our code more robust.


Where Do We Implement the Code? (The C# Solution)

Now, let's translate our design into a complete C# solution. We will create a static class named Wordy with a single public static method, Answer. This structure is simple and makes the functionality easy to use without needing to instantiate an object.

We'll be using the System.Text.RegularExpressions namespace for our Regex-based tokenization.

The Complete C# Code

This code is written for .NET 8 and C# 12, leveraging modern features like top-level statements for testing (if you were running it in a simple console app) and switch expressions for clarity.


using System;
using System.Text.RegularExpressions;
using System.Linq;
using System.Collections.Generic;

public static class Wordy
{
    // A single, powerful regex to capture all numbers and known operations.
    // This is the heart of our parsing strategy.
    private static readonly Regex TokenizerRegex = new Regex(
        @"-?\d+|plus|minus|multiplied by|divided by",
        RegexOptions.Compiled);

    public static int Answer(string question)
    {
        // 1. Initial Validation: The question must follow the basic format.
        if (!question.StartsWith("What is") || !question.EndsWith("?"))
        {
            throw new ArgumentException("Invalid question format. Must start with 'What is' and end with '?'.");
        }

        // 2. Tokenization: Use the regex to extract all relevant parts.
        var matches = TokenizerRegex.Matches(question);

        // If no numbers or operations are found, it's an invalid question.
        if (matches.Count == 0)
        {
            throw new ArgumentException("No valid numbers or operations found in the question.");
        }

        // 3. Initialization: The first token MUST be a number.
        // We use a Queue for easy processing of tokens in order.
        var tokens = new Queue<string>(matches.Cast<Match>().Select(m => m.Value));
        
        if (!int.TryParse(tokens.Dequeue(), out int result))
        {
            throw new ArgumentException("Syntax error: Question must start with a number.");
        }

        // 4. Evaluation Loop: Process the rest of the tokens.
        while (tokens.Count > 0)
        {
            // We expect pairs of (operator, operand). If we only have an operator left, it's an error.
            if (tokens.Count < 2)
            {
                throw new ArgumentException("Syntax error: Incomplete operation at the end of the question.");
            }

            string operation = tokens.Dequeue();
            string operandStr = tokens.Dequeue();

            if (!int.TryParse(operandStr, out int operand))
            {
                // This catches cases like "What is 5 plus plus 6?"
                throw new ArgumentException("Syntax error: Expected a number after an operator.");
            }

            // 5. Perform Calculation: Use a modern switch expression for clean logic.
            result = operation switch
            {
                "plus" => result + operand,
                "minus" => result - operand,
                "multiplied by" => result * operand,
                "divided by" => result / operand,
                // This case should not be hit due to our regex, but it's good defensive programming.
                _ => throw new ArgumentException($"Unknown operation '{operation}'.")
            };
        }

        return result;
    }
}

Detailed Code Walkthrough

1. The Regex Field

private static readonly Regex TokenizerRegex = new Regex(...)

We define our regex pattern as a static readonly field. This is a performance optimization. Compiling a regex can be expensive, so by creating it once and reusing it, we make subsequent calls to the Answer method much faster. RegexOptions.Compiled tells the .NET runtime to compile the pattern into intermediate language (IL) for maximum performance.

2. Initial Validation

if (!question.StartsWith("What is") || !question.EndsWith("?"))

This is our first line of defense. We perform a quick, cheap check to ensure the question has the correct prefix and suffix. If not, we fail fast by throwing an ArgumentException with a clear message.

3. Tokenization and Queue Initialization

var matches = TokenizerRegex.Matches(question);

Here, we execute the regex against the input string. The result, matches, is a collection of all substrings that match our pattern.

var tokens = new Queue<string>(matches.Cast<Match>().Select(m => m.Value));

We convert the MatchCollection into a Queue<string>. A queue is a First-In, First-Out (FIFO) data structure, which is perfect for our task because we need to process the tokens in the exact order they appeared in the question. Using Dequeue() lets us consume tokens one by one.

4. The Evaluation Loop Logic

The core of the evaluation logic resides inside the while (tokens.Count > 0) loop. This loop continues as long as there are tokens left to process.

  ● Start Loop (tokens available)
  │
  ▼
┌───────────────────┐
│ Check Token Count │
│ (Must be >= 2)    │
└─────────┬─────────┘
          │
          ▼
   ◆ Is Count < 2?
  ╱               ╲
 Yes               No
  │                 │
  ▼                 ▼
┌───────────┐   ┌───────────────────┐
│ Throw     │   │ Dequeue Operator  │
│ Exception │   │ & Operand String  │
└───────────┘   └─────────┬─────────┘
                          │
                          ▼
                    ┌───────────────────┐
                    │ Try Parse Operand │
                    │ to Integer        │
                    └─────────┬─────────┘
                              │
                              ▼
                         ◆ Successful?
                        ╱             ╲
                       Yes             No
                        │               │
                        ▼               ▼
                  ┌───────────┐   ┌───────────┐
                  │ Perform   │   │ Throw     │
                  │ Operation │   │ Exception │
                  │ (Switch)  │   └───────────┘
                  └─────┬─────┘
                        │
                        ▼
                  ┌───────────┐
                  │ Update    │
                  │ Result    │
                  └─────┬─────┘
                        │
                        └───────────> To Start of Loop

Inside the loop, we first dequeue the operator (e.g., "plus") and then the operand string (e.g., "13"). A critical validation step is to ensure the operand can be parsed into an integer. If int.TryParse fails, it means we have a syntax error like "5 plus six" or "3 multiplied by plus 4", and we throw an exception.

5. The Switch Expression

result = operation switch { ... };

This is the calculation engine. The C# 8.0+ switch expression provides a highly readable and concise way to perform the correct mathematical operation based on the operator string. The result of the expression (e.g., result + operand) is directly assigned back to our running total, result. This is cleaner and less error-prone than a traditional if-else if chain or a switch statement.


When to Choose Different Approaches? (Pros & Cons)

While the Regex approach is powerful and elegant, it's not the only way to solve the Wordy problem. A more manual approach using string.Split() is also viable and can be easier to understand for developers less comfortable with regular expressions. Understanding the trade-offs between these methods is key to choosing the right tool for the job.

Approach Pros Cons Best For
Regular Expressions
  • Concise & Powerful: A single pattern can define complex extraction rules.
  • Robust: Handles variations like multi-word operators ("multiplied by") seamlessly.
  • Efficient: Compiled regex is highly optimized for pattern matching.
  • Readability: Complex regex patterns can be difficult to read and debug ("write-only code").
  • Steeper Learning Curve: Requires knowledge of regex syntax.
Problems with well-defined but potentially complex patterns, where a single pass for tokenization is desired. It's the professional-grade choice for this kind of parsing.
String Splitting & Iteration
  • Easy to Understand: The logic is very explicit and follows a simple step-by-step flow.
  • No Special Syntax: Relies only on basic string methods like Split and Replace.
  • More Verbose: Requires more lines of code for cleanup and processing.
  • Brittle: Can easily break if the input format changes slightly (e.g., extra spaces). Handling multi-word operators is more complex.
Simpler parsing tasks where the input format is extremely rigid and simple, or for developers who are just beginning to learn string manipulation.

Alternative Approach: String Splitting Code Sketch

Here's what a solution using string splitting might look like to illustrate the difference.


public static int AnswerWithSplit(string question)
{
    // 1. Cleanup and basic validation
    if (!question.StartsWith("What is") || !question.EndsWith("?"))
        throw new ArgumentException("Invalid format.");

    string coreProblem = question.Substring("What is".Length, question.Length - "What is".Length - "?".Length).Trim();
    
    if (string.IsNullOrEmpty(coreProblem))
        throw new ArgumentException("Empty problem.");

    // Replace multi-word operators to simplify splitting
    coreProblem = coreProblem.Replace("multiplied by", "multiplied").Replace("divided by", "divided");

    // 2. Split into parts
    string[] parts = coreProblem.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);

    if (parts.Length == 0)
        throw new ArgumentException("No parts found.");

    // ... rest of the logic to loop through the 'parts' array ...
    // This part becomes more complex as you have to manage indices and
    // differentiate between numbers and operators manually.
    // ...
    return 0; // Placeholder
}

As you can see, this approach requires more manual cleanup and the logic to iterate through the resulting array is more complex than simply dequeueing from a pre-validated list of tokens.


Frequently Asked Questions (FAQ)

1. What's the best way to handle unsupported operations?
The most robust way is to throw an ArgumentException or a more specific custom exception. Your regex should be designed to only match known operations. If an unknown operation (e.g., "What is 5 to the power of 3?") is passed, the regex won't match "to the power of", and the subsequent logic will likely fail because it will see two numbers in a row. A well-placed validation check can catch this and provide a clear error message like "Syntax error: two consecutive numbers" or "Unknown operation".

2. How could I extend this parser to support the order of operations (PEMDAS/BODMAS)?
Supporting order of operations is a significant step up in complexity. The current left-to-right evaluation model is insufficient. You would need to implement a more advanced parsing algorithm, such as the Shunting-yard algorithm, to convert the infix notation (like 5 + 3 * 2) into postfix notation (Reverse Polish Notation: 5 3 2 * +). Alternatively, you could build an Abstract Syntax Tree (AST) that represents the expression's structure, then evaluate the tree recursively.

3. Why use a compiled Regex instead of a static `Regex.Match` call?
When a regex pattern will be reused multiple times, pre-compiling it into a Regex object via new Regex(pattern, RegexOptions.Compiled) offers a significant performance benefit. The static methods like Regex.Match(input, pattern) are great for one-off uses, but they either re-compile the pattern on each call or use an internal cache with limited size. For a method that could be called frequently, creating a static readonly compiled instance is the best practice.

4. Is this a form of Natural Language Processing (NLP)?
This is a very rudimentary form of NLP. True NLP involves much more complex tasks like understanding grammar, context, intent, and ambiguity. Our Wordy parser works because it operates on a very strict and predictable sentence structure. It's an excellent first step into the world of language parsing, but it doesn't employ the sophisticated models and techniques (like machine learning) used in modern NLP systems.

5. How do I test this code effectively?
Unit testing is essential. Using a testing framework like xUnit, NUnit, or MSTest, you should create a suite of tests that cover all valid cases and expected failure modes.
  • Happy Path: Test simple addition, subtraction, multiple operations, and negative numbers.
  • Edge Cases: Test a single number question ("What is 5?").
  • Error Conditions: Write tests that assert an ArgumentException is thrown for invalid inputs like "What is 5 plus plus 6?", "Who is the President?", "What is 52?", and questions with unsupported operations.

6. What are some common pitfalls when solving this problem?
Common pitfalls include: writing a regex that is too greedy or doesn't account for negative numbers; off-by-one errors when iterating through tokens manually; not handling malformed input gracefully (e.g., crashing instead of throwing an exception); and forgetting to handle multi-word operators like "multiplied by" when using a simple string split approach.

Conclusion & Next Steps

You have successfully journeyed from a simple English question to a working C# calculator. By building a solution to the Wordy problem, you've practiced and solidified critical software engineering skills: advanced string parsing with regular expressions, algorithmic design, state management within a loop, and robust, defensive error handling. These are not just puzzle-solving skills; they are the bedrock of building reliable, intelligent applications.

The true lesson here is the power of transforming unstructured input into a structured, computable format. Whether your next project involves a chatbot, a data import tool, or a custom scripting language, the principles of tokenization and sequential evaluation you've learned here will be invaluable.

Disclaimer: The code and explanations in this article are based on C# 12 and the .NET 8 runtime. While the core logic is transferable, specific syntax like switch expressions may require adjustments for older versions of C#.

Ready to tackle the next challenge? Explore our complete C# Learning Roadmap for more modules from the exclusive kodikra.com curriculum. To strengthen your foundational knowledge, dive deeper into C# with our comprehensive language guides.


Published by Kodikra — Your trusted Csharp learning resource.