Wordy in C: Complete Solution & Deep Dive Guide

a close up of a computer screen with code on it

Mastering String Parsing in C: A Deep Dive into Solving Math Word Problems

Solving math word problems in C requires a robust strategy for converting natural language into computable logic. The optimal approach involves using string tokenization functions like strtok_r to break the input into words, then processing these tokens sequentially with a state machine to identify numbers and operators, ultimately performing the arithmetic while rigorously handling syntax errors.

Ever stared at a line of user input, a seemingly simple sentence, and wondered how to teach your program to understand its meaning? It's a fundamental challenge in software development: bridging the gap between human language and the rigid logic of a machine. A user might type "What is 5 plus 10?", a question a child could answer, yet for a C program, it's just an array of characters without inherent meaning.

This is where the art of parsing comes in. It’s the process of analyzing a string of symbols to understand its grammatical structure. In this deep dive, we'll demystify this process. We will build a C program from scratch that can parse these simple math questions, calculate the answer, and gracefully handle invalid inputs. This isn't just an academic exercise; it's a foundational skill for building interpreters, command-line tools, and any software that needs to interact intelligently with user input.

What is the "Wordy" Problem? The Core Challenge

The "Wordy" problem, a classic challenge from the kodikra.com learning path, is a specialized string parsing task. The goal is to write a function that takes a single string as input—a question posed in English—and returns the integer result of the mathematical operation described within it.

The input follows a specific, albeit simple, grammar. It always starts with "What is", ends with a question mark, and contains numbers and operator words in between.

Simple Number: "What is 5?" should evaluate to 5.
Basic Operations: "What is 5 plus 13?" should evaluate to 18.
Chained Operations: "What is 3 plus 2 multiplied by 3?" should evaluate to 15 (processed left-to-right, ignoring standard operator precedence).
Negative Numbers: "What is -3 minus 10?" should evaluate to -13.

The program must be robust enough to identify and reject malformed questions. Inputs like "What is 5 plus plus 6?" or "Who is the President?" are not valid math problems and should be flagged as errors. This requirement forces us to think not just about the "happy path" but also about comprehensive error handling.

Why This is a Crucial Skill for C Programmers

Working in C means working closer to the metal, with less abstraction than in many modern languages. While this provides immense power and performance, it also means that tasks like string manipulation require careful, manual implementation. Mastering this kindis of parsing is not just about solving one problem; it's about building a versatile skillset.

This skill is directly transferable to numerous real-world applications:

Building Command-Line Interfaces (CLIs): Parsing arguments and commands (e.g., git commit -m "Initial commit").
Creating Interpreters: The first step in building an interpreter for a scripting language is lexical analysis, or tokenization, which is exactly what we'll be doing.
Parsing Configuration Files: Reading settings from files where data is stored in a human-readable format (e.g., key = value).
Network Protocol Implementation: Deconstructing data packets that arrive as a stream of bytes or characters to extract meaningful information.

By tackling this challenge, you are sharpening your understanding of pointers, memory management, and algorithmic thinking—three pillars of proficient C programming. You can explore more advanced C concepts in the complete C guide on kodikra.com.

How to Solve the Wordy Problem: A Step-by-Step Strategy

Our approach can be broken down into a clear, logical sequence of steps. We will transform the raw string into a calculated integer, handling potential pitfalls along the way.

The Overall Logic Flow

Before diving into code, let's visualize the high-level plan. We take the input string, clean and validate it, break it into pieces (tokens), process these pieces in order, and produce a final result.

    ● Start ("What is 5 plus 10?")
    │
    ▼
  ┌───────────────────────────┐
  │ Validate Prefix & Suffix  │
  │ ("What is ... ?")         │
  └────────────┬──────────────┘
               │
               ▼
  ┌───────────────────────────┐
  │ Tokenize The Core String  │
  │ e.g., "5", "plus", "10"   │
  └────────────┬──────────────┘
               │
               ▼
  ┌───────────────────────────┐
  │ Process Tokens Sequentially │
  │ (Using a State Machine)   │
  └────────────┬──────────────┘
               │
               ▼
  ┌───────────────────────────┐
  │ Perform Arithmetic &      │
  │ Update Result             │
  └────────────┬──────────────┘
               │
               ▼
    ● End (Result: 15 or Error)

Step 1: Sanitize and Validate the Input

A robust program never trusts its input. The first step is to ensure the question is in a format we can even begin to understand.

Check the Prefix: The question must start with "What is". We can use strncmp for a safe, bounded comparison.
Check the Suffix: The question must end with a '?'.
Handle Trivial Case: If the question is just "What is?", it's an error.

If any of these checks fail, we immediately know it's an invalid problem and can stop processing.

Step 2: Tokenization - Breaking the String into Words

Once we've stripped the "What is" prefix and the "?" suffix, we're left with the core of the problem, like "5 plus 10". We need to split this string by spaces to get individual "tokens": "5", "plus", and "10".

The standard C library provides strtok for this, but it has a major flaw: it uses a static internal buffer, making it non-reentrant and not thread-safe. A much better and safer alternative is strtok_r (the 'r' stands for reentrant). It requires an extra pointer argument (often called saveptr) to maintain its state between calls, making it completely self-contained.


// Example of using strtok_r
char *saveptr;
char *token = strtok_r(question_body, " ", &saveptr);
while (token != NULL) {
    // Process the token here...
    token = strtok_r(NULL, " ", &saveptr);
}

Step 3: The State Machine - Processing Tokens in Order

We can't process tokens in isolation. The meaning of a token depends on what came before it. We expect a strict alternating pattern: NUMBER, then OPERATOR, then NUMBER, and so on. This is a perfect use case for a simple state machine.

We can define two states:

EXPECTING_NUMBER: We are at a point where the next token must be a number. This is the initial state.
EXPECTING_OPERATOR: We have just processed a number, so the next token must be an operator.

If we encounter a token that doesn't match the expected type for our current state, we know the syntax is invalid.

Detailed State Machine Logic

This diagram illustrates the decision-making process inside our token-processing loop.

    ● Start Loop (with tokens)
    │
    ▼
  ┌────────────────┐
  │ Get Next Token │
  └────────┬───────┘
           │
           ▼
    ◆ Current State is EXPECTING_NUMBER?
   ╱                                   ╲
  Yes                                   No (Must be EXPECTING_OPERATOR)
  │                                     │
  ▼                                     ▼
┌──────────────────┐               ┌──────────────────┐
│ Is token a valid │               │ Is token a valid │
│ number?          │               │ operator?        │
└──────────────────┘               └──────────────────┘
   ╱         ╲                        ╱         ╲
 Yes          No                    Yes          No
  │            │                     │            │
  ▼            ▼                     ▼            ▼
┌───────────┐ [Syntax Error]   ┌─────────────┐ [Syntax Error]
│ Update    │                  │ Store       │
│ Result    │                  │ Operation   │
│ & Switch  │                  │ & Switch    │
│ to        │                  │ to          │
│ EXPECTING_ │                  │ EXPECTING_  │
│ OPERATOR  │                  │ NUMBER      │
└───────────┘                  └─────────────┘
  │                                     │
  └─────────────────┬───────────────────┘
                    │
                    ▼
             ◆ Any More Tokens?
            ╱                  ╲
          Yes                   No
           │                     │
           ▼                     ▼
    [Continue Loop]          ● End Loop (Return Result)

The Complete C Solution

Now, let's assemble these concepts into a complete, working C solution. We'll structure it with a header file for the function prototype and a source file for the implementation.

Header File: `wordy.h`

This file defines the public interface for our wordy parser. It's clean and simple, exposing only what's necessary.


#ifndef WORDY_H
#define WORDY_H

#include <stdbool.h>

// The main function to parse the word problem.
// It returns true on success and false on failure.
// If successful, the result is stored in the 'answer' pointer.
bool answer(const char *question, int *result);

#endif

Implementation File: `wordy.c`

This is where the core logic resides. The code is heavily commented to explain each step of the process, from validation to final calculation.


#include "wordy.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

// Define states for our simple parser state machine
typedef enum {
    EXPECTING_NUMBER,
    EXPECTING_OPERATOR
} ParserState;

// Define known operations
typedef enum {
    OP_NONE,
    OP_PLUS,
    OP_MINUS,
    OP_MULTIPLIED,
    OP_DIVIDED
} Operation;

// Helper function to check if a string is a valid number (handles negatives)
static bool is_number(const char *str) {
    if (!str || *str == '\0') return false;
    int i = 0;
    if (str[0] == '-') {
        if (str[1] == '\0') return false; // Just a "-" is not a number
        i = 1;
    }
    for (; str[i] != '\0'; i++) {
        if (!isdigit(str[i])) return false;
    }
    return true;
}

bool answer(const char *question, int *result) {
    if (!question || !result) {
        return false; // Invalid arguments
    }

    // 1. Basic validation of the question format
    const char *prefix = "What is";
    if (strncmp(question, prefix, strlen(prefix)) != 0) {
        return false;
    }

    size_t q_len = strlen(question);
    if (q_len == 0 || question[q_len - 1] != '?') {
        return false;
    }

    // Create a mutable copy to use with strtok_r
    char *question_copy = strdup(question);
    if (!question_copy) {
        return false; // Memory allocation failure
    }

    // 2. Isolate the core mathematical expression
    // Move pointer past "What is "
    char *expression = question_copy + strlen(prefix) + 1; 
    // Remove the trailing '?'
    expression[strlen(expression) - 1] = '\0'; 

    if (strlen(expression) == 0) {
        free(question_copy);
        return false; // Empty expression like "What is?"
    }

    // 3. Tokenization and State Machine Processing
    char *saveptr;
    char *token = strtok_r(expression, " ", &saveptr);
    
    if (token == NULL) {
        free(question_copy);
        return false; // No tokens found
    }

    // The first token MUST be a number
    if (!is_number(token)) {
        free(question_copy);
        return false;
    }

    *result = atoi(token);
    ParserState state = EXPECTING_OPERATOR;
    Operation current_op = OP_NONE;

    token = strtok_r(NULL, " ", &saveptr);

    while (token != NULL) {
        if (state == EXPECTING_OPERATOR) {
            if (strcmp(token, "plus") == 0) current_op = OP_PLUS;
            else if (strcmp(token, "minus") == 0) current_op = OP_MINUS;
            else if (strcmp(token, "multiplied") == 0) current_op = OP_MULTIPLIED;
            else if (strcmp(token, "divided") == 0) current_op = OP_DIVIDED;
            else if (strcmp(token, "by") == 0) {
                // "multiplied by" and "divided by" are multi-word operators
                // We just consume "by" and wait for the next token (number)
                if (current_op != OP_MULTIPLIED && current_op != OP_DIVIDED) {
                    free(question_copy);
                    return false; // "by" without "multiplied" or "divided" is a syntax error
                }
                // We are still expecting a number after "by"
                state = EXPECTING_NUMBER;
                token = strtok_r(NULL, " ", &saveptr);
                continue;
            } else {
                free(question_copy);
                return false; // Unknown operation or syntax error
            }
            state = EXPECTING_NUMBER;
        } else { // EXPECTING_NUMBER
            if (!is_number(token)) {
                free(question_copy);
                return false; // Expected a number, got something else
            }
            int number = atoi(token);
            switch (current_op) {
                case OP_PLUS: *result += number; break;
                case OP_MINUS: *result -= number; break;
                case OP_MULTIPLIED: *result *= number; break;
                case OP_DIVIDED: 
                    if (number == 0) { // Division by zero is an error
                        free(question_copy);
                        return false;
                    }
                    *result /= number; 
                    break;
                case OP_NONE: // Should not happen after the first number
                default:
                    free(question_copy);
                    return false;
            }
            state = EXPECTING_OPERATOR;
            current_op = OP_NONE;
        }
        token = strtok_r(NULL, " ", &saveptr);
    }

    free(question_copy);

    // If we end in a state where we were still expecting a number, it's an error
    // e.g., "What is 5 plus?"
    if (state == EXPECTING_NUMBER) {
        return false;
    }
    
    return true;
}

Code Walkthrough

Includes and Enums: We include standard libraries and define enums for ParserState and Operation. Using enums makes the code far more readable and less error-prone than using magic numbers or strings.
is_number Helper: A small, static helper function to validate if a string represents a valid integer, correctly handling negative signs.
Initial Validation: The answer function begins by checking for NULL pointers and then validates the "What is" prefix and "?" suffix using strncmp and index checking.
Memory Management: Since strtok_r modifies the string it operates on, we cannot use the const char *question directly. We create a temporary, modifiable copy using strdup. It's crucial to remember to free this memory before every exit point to prevent memory leaks.
Tokenization Setup: We isolate the core expression (e.g., "5 plus 10") and initialize strtok_r.
Processing the First Token: The very first token is a special case; it *must* be a number. We check this, convert it using atoi to initialize our result, and set the state to EXPECTING_OPERATOR.
The Main Loop: The while loop iterates through the remaining tokens.
- If state is EXPECTING_OPERATOR, we compare the token against our known operator words. We handle the special case of "multiplied by" and "divided by" by simply consuming the "by" token. If an unknown word is found, it's a syntax error. We then switch the state to EXPECTING_NUMBER.
- If state is EXPECTING_NUMBER, we validate the token is a number, convert it, and perform the stored current_op. We also include a check for division by zero. After the calculation, we switch back to EXPECTING_OPERATOR.
Final State Check: After the loop finishes, we perform one last check. If the final state is EXPECTING_NUMBER, it means the question ended prematurely (e.g., "What is 5 plus?"). This is a syntax error.
Cleanup and Return: We free the duplicated string and return true for a successful parse or false for any error encountered.

Evaluating Alternative Approaches and Trade-offs

Our manual, state-machine-based parser is highly efficient and has no external dependencies, making it perfect for this specific problem scope. However, it's important to understand its limitations and when a different tool might be more appropriate.

Pros and Cons of Manual Parsing

Here is a comparison of our approach versus using a more powerful parser generator tool like Flex/Bison.

Feature	Manual Parsing (Our Approach)	Using a Parser Generator (e.g., Flex/Bison)
Control & Transparency	Full, granular control over every step. The logic is explicit and easy to trace.	Logic is more abstract, defined by grammar rules. Less direct control over the parsing process.
Dependencies	Zero external dependencies. Relies only on the C standard library.	Requires external tools (Flex, Bison) and their corresponding runtime libraries.
Scalability	Becomes complex and hard to maintain as the grammar rules increase (e.g., adding parentheses, operator precedence).	Excellent for managing complex, formal grammars. Scales very well.
Learning Curve	Low. If you know C string functions, you can build this.	Steep. Requires learning the specific syntax and concepts of the generator tools.
Error Reporting	Error handling is manual and can be tailored, but might be less systematic.	Can provide more structured and detailed error messages based on grammar violations.

For the scope defined in this kodikra module, our approach is ideal. It's a fantastic learning tool that reinforces core C programming concepts. If the requirements were to expand to include standard operator precedence (PEMDAS) or nested expressions with parentheses, switching to a more robust parsing technique like the Shunting-yard algorithm or a parser generator would be the professional choice.

Frequently Asked Questions (FAQ)

1. Why did you use strtok_r instead of the simpler strtok?: strtok uses a hidden static variable to keep track of its position in the string. This makes it non-reentrant and not thread-safe. If you were parsing two different strings in a multi-threaded program, calls to strtok could interfere with each other, leading to corrupted data. strtok_r solves this by requiring the caller to provide a "save pointer," making each call independent and safe for concurrent use.
2. How could this solution be extended to handle floating-point numbers?: You would need to change the result and intermediate number variables from int to double or float. Instead of atoi, you would use strtod for conversion, which also provides better error checking. The is_number helper would also need to be updated to recognize decimal points.
3. What is the best way to report specific errors in a C parser?: Our boolean return is simple but not very descriptive. A more advanced approach would be to have the function return an enum error code (e.g., SYNTAX_ERROR, INVALID_NUMBER, DIVISION_BY_ZERO). You could also pass a pointer to a character buffer (char **error_message) that the function can populate with a human-readable error string.
4. Could this problem be solved with regular expressions?: Yes, to an extent. You could craft a regex to validate the entire string structure. However, for chained operations, extracting and calculating the result would still require iterative processing. A regex might be good for initial validation, but the state machine approach is often more efficient and clearer for performing the actual step-by-step calculations.
5. How would you handle operator precedence (e.g., multiplication before addition)?: Our current left-to-right parser intentionally ignores precedence. To implement it correctly, you would need a more sophisticated algorithm. A common choice is Dijkstra's Shunting-yard algorithm, which converts the infix expression (e.g., "3 + 4 * 2") into a postfix (Reverse Polish Notation) expression ("3 4 2 * +"). This RPN expression can then be easily evaluated using a stack.
6. What are some common pitfalls when using strdup and strtok_r?: The most common pitfall with strdup is forgetting to free the allocated memory, leading to memory leaks. Forgetting to handle the case where strdup returns NULL (on memory allocation failure) can lead to crashes. With strtok_r, the main pitfall is passing the original constant string instead of a mutable copy, which results in undefined behavior and likely a crash.

Conclusion: From Words to Wisdom

We have successfully journeyed from a simple English question to a fully functional C parser capable of providing an integer answer. By combining careful validation, safe tokenization with strtok_r, and a clear state machine logic, we built a solution that is both robust and efficient.

This exercise from the kodikra C learning path is more than just a puzzle; it's a practical lesson in algorithmic thinking and defensive programming in C. The skills you've applied here—managing memory with strdup/free, handling strings safely, and implementing stateful logic—are the bedrock of countless complex applications. You now have a solid foundation for tackling even more ambitious parsing challenges.

Disclaimer: All code snippets and logic are based on the C11 standard and modern best practices. The provided solution is designed for clarity and educational purposes.

Published by Kodikra — Your trusted C learning resource.

kodikra

Search this blog

Wordy in C: Complete Solution & Deep Dive Guide

Mastering String Parsing in C: A Deep Dive into Solving Math Word Problems

What is the "Wordy" Problem? The Core Challenge

Why This is a Crucial Skill for C Programmers

How to Solve the Wordy Problem: A Step-by-Step Strategy

The Overall Logic Flow

Step 1: Sanitize and Validate the Input

Step 2: Tokenization - Breaking the String into Words

Step 3: The State Machine - Processing Tokens in Order

Detailed State Machine Logic

The Complete C Solution

Header File: `wordy.h`

Implementation File: `wordy.c`

Code Walkthrough

Evaluating Alternative Approaches and Trade-offs

Pros and Cons of Manual Parsing

Frequently Asked Questions (FAQ)

Conclusion: From Words to Wisdom

Post a Comment

The Complete Idris Guide: From Zero to Expert

The Complete Odin Guide: From Zero to Expert

The Complete Vlang Guide: From Zero to Expert

The Complete Roc Guide: From Zero to Expert

The Complete D Guide: From Zero to Expert

Kodikra

Wordy in C: Complete Solution & Deep Dive Guide

Mastering String Parsing in C: A Deep Dive into Solving Math Word Problems

What is the "Wordy" Problem? The Core Challenge

Why This is a Crucial Skill for C Programmers

How to Solve the Wordy Problem: A Step-by-Step Strategy

The Overall Logic Flow

Step 1: Sanitize and Validate the Input

Step 2: Tokenization - Breaking the String into Words

Step 3: The State Machine - Processing Tokens in Order

Detailed State Machine Logic

The Complete C Solution

Header File: wordy.h

Implementation File: wordy.c

Code Walkthrough

Evaluating Alternative Approaches and Trade-offs

Pros and Cons of Manual Parsing

Frequently Asked Questions (FAQ)

Conclusion: From Words to Wisdom

Post a Comment

The Complete Idris Guide: From Zero to Expert

The Complete Odin Guide: From Zero to Expert

The Complete Vlang Guide: From Zero to Expert

The Complete Roc Guide: From Zero to Expert

The Complete D Guide: From Zero to Expert

Header File: `wordy.h`

Implementation File: `wordy.c`