Wordy in C: Complete Solution & Deep Dive Guide
Mastering String Parsing in C: A Deep Dive into Solving Math Word Problems
Solving math word problems in C requires a robust strategy for converting natural language into computable logic. The optimal approach involves using string tokenization functions like strtok_r to break the input into words, then processing these tokens sequentially with a state machine to identify numbers and operators, ultimately performing the arithmetic while rigorously handling syntax errors.
Ever stared at a line of user input, a seemingly simple sentence, and wondered how to teach your program to understand its meaning? It's a fundamental challenge in software development: bridging the gap between human language and the rigid logic of a machine. A user might type "What is 5 plus 10?", a question a child could answer, yet for a C program, it's just an array of characters without inherent meaning.
This is where the art of parsing comes in. It’s the process of analyzing a string of symbols to understand its grammatical structure. In this deep dive, we'll demystify this process. We will build a C program from scratch that can parse these simple math questions, calculate the answer, and gracefully handle invalid inputs. This isn't just an academic exercise; it's a foundational skill for building interpreters, command-line tools, and any software that needs to interact intelligently with user input.
What is the "Wordy" Problem? The Core Challenge
The "Wordy" problem, a classic challenge from the kodikra.com learning path, is a specialized string parsing task. The goal is to write a function that takes a single string as input—a question posed in English—and returns the integer result of the mathematical operation described within it.
The input follows a specific, albeit simple, grammar. It always starts with "What is", ends with a question mark, and contains numbers and operator words in between.
- Simple Number: "What is 5?" should evaluate to
5. - Basic Operations: "What is 5 plus 13?" should evaluate to
18. - Chained Operations: "What is 3 plus 2 multiplied by 3?" should evaluate to
15(processed left-to-right, ignoring standard operator precedence). - Negative Numbers: "What is -3 minus 10?" should evaluate to
-13.
The program must be robust enough to identify and reject malformed questions. Inputs like "What is 5 plus plus 6?" or "Who is the President?" are not valid math problems and should be flagged as errors. This requirement forces us to think not just about the "happy path" but also about comprehensive error handling.
Why This is a Crucial Skill for C Programmers
Working in C means working closer to the metal, with less abstraction than in many modern languages. While this provides immense power and performance, it also means that tasks like string manipulation require careful, manual implementation. Mastering this kindis of parsing is not just about solving one problem; it's about building a versatile skillset.
This skill is directly transferable to numerous real-world applications:
- Building Command-Line Interfaces (CLIs): Parsing arguments and commands (e.g.,
git commit -m "Initial commit"). - Creating Interpreters: The first step in building an interpreter for a scripting language is lexical analysis, or tokenization, which is exactly what we'll be doing.
- Parsing Configuration Files: Reading settings from files where data is stored in a human-readable format (e.g.,
key = value). - Network Protocol Implementation: Deconstructing data packets that arrive as a stream of bytes or characters to extract meaningful information.
By tackling this challenge, you are sharpening your understanding of pointers, memory management, and algorithmic thinking—three pillars of proficient C programming. You can explore more advanced C concepts in the complete C guide on kodikra.com.
How to Solve the Wordy Problem: A Step-by-Step Strategy
Our approach can be broken down into a clear, logical sequence of steps. We will transform the raw string into a calculated integer, handling potential pitfalls along the way.
The Overall Logic Flow
Before diving into code, let's visualize the high-level plan. We take the input string, clean and validate it, break it into pieces (tokens), process these pieces in order, and produce a final result.
● Start ("What is 5 plus 10?")
│
▼
┌───────────────────────────┐
│ Validate Prefix & Suffix │
│ ("What is ... ?") │
└────────────┬──────────────┘
│
▼
┌───────────────────────────┐
│ Tokenize The Core String │
│ e.g., "5", "plus", "10" │
└────────────┬──────────────┘
│
▼
┌───────────────────────────┐
│ Process Tokens Sequentially │
│ (Using a State Machine) │
└────────────┬──────────────┘
│
▼
┌───────────────────────────┐
│ Perform Arithmetic & │
│ Update Result │
└────────────┬──────────────┘
│
▼
● End (Result: 15 or Error)
Step 1: Sanitize and Validate the Input
A robust program never trusts its input. The first step is to ensure the question is in a format we can even begin to understand.
- Check the Prefix: The question must start with "What is". We can use
strncmpfor a safe, bounded comparison. - Check the Suffix: The question must end with a '?'.
- Handle Trivial Case: If the question is just "What is?", it's an error.
If any of these checks fail, we immediately know it's an invalid problem and can stop processing.
Step 2: Tokenization - Breaking the String into Words
Once we've stripped the "What is" prefix and the "?" suffix, we're left with the core of the problem, like "5 plus 10". We need to split this string by spaces to get individual "tokens": "5", "plus", and "10".
The standard C library provides strtok for this, but it has a major flaw: it uses a static internal buffer, making it non-reentrant and not thread-safe. A much better and safer alternative is strtok_r (the 'r' stands for reentrant). It requires an extra pointer argument (often called saveptr) to maintain its state between calls, making it completely self-contained.
// Example of using strtok_r
char *saveptr;
char *token = strtok_r(question_body, " ", &saveptr);
while (token != NULL) {
// Process the token here...
token = strtok_r(NULL, " ", &saveptr);
}
Step 3: The State Machine - Processing Tokens in Order
We can't process tokens in isolation. The meaning of a token depends on what came before it. We expect a strict alternating pattern: NUMBER, then OPERATOR, then NUMBER, and so on. This is a perfect use case for a simple state machine.
We can define two states:
EXPECTING_NUMBER: We are at a point where the next token must be a number. This is the initial state.EXPECTING_OPERATOR: We have just processed a number, so the next token must be an operator.
If we encounter a token that doesn't match the expected type for our current state, we know the syntax is invalid.
Detailed State Machine Logic
This diagram illustrates the decision-making process inside our token-processing loop.
● Start Loop (with tokens)
│
▼
┌────────────────┐
│ Get Next Token │
└────────┬───────┘
│
▼
◆ Current State is EXPECTING_NUMBER?
╱ ╲
Yes No (Must be EXPECTING_OPERATOR)
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Is token a valid │ │ Is token a valid │
│ number? │ │ operator? │
└──────────────────┘ └──────────────────┘
╱ ╲ ╱ ╲
Yes No Yes No
│ │ │ │
▼ ▼ ▼ ▼
┌───────────┐ [Syntax Error] ┌─────────────┐ [Syntax Error]
│ Update │ │ Store │
│ Result │ │ Operation │
│ & Switch │ │ & Switch │
│ to │ │ to │
│ EXPECTING_ │ │ EXPECTING_ │
│ OPERATOR │ │ NUMBER │
└───────────┘ └─────────────┘
│ │
└─────────────────┬───────────────────┘
│
▼
◆ Any More Tokens?
╱ ╲
Yes No
│ │
▼ ▼
[Continue Loop] ● End Loop (Return Result)
The Complete C Solution
Now, let's assemble these concepts into a complete, working C solution. We'll structure it with a header file for the function prototype and a source file for the implementation.
Header File: wordy.h
This file defines the public interface for our wordy parser. It's clean and simple, exposing only what's necessary.
#ifndef WORDY_H
#define WORDY_H
#include <stdbool.h>
// The main function to parse the word problem.
// It returns true on success and false on failure.
// If successful, the result is stored in the 'answer' pointer.
bool answer(const char *question, int *result);
#endif
Implementation File: wordy.c
This is where the core logic resides. The code is heavily commented to explain each step of the process, from validation to final calculation.
#include "wordy.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
// Define states for our simple parser state machine
typedef enum {
EXPECTING_NUMBER,
EXPECTING_OPERATOR
} ParserState;
// Define known operations
typedef enum {
OP_NONE,
OP_PLUS,
OP_MINUS,
OP_MULTIPLIED,
OP_DIVIDED
} Operation;
// Helper function to check if a string is a valid number (handles negatives)
static bool is_number(const char *str) {
if (!str || *str == '\0') return false;
int i = 0;
if (str[0] == '-') {
if (str[1] == '\0') return false; // Just a "-" is not a number
i = 1;
}
for (; str[i] != '\0'; i++) {
if (!isdigit(str[i])) return false;
}
return true;
}
bool answer(const char *question, int *result) {
if (!question || !result) {
return false; // Invalid arguments
}
// 1. Basic validation of the question format
const char *prefix = "What is";
if (strncmp(question, prefix, strlen(prefix)) != 0) {
return false;
}
size_t q_len = strlen(question);
if (q_len == 0 || question[q_len - 1] != '?') {
return false;
}
// Create a mutable copy to use with strtok_r
char *question_copy = strdup(question);
if (!question_copy) {
return false; // Memory allocation failure
}
// 2. Isolate the core mathematical expression
// Move pointer past "What is "
char *expression = question_copy + strlen(prefix) + 1;
// Remove the trailing '?'
expression[strlen(expression) - 1] = '\0';
if (strlen(expression) == 0) {
free(question_copy);
return false; // Empty expression like "What is?"
}
// 3. Tokenization and State Machine Processing
char *saveptr;
char *token = strtok_r(expression, " ", &saveptr);
if (token == NULL) {
free(question_copy);
return false; // No tokens found
}
// The first token MUST be a number
if (!is_number(token)) {
free(question_copy);
return false;
}
*result = atoi(token);
ParserState state = EXPECTING_OPERATOR;
Operation current_op = OP_NONE;
token = strtok_r(NULL, " ", &saveptr);
while (token != NULL) {
if (state == EXPECTING_OPERATOR) {
if (strcmp(token, "plus") == 0) current_op = OP_PLUS;
else if (strcmp(token, "minus") == 0) current_op = OP_MINUS;
else if (strcmp(token, "multiplied") == 0) current_op = OP_MULTIPLIED;
else if (strcmp(token, "divided") == 0) current_op = OP_DIVIDED;
else if (strcmp(token, "by") == 0) {
// "multiplied by" and "divided by" are multi-word operators
// We just consume "by" and wait for the next token (number)
if (current_op != OP_MULTIPLIED && current_op != OP_DIVIDED) {
free(question_copy);
return false; // "by" without "multiplied" or "divided" is a syntax error
}
// We are still expecting a number after "by"
state = EXPECTING_NUMBER;
token = strtok_r(NULL, " ", &saveptr);
continue;
} else {
free(question_copy);
return false; // Unknown operation or syntax error
}
state = EXPECTING_NUMBER;
} else { // EXPECTING_NUMBER
if (!is_number(token)) {
free(question_copy);
return false; // Expected a number, got something else
}
int number = atoi(token);
switch (current_op) {
case OP_PLUS: *result += number; break;
case OP_MINUS: *result -= number; break;
case OP_MULTIPLIED: *result *= number; break;
case OP_DIVIDED:
if (number == 0) { // Division by zero is an error
free(question_copy);
return false;
}
*result /= number;
break;
case OP_NONE: // Should not happen after the first number
default:
free(question_copy);
return false;
}
state = EXPECTING_OPERATOR;
current_op = OP_NONE;
}
token = strtok_r(NULL, " ", &saveptr);
}
free(question_copy);
// If we end in a state where we were still expecting a number, it's an error
// e.g., "What is 5 plus?"
if (state == EXPECTING_NUMBER) {
return false;
}
return true;
}
Code Walkthrough
- Includes and Enums: We include standard libraries and define
enumsforParserStateandOperation. Using enums makes the code far more readable and less error-prone than using magic numbers or strings. is_numberHelper: A small, static helper function to validate if a string represents a valid integer, correctly handling negative signs.- Initial Validation: The
answerfunction begins by checking forNULLpointers and then validates the "What is" prefix and "?" suffix usingstrncmpand index checking. - Memory Management: Since
strtok_rmodifies the string it operates on, we cannot use theconst char *questiondirectly. We create a temporary, modifiable copy usingstrdup. It's crucial to remember tofreethis memory before every exit point to prevent memory leaks. - Tokenization Setup: We isolate the core expression (e.g., "5 plus 10") and initialize
strtok_r. - Processing the First Token: The very first token is a special case; it *must* be a number. We check this, convert it using
atoito initialize ourresult, and set the state toEXPECTING_OPERATOR. - The Main Loop: The
whileloop iterates through the remaining tokens.- If
stateisEXPECTING_OPERATOR, we compare the token against our known operator words. We handle the special case of "multiplied by" and "divided by" by simply consuming the "by" token. If an unknown word is found, it's a syntax error. We then switch the state toEXPECTING_NUMBER. - If
stateisEXPECTING_NUMBER, we validate the token is a number, convert it, and perform the storedcurrent_op. We also include a check for division by zero. After the calculation, we switch back toEXPECTING_OPERATOR.
- If
- Final State Check: After the loop finishes, we perform one last check. If the final state is
EXPECTING_NUMBER, it means the question ended prematurely (e.g., "What is 5 plus?"). This is a syntax error. - Cleanup and Return: We
freethe duplicated string and returntruefor a successful parse orfalsefor any error encountered.
Evaluating Alternative Approaches and Trade-offs
Our manual, state-machine-based parser is highly efficient and has no external dependencies, making it perfect for this specific problem scope. However, it's important to understand its limitations and when a different tool might be more appropriate.
Pros and Cons of Manual Parsing
Here is a comparison of our approach versus using a more powerful parser generator tool like Flex/Bison.
| Feature | Manual Parsing (Our Approach) | Using a Parser Generator (e.g., Flex/Bison) |
|---|---|---|
| Control & Transparency | Full, granular control over every step. The logic is explicit and easy to trace. | Logic is more abstract, defined by grammar rules. Less direct control over the parsing process. |
| Dependencies | Zero external dependencies. Relies only on the C standard library. | Requires external tools (Flex, Bison) and their corresponding runtime libraries. |
| Scalability | Becomes complex and hard to maintain as the grammar rules increase (e.g., adding parentheses, operator precedence). | Excellent for managing complex, formal grammars. Scales very well. |
| Learning Curve | Low. If you know C string functions, you can build this. | Steep. Requires learning the specific syntax and concepts of the generator tools. |
| Error Reporting | Error handling is manual and can be tailored, but might be less systematic. | Can provide more structured and detailed error messages based on grammar violations. |
For the scope defined in this kodikra module, our approach is ideal. It's a fantastic learning tool that reinforces core C programming concepts. If the requirements were to expand to include standard operator precedence (PEMDAS) or nested expressions with parentheses, switching to a more robust parsing technique like the Shunting-yard algorithm or a parser generator would be the professional choice.
Frequently Asked Questions (FAQ)
- 1. Why did you use
strtok_rinstead of the simplerstrtok? -
strtokuses a hidden static variable to keep track of its position in the string. This makes it non-reentrant and not thread-safe. If you were parsing two different strings in a multi-threaded program, calls tostrtokcould interfere with each other, leading to corrupted data.strtok_rsolves this by requiring the caller to provide a "save pointer," making each call independent and safe for concurrent use. - 2. How could this solution be extended to handle floating-point numbers?
-
You would need to change the result and intermediate number variables from
inttodoubleorfloat. Instead ofatoi, you would usestrtodfor conversion, which also provides better error checking. Theis_numberhelper would also need to be updated to recognize decimal points. - 3. What is the best way to report specific errors in a C parser?
-
Our boolean return is simple but not very descriptive. A more advanced approach would be to have the function return an enum error code (e.g.,
SYNTAX_ERROR,INVALID_NUMBER,DIVISION_BY_ZERO). You could also pass a pointer to a character buffer (char **error_message) that the function can populate with a human-readable error string. - 4. Could this problem be solved with regular expressions?
-
Yes, to an extent. You could craft a regex to validate the entire string structure. However, for chained operations, extracting and calculating the result would still require iterative processing. A regex might be good for initial validation, but the state machine approach is often more efficient and clearer for performing the actual step-by-step calculations.
- 5. How would you handle operator precedence (e.g., multiplication before addition)?
-
Our current left-to-right parser intentionally ignores precedence. To implement it correctly, you would need a more sophisticated algorithm. A common choice is Dijkstra's Shunting-yard algorithm, which converts the infix expression (e.g., "3 + 4 * 2") into a postfix (Reverse Polish Notation) expression ("3 4 2 * +"). This RPN expression can then be easily evaluated using a stack.
- 6. What are some common pitfalls when using
strdupandstrtok_r? -
The most common pitfall with
strdupis forgetting tofreethe allocated memory, leading to memory leaks. Forgetting to handle the case wherestrdupreturnsNULL(on memory allocation failure) can lead to crashes. Withstrtok_r, the main pitfall is passing the original constant string instead of a mutable copy, which results in undefined behavior and likely a crash.
Conclusion: From Words to Wisdom
We have successfully journeyed from a simple English question to a fully functional C parser capable of providing an integer answer. By combining careful validation, safe tokenization with strtok_r, and a clear state machine logic, we built a solution that is both robust and efficient.
This exercise from the kodikra C learning path is more than just a puzzle; it's a practical lesson in algorithmic thinking and defensive programming in C. The skills you've applied here—managing memory with strdup/free, handling strings safely, and implementing stateful logic—are the bedrock of countless complex applications. You now have a solid foundation for tackling even more ambitious parsing challenges.
Disclaimer: All code snippets and logic are based on the C11 standard and modern best practices. The provided solution is designed for clarity and educational purposes.
Published by Kodikra — Your trusted C learning resource.
Post a Comment