Phone Number in Cairo: Complete Solution & Deep Dive Guide
Mastering String Manipulation in Cairo: A Deep Dive into Phone Number Cleaning
Learn to clean and validate North American Numbering Plan (NANP) phone numbers in Cairo. This guide covers parsing varied string formats, filtering non-numeric characters, and applying specific NANP rules using Cairo's ByteArray type, ensuring your data is standardized and reliable for any application.
You've just been hired at a cutting-edge decentralized communication startup building on Starknet. The dream is to connect people securely, but there's a nightmare lurking in the user data: phone numbers. They arrive in every conceivable format—(123) 456-7890, 123.456.7890, +1 123 456 7890, and some are just plain gibberish. This chaos of messy input is more than just an annoyance; on an immutable ledger, it's a permanent, costly problem. How do you bring order to this chaos and ensure every number is valid before it's processed by your smart contract?
This is a universal challenge for developers, but in Cairo, the stakes are higher due to gas fees and finality. In this comprehensive guide, we will dissect the process of building a robust phone number cleaning and validation function from scratch. We'll transform messy, user-submitted strings into a clean, standardized format, ready for any system, all while mastering key aspects of Cairo's string and data manipulation capabilities.
What is Phone Number Sanitization?
Phone number sanitization, or cleaning, is the process of taking a raw, user-provided phone number string and transforming it into a standardized, usable format. This involves stripping away non-numeric characters, handling optional country codes, and validating the final result against a specific numbering plan's rules. For this guide, we focus on the North American Numbering Plan (NANP).
The NANP governs phone numbers in the United States, Canada, Bermuda, and many Caribbean nations. Its structure is fundamental to our validation logic:
- Country Code: Always '1'. This is often optional in user input.
- Area Code (NPA): A 3-digit code. The first digit cannot be 0 or 1.
- Exchange Code (NXX): Another 3-digit code. The first digit also cannot be 0 or 1.
- Line Number: A 4-digit code.
A valid NANP number is therefore 10 digits long (Area Code + Exchange Code + Line Number). If a country code is included, the total length becomes 11 digits, and it must start with '1'. Our function's primary goal is to enforce these rules rigorously.
This process is a classic example of data sanitization, a critical security and data integrity practice. By cleaning input at the source, we prevent malformed data from propagating through our systems, which is especially vital in the world of smart contracts where data is immutable.
Why is Data Validation Crucial in the Cairo Ecosystem?
While input validation is important in any programming environment, its significance is amplified within the Cairo and Starknet ecosystem for several key reasons:
- Immutability and Cost: Once data is written to the Starknet blockchain, it is permanent. Storing incorrect or poorly formatted data is not only unprofessional but also wasteful, as you've paid gas fees to store garbage. Correcting it requires another transaction, incurring more costs.
- Smart Contract Logic: Smart contracts often rely on precise data formats to execute correctly. A function expecting a 10-digit numeric string will fail or produce unpredictable results if it receives
(123)-456-7890. This can lead to locked funds or broken application logic. - Security and Predictability: Predictable state is the cornerstone of blockchain security. Sanitizing inputs ensures that your contract functions operate within expected boundaries, reducing the attack surface for exploits that leverage unexpected data formats.
- Gas Efficiency: Processing and storing clean, compact data is more gas-efficient than handling variable-length strings cluttered with unnecessary characters. Performing complex string manipulation on-chain can be expensive, so having a standardized format is key.
By mastering data sanitization, you're not just writing cleaner code; you're building more robust, secure, and cost-effective decentralized applications. This skill is foundational for anyone serious about becoming a proficient Cairo developer, as demonstrated in our exclusive kodikra.com Cairo learning path.
How to Build a Phone Number Cleaner in Cairo
Let's dive into the practical implementation. Our goal is to create a Cairo function named clean that accepts a ByteArray and returns a cleaned, validated ByteArray. If the input is invalid, the function should panic and revert the transaction, preventing bad data from being processed further.
The Core Logic: A Step-by-Step Breakdown
The solution involves iterating through the input string, filtering out unwanted characters, and then applying a series of validation checks based on NANP rules. The logic can be visualized as a simple data processing pipeline.
● Start (Input: Raw ByteArray)
│
▼
┌───────────────────────────┐
│ Initialize empty `cleaned` │
│ ByteArray │
└────────────┬──────────────┘
│
▼
╭ Loop through each
│ byte of input phrase
╰───────────┬──────────
│
▼
◆ Is byte a digit?
╱ ╲
Yes No
│ │
▼ ▼
┌──────────────┐ ┌───────────┐
│ Append digit │ │ Discard │
│ to `cleaned` │ │ character │
└──────────────┘ └───────────┘
│ │
╰─────────┬─────────╯
│
▼
Has loop finished? ───> No ───> (Back to Loop)
│
Yes
│
▼
┌───────────────────────────┐
│ Apply NANP Rule Checks │
│ (Length, Country Code...) │
└────────────┬──────────────┘
│
▼
◆ Is `cleaned` valid?
╱ ╲
Yes No
│ │
▼ ▼
┌──────────────┐ ┌───────────┐
│ Return │ │ Panic / │
│ `cleaned` │ │ Revert │
└──────────────┘ └───────────┘
│
▼
● End
The Cairo Implementation: Code Walkthrough
Here is a complete solution from the kodikra.com module, followed by a detailed line-by-line explanation. This code effectively implements the logic described in our flowchart.
use core::byte_array::ByteArrayTrait;
// Helper function to check for numeric characters
fn is_numeric(byte: u8) -> bool {
byte >= '0' && byte <= '9'
}
// Helper function to check for allowed non-numeric characters
fn assert_valid_char(byte: u8) {
// Allows numbers, spaces, parentheses, hyphens, dots, and plus signs
let is_num = is_numeric(byte);
let is_special = byte == ' ' || byte == '(' || byte == ')' || byte == '-' || byte == '.' || byte == '+';
assert(is_num || is_special, "invalid character found");
}
pub fn clean(phrase: ByteArray) -> ByteArray {
let mut cleaned: ByteArray = "";
let mut i = 0;
while i < phrase.len() {
let byte = phrase.at(i);
// First, validate the character to ensure it's allowed at all
assert_valid_char(byte);
// If it's a number, add it to our cleaned string
if is_numeric(byte) {
cleaned.append_byte(byte);
}
i += 1;
}
// Rule 1: Must be at least 10 digits
assert(cleaned.len() >= 10, "must not be fewer than 10 digits");
// Rule 2: If 11 digits, must start with '1'
if cleaned.len() == 11 {
assert(*cleaned.at(0) == '1', "11 digits must start with 1");
// If it's a valid 11-digit number, we need the final 10 digits
cleaned = cleaned.slice(1, 10);
}
// Rule 3: Must be exactly 10 digits at this point
assert(cleaned.len() == 10, "must be 10 digits");
// Rule 4 & 5: Area code (N) and exchange code (N) cannot start with 0 or 1
let area_code_first_digit = *cleaned.at(0);
let exchange_code_first_digit = *cleaned.at(3);
assert(area_code_first_digit != '0', "area code cannot start with 0");
assert(area_code_first_digit != '1', "area code cannot start with 1");
assert(exchange_code_first_digit != '0', "exchange code cannot start with 0");
assert(exchange_code_first_digit != '1', "exchange code cannot start with 1");
cleaned
}
Detailed Explanation:
1. Helper Functions:
fn is_numeric(byte: u8) -> bool: A simple utility that returnstrueif the given byte is within the ASCII range of '0' to '9'. This is a clean way to reuse this check.fn assert_valid_char(byte: u8): This function enforces a strict whitelist of allowed characters. It permits digits and common punctuation found in phone numbers( ) - . +. If any other character (like a letter or an emoji) is found, it triggers anassert, causing the program to panic. This is a great first line of defense.
2. The clean Function:
let mut cleaned: ByteArray = "";: We initialize an empty, mutableByteArray. This will be our accumulator, where we build the final string of digits.while i < phrase.len() { ... }: We use a standardwhileloop to iterate over each byte of the inputphrase.let byte = phrase.at(i);: Inside the loop, we get the byte at the current indexi. Note the use of.at()for safe access.assert_valid_char(byte);: We immediately call our helper to ensure the character is allowed. This fails fast if the input contains illegal characters.if is_numeric(byte) { cleaned.append_byte(byte); }: If the character is a digit, we append it to ourcleanedarray. All other valid characters (like spaces and dashes) are effectively ignored.
3. NANP Validation Rules:
After the loop finishes, cleaned contains only the digits from the input string. Now, we apply the NANP rules.
assert(cleaned.len() >= 10, ...);: The first check ensures we have at least 10 digits, the minimum for a valid number.if cleaned.len() == 11 { ... }: This block handles the case where a country code might be present.assert(*cleaned.at(0) == '1', ...);: If there are 11 digits, the first one absolutely must be the country code '1'.cleaned = cleaned.slice(1, 10);: If the check passes, we are only interested in the 10-digit number itself. We use.slice(start_index, length)to effectively remove the leading '1', leaving us with the 10-digit NANP number.
assert(cleaned.len() == 10, ...);: After potentially slicing an 11-digit number, we perform a final length check. At this point, any valid number must be exactly 10 digits long. This catches cases where the input was, for example, 12 digits long.assert(area_code_first_digit != '0', ...);and subsequent checks: These final assertions enforce the rules that the area code (first digit at index 0) and the exchange code (first digit at index 3) cannot start with 0 or 1.
If the input passes all these assertions, the final 10-digit cleaned ByteArray is returned. If any assertion fails, the execution halts.
When to Validate: On-Chain vs. Off-Chain?
A critical architectural decision for any dApp developer is deciding where to perform data validation. Should this clean function run in the user's browser (off-chain) or within your Starknet smart contract (on-chain)? The answer is often "both," but each has distinct trade-offs.
The Data Validation Flow
A robust system uses a multi-layered approach to validation, providing quick feedback to the user while maintaining ultimate security on-chain.
● User Input (e.g., in a Web Form)
│
▼
┌─────────────────────────┐
│ Off-Chain Validation │
│ (JavaScript/TypeScript) │
└───────────┬─────────────┘
│
▼
◆ Is format valid?
╱ ╲
Yes No
│ │
▼ ▼
┌────────────────┐ ┌───────────────────┐
│ Submit tx to │ │ Show error message│
│ Starknet │ │ to user instantly │
└────────────────┘ └───────────────────┘
│
▼
┌─────────────────────────┐
│ On-Chain Validation │
│ (Cairo `clean` fn) │
└───────────┬─────────────┘
│
▼
◆ Does it pass contract rules?
╱ ╲
Yes No
│ │
▼ ▼
┌────────────────┐ ┌────────────────┐
│ Process & Store│ │ Revert │
│ Data │ │ Transaction │
└────────────────┘ └────────────────┘
│
▼
● End
Pros & Cons Analysis
Here’s a breakdown of the advantages and disadvantages of each approach:
| Aspect | Off-Chain Validation (e.g., Frontend) | On-Chain Validation (Cairo Smart Contract) |
|---|---|---|
| User Experience (UX) | Excellent. Provides instant feedback to the user without needing a transaction. Prevents user frustration. | Poor. The user only finds out about an error after submitting a transaction, waiting for it to be processed, and seeing it fail. This costs them time and potentially gas fees. |
| Security | Low. A malicious user can easily bypass frontend validation and send malformed data directly to the contract. It should never be the only line of defense. | High. This is the ultimate source of truth. The contract enforces the rules, making it impossible to store invalid data. It's a trustless guarantee. |
| Cost (Gas Fees) | Zero. Runs in the user's browser, consuming no gas. | Moderate. String manipulation and logic checks consume gas. More complex validation logic will result in higher transaction costs. |
| Complexity | Low. Implemented in common web languages like JavaScript. | Higher. Requires careful implementation in Cairo, with deep consideration for gas optimization and security patterns. |
Conclusion: The best practice is to implement validation in both places. Use off-chain validation for a smooth user experience and on-chain validation as your non-negotiable security backstop. Never trust user input without verifying it in your contract.
An Enterprise-Grade Alternative: Returning `Option`
The use of assert is common in Cairo for enforcing invariants, but it has a blunt side effect: it causes the entire transaction to fail (panic). In many scenarios, particularly when a function is part of a larger sequence of calls, you might prefer a more graceful way to handle invalid input. This can be achieved by returning an Option<ByteArray>.
An Option is an enum that can be one of two variants:
Some(value): Indicates success and contains the resulting value.None: Indicates failure.
This pattern allows the calling function to decide how to handle an invalid phone number, rather than being forced into a transaction revert. For a deeper understanding of Cairo's core types and patterns, consult our complete Cairo language guide.
Refactored Code with `Option`
use core::option::OptionTrait;
use core::byte_array::ByteArrayTrait;
// Helper functions (is_numeric, is_valid_char) would be the same, but return bool instead of asserting.
fn is_valid_char(byte: u8) -> bool {
let is_num = byte >= '0' && byte <= '9';
let is_special = byte == ' ' || byte == '(' || byte == ')' || byte == '-' || byte == '.' || byte == '+';
is_num || is_special
}
pub fn clean_optional(phrase: ByteArray) -> Option<ByteArray> {
let mut cleaned: ByteArray = "";
let mut i = 0;
while i < phrase.len() {
let byte = phrase.at(i);
if !is_valid_char(byte) {
return Option::None; // Invalid character, return None immediately
}
if byte >= '0' && byte <= '9' {
cleaned.append_byte(byte);
}
i += 1;
}
if cleaned.len() < 10 { return Option::None; }
if cleaned.len() == 11 {
if *cleaned.at(0) != '1' {
return Option::None; // 11 digits but doesn't start with 1
}
cleaned = cleaned.slice(1, 10);
}
if cleaned.len() != 10 { return Option::None; }
let area_code_first_digit = *cleaned.at(0);
let exchange_code_first_digit = *cleaned.at(3);
if area_code_first_digit == '0' || area_code_first_digit == '1' { return Option::None; }
if exchange_code_first_digit == '0' || exchange_code_first_digit == '1' { return Option::None; }
Option::Some(cleaned)
}
This refactored function, clean_optional, never panics. Instead, it returns Option::None at the first sign of invalid data. This allows for more flexible control flow in your smart contracts, enabling you to, for example, log an error event or try an alternative logic path without reverting the entire transaction.
Frequently Asked Questions (FAQ)
- 1. What is a
ByteArrayin Cairo? - A
ByteArrayis a dynamic, mutable array of bytes, designed to handle string-like data in Cairo. Unlike a simplefelt252which has a 31-byte limit,ByteArraycan represent strings of arbitrary length, making it ideal for handling user input like names, descriptions, or, in this case, phone numbers. - 2. Why use
assert!instead of returning an error like in the second example? - Using
assert!is a pattern for enforcing invariants—conditions that must absolutely be true for the program's state to be considered valid. It's best used when a failure represents a critical, unrecoverable error. TheOptionpattern is better for recoverable errors or validation logic where the caller needs to react differently to success and failure. - 3. Can this logic be adapted for international phone numbers?
- Yes, but it would require significant changes. You would need a much more complex validation library that understands various international dialing codes, number lengths, and formatting rules (like the E.164 standard). The current logic is specifically tailored to the NANP.
- 4. How does string manipulation in Cairo compare to languages like Rust or Python?
- Cairo's string manipulation is currently more low-level than in mature languages like Rust or Python, which have rich, high-level APIs for string processing. In Cairo, you often work directly with bytes and loops, as seen in our example. However, the ecosystem is evolving, and more advanced libraries are continually being developed.
- 5. What are the gas implications of running this function on-chain?
- The gas cost will be proportional to the length of the input string, as the primary cost comes from the loop. Longer, messier strings will require more iterations and thus more gas. This is another reason to perform initial cleaning off-chain to minimize the workload on the contract.
- 6. Are there any pre-built libraries for data validation in the Cairo ecosystem?
- The Cairo and Starknet ecosystem is growing rapidly. While it may not have comprehensive validation libraries like those in the web2 world yet, community-driven standards and libraries are emerging. Always check sources like the Starknet Book or community GitHub organizations for the latest tools before building from scratch.
- 7. What does NANP stand for and why is it important?
- NANP stands for the North American Numbering Plan. It's a standardized telephone numbering system used by the United States, Canada, and several other countries. Understanding specific plans like NANP is crucial for applications that target users in those regions to ensure data accuracy and proper functionality.
Conclusion: Building Resilient Systems
We've journeyed from a chaotic mess of user-submitted phone numbers to a clean, validated, and standardized 10-digit format using Cairo. This exercise is more than just a lesson in string manipulation; it's a foundational pillar of building secure, reliable, and efficient applications on Starknet. By rigorously validating input, you protect your smart contracts from invalid state, save on gas costs, and create a more predictable and robust system.
You learned how to iterate over a ByteArray, apply conditional logic to filter and validate characters, and enforce business rules with assert. Furthermore, you explored the architectural trade-offs between on-chain and off-chain validation and saw a more advanced, enterprise-grade pattern using Option for graceful error handling. These skills are essential for any developer looking to build real-world dApps.
As you continue your journey, remember that the principles of data sanitization are universal. Apply them diligently to build the next generation of resilient decentralized applications. To continue honing your skills, explore the other modules in the kodikra.com Cairo Learning Path and dive deeper into the language with our comprehensive Cairo guide.
Disclaimer: The code in this article is based on Cairo syntax and libraries current as of the time of writing. The Cairo language is under active development, and specific syntax or function names may change in future versions. Always consult the official documentation for the latest updates.
Published by Kodikra — Your trusted Cairo learning resource.
Post a Comment