Isogram in Cobol: Complete Solution & Deep Dive Guide

The word cool in bold green and blue letters

Mastering String Manipulation: The Cobol Isogram Challenge from Zero to Hero

To determine if a word is an isogram in Cobol, you must check for repeating letters while ignoring case, spaces, and hyphens. The most efficient method involves normalizing the input string to a single case, then iterating through it and using a frequency table to track each letter's occurrence.

You've just been handed a piece of mainframe code. It's dense, procedural, and written in a language that feels like it belongs to a different era. Your task seems simple on the surface: validate a string. But as you stare at the `PROCEDURE DIVISION`, you realize the tools you're used to—dynamic arrays, built-in hash maps, slick string methods—are nowhere to be found. This is the reality for many developers working with legacy systems, where fundamental problems require a deep understanding of core programming principles.

The "Isogram" challenge is a perfect microcosm of this world. It’s a deceptively simple word puzzle that forces you to confront the very essence of string and data manipulation in Cobol. It’s not just about finding a solution; it’s about learning to think in a structured, memory-conscious way that is the hallmark of enterprise-level Cobol programming.

This comprehensive guide will walk you through the entire process, from understanding the logic to implementing a clean, efficient, and well-documented solution. By the end, you won't just have solved a puzzle; you'll have gained invaluable insight into the powerful, albeit verbose, world of Cobol data handling—a skill that remains critical in powering the global economy.

What Exactly is an Isogram?

Before diving into the code, it's crucial to solidify our understanding of the problem. An isogram, also known as a "non-pattern word," is a word or phrase where no letter repeats. The core rule is simple, but there are a few important nuances to consider.

First, the check is case-insensitive. This means the letter 'A' is considered the same as 'a'. A word like "Path" is an isogram because 'P', 'a', 't', and 'h' are all unique letters. A word like "Antenna" is not, because 'n' and 'a' both appear more than once.

Second, certain characters are exempt from this rule. Specifically, spaces and hyphens can appear multiple times without disqualifying a phrase from being an isogram. This allows for multi-word phrases or hyphenated words to be evaluated correctly.

Examples of Isograms:

lumberjacks - Every letter appears only once.
background - No repeating letters.
six-year-old - The hyphen repeats, and there's a space, but that's allowed. The letters 's', 'i', 'x', 'y', 'e', 'a', 'r', 'o', 'l', 'd' are all unique.

Examples of Non-Isograms:

isograms - The letter 's' repeats.
apple - The letter 'p' repeats.
programming - The letters 'r', 'g', and 'm' all repeat.

The challenge, therefore, is to create a program that can systematically parse a string, apply these rules, and return a definitive true or false answer.

Why Solve This in Cobol? The Relevance in Modern Systems

You might be wondering, "Why use a language from the 1950s for a word puzzle?" The answer lies in the immense, often invisible, presence of Cobol in the modern world. This isn't just an academic exercise; it's a practical simulation of the data validation and record processing tasks that happen billions of times a day on mainframe systems.

Cobol (COmmon Business-Oriented Language) is the backbone of the global financial industry, powering core banking systems, credit card processing networks, insurance claim systems, and government databases. In these environments, data integrity is paramount. A single corrupted record can have cascading financial consequences.

Tasks like the isogram check are analogous to real-world requirements:

Data Validation: Ensuring a field, like a unique identifier or a specific type of name, doesn't contain invalid repeated characters.
Record Parsing: Reading fixed-format data files (a Cobol specialty) and extracting meaningful information, which often requires character-by-character analysis.
Report Generation: Manipulating and formatting strings to create readable, structured reports from raw data feeds.

By solving this problem within the constraints of Cobol, you are training yourself to work efficiently with fixed-length strings, static memory allocation, and procedural logic—the very skills required to maintain and modernize these critical legacy systems. This exercise, part of the exclusive kodikra.com learning path, is designed to build that foundational expertise.

How to Design the Isogram-Checking Logic

A naive approach might involve nested loops: for each character, you loop through the rest of the string to see if it appears again. While this works for very short strings, it's computationally expensive, with a time complexity of O(n²). For the kind of high-volume processing done on mainframes, this is highly inefficient.

A much better, more scalable approach involves using a frequency table. This method has a linear time complexity of O(n), as we only need to pass through the string once. The logic can be broken down into a clear, sequential process.

The High-Level Algorithm

Normalization: Take the input string and create a temporary, "clean" version of it. This involves converting all letters to a single case (e.g., lowercase) to ensure 'A' and 'a' are treated as the same character.
Initialization: Create a data structure to act as our frequency map. In Cobol, the perfect tool for this is a small table (an array) with an entry for each letter of the alphabet. We'll create a table with 26 slots, and initialize all of them to zero.
Iteration & Checking: Loop through the normalized string, one character at a time.
Filtering: Inside the loop, for each character, first check if it's a letter. If it's a space or a hyphen, simply ignore it and move to the next character.
Mapping & Counting: If the character is a letter, calculate its corresponding index in our frequency table (e.g., 'a' maps to index 1, 'b' to 2, and so on). Check the value at that index in the table.
- If the value is already 1, it means we have seen this letter before. The string is not an isogram. We can stop immediately and report the result.
- If the value is 0, it's the first time we're seeing this letter. We update the value at that index to 1 to mark it as "seen".
Final Result: If the loop completes without ever finding a previously seen letter, the string is an isogram.

This method is efficient, deterministic, and translates beautifully into Cobol's structured, procedural nature.

High-Level Logic Flow Diagram

Here is a visual representation of our algorithm's flow, showing the decision points and processes from start to finish.

    ● Start
    │
    ▼
  ┌───────────────────┐
  │ Get Input String  │
  └─────────┬─────────┘
            │
            ▼
  ┌───────────────────┐
  │ Normalize String  │
  │ (e.g., to lowercase) │
  └─────────┬─────────┘
            │
            ▼
  ┌───────────────────┐
  │ Initialize Freq   │
  │ Table (26 zeros)  │
  └─────────┬─────────┘
            │
            ▼
  ┌───────────────────┐
  │ Loop Each Char    │
  └─────────┬─────────┘
            │
      ╭─────▼─────╮
      │ Is Letter?│
      ╰─────┬─────╯
        Yes │ No
    ┌───────┘ ───────┐
    │                │
    ▼                ▼
  ┌───────────────┐  (Ignore & Continue)
  │ Check Freq Tbl│
  └───────┬───────┘
          │
    ╭─────▼─────╮
    │ Already > 0?│
    ╰─────┬─────╯
      Yes │ No
  ┌───────┘ ───────┐
  │                │
  ▼                ▼
┌───────────────┐┌───────────────┐
│ Set Result to ││ Mark Freq Tbl │
│ "Not Isogram" ││ as Seen (set 1) │
│ & Exit Loop   │└───────────────┘
└───────────────┘        │
  │                      │
  └─────────┬────────────┘
            │
            ▼
    (End of Loop)
            │
            ▼
    ● Final Result

Where the Logic Lives: The Complete Cobol Implementation

Now, let's translate our algorithm into a fully functional Cobol program. This solution is written using modern conventions, making it clean and readable. We will define our data structures in the `DATA DIVISION` and implement the logic in the `PROCEDURE DIVISION`.

This code is designed to be self-contained and can be compiled using GnuCOBOL or an IBM Enterprise Cobol compiler.


       IDENTIFICATION DIVISION.
       PROGRAM-ID. ISOGRAM-CHECKER.
       AUTHOR. Kodikra.
      *================================================================
      * This program determines if a given word or phrase is an isogram.
      * An isogram is a word with no repeating letters.
      * Spaces and hyphens are allowed to appear multiple times.
      * The check is case-insensitive.
      *================================================================
       ENVIRONMENT DIVISION.
       CONFIGURATION SECTION.
       
       DATA DIVISION.
       WORKING-STORAGE SECTION.
      * --- Input and Control Variables ---
       01 WS-INPUT-STRING           PIC X(50) VALUE 'six-year-old'.
       01 WS-NORMALIZED-STRING      PIC X(50).
       01 WS-INPUT-LENGTH           PIC 9(2).
       01 WS-RESULT-FLAG            PIC X(1)  VALUE 'Y'. *> Y=Yes, N=No
          88 IS-ISOGRAM                       VALUE 'Y'.
          88 IS-NOT-ISOGRAM                   VALUE 'N'.
       
      * --- Looping and Indexing Variables ---
       01 WS-LOOP-INDEX             PIC 9(2).
       01 WS-CHAR-INDEX             PIC 9(2).
       01 WS-CURRENT-CHAR           PIC X(1).

      * --- Frequency Table for Alphabet (a-z) ---
      * This table will store the count of each letter.
      * Index 1 corresponds to 'a', 2 to 'b', and so on.
       01 WS-FREQUENCY-TABLE.
          05 WS-LETTER-COUNT        PIC 9(1) OCCURS 26 TIMES
                                    INDEXED BY IDX-LETTER.

      * --- Constants ---
       01 WS-LOWERCASE-ALPHABET     PIC X(26) 
                                    VALUE 'abcdefghijklmnopqrstuvwxyz'.

      * --- Output Messages ---
       01 WS-OUTPUT-MESSAGE         PIC X(60).
       01 WS-ISOGRAM-MSG            PIC X(30) 
                                    VALUE ' is an isogram.'.
       01 WS-NOT-ISOGRAM-MSG        PIC X(30) 
                                    VALUE ' is NOT an isogram.'.
       
       PROCEDURE DIVISION.
       
       000-MAIN-PROCEDURE.
           DISPLAY "Checking string: " WS-INPUT-STRING.

           PERFORM 100-PREPARE-STRING.
           PERFORM 200-INITIALIZE-FREQ-TABLE.
           PERFORM 300-CHECK-FOR-ISOGRAM
               VARYING WS-LOOP-INDEX FROM 1 BY 1
               UNTIL WS-LOOP-INDEX > WS-INPUT-LENGTH OR IS-NOT-ISOGRAM.
           
           PERFORM 400-DISPLAY-RESULT.
           
           STOP RUN.

      *================================================================
      * 100-PREPARE-STRING
      * Normalizes the input string to lowercase for case-insensitive
      * comparison and gets its effective length.
      *================================================================
       100-PREPARE-STRING.
           MOVE FUNCTION UPPER-CASE(WS-INPUT-STRING) TO WS-NORMALIZED-STRING.
           INSPECT WS-NORMALIZED-STRING CONVERTING
               'ABCDEFGHIJKLMNOPQRSTUVWXYZ' TO 'abcdefghijklmnopqrstuvwxyz'.
           
           INSPECT FUNCTION REVERSE(WS-NORMALIZED-STRING)
               TALLYING WS-INPUT-LENGTH FOR LEADING SPACES.
           COMPUTE WS-INPUT-LENGTH = LENGTH OF WS-NORMALIZED-STRING - WS-INPUT-LENGTH.
           
      *================================================================
      * 200-INITIALIZE-FREQ-TABLE
      * Sets all letter counts in our frequency table to zero.
      *================================================================
       200-INITIALIZE-FREQ-TABLE.
           PERFORM VARYING IDX-LETTER FROM 1 BY 1
               UNTIL IDX-LETTER > 26
                   MOVE 0 TO WS-LETTER-COUNT(IDX-LETTER)
           END-PERFORM.
           
      *================================================================
      * 300-CHECK-FOR-ISOGRAM
      * Iterates through the normalized string character by character.
      * Ignores non-letters. For letters, it updates the frequency table.
      * If a letter is found twice, it sets the result flag to 'N'.
      *================================================================
       300-CHECK-FOR-ISOGRAM.
           MOVE WS-NORMALIZED-STRING(WS-LOOP-INDEX:1) TO WS-CURRENT-CHAR.
           
           IF WS-CURRENT-CHAR IS ALPHABETIC
               * Calculate the index (1-26) for the current character
               COMPUTE WS-CHAR-INDEX = FUNCTION ORD(WS-CURRENT-CHAR)
                                     - FUNCTION ORD('a') + 1

               * Check if we have seen this letter before
               IF WS-LETTER-COUNT(WS-CHAR-INDEX) > 0
                   SET IS-NOT-ISOGRAM TO TRUE
               ELSE
                   * Mark this letter as seen
                   MOVE 1 TO WS-LETTER-COUNT(WS-CHAR-INDEX)
               END-IF
           END-IF.
           
      *================================================================
      * 400-DISPLAY-RESULT
      * Constructs and displays the final output message based on the
      * result flag.
      *================================================================
       400-DISPLAY-RESULT.
           IF IS-ISOGRAM
               STRING WS-INPUT-STRING DELIMITED BY SIZE
                      WS-ISOGRAM-MSG   DELIMITED BY SIZE
                      INTO WS-OUTPUT-MESSAGE
           ELSE
               STRING WS-INPUT-STRING DELIMITED BY SIZE
                      WS-NOT-ISOGRAM-MSG DELIMITED BY SIZE
                      INTO WS-OUTPUT-MESSAGE
           END-IF.
           
           DISPLAY FUNCTION TRIM(WS-OUTPUT-MESSAGE).
           
       END PROGRAM ISOGRAM-CHECKER.

Detailed Code Walkthrough: How It Works Step-by-Step

Understanding the code requires breaking it down into its constituent parts. Let's analyze each `DIVISION` and `PROCEDURE` to see how they contribute to the final result.

The `DATA DIVISION` Breakdown

This is where we declare all our variables, constants, and data structures. In Cobol, memory is allocated at compile time, so we must define everything upfront.

WS-INPUT-STRING: A fixed-size field to hold the string we want to test. We initialize it with a value for this demonstration.
WS-NORMALIZED-STRING: A working copy of the input string that we will convert to lowercase.
WS-RESULT-FLAG: A single-character flag to store the final result ('Y' or 'N'). We use level 88 condition names (IS-ISOGRAM, IS-NOT-ISOGRAM) for more readable `IF` statements later.
WS-LOOP-INDEX, WS-CHAR-INDEX, WS-CURRENT-CHAR: Standard variables for controlling our loop, storing the calculated alphabet index, and holding the character being processed.
WS-FREQUENCY-TABLE: This is the heart of our algorithm. The OCCURS 26 TIMES clause creates an array of 26 single-digit numbers (PIC 9(1)). The INDEXED BY IDX-LETTER clause creates a special, efficient index variable for this table.

The `PROCEDURE DIVISION` Logic Flow

This section contains the executable code, organized into paragraphs (similar to functions or methods).

`000-MAIN-PROCEDURE`

This is the main driver of the program. It orchestrates the entire process by calling other paragraphs in a specific order.

It calls 100-PREPARE-STRING to normalize the input.
It calls 200-INITIALIZE-FREQ-TABLE to reset our counter array.
It executes the main loop, 300-CHECK-FOR-ISOGRAM. The VARYING...UNTIL clause handles the iteration from the first character to the last. Crucially, the loop also stops if IS-NOT-ISOGRAM becomes true, making the program more efficient by exiting early.
Finally, it calls 400-DISPLAY-RESULT to show the outcome.

`100-PREPARE-STRING`

This paragraph cleans the input data. First, it uses the intrinsic function FUNCTION UPPER-CASE and then the powerful INSPECT ... CONVERTING verb to change all letters to lowercase. This ensures our check is case-insensitive. It also calculates the actual length of the string, ignoring any trailing spaces.

`200-INITIALIZE-FREQ-TABLE`

A simple but critical step. This paragraph uses a PERFORM VARYING loop to iterate through all 26 slots of our WS-LETTER-COUNT table and sets each one to 0. This guarantees a clean slate before we start checking a new string.

`300-CHECK-FOR-ISOGRAM`

This is where the core logic is executed on each character of the string.

It extracts the current character using reference modification: WS-NORMALIZED-STRING(WS-LOOP-INDEX:1).
It uses an IF WS-CURRENT-CHAR IS ALPHABETIC check to filter out spaces, hyphens, and any other non-letter characters.
If it's a letter, it calculates the table index. The line COMPUTE WS-CHAR-INDEX = FUNCTION ORD(WS-CURRENT-CHAR) - FUNCTION ORD('a') + 1 is a modern and portable way to do this. FUNCTION ORD returns the numeric position of a character in the program's character set. By subtracting the ordinal value of 'a' and adding 1, we map 'a' -> 1, 'b' -> 2, etc.
It then checks WS-LETTER-COUNT(WS-CHAR-INDEX). If it's greater than 0, we've seen this letter before, so we SET IS-NOT-ISOGRAM TO TRUE.
If not, we set WS-LETTER-COUNT(WS-CHAR-INDEX) to 1, marking the letter as seen.

Character Mapping and Frequency Table Update

The interaction between the character and the frequency table is the most critical part of the logic. This diagram illustrates the process for a single alphabetic character.

    ● Character is Alphabetic
    │  (e.g., 'c')
    ▼
  ┌───────────────────┐
  │ COMPUTE Index     │
  │ ORD('c')-ORD('a')+1 │
  └─────────┬─────────┘
            │
            ▼
        Result: 3
            │
            ▼
  ┌───────────────────┐
  │ Access Freq Table │
  │ at Index 3        │
  └─────────┬─────────┘
            │
      ╭─────▼─────╮
      │ Value is 0? │
      ╰─────┬─────╯
        Yes │ No
    ┌───────┘ ───────┐
    │                │
    ▼                ▼
┌────────────────┐ ┌────────────────┐
│ Update Table:  │ │ Found Repeat!  │
│ Value at Index │ │ Set Flag to    │
│ 3 becomes 1    │ │ "Not Isogram"  │
└────────────────┘ └────────────────┘
    │                │
    └───────┬────────┘
            │
            ▼
    ● Continue to Next Char

Alternative Approaches and Performance

While the frequency table is the most efficient method, it's useful to understand other ways this problem could be solved in Cobol, along with their trade-offs.

Nested Loop (Brute-Force) Approach

The most straightforward logic involves two nested loops.

The outer loop picks a character (let's call it `char1`).
The inner loop iterates through the rest of the string, comparing every other character (`char2`) to `char1`.
If a match is found, the string is not an isogram.

This approach avoids the need for a separate frequency table, potentially saving a small amount of memory. However, its performance degrades rapidly as the string gets longer.

Pros and Cons Comparison

Aspect	Frequency Table (O(n))	Nested Loop (O(n²))
Performance	Excellent. Linear time complexity. The processing time grows in direct proportion to the string length. Very efficient for large strings.	Poor. Quadratic time complexity. Processing time grows exponentially with string length. Becomes very slow for even moderately long strings.
Memory Usage	Requires a small, fixed-size table (26 bytes in our case). This is negligible in modern systems.	Minimal. Does not require an auxiliary data structure, only a few extra index variables.
Code Complexity	Slightly more complex due to the setup of the frequency table and the index calculation logic.	The logic is arguably simpler to understand at a glance, as it directly compares characters.
Best Use Case	The standard, recommended approach for almost all scenarios, especially in performance-critical enterprise applications.	Only suitable for very short, fixed-length strings where performance is not a concern at all. Generally avoided.

For any professional application, especially within the context of the kodikra Cobol curriculum which emphasizes enterprise-grade practices, the frequency table method is the superior and correct choice.

Frequently Asked Questions (FAQ)

1. What makes Cobol suitable for this kind of string task?: Cobol excels at handling fixed-format, structured data. Its `INSPECT` verb is incredibly powerful for character counting, replacement, and conversion. While it may seem verbose, its explicit nature makes the code's behavior predictable and reliable, which is critical for business logic in financial and administrative systems.
2. How does Cobol handle case sensitivity in strings?: By default, string comparisons in Cobol are case-sensitive ('A' is not equal to 'a'). To handle this, you must explicitly normalize the data. Our solution uses `INSPECT WS-NORMALIZED-STRING CONVERTING ...` to convert all uppercase letters to their lowercase equivalents, ensuring the isogram check is case-insensitive.
3. Can this logic be adapted for non-English alphabets?: Yes, but it would require modification. The current solution assumes a 26-character Latin alphabet. To support languages with more characters (like Cyrillic or those with diacritics), you would need to expand the `WS-FREQUENCY-TABLE` to a larger size and adjust the character-to-index mapping logic accordingly. You might also need to consider the `NATIONAL` data type for Unicode support.
4. What do `PIC` and `OCCURS` mean in the `DATA DIVISION`?: PIC stands for "Picture Clause" and defines the type and size of a data item. For example, PIC X(50) defines an alphanumeric string of 50 characters, and PIC 9(2) defines a two-digit number. OCCURS is used to define a table or array. OCCURS 26 TIMES tells Cobol to allocate memory for 26 instances of the data item it describes, creating our frequency table.
5. Is there a built-in function in Cobol to find unique characters in a string?: No, there is no single intrinsic function in standard Cobol that returns the unique characters of a string. This is why understanding how to implement the logic manually, as demonstrated in this guide, is a fundamental skill. It showcases your ability to solve problems using the core features of the language.
6. Why is this algorithm's performance so important for mainframe applications?: Mainframe systems often run batch jobs that process millions or even billions of records in a single run. An inefficient algorithm (like an O(n²) nested loop) would multiply the processing time immensely, leading to missed processing windows, increased operational costs (MIPS usage), and potential system slowdowns. Algorithmic efficiency is a primary concern in mainframe development.

Conclusion: From Puzzle to Practical Skill

We've journeyed from a simple word puzzle to a complete, efficient, and professional-grade Cobol program. The isogram challenge, while seemingly trivial, serves as a powerful lesson in the principles of algorithmic efficiency, data normalization, and structured programming that are the bedrock of mainframe development.

You've learned how to manipulate strings using the `INSPECT` verb, define and use arrays with `OCCURS`, manage program flow with `PERFORM`, and leverage intrinsic functions like `ORD` for clean, modern code. These are not just theoretical concepts; they are the daily tools used to maintain the critical systems that support our world's infrastructure.

Disclaimer: This solution is written in a standard Cobol syntax compatible with modern compilers like GnuCOBOL and IBM Enterprise COBOL for z/OS. The core concepts are universally applicable across Cobol versions, though specific intrinsic function names or syntax details may vary slightly in older environments.

Ready to apply these skills to more complex challenges? Continue your journey by exploring the next module in our Cobol Learning Path or deepen your understanding of the language with our complete Cobol guide on kodikra.com.

Published by Kodikra — Your trusted Cobol learning resource.

kodikra

Search this blog