Basics Transformation Json in Ballerina: Complete Solution & Deep Dive Guide

A ballerina poses gracefully in a dance.

Mastering Ballerina JSON: A Zero-to-Hero Guide to Data Transformation

Master Ballerina JSON transformation by learning to read, process, and write JSON files. This guide covers essential techniques like data mapping, aggregation using ballerina/io and ballerina/lang.value, and structuring complex output, turning raw fuel records into summarized employee reports.

You're staring at a raw JSON file, a relentless stream of data entries. It might be logs, transaction records, or user activity. The data is all there, but it's chaotic, repetitive, and far from insightful. Your mission, should you choose to accept it, is to tame this data beast—to transform it from a jumble of records into a clean, structured, and meaningful summary. This is a universal challenge in software development, and the tool you choose can make the difference between a frustrating ordeal and an elegant solution.

This is where Ballerina shines. Designed from the ground up for network-centric applications and data integration, Ballerina treats JSON not as a foreign string to be parsed, but as a first-class citizen. In this comprehensive guide, we will walk you through a practical, real-world scenario from the exclusive kodikra.com curriculum: processing employee fuel expense records. You will learn, step-by-step, how to build a robust Ballerina application that reads raw JSON, performs complex aggregations, and writes a perfectly formatted summary file. Prepare to unlock the full potential of Ballerina for any data transformation task.

What is JSON Transformation in Ballerina?

JSON (JavaScript Object Notation) transformation is the process of converting JSON data from one structure or format to another. This isn't just about changing field names; it often involves filtering records, restructuring nested objects, aggregating data (like calculating sums or averages), and enriching the data with new information. In the context of Ballerina, it means leveraging the language's powerful, built-in features to perform these tasks efficiently and safely.

Unlike general-purpose languages where JSON manipulation often requires external libraries and cumbersome parsing/serialization steps, Ballerina integrates JSON deeply into its type system. The language provides a flexible json type, which can represent any valid JSON structure. More importantly, it allows you to map JSON data directly to strongly-typed record structures, giving you the best of both worlds: the flexibility of JSON and the compile-time safety of static types.

This native support makes Ballerina an exceptional choice for tasks common in modern software architecture, such as:

API Response Shaping: Transforming a database result into a specific JSON structure required by a client application.
ETL Pipelines: Extracting data from a source (like a file or API), Transforming it (aggregating, cleaning), and Loading it into a destination system.
Microservice Communication: Adapting data from one microservice's format to what another microservice expects.
Configuration Management: Reading complex configuration files and mapping them to application settings.

Why Use Ballerina for This Task? The Core Advantages

When faced with a data transformation problem, developers have many language choices. However, Ballerina presents a compelling case, especially for tasks involving JSON and network interactions. Its design philosophy directly addresses the pain points commonly found in other ecosystems.

1. Type Safety with Flexibility

Ballerina's type system is a standout feature. You can start with the generic json type for quick prototyping and then progressively introduce typed record definitions. This allows you to enforce a specific structure, catch potential errors at compile time (e.g., typos in field names, incorrect data types), and benefit from superior IDE support with autocompletion.

2. Powerful Standard Libraries

The ballerina/io module provides simple, high-level functions like io:fileReadJson() and io:fileWriteJson(). These abstract away the complexities of file handling, byte streams, and character encoding, letting you focus on the business logic. You don't need to hunt for third-party packages for fundamental tasks.

3. Data-Oriented Syntax

Ballerina includes features like query expressions, which offer a SQL-like syntax for processing data collections. While we will use a more imperative map-based approach in our main solution for clarity, these expressions can often reduce complex loops and conditional logic into a few declarative lines of code, making the intent clearer.

4. Built-in Concurrency

While our current problem is sequential, many real-world data processing tasks can be parallelized. Ballerina's concurrency model with workers and services is designed to be simple and safe, making it easy to scale up processing performance when needed.

How to Implement the JSON Fuel Transformation: A Step-by-Step Guide

We will now build the solution from scratch based on the kodikra module problem statement. The goal is to read a JSON file containing a list of fuel records, process them to calculate totals for each employee, and write the summary to a new JSON file.

Step 1: Understand the Input and Desired Output

First, let's define our data structures. We have an input file, let's call it fuel_records.json, which is an array of fuel transaction objects.

Input: fuel_records.json

[
  {
    "employeeId": "E-001",
    "employeeName": "Alice",
    "vehicleNo": "CAR-123",
    "liters": 40.5,
    "cost": 60.75
  },
  {
    "employeeId": "E-002",
    "employeeName": "Bob",
    "vehicleNo": "TRK-456",
    "liters": 80.0,
    "cost": 120.00
  },
  {
    "employeeId": "E-001",
    "employeeName": "Alice",
    "vehicleNo": "CAR-123",
    "liters": 35.0,
    "cost": 52.50
  },
  {
    "employeeId": "E-003",
    "employeeName": "Charlie",
    "vehicleNo": "VAN-789",
    "liters": 55.2,
    "cost": 82.80
  },
  {
    "employeeId": "E-002",
    "employeeName": "Bob",
    "vehicleNo": "TRK-456",
    "liters": 75.5,
    "cost": 113.25
  }
]

Our goal is to produce an output file, summary.json, that groups these records by employee and calculates the total liters and total cost.

Desired Output: summary.json

{
  "E-001": {
    "employeeName": "Alice",
    "vehicles": ["CAR-123"],
    "totalLiters": 75.5,
    "totalCost": 113.25
  },
  "E-002": {
    "employeeName": "Bob",
    "vehicles": ["TRK-456"],
    "totalLiters": 155.5,
    "totalCost": 233.25
  },
  "E-003": {
    "employeeName": "Charlie",
    "vehicles": ["VAN-789"],
    "totalLiters": 55.2,
    "totalCost": 82.8
  }
}

Step 2: Set Up Your Ballerina Project

Open your terminal and create a new Ballerina project.

bal new fuel_transformer
cd fuel_transformer

This command creates a new directory named fuel_transformer with a main.bal file and a Ballerina.toml configuration file. Create the fuel_records.json file inside this directory and paste the input data into it.

Step 3: Define the Data Models with Ballerina Records

Using typed records is a best practice. It makes your code self-documenting and prevents a whole class of runtime errors. Open main.bal and define the records that match our JSON structures.

// Represents a single entry in the input JSON file
type FuelRecord record {|
    string employeeId;
    string employeeName;
    string vehicleNo;
    decimal liters;
    decimal cost;
|};

// Represents the calculated summary for a single employee in the output
type EmployeeSummary record {|
    string employeeName;
    string[] vehicles = []; // Initialize with an empty array
    decimal totalLiters = 0;
    decimal totalCost = 0;
|};

// A map to hold the summaries, keyed by employeeId
type FuelSummary map<EmployeeSummary>;

Here, we define FuelRecord for the input and EmployeeSummary for the output. We also define FuelSummary as a type alias for a map, which will be our primary data structure for aggregation.

Step 4: The Core Transformation Logic

Now, let's write the main function that performs the transformation. We will follow a clear, logical flow: Read, Process, Write.

Here is the complete code for main.bal with detailed comments explaining each part.

import ballerina/io;
import ballerina/lang.'value;

// Represents a single entry in the input JSON file
type FuelRecord record {|
    string employeeId;
    string employeeName;
    string vehicleNo;
    decimal liters;
    decimal cost;
|};

// Represents the calculated summary for a single employee in the output
type EmployeeSummary record {|
    string employeeName;
    string[] vehicles = [];
    decimal totalLiters = 0;
    decimal totalCost = 0;
|};

// A map to hold the summaries, keyed by employeeId
type FuelSummary map<EmployeeSummary>;

public function main() returns error? {
    // --- 1. READ ---
    // Read the entire JSON file into a variable.
    // We cast the result to an array of our FuelRecord type.
    // The 'check' keyword handles potential errors from file reading or parsing.
    FuelRecord[] fuelRecords = check io:fileReadJson("fuel_records.json");

    // --- 2. PROCESS ---
    // Initialize an empty map to store the aggregated results.
    // The key will be the employeeId (string), and the value will be the EmployeeSummary record.
    FuelSummary summaryMap = {};

    // Iterate over each record from the input file.
    foreach var record in fuelRecords {
        string employeeId = record.employeeId;

        // Check if we have already seen this employee.
        if summaryMap.hasKey(employeeId) {
            // If employee exists, update their summary.
            // We use a definite assignment 'summaryMap[employeeId]' because we know the key exists.
            EmployeeSummary existingSummary = summaryMap[employeeId];
            
            // Add the current record's cost and liters to the totals.
            existingSummary.totalLiters += record.liters;
            existingSummary.totalCost += record.cost;

            // Add the vehicle number to the list if it's not already there.
            // This prevents duplicate vehicle numbers for the same employee.
            if !existingSummary.vehicles.includes(record.vehicleNo) {
                existingSummary.vehicles.push(record.vehicleNo);
            }
        } else {
            // If this is a new employee, create a new summary record for them.
            EmployeeSummary newSummary = {
                employeeName: record.employeeName,
                vehicles: [record.vehicleNo], // Start the list with the current vehicle
                totalLiters: record.liters,
                totalCost: record.cost
            };
            // Add the new summary to the map.
            summaryMap[employeeId] = newSummary;
        }
    }

    // --- 3. WRITE ---
    // Convert the final map to a clean JSON object for writing.
    // The 'cloneWithType' function ensures the output conforms to the json type.
    json outputJson = check 'value:cloneWithType(summaryMap, json);

    // Write the resulting JSON to the output file.
    // This will overwrite the file if it already exists.
    check io:fileWriteJson("summary.json", outputJson);

    io:println("Successfully transformed fuel records and wrote to summary.json");
}

Code Walkthrough & Explanation

Imports: We import ballerina/io for file operations and ballerina/lang.'value for the cloneWithType function, which is essential for converting our typed map back into a generic json type suitable for writing.
Reading the File: io:fileReadJson("fuel_records.json") reads the file and attempts to parse it as JSON. We immediately cast the result to FuelRecord[]. If the file doesn't exist, is not valid JSON, or doesn't match the structure of FuelRecord, Ballerina will raise an error, which is cleanly handled by the check keyword, propagating the error up from the main function.
Initializing the Map: We create an empty FuelSummary map called summaryMap. This map is the heart of our aggregation logic. It will store the running totals for each employee.
The Processing Loop: We use a foreach loop to iterate through every FuelRecord.
- Check for Existing Employee: summaryMap.hasKey(employeeId) is the crucial check. It determines if we've already started a summary for this employee.
- Updating an Existing Summary: If the key exists, we retrieve the existing EmployeeSummary, add the current record's liters and cost to the totals, and add the vehicleNo to the list of vehicles (only if it's not already present, ensuring a unique list).
- Creating a New Summary: If the key does not exist, it's the first time we're seeing this employee. We create a brand new EmployeeSummary record, populating it with the data from the current FuelRecord, and add it to our summaryMap.
Writing the Output: After the loop finishes, summaryMap contains the complete, aggregated data. We use 'value:cloneWithType(summaryMap, json) to create a deep copy of our map as a pure json value. This is a safe way to prepare typed data for untyped output. Finally, io:fileWriteJson("summary.json", outputJson) serializes this json value into a string and writes it to the specified file.

Visualizing the Transformation Logic

To better understand the flow of data and logic, let's visualize the process with two diagrams.

High-Level Process Flow

This diagram shows the overall pipeline from input file to output file.

    ● Start
    │
    ▼
  ┌───────────────────┐
  │ Read              │
  │ fuel_records.json │
  └─────────┬─────────┘
            │
            ▼
  ┌───────────────────┐
  │ Parse JSON to     │
  │ FuelRecord[]      │
  └─────────┬─────────┘
            │
            ▼
  ┌───────────────────┐
  │ Aggregate Data    │
  │ into summaryMap   │
  └─────────┬─────────┘
            │
            ▼
  ┌───────────────────┐
  │ Convert Map to    │
  │ final JSON object │
  └─────────┬─────────┘
            │
            ▼
  ┌───────────────────┐
  │ Write             │
  │ summary.json      │
  └─────────┬─────────┘
            │
            ▼
    ● End

Detailed Aggregation Loop Logic

This diagram zooms in on the decision-making process inside the foreach loop for each record.

    ● For each `record` in `fuelRecords`
    │
    ▼
┌──────────────┐
│ Get          │
│ employeeId   │
└──────┬───────┘
       │
       ▼
  ◆ summaryMap has employeeId?
   ╱           ╲
  Yes           No
  │              │
  ▼              ▼
┌─────────────────┐  ┌──────────────────┐
│ Retrieve        │  │ Create           │
│ existingSummary │  │ newSummary       │
└──────┬──────────┘  └─────────┬────────┘
       │                      │
       ▼                      │
┌─────────────────┐  ┌────────┴────────┐
│ Update totals:  │  │ Populate from   │
│ - totalLiters   │  │ current `record`│
│ - totalCost     │  └─────────┬────────┘
└──────┬──────────┘            │
       │                      │
       ▼                      │
┌─────────────────┐  ┌────────┴────────┐
│ Add vehicleNo   │  │ Add newSummary  │
│ (if unique)     │  │ to summaryMap   │
└─────────────────┘  └─────────────────┘
       │                      │
       └─────────┬────────────┘
                 │
                 ▼
    ● Next record

Where This Pattern Is Used in the Real World

The "Read-Process-Write" pattern combined with in-memory aggregation is incredibly common and forms the basis of many data-driven applications:

Financial Reporting: Aggregating daily transaction logs into monthly or quarterly financial summaries for different departments or clients.
Log Analysis: Processing web server access logs to count unique visitors, calculate error rates per endpoint, or determine peak traffic hours.
E-commerce Analytics: Reading a stream of sales orders to calculate total revenue per product, identify top customers, or analyze regional sales performance.
IoT Data Processing: Collecting sensor readings from thousands of devices and aggregating them to find averages, detect anomalies, or generate alerts.

Mastering this fundamental pattern in Ballerina equips you to tackle a wide range of data integration challenges you'll encounter in your career. You can further explore our complete guide to the Ballerina language for more advanced concepts.

Pros, Cons, and Alternative Approaches

The map-based aggregation method we used is clear, efficient, and idiomatic in Ballerina. However, it's important to understand its trade-offs and know when another approach might be more suitable.

In-Memory Map Aggregation

Pros	Cons
Fast Performance: Accessing and updating map entries is very fast (typically O(1) average time complexity).	High Memory Usage: The entire dataset and the summary map must fit into memory. This is not suitable for gigabyte-scale files.
Easy to Understand: The logic is imperative and follows a step-by-step flow, which is easy for most developers to read and debug.	Can be Verbose: The conditional logic (`if/else`) can become complex if more aggregation rules are added.
Flexible: It's easy to add complex logic, like the check for unique vehicle numbers, inside the conditional blocks.	Not Easily Parallelizable: A simple `foreach` loop runs sequentially. Parallelizing updates to a shared map requires careful concurrency control.

Alternative: Ballerina Query Expressions

Ballerina's query expressions provide a more declarative, SQL-like way to achieve the same result. For our problem, a query expression could look something like this:

// This is a conceptual example of an alternative approach
FuelSummary summary = from var r in fuelRecords
                      group by var employeeId = r.employeeId
                      select {
                          employeeName: r.employeeName,
                          vehicles: [], // Vehicle aggregation is more complex here
                          totalLiters: sum(r.liters),
                          totalCost: sum(r.cost)
                      };

While powerful for simple grouping and aggregation (like sum), this approach can become more complex when you need to perform custom logic, such as creating a unique list of vehicles within each group. For our specific problem, the imperative map-based solution offers a better balance of clarity and control.

Alternative: Streaming Processing

For extremely large JSON files that cannot fit in memory, a streaming approach would be necessary. This involves reading the JSON file piece by piece (e.g., one object at a time) instead of loading the whole file at once. While Ballerina has capabilities for this, it involves more complex logic for parsing and state management. The in-memory approach is the correct choice for files of a reasonable size (up to hundreds of megabytes).

Frequently Asked Questions (FAQ)

1. What is the difference between the json type and a typed record in Ballerina?: The json type is a union of all possible JSON values (string, int, float, decimal, boolean, json[], map<json>, ()). It is highly flexible but offers no compile-time guarantees about its structure. A typed record, like our FuelRecord, defines a specific, fixed structure with named fields and their expected types. Using records gives you static type safety, preventing typos and type mismatches before you even run the code.
2. How do I handle potential errors when reading a JSON file?: Ballerina has excellent built-in error handling. Functions that can fail, like io:fileReadJson, return a union type that includes error (e.g., json|error). The check keyword is a clean way to handle this. If the function returns an error, check immediately stops execution in the current function and returns the error to the caller. This avoids messy if err != nil blocks common in other languages.
3. Can Ballerina handle deeply nested JSON structures?: Absolutely. You can define nested records to mirror the structure of your JSON. For example, if an employee had an address object, you could define an Address record and include it as a field within your Employee record: Address address;. The mapping and type-checking work seamlessly through multiple levels of nesting.
4. How can I make the output JSON "pretty-printed" with indentation?: The standard io:fileWriteJson function produces a compact, single-line JSON string, which is optimal for machine-to-machine communication. For human-readable, indented output, you would typically use a library function that allows specifying serialization options. A common approach is to serialize to a string with indentation options first, then write that string to a file. For instance, the 'value:toJsonString() function can be used before writing.
5. What if my input JSON has missing or optional fields?: Ballerina records handle this gracefully. You can mark a field as optional by adding a question mark ? to its type, like string? middleName;. When Ballerina maps JSON to this record, it will not produce an error if the middleName field is missing in the JSON object; the field's value will simply be () (nil).
6. Why did we use 'value:cloneWithType(summaryMap, json) before writing the file?: This is a crucial step for type safety and correctness. Our summaryMap is a strongly-typed map<EmployeeSummary>. The io:fileWriteJson function expects a value of type json. The cloneWithType function performs a deep copy and conversion, ensuring the data structure is transformed into the generic json type that the I/O function understands, stripping away any Ballerina-specific type information that isn't part of the JSON standard.

Conclusion: Your Next Steps in Ballerina Data Mastery

You have successfully navigated a complete data transformation pipeline in Ballerina. Starting with raw, repetitive JSON data, you've used Ballerina's type-safe records, intuitive map data structures, and powerful I/O libraries to produce a clean, aggregated, and insightful summary. You've learned not just the "how" but also the "why" behind each decision, from choosing data structures to handling potential errors.

This foundational pattern of reading, processing, and writing data is a cornerstone of backend development. By mastering it in Ballerina, you are well-equipped to build robust APIs, efficient data pipelines, and reliable microservices. The language's focus on clarity, safety, and developer productivity makes it an invaluable tool in any modern software engineer's toolkit.

This exercise is just one part of a comprehensive learning journey. To continue building your skills, we highly recommend you explore the full Ballerina 4 learning path on kodikra.com, where you'll find more challenges that build on these concepts. For a deeper dive into the language features, be sure to consult our complete guide to the Ballerina language.

Disclaimer: The code and concepts in this article are based on Ballerina Swan Lake Update 8 (2023r3) and later versions. Syntax and library functions may differ in older versions of the language.

Published by Kodikra — Your trusted Ballerina learning resource.

kodikra

Search this blog