Custom Set in Csharp: Complete Solution & Deep Dive Guide
C# Custom Set: The Ultimate Guide to Building Your Own Collection from Scratch
Creating a custom set in C# is a foundational exercise in data structure design, involving a class that enforces unique element storage. This is typically achieved using an internal collection like a List<T> or Dictionary<T, bool>, and then exposing methods like Add, Contains, and IsSubset to mimic the behavior of standard set collections.
Ever found yourself reaching for HashSet<T> but wondering about the magic happening under the hood? Or perhaps you've faced a scenario where you needed a collection that guarantees uniqueness but with custom logic that the built-in types don't offer. This is a common crossroads for developers looking to move from simply using frameworks to truly understanding them.
This guide, inspired by the exclusive curriculum at kodikra.com, will demystify the process entirely. We won't just look at the theory; we will roll up our sleeves and build a fully functional CustomSet<T> class from the ground up. By the end, you'll not only have a powerful new tool in your arsenal but also a much deeper appreciation for the design of C#'s collection framework.
What Exactly Is a Set Data Structure?
Before we write a single line of C# code, it's crucial to grasp the core concept. In computer science, a set is an abstract data type that stores a collection of unique values, without any particular order. Think of it as a mathematical set translated into programming.
The two defining characteristics of a set are:
- Uniqueness: Each element within a set must be unique. You cannot add the same element twice. If you try, the set simply ignores the duplicate, ensuring its integrity. -
- Unordered: The elements in a set have no defined sequence or index. You can't ask for "the third element" because there is no guaranteed third element. You can only check for an element's presence or absence.
This contrasts sharply with other common collections like a List<T> or an array, where elements are stored in a specific order (indexed) and duplicates are perfectly acceptable. The primary purpose of a set is to efficiently answer the question: "Does this collection contain this specific item?"
Why Bother Building a Custom Set in C#?
With powerful built-in types like HashSet<T> available in the .NET library, you might ask, "Why reinvent the wheel?" This is a valid question, and the answer lies in several key benefits, especially for those following a structured learning path like the kodikra C# learning path.
For Deeper Learning and Understanding
The number one reason is educational. Building a data structure from scratch forces you to confront the fundamental problems they are designed to solve. You'll gain intimate knowledge of generics (<T>), interface implementation, algorithm design, and performance considerations (Big O notation). It's the difference between knowing how to drive a car and understanding how the engine works.
Full Control and Customization
Sometimes, the standard implementation doesn't quite fit your needs. Perhaps you need to trigger a specific event whenever a new item is added, or maybe you need to implement a custom, domain-specific equality comparison that goes beyond a simple Equals override. Building your own set gives you the ultimate control to inject this custom logic.
Performance Tuning for Niche Scenarios
While HashSet<T> is highly optimized for general use, a custom implementation allows you to tune for specific scenarios. For instance, if you know your set will always be very small, a simple List<T>-backed implementation might be sufficient and have less memory overhead than a hash-based structure.
Preparation for Technical Interviews
Data structure implementation is a classic topic in software engineering interviews. Being able to confidently design, implement, and discuss the trade-offs of a custom set on a whiteboard demonstrates a strong command of computer science fundamentals that companies highly value.
How to Implement a Custom Set: A Step-by-Step Guide
Now, let's get to the practical part. We will build our CustomSet<T> class. For this implementation, we will use a List<T> as the internal data store. This approach is straightforward to understand and clearly illustrates the logic required to enforce set behavior.
Step 1: The Class Skeleton
First, we define the class structure. We make it generic by using <T>, allowing our set to hold elements of any type (int, string, custom objects, etc.). The internal storage will be a private list.
// C#
using System.Collections.Generic;
using System.Linq;
public class CustomSet<T>
{
private readonly List<T> _elements;
// Constructor to initialize an empty set
public CustomSet()
{
_elements = new List<T>();
}
// Constructor to initialize from an existing collection, ensuring uniqueness
public CustomSet(IEnumerable<T> startingElements)
{
_elements = new List<T>();
foreach (var element in startingElements)
{
Add(element);
}
}
// ... methods will go here ...
}
We've created two constructors: a default one for an empty set and another that accepts any IEnumerable<T> (like an array or list), carefully adding each element to maintain uniqueness.
Step 2: Implementing Core Set Methods
A set isn't useful without its core operations. Let's implement them one by one.
Checking for Emptiness and Membership
These are the simplest methods. IsEmpty checks if the set has any elements, and Contains checks for the presence of a specific element.
public bool IsEmpty()
{
return _elements.Count == 0;
}
public bool Contains(T element)
{
return _elements.Contains(element);
}
It's important to note that for a List<T>, the Contains method has a time complexity of O(n), meaning it may have to scan the entire list in the worst case.
Adding a New Element
This is where the core "uniqueness" rule is enforced. Before adding an element, we must first check if it already exists.
public void Add(T element)
{
if (!Contains(element))
{
_elements.Add(element);
}
}
This simple logic is the heart of the set. Here is a visual representation of the flow:
● Start: Add(element)
│
▼
┌──────────────────┐
│ Receive 'element'│
└─────────┬────────┘
│
▼
◆ Does set already
contain 'element'?
╱ ╲
Yes (Exists) No (Unique)
│ │
▼ ▼
┌─────────┐ ┌───────────────────┐
│ Do nothing│ │ Add 'element' to │
│ │ │ internal list │
└─────────┘ └───────────────────┘
│ │
└─────────┬───────────┘
│
▼
● End
Step 3: Implementing Set-to-Set Operations
The true power of sets comes from their ability to interact with each other. Let's implement methods like IsSubset, IsDisjointFrom, and Equals.
IsSubset
A set 'A' is a subset of set 'B' if all elements of 'A' are also present in 'B'.
public bool IsSubset(CustomSet<T> other)
{
if (other == null)
{
return false; // Or throw an exception, depending on desired behavior
}
// If this set is empty, it's a subset of any other set.
if (this.IsEmpty())
{
return true;
}
// Check if every element in this set is also in the other set.
foreach (var element in _elements)
{
if (!other.Contains(element))
{
return false; // Found an element that is not in the other set
}
}
return true; // All elements were found
}
The logic involves iterating through our own elements and ensuring each one can be found in the `other` set. Here is the flow diagram for this logic:
● Start: IsSubset(otherSet)
│
▼
┌────────────────────────┐
│ For each 'item' in │
│ the current set's list │
└────────────┬───────────┘
│
▼
◆ Does otherSet.Contains(item)?
╱ ╲
No Yes
│ │
▼ ▼
┌─────────────────┐ ┌──────────────────┐
│ Return `false` │ │ Continue to next │
│ (Exit early) │ │ item in loop │
└─────────────────┘ └──────────────────┘
│ │
└───────────────────────────────┘
│
▼ (Loop Completes Successfully)
┌─────────────────┐
│ Return `true` │
└─────────────────┘
│
▼
● End
IsDisjointFrom
Two sets are disjoint if they have no elements in common.
public bool IsDisjointFrom(CustomSet<T> other)
{
if (other == null) return true; // Disjoint from null
foreach (var element in _elements)
{
if (other.Contains(element))
{
return false; // Found a common element
}
}
return true; // No common elements found
}
Equals
Two sets are considered equal if they contain the exact same elements. The order does not matter. A robust way to check this is to verify that each set is a subset of the other.
public bool Equals(CustomSet<T> other)
{
if (other == null || this._elements.Count != other._elements.Count)
{
return false;
}
// A more efficient check than two subset calls:
// A is a subset of B and B is a subset of A.
return this.IsSubset(other) && other.IsSubset(this);
}
The Complete C# Code Solution
Here is the full, commented code for our CustomSet<T> class, based on the principles discussed in the kodikra.com module.
// C#
using System;
using System.Collections.Generic;
using System.Linq;
/// <summary>
/// A custom implementation of a set data structure.
/// It stores a collection of unique elements of type T.
/// This implementation uses a List<T> as the backing store.
/// </summary>
/// <typeparam name="T">The type of elements in the set.</typeparam>
public class CustomSet<T>
{
private readonly List<T> _elements;
/// <summary>
/// Initializes a new, empty CustomSet.
/// </summary>
public CustomSet()
{
_elements = new List<T>();
}
/// <summary>
/// Initializes a new CustomSet with elements from a collection, ensuring uniqueness.
/// </summary>
/// <param name="startingElements">The initial collection of elements.</param>
public CustomSet(IEnumerable<T> startingElements)
{
_elements = new List<T>();
if (startingElements != null)
{
foreach (var element in startingElements)
{
Add(element);
}
}
}
/// <summary>
/// Checks if the set is empty.
/// </summary>
/// <returns>True if the set contains no elements; otherwise, false.</returns>
public bool IsEmpty()
{
return _elements.Count == 0;
}
/// <summary>
/// Determines whether the set contains a specific element.
/// Time complexity: O(n) due to List.Contains.
/// </summary>
/// <param name="element">The element to locate in the set.</param>
/// <returns>True if the element is found; otherwise, false.</returns>
public bool Contains(T element)
{
return _elements.Contains(element);
}
/// <summary>
/// Adds an element to the set if it is not already present.
/// Time complexity: O(n) because it first calls Contains.
/// </summary>
/// <param name="element">The element to add.</param>
public void Add(T element)
{
if (!Contains(element))
{
_elements.Add(element);
}
}
/// <summary>
/// Determines whether the current set is a subset of a specified collection.
/// A set is a subset of another if all its elements are in the other set.
/// </summary>
/// <param name="other">The set to compare to the current set.</param>
/// <returns>True if the current set is a subset of other; otherwise, false.</returns>
public bool IsSubset(CustomSet<T> other)
{
if (other == null) return false;
if (this.IsEmpty()) return true;
// Every element in 'this' must be in 'other'.
return _elements.All(other.Contains);
}
/// <summary>
/// Determines whether the current set and a specified set are disjoint.
/// Two sets are disjoint if they have no elements in common.
/// </summary>
/// <param name="other">The set to compare to the current set.</param>
/// <returns>True if the sets are disjoint; otherwise, false.</returns>
public bool IsDisjointFrom(CustomSet<T> other)
{
if (other == null || this.IsEmpty() || other.IsEmpty()) return true;
// No element in 'this' should be in 'other'.
return !_elements.Any(other.Contains);
}
/// <summary>
/// Determines whether two sets contain the same elements.
/// </summary>
/// <param name="other">The set to compare to the current set.</param>
/// <returns>True if the sets are equal; otherwise, false.</returns>
public bool Equals(CustomSet<T> other)
{
if (other == null || this._elements.Count != other._elements.Count)
{
return false;
}
// A robust check for equality is that they are subsets of each other.
return this.IsSubset(other); // Given counts are same, one-way subset check is enough
}
/// <summary>
/// Returns a new set that contains elements present in both this set and the other set.
/// </summary>
/// <param name="other">The set to intersect with.</param>
/// <returns>A new CustomSet containing the common elements.</returns>
public CustomSet<T> Intersection(CustomSet<T> other)
{
var intersectionSet = new CustomSet<T>();
if (other == null) return intersectionSet;
foreach (var element in _elements)
{
if (other.Contains(element))
{
intersectionSet.Add(element);
}
}
return intersectionSet;
}
/// <summary>
/// Returns a new set with elements from the current set that are not in the other set.
/// </summary>
/// <param name="other">The set whose elements will be removed.</param>
/// <returns>A new CustomSet with the resulting elements.</returns>
public CustomSet<T> Difference(CustomSet<T> other)
{
var differenceSet = new CustomSet<T>();
if (other == null) return new CustomSet<T>(_elements);
foreach (var element in _elements)
{
if (!other.Contains(element))
{
differenceSet.Add(element);
}
}
return differenceSet;
}
/// <summary>
/// Returns a new set containing all unique elements present in both sets.
/// </summary>
/// <param name="other">The set to combine with.</param>
/// <returns>A new CustomSet with the union of elements.</returns>
public CustomSet<T> Union(CustomSet<T> other)
{
// Start with all elements from the current set.
var unionSet = new CustomSet<T>(_elements);
if (other != null)
{
// Add all elements from the other set. The Add method handles uniqueness.
foreach (var element in other._elements)
{
unionSet.Add(element);
}
}
return unionSet;
}
}
To compile and run a program using this class, you can create a simple Program.cs, save both files, and use the .NET CLI.
# In your terminal, navigate to the project directory
# Then run the application
dotnet run
Alternative Approaches and Performance Considerations
Our implementation using a List<T> is clear and functional, but it's not the most performant. The Contains method, which is used by many other methods, has a linear time complexity of O(n). This means that as the set grows, operations become progressively slower.
A More Performant Approach: Using a Dictionary
A much more efficient way to build a custom set is to use a Dictionary<T, bool> or Dictionary<T, object> as the backing store. The dictionary's key provides fast lookups.
Here's how the Add and Contains methods would change:
// C# - Alternative implementation using a Dictionary for O(1) lookups
public class FastCustomSet<T>
{
private readonly Dictionary<T, bool> _elements;
public FastCustomSet()
{
_elements = new Dictionary<T, bool>();
}
// Contains becomes an O(1) operation on average
public bool Contains(T element)
{
return _elements.ContainsKey(element);
}
// Add also becomes an O(1) operation on average
public void Add(T element)
{
// Dictionary handles uniqueness of keys automatically.
// We can use the indexer for a concise add/update.
_elements[element] = true;
}
// ... other methods would be adapted similarly ...
}
By leveraging the dictionary's hash-based structure, we change the average time complexity for Add, Contains, and Remove (if we were to implement it) from O(n) to O(1). This is a massive performance gain for large sets and is how the standard HashSet<T> works internally.
Where and When to Use a Custom Set
While you should default to using the battle-tested HashSet<T> for most production scenarios, understanding where a custom implementation fits is key.
Real-World Use Cases:
- Data Deduplication: Processing a stream of data from an API or a file and needing to store only the unique records encountered.
- Permission and Role Management: A user's roles can be stored in a set (e.g., {"Admin", "Editor"}). Checking permissions is a simple
set.Contains("Admin")call. - Feature Flag Tracking: Storing the set of unique features enabled for a particular user or session.
- Graph Algorithms: Tracking visited nodes in a graph traversal algorithm to avoid infinite loops. A set is perfect for storing the unique nodes that have already been processed.
Comparison: CustomSet vs. Built-in HashSet<T>
Let's summarize the trade-offs in a table.
| Feature | Our CustomSet (List-based) | .NET HashSet<T> |
|---|---|---|
| Performance (Add/Contains) | Poor - O(n) | Excellent - O(1) on average |
| Memory Usage | Generally lower for very small sets, as there's no hashing overhead. | Higher overhead due to the hash table structure, but more efficient for large sets. |
| Flexibility & Control | Maximum. You can add any custom logic, logging, or events. | Limited. You can't change its internal behavior. |
| Learning Value | Extremely high. Teaches core data structure concepts. | Low. It's a tool to be used, not a lesson in itself. |
| Production Readiness | Low. Not optimized or thoroughly tested for edge cases. | High. Fully optimized, tested, and maintained by Microsoft. |
The verdict is clear: use HashSet<T> in your projects. Build a CustomSet to become a better developer. For more in-depth C# knowledge, dive deeper into C# fundamentals on our platform.
Frequently Asked Questions (FAQ)
What is the best internal data structure for a custom set?
For performance, a hash-based structure like Dictionary<T, bool> is the best choice. It provides average O(1) time complexity for core operations like add, remove, and contains. A List<T> is simpler to implement and understand but suffers from O(n) performance, making it unsuitable for large sets.
How can I make my custom set work with my own custom objects?
To ensure your custom objects work correctly (especially with a dictionary-based set), you must override the Equals(object obj) and GetHashCode() methods in your custom class. This tells the collection how to determine if two of your objects are "equal" and how to hash them for efficient storage. Implementing the IEquatable<T> interface is also highly recommended.
Is the List<T>-based set ever a good choice in production?
It could be acceptable in very specific, niche scenarios where the number of elements is guaranteed to always be very small (e.g., less than 10-20). In such cases, the overhead of a hash table might not be worth it, and the simplicity of a list might be preferable. However, for any general-purpose use, it is not recommended.
How would you implement set operations like Union and Intersection efficiently?
For the most efficient implementation (with a hash-based set), you would iterate over the smaller of the two sets. For Intersection, you'd check for each element's existence in the larger set. For Union, you'd add all elements from the larger set to a new set, then iterate through the smaller set, adding only the missing elements. This minimizes the number of lookups.
Can this CustomSet be made thread-safe?
No, this implementation is not thread-safe. If multiple threads tried to add elements simultaneously, you could get race conditions leading to duplicate entries or other errors. To make it thread-safe, you would need to use locking mechanisms, such as the lock statement in C#, around any code that modifies the internal collection (e.g., inside the Add method).
Why is the CustomSet class generic (using <T>)?
Generics allow us to write a single class that can work with any data type in a type-safe manner. Without <T>, we would either have to create separate set classes for each type (IntSet, StringSet, etc.) or use object, which would require casting and lose compile-time type safety. Generics are a cornerstone of modern, reusable C# code.
What is the main difference between a set and a list?
The two primary differences are uniqueness and order. A Set guarantees that all its elements are unique and does not maintain any specific order. A List, on the other hand, allows duplicate elements and maintains them in the exact order they were inserted, accessible by an index.
Conclusion: From Theory to Mastery
We've journeyed from the theoretical definition of a set to a complete, hands-on C# implementation. By building a CustomSet from scratch, you've engaged with generics, algorithmic logic, and the critical performance trade-offs between different backing data structures. This exercise, a key part of the kodikra.com C# learning path, is designed to build the kind of deep, foundational knowledge that separates good developers from great ones.
While you'll likely use the built-in HashSet<T> in your day-to-day work for its superior performance and reliability, you now possess the invaluable insight into how it functions. You understand the "why" behind its design, empowering you to make better architectural decisions and tackle more complex programming challenges in the future.
Disclaimer: The code in this article is written and tested against .NET 8 and C# 12. While the core concepts are timeless, syntax and library features may evolve in future versions of the .NET framework.
Published by Kodikra — Your trusted Csharp learning resource.
Post a Comment