Union Find Basic
## Introduction to Union-Find (Disjoint-Set)
The **Union-Find** data structure (also known as a **Disjoint-Set** data structure) is a tree-based data structure designed to efficiently handle partitioning of a set into mutually disjoint (non-overlapping) subsets.
It primarily supports two operations:
1. **Find**: Determine which subset a particular element belongs to. This is typically used to check if two elements are in the same subset.
2. **Union**: Merge two distinct subsets into a single subset.
### Core Concept
The fundamental idea behind Union-Find is to represent a forest of trees using a single array (often named `parent` or `id`). Each tree represents a disjoint set, and the root node of the tree serves as the unique representative (or identifier) of that set. By finding the root node of any given element, we can instantly determine which set it belongs to.
---
## Use Cases and Applications
Union-Find is highly efficient and widely used in scenarios involving $N$ elements where we start with each element in its own singleton set. As the algorithm progresses, we merge sets based on specific relationships and repeatedly query whether elements belong to the same set.
### Why Use Union-Find?
While these operations sound simple, managing them with naive data structures (like lists or graphs) can lead to excessive memory consumption and poor time complexity when dealing with massive datasets. Union-Find solves these performance bottlenecks, making it ideal for:
* **Kruskal's Algorithm**: Finding the Minimum Spanning Tree (MST) of a graph.
* **Cycle Detection**: Checking for cycles in an undirected graph.
* **Network Connectivity**: Determining if paths exist between nodes in social networks, computer networks, or physical grids.
* **Image Processing**: Connected-component labeling.
---
## Basic Data Representation
To understand how Union-Find tracks relationships, let's look at how elements are mapped to their set identifiers using an array.
### Example 1: Block Partitioning
In this configuration, elements $0$ through $4$ belong to set `0`, and elements $5$ through $9$ belong to set `1`. This indicates that the first five elements are connected to each other, and the last five elements are connected to each other.
| Index ($i$) | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| **Value (`id`)** | **0** | **0** | **0** | **0** | **0** | **1** | **1** | **1** | **1** | **1** |
### Example 2: Interleaved Partitioning
In this configuration, even-indexed elements ($0, 2, 4, 6, 8$) belong to set `0`, while odd-indexed elements ($1, 3, 5, 7, 9$) belong to set `1`.
| Index ($i$) | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| **Value (`id`)** | **0** | **1** | **0** | **1** | **0** | **1** | **0** | **1** | **0** | **1** |
---
## Implementation and Code Examples
When initializing a Union-Find instance, we allocate an array of size $N$. Initially, no elements are merged, so each element points to itself as its own parent (representing $N$ independent singleton sets).
### Java Implementation
Below is the basic structure of the `UnionFind` class in Java:
```java
package runoob.union;
public class UnionFind {
// Array to store the parent/set ID of each element
private int[] id;
// Total number of elements
private int count;
/**
* Constructor to initialize the Union-Find data structure.
*
* @param n The number of elements.
*/
public UnionFind(int n) {
count = n;
id = new int;
// Initialization: Each element points to itself, meaning no elements are merged yet.
for (int i = 0; i < n; i++) {
id = i;
}
}
}
```
---
## Key Considerations and Optimizations
While the basic implementation above sets up the array structure, a production-ready Union-Find implementation requires efficient `find` and `union` methods. To prevent the trees from becoming too deep (which degrades performance), two key optimizations are typically applied:
1. **Path Compression (for `find`)**: During a `find` operation, we make traversed nodes point directly to the root node. This flattens the tree structure, ensuring subsequent queries run in near-constant time.
2. **Union by Rank/Size (for `union`)**: When merging two trees, we always attach the shorter tree under the root of the taller tree. This keeps the overall tree height as small as possible.
With these optimizations, the amortized time complexity per operation becomes $O(\alpha(N))$, where $\alpha$ is the Inverse Ackermann functionβwhich grows so slowly that it is practically $O(1)$ for all realistic inputs.
YouTip