Pytorch Torch Unique_Consecutive
## PyTorch `torch.unique_consecutive` API Reference
In PyTorch, `torch.unique_consecutive` is a utility function designed to eliminate consecutive duplicate elements along a given dimension of a tensor. Unlike `torch.unique`, which removes all duplicates and returns sorted unique values across the entire tensor, `torch.unique_consecutive` only deduplicates elements that are adjacent (consecutive) to each other.
This behavior is highly useful for tasks like run-length encoding (RLE), processing sequential data, or handling state transitions in sequence-to-sequence models (such as CTC decoding in speech recognition).
---
## Function Definition
```python
torch.unique_consecutive(input, return_inverse=False, return_counts=False, dim=None) -> Tensor or Tuple[Tensor, Tensor, Tensor]
```
### Parameters
| Parameter | Type | Description | Default |
| :--- | :--- | :--- | :--- |
| `input` | *Tensor* | The input tensor to be processed. | *Required* |
| `return_inverse` | *bool* | If `True`, returns an inverse tensor containing indices to reconstruct the original input from the unique output tensor. | `False` |
| `return_counts` | *bool* | If `True`, returns the count of occurrences for each consecutive unique element. | `False` |
| `dim` | *int* | The dimension along which to apply the unique operation. If `None`, the input tensor is flattened first. | `None` |
### Return Value
Returns a single tensor or a tuple of tensors depending on the input arguments:
1. **`output`**: The tensor containing consecutive unique elements.
2. **`inverse_indices`** *(Optional, if `return_inverse=True`)*: A tensor of indices mapping the output back to the original input shape.
3. **`counts`** *(Optional, if `return_counts=True`)*: A tensor containing the run-length counts of each consecutive unique element.
---
## Code Examples
### Example 1: Basic Usage and Counts Retrieval
This example demonstrates how to extract consecutive unique values from a 1D tensor and retrieve their respective run-length counts.
```python
import torch
# Create a 1D tensor with consecutive duplicate values
x = torch.tensor([1, 1, 1, 2, 2, 3, 3, 3, 1, 1])
# 1. Get consecutive unique values
unique_vals = torch.unique_consecutive(x)
print(f"Consecutive unique values: {unique_vals}")
# Output: tensor([1, 2, 3, 1])
# 2. Get consecutive unique values along with their counts
unique_vals, counts = torch.unique_consecutive(x, return_counts=True)
print(f"Unique values: {unique_vals}")
print(f"Counts: {counts}")
# Output:
# Unique values: tensor([1, 2, 3, 1])
# Counts: tensor([3, 2, 3, 2])
```
### Example 2: Reconstructing the Original Tensor using `return_inverse`
By setting `return_inverse=True`, you can obtain the mapping indices required to reconstruct the original input tensor from the deduplicated output.
```python
import torch
x = torch.tensor([5, 5, 8, 8, 8, 2, 2, 5])
# Get unique values and inverse indices
unique_vals, inverse_indices = torch.unique_consecutive(x, return_inverse=True)
print(f"Original: {x}")
print(f"Unique: {unique_vals}")
print(f"Inverse Indices: {inverse_indices}")
# Output:
# Original: tensor([5, 5, 8, 8, 8, 2, 2, 5])
# Unique: tensor([5, 8, 2, 5])
# Inverse Indices: tensor([0, 0, 1, 1, 1, 2, 2, 3])
# Reconstruct the original tensor
reconstructed = unique_vals
print(f"Reconstructed: {reconstructed}")
# Output: tensor([5, 5, 8, 8, 8, 2, 2, 5])
```
### Example 3: Multi-dimensional Tensors along a Specific Dimension
When working with multi-dimensional tensors, you can specify the `dim` parameter to deduplicate entire slices (rows or columns) that are consecutively identical.
```python
import torch
# Create a 2D tensor (3x3) where some rows are consecutive duplicates
y = torch.tensor([
[1, 2],
[1, 2],
[3, 4],
[1, 2]
])
# Deduplicate consecutive rows (dim=0)
unique_rows = torch.unique_consecutive(y, dim=0)
print("Original 2D Tensor:")
print(y)
print("\nUnique Consecutive Rows:")
print(unique_rows)
# Output:
# tensor([[1, 2],
# [3, 4],
# [1, 2]])
```
---
## Key Considerations
* **Difference from `torch.unique`**:
* `torch.unique` sorts the output and removes *all* duplicates globally (e.g., `[1, 1, 2, 1]` becomes `[1, 2]`).
* `torch.unique_consecutive` preserves the original order of appearance and only merges adjacent duplicates (e.g., `[1, 1, 2, 1]` becomes `[1, 2, 1]`).
* **Performance**: Since `torch.unique_consecutive` does not require sorting the entire tensor, it is generally faster and more memory-efficient than `torch.unique` for large datasets.
* **Dimensionality**: If `dim=None`, the input tensor is flattened to a 1D tensor before processing. If you want to preserve the structure of a multi-dimensional tensor, always specify the target `dim`.
YouTip