YouTip LogoYouTip

Pytorch Torch Unique_Consecutive

## PyTorch `torch.unique_consecutive` API Reference In PyTorch, `torch.unique_consecutive` is a utility function designed to eliminate consecutive duplicate elements along a given dimension of a tensor. Unlike `torch.unique`, which removes all duplicates and returns sorted unique values across the entire tensor, `torch.unique_consecutive` only deduplicates elements that are adjacent (consecutive) to each other. This behavior is highly useful for tasks like run-length encoding (RLE), processing sequential data, or handling state transitions in sequence-to-sequence models (such as CTC decoding in speech recognition). --- ## Function Definition ```python torch.unique_consecutive(input, return_inverse=False, return_counts=False, dim=None) -> Tensor or Tuple[Tensor, Tensor, Tensor] ``` ### Parameters | Parameter | Type | Description | Default | | :--- | :--- | :--- | :--- | | `input` | *Tensor* | The input tensor to be processed. | *Required* | | `return_inverse` | *bool* | If `True`, returns an inverse tensor containing indices to reconstruct the original input from the unique output tensor. | `False` | | `return_counts` | *bool* | If `True`, returns the count of occurrences for each consecutive unique element. | `False` | | `dim` | *int* | The dimension along which to apply the unique operation. If `None`, the input tensor is flattened first. | `None` | ### Return Value Returns a single tensor or a tuple of tensors depending on the input arguments: 1. **`output`**: The tensor containing consecutive unique elements. 2. **`inverse_indices`** *(Optional, if `return_inverse=True`)*: A tensor of indices mapping the output back to the original input shape. 3. **`counts`** *(Optional, if `return_counts=True`)*: A tensor containing the run-length counts of each consecutive unique element. --- ## Code Examples ### Example 1: Basic Usage and Counts Retrieval This example demonstrates how to extract consecutive unique values from a 1D tensor and retrieve their respective run-length counts. ```python import torch # Create a 1D tensor with consecutive duplicate values x = torch.tensor([1, 1, 1, 2, 2, 3, 3, 3, 1, 1]) # 1. Get consecutive unique values unique_vals = torch.unique_consecutive(x) print(f"Consecutive unique values: {unique_vals}") # Output: tensor([1, 2, 3, 1]) # 2. Get consecutive unique values along with their counts unique_vals, counts = torch.unique_consecutive(x, return_counts=True) print(f"Unique values: {unique_vals}") print(f"Counts: {counts}") # Output: # Unique values: tensor([1, 2, 3, 1]) # Counts: tensor([3, 2, 3, 2]) ``` ### Example 2: Reconstructing the Original Tensor using `return_inverse` By setting `return_inverse=True`, you can obtain the mapping indices required to reconstruct the original input tensor from the deduplicated output. ```python import torch x = torch.tensor([5, 5, 8, 8, 8, 2, 2, 5]) # Get unique values and inverse indices unique_vals, inverse_indices = torch.unique_consecutive(x, return_inverse=True) print(f"Original: {x}") print(f"Unique: {unique_vals}") print(f"Inverse Indices: {inverse_indices}") # Output: # Original: tensor([5, 5, 8, 8, 8, 2, 2, 5]) # Unique: tensor([5, 8, 2, 5]) # Inverse Indices: tensor([0, 0, 1, 1, 1, 2, 2, 3]) # Reconstruct the original tensor reconstructed = unique_vals print(f"Reconstructed: {reconstructed}") # Output: tensor([5, 5, 8, 8, 8, 2, 2, 5]) ``` ### Example 3: Multi-dimensional Tensors along a Specific Dimension When working with multi-dimensional tensors, you can specify the `dim` parameter to deduplicate entire slices (rows or columns) that are consecutively identical. ```python import torch # Create a 2D tensor (3x3) where some rows are consecutive duplicates y = torch.tensor([ [1, 2], [1, 2], [3, 4], [1, 2] ]) # Deduplicate consecutive rows (dim=0) unique_rows = torch.unique_consecutive(y, dim=0) print("Original 2D Tensor:") print(y) print("\nUnique Consecutive Rows:") print(unique_rows) # Output: # tensor([[1, 2], # [3, 4], # [1, 2]]) ``` --- ## Key Considerations * **Difference from `torch.unique`**: * `torch.unique` sorts the output and removes *all* duplicates globally (e.g., `[1, 1, 2, 1]` becomes `[1, 2]`). * `torch.unique_consecutive` preserves the original order of appearance and only merges adjacent duplicates (e.g., `[1, 1, 2, 1]` becomes `[1, 2, 1]`). * **Performance**: Since `torch.unique_consecutive` does not require sorting the entire tensor, it is generally faster and more memory-efficient than `torch.unique` for large datasets. * **Dimensionality**: If `dim=None`, the input tensor is flattened to a 1D tensor before processing. If you want to preserve the structure of a multi-dimensional tensor, always specify the target `dim`.
← Pytorch Torch VanderPytorch Torch Unflatten β†’