Pytorch Torch Corrcoef
## PyTorch torch.corrcoef
The `torch.corrcoef` function in PyTorch is used to compute the Pearson correlation coefficient matrix of an input tensor. This matrix measures the linear correlation between multiple variables, with values ranging from -1 to 1.
---
## Introduction to Pearson Correlation Coefficient
The Pearson correlation coefficient ($r$) between two variables $X$ and $Y$ is calculated as:
$$r = \frac{\text{cov}(X, Y)}{\sigma_X \sigma_Y}$$
Where:
* $\text{cov}(X, Y)$ is the covariance of $X$ and $Y$.
* $\sigma_X$ and $\sigma_Y$ are the standard deviations of $X$ and $Y$, respectively.
The resulting matrix contains values where:
* **1** indicates a perfect positive linear relationship.
* **-1** indicates a perfect negative linear relationship.
* **0** indicates no linear relationship.
---
## Syntax
```python
torch.corrcoef(input)
```
### Parameters
| Parameter | Type | Description |
| :--- | :--- | :--- |
| `input` | *Tensor* | A 2D matrix containing multiple variables and observations, or a 1D vector representing a single variable. |
### Return Value
* **Tensor**: A correlation coefficient matrix. If the input is a 2D tensor of shape `(M, N)` (where $M$ represents the number of variables and $N$ represents the number of observations), the output will be a symmetric matrix of shape `(M, M)`.
---
## Code Examples
### Example 1: Computing the Correlation Matrix of a 2D Tensor
This example demonstrates how to compute the correlation matrix for a 2D tensor where each row represents a variable and each column represents an observation.
```python
import torch
# Create a 2D tensor (3 variables, 4 observations each)
x = torch.tensor([[1.0, 2.0, 3.0, 4.0],
[2.0, 3.0, 4.0, 5.0],
[3.0, 4.0, 5.0, 6.0]])
print("Input shape:", x.shape)
print("Input Tensor:")
print(x)
# Compute the correlation coefficient matrix
corr = torch.corrcoef(x)
print("\nCorrelation Coefficient Matrix:")
print(corr)
```
**Output:**
```text
Input shape: torch.Size([3, 4])
Input Tensor:
tensor([[1., 2., 3., 4.],
[2., 3., 4., 5.],
[3., 4., 5., 6.]])
Correlation Coefficient Matrix:
tensor([[1.0000, 1.0000, 1.0000],
[1.0000, 1.0000, 1.0000],
[1.0000, 1.0000, 1.0000]])
```
*Note: Since all three rows increase linearly at the exact same rate, they are perfectly positively correlated, resulting in a matrix of all $1.0$.*
---
### Example 2: Correlation Between Two Specific Variables
To find the correlation between two 1D tensors (variables), stack them into a 2D tensor of shape `(2, N)` before passing them to `torch.corrcoef`.
```python
import torch
# Define two variables
a = torch.tensor([1.0, 2.0, 3.0, 4.0, 5.0])
b = torch.tensor([2.0, 4.0, 6.0, 8.0, 10.0])
# Stack them to create a (2, N) shape tensor
data = torch.stack([a, b])
# Compute the correlation matrix
corr2 = torch.corrcoef(data)
print("Correlation Matrix:")
print(corr2)
# Extract the correlation coefficient between variable 'a' and 'b'
r_ab = corr2[0, 1]
print(f"\nCorrelation coefficient between a and b: {r_ab.item():.4f}")
# Output is 1.0000, indicating a perfect positive linear correlation
```
**Output:**
```text
Correlation Matrix:
tensor([[1.0000, 1.0000],
[1.0000, 1.0000]])
Correlation coefficient between a and b: 1.0000
```
---
## Important Considerations
1. **Data Type**: `torch.corrcoef` requires floating-point data types (e.g., `torch.float32` or `torch.float64`). If you pass integer tensors, PyTorch will raise a `RuntimeError`. Always cast your input using `.float()` or `.double()` if necessary.
2. **Input Dimensions**: The input tensor must be 1D or 2D.
* A 1D tensor of shape `(N,)` is treated as a single variable with $N$ observations and will return a $1 \times 1$ matrix containing `[[1.0]]`.
* A 2D tensor of shape `(M, N)` is treated as $M$ variables with $N$ observations.
3. **NaN Values**: If any variable has zero variance (i.e., all elements in a row are identical), the standard deviation will be zero. This leads to division by zero during calculation, resulting in `nan` (Not a Number) values in the output matrix.
YouTip