A sparse matrix (English: sparse matrix) refers to a matrix in numerical analysis where the vast majority of values are zero. Conversely, if most elements are non-zero, the matrix is considered dense.
Large sparse matrices frequently appear when solving linear models in scientific and engineering fields.
The image above shows a sparse matrix on the left, containing many 0 elements, and a dense matrix on the right, where most elements are not 0.
Let's look at a simple example:
The above sparse matrix contains only 9 non-zero elements and 26 zero elements. Its sparsity is 74%, and its density is 26%.
SciPy's scipy.sparse module provides functions for handling sparse matrices.
We primarily use the following two types of sparse matrices:
- CSC - Compressed Sparse Column, compressed by column.
- CSR - Compressed Sparse Row, compressed by row.
In this chapter, we will mainly use the CSR matrix.
CSR Matrix
We can create a CSR matrix by passing an array to the scipy.sparse.csr_matrix() function.
Example
Create a CSR matrix.
import numpy as np
from scipy.sparse import csr_matrix
arr = np.array([0,0,0,0,0,1,1,0,2])
print(csr_matrix(arr))
The output of the above code is:
(0, 5) 1
(0, 6) 1
(0, 8) 2
Result Explanation:
- First line: In the first row of the matrix (index 0), at the sixth position (index 5), there is a value of 1.
- Second line: In the first row of the matrix (index 0), at the seventh position (index 6), there is a value of 1.
- Third line: In the first row of the matrix (index 0), at the ninth position (index 8), there is a value of 2.
CSR Matrix Methods
We can use the data attribute to view the stored data (excluding 0 elements):
Example
import numpy as np
from scipy.sparse import csr_matrix
arr = np.array([[0,0,0],[0,0,1],[1,0,2]])
print(csr_matrix(arr).data)
The output of the above code is:
Use the count_nonzero() method to calculate the total number of non-0 elements:
Example
import numpy as np
from scipy.sparse import csr_matrix
arr = np.array([[0,0,0],[0,0,1],[1,0,2]])
print(csr_matrix(arr).count_nonzero())
The output of the above code is:
3
Use the eliminate_zeros() method to remove 0 elements from the matrix:
Example
import numpy as np
from scipy.sparse import csr_matrix
arr = np.array([[0,0,0],[0,0,1],[1,0,2]])
mat = csr_matrix(arr)
mat.eliminate_zeros()
print(mat)
The output of the above code is:
(1, 2) 1
(2, 0) 1
(2, 2) 2
Use the sum_duplicates() method to remove duplicate entries:
Example
import numpy as np
from scipy.sparse import csr_matrix
arr = np.array([[0,0,0],[0,0,1],[1,0,2]])
mat = csr_matrix(arr)
mat.sum_duplicates()
print(mat)
The output of the above code is:
(1, 2) 1
(2, 0) 1
(2, 2) 2
To convert a CSR matrix to a CSC matrix, use the tocsc() method:
Example
import numpy as np
from scipy.sparse import csr_matrix
arr = np.array([[0,0,0],[0,0,1],[1,0,2]])
newarr = csr_matrix(arr).tocsc()
print(newarr)
The output of the above code is:
(2, 0) 1
(1, 2) 1
(2, 2) 2
YouTip