Pandas Pd Unique
[ Pandas Common Functions](#)
* * *
`pd.unique()` is a function in the Pandas library used to **obtain unique values in an array**. It returns all non-repeating values in the input array, removing duplicates.
This is a common operation in data analysis, such as counting how many different values are in a column, or getting all categories of a categorical variable.
**Word Definition**: `unique` means "only, distinctive", and here refers to returning non-repeating values in an array.
* * *
## Basic Syntax and Parameters
`pd.unique()` is a top-level function in the Pandas library used to extract unique values from an array.
### Syntax Format
pd.unique(values)
### Parameter Description
* **Parameter**: `values`
* Type: Array-like object, such as list, Series, one-dimensional array, etc.
* Description: Input data to extract unique values from. Can be any one-dimensional array structure.
### Function Description
* **Return Value**: Returns an ndarray (NumPy array) containing all unique values.
* **Effect**: Removes duplicate values from the input data, keeping each value only once.
* * *
## Examples
Let's thoroughly master the usage of `pd.unique()` through a series of examples from simple to complex.
### Example 1: Basic Usage - Extracting Unique Values from Series
## Example
import pandas as pd
import numpy as np
# 1. Create a Series with duplicate values
colors = pd.Series(['red','blue','green','red','blue','yellow','red'])
print("=== Original Series ===")
print(colors)
# 2. Use pd.unique() to get unique values
unique_values = pd.unique(colors)
print("n=== pd.unique() Unique Values ===")
print(unique_values)
print(f"n Number of unique values: {len(unique_values)}")
**Expected Output:**
=== Original Series ===0 red 1 blue 2 green 3 red 4 blue 5 yellow 6 red dtype: object=== pd.unique() Unique Values ===['red' 'blue' 'green' 'yellow']Number of unique values: 4
**Code Analysis:**
* The original Series has 7 elements, but only 4 unique values.
* `pd.unique()` returns a NumPy array containing all non-repeating values.
* The order of the returned result is not guaranteed (but usually follows the order of first appearance).
### Example 2: Extracting Unique Values from Lists and Arrays
`pd.unique()` can not only handle Series but also various array structures.
## Example
import pandas as pd
import numpy as np
# 1. Extract unique values from a list
numbers =[1,2,3,2,1,4,5,3,2]
print("=== Original List ===")
print(numbers)
unique_numbers = pd.unique(numbers)
print("n=== Unique Values from List ===")
print(unique_numbers)
print(f"Type: {type(unique_numbers)}")
# 2. Extract unique values from NumPy array
arr = np.array(['a','b','a','c','b','d'])
print("n=== NumPy Array ===")
print(arr)
print("Unique values:", pd.unique(arr))
# 3. Extract unique values from DataFrame column
df = pd.DataFrame({
'name': ['Alice','Bob','Charlie','Alice','Diana'],
'city': ['Beijing','Shanghai','Beijing','Beijing','Guangzhou']
})
print("n=== DataFrame ===")
print(df)
print("n Unique values in name column:", pd.unique(df['name']))
print("Unique values in city column:", pd.unique(df['city']))
**Expected Output:**
=== Original List ===[1, 2, 3, 2, 1, 4, 5, 3, 2]=== Unique Values from List ===Type: === NumPy Array ===['a' 'b' 'a' 'c' 'b' 'd']Unique values: ['a' 'b' 'c' 'd']=== DataFrame === name city 0 Alice Beijing1 Bob Shanghai2 Charlie Beijing3 Alice Beijing4 Diana Guangzhong=== Unique values in name column: ['Alice' 'Bob' 'Charlie' 'Diana']=== Unique values in city column: ['Beijing' 'Shanghai', 'Guangzhou']
**Code Analysis:**
* `pd.unique()` can handle Python lists, NumPy arrays, and DataFrame columns.
* The returned result is always a one-dimensional NumPy array.
* When processing DataFrame, you need to specify the specific column (using bracket syntax).
### Example 3: Handling Numeric Data and Sorting
When handling numeric unique values, sorting and statistical analysis can be done conveniently.
## Example
import pandas as pd
import numpy as np
# 1. Series with duplicate numeric values
scores = pd.Series([85,90,78,85,92,90,78,88,95])
print("=== Original Scores ===")
print(scores)
# 2. Get unique values and sort them
unique_scores = pd.unique(scores)
print("n=== Unique Values (Unsorted) ===")
print(unique_scores)
# 3. Sorted unique values
unique_sorted = np.unique(unique_scores)
print("n=== Unique Values (Sorted) ===")
print(unique_sorted)
# 4. Count occurrences of each unique value
print("n=== Unique Values and Frequencies ===")
value_counts = pd.Series(scores).value_counts()
print(value_counts)
# 5. Get statistical information for unique values
print("n=== Basic Statistics ===")
print(f"Number of unique values: {len(unique_scores)}")
print(f"Minimum: {unique_scores.min()}")
print(f"Maximum: {unique_scores.max()}")
print(f"Average: {unique_scores.mean():.2f}")
**Expected Output:**
>--- Original Scores ---0 851 902 7885, 90, 78 each appear 2 times, 88, 95 each appear 1 time.=== Unique Values and Frequencies ===85 290 278 288 195 1 dtype: int=== Basic Statistics ===Number of unique values: 5Minimum: 78Maximum: 95Average: 86.40
**Code Analysis:**
* You can use `np.unique()` to sort the results.
* Combining with `value_counts()` allows you to see the occurrence count of each unique value.
* The unique values array can perform
YouTip