YouTip LogoYouTip

Pandas Pd Unique

[![Image 1: Pandas Common Functions](#) Pandas Common Functions](#) * * * `pd.unique()` is a function in the Pandas library used to **obtain unique values in an array**. It returns all non-repeating values in the input array, removing duplicates. This is a common operation in data analysis, such as counting how many different values are in a column, or getting all categories of a categorical variable. **Word Definition**: `unique` means "only, distinctive", and here refers to returning non-repeating values in an array. * * * ## Basic Syntax and Parameters `pd.unique()` is a top-level function in the Pandas library used to extract unique values from an array. ### Syntax Format pd.unique(values) ### Parameter Description * **Parameter**: `values` * Type: Array-like object, such as list, Series, one-dimensional array, etc. * Description: Input data to extract unique values from. Can be any one-dimensional array structure. ### Function Description * **Return Value**: Returns an ndarray (NumPy array) containing all unique values. * **Effect**: Removes duplicate values from the input data, keeping each value only once. * * * ## Examples Let's thoroughly master the usage of `pd.unique()` through a series of examples from simple to complex. ### Example 1: Basic Usage - Extracting Unique Values from Series ## Example import pandas as pd import numpy as np # 1. Create a Series with duplicate values colors = pd.Series(['red','blue','green','red','blue','yellow','red']) print("=== Original Series ===") print(colors) # 2. Use pd.unique() to get unique values unique_values = pd.unique(colors) print("n=== pd.unique() Unique Values ===") print(unique_values) print(f"n Number of unique values: {len(unique_values)}") **Expected Output:** === Original Series ===0 red 1 blue 2 green 3 red 4 blue 5 yellow 6 red dtype: object=== pd.unique() Unique Values ===['red' 'blue' 'green' 'yellow']Number of unique values: 4 **Code Analysis:** * The original Series has 7 elements, but only 4 unique values. * `pd.unique()` returns a NumPy array containing all non-repeating values. * The order of the returned result is not guaranteed (but usually follows the order of first appearance). ### Example 2: Extracting Unique Values from Lists and Arrays `pd.unique()` can not only handle Series but also various array structures. ## Example import pandas as pd import numpy as np # 1. Extract unique values from a list numbers =[1,2,3,2,1,4,5,3,2] print("=== Original List ===") print(numbers) unique_numbers = pd.unique(numbers) print("n=== Unique Values from List ===") print(unique_numbers) print(f"Type: {type(unique_numbers)}") # 2. Extract unique values from NumPy array arr = np.array(['a','b','a','c','b','d']) print("n=== NumPy Array ===") print(arr) print("Unique values:", pd.unique(arr)) # 3. Extract unique values from DataFrame column df = pd.DataFrame({ 'name': ['Alice','Bob','Charlie','Alice','Diana'], 'city': ['Beijing','Shanghai','Beijing','Beijing','Guangzhou'] }) print("n=== DataFrame ===") print(df) print("n Unique values in name column:", pd.unique(df['name'])) print("Unique values in city column:", pd.unique(df['city'])) **Expected Output:** === Original List ===[1, 2, 3, 2, 1, 4, 5, 3, 2]=== Unique Values from List ===Type: === NumPy Array ===['a' 'b' 'a' 'c' 'b' 'd']Unique values: ['a' 'b' 'c' 'd']=== DataFrame === name city 0 Alice Beijing1 Bob Shanghai2 Charlie Beijing3 Alice Beijing4 Diana Guangzhong=== Unique values in name column: ['Alice' 'Bob' 'Charlie' 'Diana']=== Unique values in city column: ['Beijing' 'Shanghai', 'Guangzhou'] **Code Analysis:** * `pd.unique()` can handle Python lists, NumPy arrays, and DataFrame columns. * The returned result is always a one-dimensional NumPy array. * When processing DataFrame, you need to specify the specific column (using bracket syntax). ### Example 3: Handling Numeric Data and Sorting When handling numeric unique values, sorting and statistical analysis can be done conveniently. ## Example import pandas as pd import numpy as np # 1. Series with duplicate numeric values scores = pd.Series([85,90,78,85,92,90,78,88,95]) print("=== Original Scores ===") print(scores) # 2. Get unique values and sort them unique_scores = pd.unique(scores) print("n=== Unique Values (Unsorted) ===") print(unique_scores) # 3. Sorted unique values unique_sorted = np.unique(unique_scores) print("n=== Unique Values (Sorted) ===") print(unique_sorted) # 4. Count occurrences of each unique value print("n=== Unique Values and Frequencies ===") value_counts = pd.Series(scores).value_counts() print(value_counts) # 5. Get statistical information for unique values print("n=== Basic Statistics ===") print(f"Number of unique values: {len(unique_scores)}") print(f"Minimum: {unique_scores.min()}") print(f"Maximum: {unique_scores.max()}") print(f"Average: {unique_scores.mean():.2f}") **Expected Output:** >--- Original Scores ---0 851 902 7885, 90, 78 each appear 2 times, 88, 95 each appear 1 time.=== Unique Values and Frequencies ===85 290 278 288 195 1 dtype: int=== Basic Statistics ===Number of unique values: 5Minimum: 78Maximum: 95Average: 86.40 **Code Analysis:** * You can use `np.unique()` to sort the results. * Combining with `value_counts()` allows you to see the occurrence count of each unique value. * The unique values array can perform
← Pandas Pd FactorizePandas Pd To Numeric β†’