YouTip LogoYouTip

Python Median Array

## Finding the Median of an Array in Python The median is a fundamental statistical measure that represents the middle value of a dataset. When the dataset is sorted in ascending or descending order: * If the number of elements is **odd**, the median is the single middle element. * If the number of elements is **even**, the median is the average (mean) of the two middle elements. This tutorial demonstrates how to calculate the median of an array (or list) in Python using custom logic, Python's built-in libraries, and external scientific computing packages. --- ## 1. Custom Implementation (Pure Python) To find the median without importing external libraries, you must first sort the array, determine its length, and apply the median formula based on whether the length is odd or even. ### Code Example ```python def find_median(nums): # 1. Sort the array in ascending order nums_sorted = sorted(nums) n = len(nums_sorted) # 2. Find the floor division index of the middle element mid = n // 2 # 3. Check if the array length is odd or even if n % 2 == 1: # Odd length: return the middle element return nums_sorted else: # Even length: return the average of the two middle elements return (nums_sorted + nums_sorted) / 2 # Example 1: Odd number of elements nums_odd = [3, 5, 1, 4, 2] median_odd = find_median(nums_odd) print(f"The median of {nums_odd} is: {median_odd}") # Example 2: Even number of elements nums_even = [3, 5, 1, 4, 2, 6] median_even = find_median(nums_even) print(f"The median of {nums_even} is: {median_even}") ``` ### Output ```text The median of [3, 5, 1, 4, 2] is: 3 The median of [3, 5, 1, 4, 2, 6] is: 3.5 ``` ### Code Explanation 1. **`sorted(nums)`**: Returns a new sorted list from the elements of the input array. Sorting is a prerequisite because the median is defined only on ordered data. 2. **`n = len(nums_sorted)`**: Retrieves the total number of elements. 3. **`mid = n // 2`**: Uses floor division to find the middle index. For an odd length of `5`, `5 // 2` is `2` (the 3rd element). For an even length of `6`, `6 // 2` is `3` (the 4th element). 4. **Odd Length Check (`n % 2 == 1`)**: Returns the element at index `mid`. 5. **Even Length Check**: Calculates the average of the elements at index `mid - 1` and `mid`. --- ## 2. Using Python's Built-in `statistics` Module Python 3.4 and later includes a built-in `statistics` module that provides a highly optimized and readable way to calculate the median. ### Code Example ```python import statistics nums = [3, 5, 1, 4, 2, 6] # Calculate the standard median median_val = statistics.median(nums) print(f"Standard Median: {median_val}") # Additional variations provided by the statistics module: # returns the low median if even median_low = statistics.median_low(nums) # returns the high median if even median_high = statistics.median_high(nums) print(f"Low Median: {median_low}") print(f"High Median: {median_high}") ``` ### Output ```text Standard Median: 3.5 Low Median: 3 High Median: 4 ``` --- ## 3. Using NumPy (For Large Datasets) For data science, machine learning, or large-scale numerical computations, the **NumPy** library is the industry standard. It is implemented in C, making it significantly faster than pure Python loops for large arrays. ### Code Example ```python import numpy as np nums = [3, 5, 1, 4, 2, 6] # Convert list to a NumPy array and calculate the median median_val = np.median(nums) print(f"NumPy Median: {median_val}") ``` ### Output ```text NumPy Median: 3.5 ``` --- ## Considerations & Best Practices | Method | Complexity | Best Used For | Dependencies | | :--- | :--- | :--- | :--- | | **Custom Function** | $O(N \log N)$ | Learning algorithms / interview preparation | None (Standard Library) | | **`statistics.median`** | $O(N \log N)$ | General-purpose Python applications | None (Standard Library) | | **`numpy.median`** | $O(N \log N)$ | Data analysis, scientific computing, large datasets | `numpy` package | * **Performance**: All three methods require sorting the array, which results in a time complexity of $O(N \log N)$. However, NumPy is highly optimized for memory and execution speed when handling millions of data points. * **In-place Sorting**: The custom implementation uses `sorted(nums)`, which creates a copy of the list. If you want to save memory and do not mind modifying the original list, you can use `nums.sort()` instead.
← Python Second LargestPython Square Elements β†’