Python Median Array
## Finding the Median of an Array in Python
The median is a fundamental statistical measure that represents the middle value of a dataset. When the dataset is sorted in ascending or descending order:
* If the number of elements is **odd**, the median is the single middle element.
* If the number of elements is **even**, the median is the average (mean) of the two middle elements.
This tutorial demonstrates how to calculate the median of an array (or list) in Python using custom logic, Python's built-in libraries, and external scientific computing packages.
---
## 1. Custom Implementation (Pure Python)
To find the median without importing external libraries, you must first sort the array, determine its length, and apply the median formula based on whether the length is odd or even.
### Code Example
```python
def find_median(nums):
# 1. Sort the array in ascending order
nums_sorted = sorted(nums)
n = len(nums_sorted)
# 2. Find the floor division index of the middle element
mid = n // 2
# 3. Check if the array length is odd or even
if n % 2 == 1:
# Odd length: return the middle element
return nums_sorted
else:
# Even length: return the average of the two middle elements
return (nums_sorted + nums_sorted) / 2
# Example 1: Odd number of elements
nums_odd = [3, 5, 1, 4, 2]
median_odd = find_median(nums_odd)
print(f"The median of {nums_odd} is: {median_odd}")
# Example 2: Even number of elements
nums_even = [3, 5, 1, 4, 2, 6]
median_even = find_median(nums_even)
print(f"The median of {nums_even} is: {median_even}")
```
### Output
```text
The median of [3, 5, 1, 4, 2] is: 3
The median of [3, 5, 1, 4, 2, 6] is: 3.5
```
### Code Explanation
1. **`sorted(nums)`**: Returns a new sorted list from the elements of the input array. Sorting is a prerequisite because the median is defined only on ordered data.
2. **`n = len(nums_sorted)`**: Retrieves the total number of elements.
3. **`mid = n // 2`**: Uses floor division to find the middle index. For an odd length of `5`, `5 // 2` is `2` (the 3rd element). For an even length of `6`, `6 // 2` is `3` (the 4th element).
4. **Odd Length Check (`n % 2 == 1`)**: Returns the element at index `mid`.
5. **Even Length Check**: Calculates the average of the elements at index `mid - 1` and `mid`.
---
## 2. Using Python's Built-in `statistics` Module
Python 3.4 and later includes a built-in `statistics` module that provides a highly optimized and readable way to calculate the median.
### Code Example
```python
import statistics
nums = [3, 5, 1, 4, 2, 6]
# Calculate the standard median
median_val = statistics.median(nums)
print(f"Standard Median: {median_val}")
# Additional variations provided by the statistics module:
# returns the low median if even
median_low = statistics.median_low(nums)
# returns the high median if even
median_high = statistics.median_high(nums)
print(f"Low Median: {median_low}")
print(f"High Median: {median_high}")
```
### Output
```text
Standard Median: 3.5
Low Median: 3
High Median: 4
```
---
## 3. Using NumPy (For Large Datasets)
For data science, machine learning, or large-scale numerical computations, the **NumPy** library is the industry standard. It is implemented in C, making it significantly faster than pure Python loops for large arrays.
### Code Example
```python
import numpy as np
nums = [3, 5, 1, 4, 2, 6]
# Convert list to a NumPy array and calculate the median
median_val = np.median(nums)
print(f"NumPy Median: {median_val}")
```
### Output
```text
NumPy Median: 3.5
```
---
## Considerations & Best Practices
| Method | Complexity | Best Used For | Dependencies |
| :--- | :--- | :--- | :--- |
| **Custom Function** | $O(N \log N)$ | Learning algorithms / interview preparation | None (Standard Library) |
| **`statistics.median`** | $O(N \log N)$ | General-purpose Python applications | None (Standard Library) |
| **`numpy.median`** | $O(N \log N)$ | Data analysis, scientific computing, large datasets | `numpy` package |
* **Performance**: All three methods require sorting the array, which results in a time complexity of $O(N \log N)$. However, NumPy is highly optimized for memory and execution speed when handling millions of data points.
* **In-place Sorting**: The custom implementation uses `sorted(nums)`, which creates a copy of the list. If you want to save memory and do not mind modifying the original list, you can use `nums.sort()` instead.
YouTip