Ref Stat Median
## Python statistics.median() Method
The `statistics.median()` method is part of Python's built-in `statistics` module. It is used to calculate the median (middle value) of a given numeric dataset.
Unlike the mean (average), the median is highly robust and less sensitive to outliers or skewed data, making it an excellent measure of central tendency for many real-world datasets.
---
### Syntax
```python
statistics.median(data)
```
#### Parameters
* **`data`**: An iterable containing numeric data (such as a `list`, `tuple`, or `range`). The data does not need to be sorted beforehand; the method handles sorting automatically.
#### Return Value
* Returns the median of the dataset as a `float` or `int` (depending on the input values and whether the dataset size is odd or even).
* If the dataset is empty, it raises a `StatisticsError`.
---
### How the Median is Calculated
* **Odd number of elements**: The dataset is sorted, and the exact middle value is returned.
* **Even number of elements**: The dataset is sorted, and the average (mean) of the two middle values is calculated and returned.
---
### Code Examples
#### Example 1: Calculating the Median of an Odd-Sized Dataset
When the dataset has an odd number of elements, the method returns the exact middle value.
```python
import statistics
# Dataset with 5 elements (odd)
data = [1, 5, 3, 2, 4]
# Calculate the median
median_val = statistics.median(data)
print("Dataset:", data)
print("Median:", median_val)
```
**Output:**
```text
Dataset: [1, 5, 3, 2, 4]
Median: 3
```
*Explanation: The sorted dataset is `[1, 2, 3, 4, 5]`. The middle element is `3`.*
---
#### Example 2: Calculating the Median of an Even-Sized Dataset
When the dataset has an even number of elements, the method interpolates by taking the average of the two middle values.
```python
import statistics
# Dataset with 6 elements (even)
data = [10, 20, 30, 40, 50, 100]
# Calculate the median
median_val = statistics.median(data)
print("Dataset:", data)
print("Median:", median_val)
```
**Output:**
```text
Dataset: [10, 20, 30, 40, 50, 100]
Median: 35.0
```
*Explanation: The two middle values are `30` and `40`. Their average is `(30 + 40) / 2 = 35.0`.*
---
#### Example 3: Handling Outliers (Median vs. Mean)
This example demonstrates why the median is preferred over the mean when dealing with extreme outliers.
```python
import statistics
# A dataset representing salaries, with one extreme outlier (1,000,000)
salaries = [3000, 3200, 3500, 4000, 4200, 1000000]
mean_val = statistics.mean(salaries)
median_val = statistics.median(salaries)
print(f"Mean Salary: {mean_val:.2f}")
print(f"Median Salary: {median_val:.2f}")
```
**Output:**
```text
Mean Salary: 169650.00
Median Salary: 3750.00
```
*Explanation: The mean is heavily skewed by the outlier (`1,000,000`), making it unrepresentative of the typical salary. The median (`3,750.00`) provides a much more accurate representation of the central tendency.*
---
### Considerations & Best Practices
1. **Empty Inputs**: If you pass an empty iterable to `statistics.median()`, it will raise a `statistics.StatisticsError`. Always ensure your dataset is not empty before calling the function:
```python
if data:
result = statistics.median(data)
else:
result = None # Or handle the empty case appropriately
```
2. **Alternative Median Functions**:
* **`statistics.median_low()`**: If the dataset has an even number of elements, it returns the smaller of the two middle values instead of their average.
* **`statistics.median_high()`**: If the dataset has an even number of elements, it returns the larger of the two middle values instead of their average.
* **`statistics.median_grouped()`**: Used for grouped continuous data, calculating the 50th percentile using interpolation.
YouTip