Pandas Series Std
[ Pandas Common Functions](#)
* * *
`Series.std()` is a function in Pandas used to calculate the standard deviation of a Series. Standard deviation is a measure of data dispersion, representing the average degree of deviation between data points and their mean.
A larger standard deviation indicates more dispersed data; a smaller standard deviation indicates more concentrated data. It is frequently used in quality control, risk assessment, exam score analysis, and other scenarios.
* * *
## Basic Syntax and Parameters
`std()` is a member function of the Series object, called directly through the dot operator.
### Syntax Format
Series.std(axis=None, skipna=True, level=None, numeric_only=None, ddof=1, **kwargs)
### Parameter Description
| Parameter | Type | Description | Default Value |
| --- | --- | --- | --- |
| axis | int | Specifies the axis. Series only has one row of data; this parameter is mainly for compatibility with DataFrame. | None |
| skipna | bool | If True, skip NaN values during calculation; if False, NaN result will be returned when encountering NaN. | True |
| level | int or str | If Series is a MultiIndex, specifies the level to calculate. | None |
| numeric_only | bool | If True, only calculate for numeric data; otherwise try to convert to numeric. | False |
| ddof | int | Degree of freedom adjustment parameter. ddof=1 uses sample standard deviation (n-1), ddof=0 uses population standard deviation (n). | 1 |
### Return Value
* **Return Type**: `float`
* **Description**: Returns the standard deviation of elements in the Series. Default uses sample standard deviation (divided by n-1).
* * *
## Examples
Let's thoroughly master the usage of `Series.std()` through a series of examples from simple to complex.
### Example 1: Basic Usage - Understanding the Concept of Standard Deviation
Standard deviation measures the dispersion of data; larger values indicate more dispersed data.
## Example
import pandas as pd
import numpy as np
# Two groups of score data
# Group A: scores are relatively concentrated
group_a = pd.Series([85,86,87,88,89])
# Group B: scores are relatively dispersed
group_b = pd.Series([70,75,85,95,100])
print("Group A scores (more concentrated):")
print(group_a)
print(f"Mean: {group_a.mean():.2f}")
print(f"Standard deviation: {group_a.std():.2f}")
print()
print("Group B scores (more dispersed):")
print(group_b)
print(f"Mean: {group_b.mean():.2f}")
print(f"Standard deviation: {group_b.std():.2f}")
print()
print("Analysis: Although both groups have the same mean (85), Group B has a larger standard deviation, indicating greater score differences.")
**Output:**
Group A scores (more concentrated):0 851 862 873 884 89 dtype: int64 Mean:85.00Standard deviation:1.58 Group B scores (more dispersed):0 701 752 853 954 100 dtype: int64 Mean:85.00Standard deviation:12.50Analysis: Although both groups have the same mean (85), Group B has a larger standard deviation, indicating greater score differences.
**Code Analysis:**
* Group A's standard deviation is approximately 1.58, very concentrated.
* Group B's standard deviation is approximately 12.50, much more dispersed.
* This shows that even with the same mean, the data distribution can be completely different.
### Example 2: The Role of ddof Parameter
The `ddof` parameter controls whether to use sample standard deviation or population standard deviation.
Example
import pandas as pd
import numpy as np
# Create a dataset
data = pd.Series([2,4,4,4,5,5,7,9])
print("Data:")
print(data)
print()
# Default ddof=1, uses sample standard deviation (divided by n-1)
sample_std = data.std(ddof=1)
print(f"Sample standard deviation (ddof=1): {sample_std:.4f}")
# ddof=0, uses population standard deviation (divided by n)
population_std = data.std(ddof=0)
print(f"Population standard deviation (ddof=0): {population_std:.4f}")
print()
print("Explanation:")
print("Sample standard deviation = sqrt(sum((x-mean)^2) / (n-1))")
print("Population standard deviation = sqrt(sum((x-mean)^2) / n)")
print("When the data size is large, the difference between the two is very small.")
**Output:**
Data:0 21 42 43 44 55 56 77 9 dtype: int64 Sample standard deviation (ddof=1):2.2678Population standard deviation (ddof=0):2.1213
**Code Analysis:**
* Sample standard deviation (ddof=1) uses n-1 as the divisor, suitable for samples drawn from a population.
* Population standard deviation (ddof=0) uses n as the divisor, suitable for the entire dataset.
* Pandas defaults to ddof=1, i.e., sample standard deviation.
### Example 3: Handling Data with Missing Values
## Example
import pandas as pd
import numpy as np
# Create a Series with missing values
data_with_nan = pd.Series([10,20, np.nan,30,40, np.nan,50])
print("Data with missing values:")
print(data_with_nan)
print()
# Default skipna=True, skip NaN when calculating standard deviation
std_skipna = data_with_nan.std()
print(f"Standard deviation with skipna=True (default): {std_skipna:.4f}")
# Set skipna=False
std_no_skipna = data_with_nan.std(skipna=False)
print(f"Standard deviation with skipna=False: {std_no_skipna}")
**Output:**
Data with missing values:0 10.01 20.02 NaN3 30.04 40.05 NaN6 50.0 dtype: float64 Standard deviation with skipna=True (default):15.8114 Standard deviation with skipna=False:nan### Example 4: Practical Application - Stock Return Volatility Analysis
Standard deviation is commonly used in finance to measure risk.
## Example
import pandas as pd
# Simulate 10 days of daily returns (%) for two stocks
stock_a = pd.Series([1.2,0.8, -0.5,1.5,0.3, -0.2,1.0,0.7, -0.3,0.5])
stock_b = pd.Series([3.5, -2.0,4.2, -1.5,2.8, -3.0,1.2, -0.8,3.0, -2.4])
print("Stock A daily returns (%):")
print(stock_a)
print(f"Average return: {stock_a.mean():.2f}%")
print(f"Volatility (standard deviation): {stock_a.std():.2f}%")
print()
print("Stock B daily returns (%):")
print(stock_b)
print(f"Average return: {stock_b.mean():.2f}%")
print(f"Volatility (standard deviation): {stock_b.std():.2f}%")
print()
print("Analysis:")
print("Stock B has a larger standard deviation, indicating more volatile returns and higher risk.")
print("Although both may have similar average returns, the risk levels are different.")
**Output:**
Stock A daily returns (%):0 1.21 0.82 -0.53 1.54 0.35 -0.26 1.07 0.78 -0.39 0.5 dtype: int64 Average return:0.60%Volatility (standard deviation):0.68%Stock B daily returns (%):0 3.51 -2.02 4.23 -1.54 2.85 -3.06 1.27 -0.88 3.09 -2.4 dtype: int64 Average return:0.60%Volatility (standard deviation):2.59%Analysis: Stock B's volatility is about 4 times that of Stock A, indicating much higher risk.
* * *
## Notes
* Default uses sample standard deviation (ddof=1), suitable for statistical analysis.
* If you need to calculate population standard deviation, set ddof=0.
*
YouTip