Pandas Stock
π
2026-06-22 | π Pandas
Pandas Stock Data Analysis | Rookie Tutorial
Pandas Stock Data Analysis
In stock data analysis, pandas is a very powerful tool that can help us process and analyze stock market data.
In this chapter, we use yfinance (Yahoo Finance library) to download historical stock data and perform various analyses, including data cleaning, visualization, technical indicator calculations, etc.
yfinance is a Python library that makes it easy to obtain historical and real-time data for assets such as stocks, funds, and cryptocurrencies from Yahoo Finance.
Using pandas, we can store this data as a DataFrame and perform subsequent analysis.
* **Data Cleaning**: Handling missing values, removing unnecessary columns, etc.
* **Data Visualization**: Plotting time series charts of stocks, moving averages, RSI, etc.
* **Technical Indicator Calculation**: Such as Simple Moving Average (SMA), Relative Strength Index (RSI), etc.
* **Daily Return and Cumulative Return Analysis**: Helps evaluate the short-term and long-term performance of stocks.
* **Volatility Analysis**: Measures the price volatility of stocks.
For more financial libraries and quantitative analysis, please refer to: (#)
### Install yfinance
First, we need to install the yfinance library. The method is as follows:
pip install yfinance --upgrade --no-cache-dir
When installing yfinance, pandas is usually automatically installed as a dependency. This means that when using yfinance, you can directly utilize the data structures and functions provided by pandas to process and analyze data.
Import the required libraries:
import yfinance as yf import pandas as pd import matplotlib.pyplot as plt import seaborn as sns
### Get Stock Data
Using the yfinance library, we can easily download stock data.
We typically use the `yf.download()` function to obtain historical data for a stock from Yahoo Finance.
The stock code for Moutai is 600519.SS, where .SS is the suffix for the Shanghai Stock Exchange.
Using yfinance to get stock data:
## Example
import yfinance as yf
# Get Moutai (600519.SS) stock data, date range from 2020-01-01 to 2021-01-01
stock_data = yf.download('600519.SS', start='2020-01-01', end='2021-01-01', auto_adjust=False, progress=False)
# View the first few rows of data
print(stock_data.head())
The output data is as follows:
!(#)
The returned data contains the following columns:
* `Open`: Opening price
* `High`: Highest price
* `Low`: Lowest price
* `Close`: Closing price
* `Adj Close`: Adjusted closing price (considers factors like dividends, stock splits, etc.)
* `Volume`: Trading volume
The `yf.download()` function returns a pandas.DataFrame containing the historical data of the specified stock:
## Example
import yfinance as yf
# Get Moutai stock (600519.SS) data
stock_data = yf.download('600519.SS', start='2020-01-01', end='2021-01-01', auto_adjust=False, progress=False)
# Check data type
print(type(stock_data))# Should output
Output:
[*********************100%%**********************] 1 of 1 completed
* * *
## Data Cleaning and Processing
When analyzing stock data, we usually need to perform some data cleaning and processing.
Common steps include filling missing values, deleting irrelevant columns, data type conversion, etc.
In some versions of yfinance, pandas is automatically introduced, so there's no need to import it again. Therefore, when using yfinance, you can directly utilize the data structures and functions provided by pandas to process and analyze data.
Check for missing values and fill them:
## Example
import yfinance as yf
# Get Moutai (600519.SS) stock data, date range from 2020-01-01 to 2021-01-01
stock_data = yf.download('600519.SS', start='2020-01-01', end='2021-01-01', auto_adjust=False, progress=False)
# Check for missing values
print(stock_data.isnull().sum())
# Replace missing values using forward fill
stock_data.ffill(inplace=True)
# Or use backward fill
# stock_data.bfill(inplace=True)
# Check if missing values have been handled
print(stock_data.isnull().sum())
The output result is as follows:
[*********************100%%**********************] 1 of 1 completed Open 0High 0Low 0Close 0Adj Close 0Volume 0 dtype: int64 Open 0High 0Low 0Close 0Adj Close 0Volume 0
Delete irrelevant columns:
## Example
import yfinance as yf
# Get Moutai (600519.SS) stock data, date range from 2020-01-01 to 2021-01-01
stock_data = yf.download('600519.SS', start='2020-01-01', end='2021-01-01', auto_adjust=False, progress=False)
# Delete the "Volume" and "Adj Close" columns
stock_data_cleaned = stock_data.drop(columns=['Adj Close','Volume'])
print(stock_data_cleaned.head())
The output result is as follows:
!(#)
* * *
## Data Visualization: Plotting Stock Price Curve
Using matplotlib or seaborn, we can visualize stock data to help us identify trends and fluctuations.
Plot a time series chart of the closing price:
## Example
import yfinance as yf
import matplotlib.pyplot as plt
# Get Moutai (600519.SS) stock data, date range from 2020-01-01 to 2021-01-01
stock_data = yf.download('600519.SS', start='2020-01-01', end='2021-01-01', auto_adjust=False, progress=False)
# Delete the "Volume" and "Adj Close" columns
stock_data_cleaned = stock_data.drop(columns=['Adj Close','Volume'])
# Plot the Moutai closing price curve
plt.figure(figsize=(10,6))
plt.plot(stock_data_cleaned['Close'], label='Close Price')
plt.title('Maotai Stock Price (2020)', fontsize=14)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Close Price (CNY)', fontsize=12)
plt.legend()
plt.grid(True)
plt.show()
Output as follows:
!(#)
* * *
## Calculating Stock Technical Indicators
In stock analysis, technical indicators (such as moving averages, Relative Strength Index RSI, etc.) are often used to assist in decision-making, and pandas can help us calculate these indicators.
### 1. Simple Moving Average (SMA)
The Simple Moving Average (SMA) is one of the most commonly used technical indicators, representing the average closing price over the past N days.
## Example
import yfinance as yf
import matplotlib.pyplot as plt
# Get Moutai (600519.SS) stock data, date range from 2020-01-01 to 2021-01-01
stock_data = yf.download('600519.SS', start='2020-01-01', end='2021-01-01', auto_adjust=False, progress=False)
# Delete the "Volume" and "Adj Close" columns
stock_data_cleaned = stock_data.drop(columns=['Adj Close','Volume'])
# Calculate 50-day and 200-day moving averages
stock_data_cleaned['SMA_50']= stock_data_cleaned['Close'].rolling(window=50).mean()
stock_data_cleaned['SMA_200']= stock_data_cleaned['Close'].rolling(window=200).mean()
# Plot closing price and moving averages
plt.figure(figsize=(12,6))
plt.plot(stock_data_cleaned['Close'], label='Close Price')
plt.plot(stock_data_cleaned['SMA_50'], label='50-Day SMA')
plt.plot(stock_data_cleaned['SMA_200'], label='200-Day SMA')
plt.title('Maotai Stock Price with Moving Averages', fontsize=14)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Price (CNY)', fontsize=12)
plt.legend()
plt.grid(True)
plt.show()
Output as follows:
!(#)
### 2. Relative Strength Index (RSI)
RSI is a technical indicator used to evaluate whether a stock is overbought or oversold. Generally, an RSI greater than 70 indicates overbought conditions, and less than 30 indicates oversold conditions.
## Example
import yfinance as yf
import matplotlib.pyplot as plt
# Get Moutai (600519.SS) stock data, date range from 2020-01-01 to 2021-01-01
stock_data = yf.download('600519.SS', start='2020-01-01', end='2021-01-01', auto_adjust=False, progress=False)
# Delete the "Volume" and "Adj Close" columns
stock_data_cleaned = stock_data.drop(columns=['Adj Close','Volume'])
# Calculate RSI indicator
delta = stock_data_cleaned['Close'].diff(1)
gain = delta.where(delta >0,0)
loss = -delta.where(delta <0,0)
# Calculate average gain and loss
avg_gain = gain.rolling(window=14).mean()
avg_loss = loss.rolling(window=14).mean()
# Calculate Relative Strength Index RSI
rs = avg_gain / avg_loss
rsi =100 - (100 / (1 + rs))
# Add RSI to the data
stock_data_cleaned['RSI']= rsi
# Plot the RSI curve
plt.figure(figsize=(12,6))
plt.plot(stock_data_cleaned['RSI'], label='RSI')
plt.axhline(y=70, color='r', linestyle='--', label='Overbought (70)')
plt.axhline(y=30, color='g', linestyle='--', label='Oversold (30)')
plt.title('RSI Indicator for Maotai Stock', fontsize=14)
plt.xlabel('Date', fontsize=12)
plt.ylabel('RSI', fontsize=12)
plt.legend()
plt.grid(True)
plt.show()
Output as follows:
!(#)
* * *
## Applications of Stock Data Analysis
In practical stock data analysis, Pandas can be used for the following common operations:
### 1. Daily Return and Cumulative Return
Calculating the daily return and cumulative return of a stock helps evaluate its long-term performance.
## Example
import yfinance as yf
import matplotlib.pyplot as plt
# Get Moutai (600519.SS) stock data, date range from 2020-01-01 to 2021-01-01
stock_data = yf.download('600519.SS', start='2020-01-01', end='2021-01-01', auto_adjust=False, progress=False)
# Delete the "Volume" and "Adj Close" columns
stock_data_cleaned = stock_data.drop(columns=['Adj Close','Volume'])
# Calculate daily return
stock_data_cleaned['Daily_Return']= stock_data_cleaned['Close'].pct_change()
# Calculate cumulative return
stock_data_cleaned['Cumulative_Return']=(1 + stock_data_cleaned['Daily_Return']).cumprod()
# Plot cumulative return
plt.figure(figsize=(10,6))
plt.plot(stock_data_cleaned['Cumulative_Return'], label='Cumulative Return')
plt.title('Cumulative Return of Maotai Stock (2020)', fontsize=14)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Cumulative Return', fontsize=12)
plt.legend()
plt.grid(True)
plt.show()
Output as follows:
!(#)
### 2. Stock Volatility
Volatility is an indicator that measures the fluctuation of stock prices.
Typically, we can use the standard deviation of returns to measure stock volatility.
## Example
import yfinance as yf
import matplotlib.pyplot as plt
# Get Moutai (600519.SS) stock data, date range from 2020-01-01 to 2021-01-01
stock_data = yf.download('600519.SS', start='2020-01-01', end='2021-01-01', auto_adjust=False, progress=False)
# Delete the "Volume" and "Adj Close" columns
stock_data_cleaned = stock_data.drop(columns=['Adj Close','Volume'])
# Calculate daily return
stock_data_cleaned['Daily_Return']= stock_data_cleaned['Close'].pct_change()
# Calculate cumulative return
stock_data_cleaned['Cumulative_Return']=(1 + stock_data_cleaned['Daily_Return']).cumprod()
# Calculate the standard deviation of daily returns (volatility)
volatility = stock_data_cleaned['Daily_Return'].std()
# Display volatility
print(f"Daily Volatility: {volatility:.4f}")
Output as follows:
[*********************100%%**********************] 1 of 1 completed Daily Volatility: 0.0181