YouTip LogoYouTip

Pandas Stock

Pandas Stock Data Analysis | Rookie Tutorial

Pandas Stock Data Analysis

In stock data analysis, pandas is a very powerful tool that can help us process and analyze stock market data. In this chapter, we use yfinance (Yahoo Finance library) to download historical stock data and perform various analyses, including data cleaning, visualization, technical indicator calculations, etc. yfinance is a Python library that makes it easy to obtain historical and real-time data for assets such as stocks, funds, and cryptocurrencies from Yahoo Finance. Using pandas, we can store this data as a DataFrame and perform subsequent analysis. * **Data Cleaning**: Handling missing values, removing unnecessary columns, etc. * **Data Visualization**: Plotting time series charts of stocks, moving averages, RSI, etc. * **Technical Indicator Calculation**: Such as Simple Moving Average (SMA), Relative Strength Index (RSI), etc. * **Daily Return and Cumulative Return Analysis**: Helps evaluate the short-term and long-term performance of stocks. * **Volatility Analysis**: Measures the price volatility of stocks. For more financial libraries and quantitative analysis, please refer to: (#) ### Install yfinance First, we need to install the yfinance library. The method is as follows: pip install yfinance --upgrade --no-cache-dir When installing yfinance, pandas is usually automatically installed as a dependency. This means that when using yfinance, you can directly utilize the data structures and functions provided by pandas to process and analyze data. Import the required libraries: import yfinance as yf import pandas as pd import matplotlib.pyplot as plt import seaborn as sns ### Get Stock Data Using the yfinance library, we can easily download stock data. We typically use the `yf.download()` function to obtain historical data for a stock from Yahoo Finance. The stock code for Moutai is 600519.SS, where .SS is the suffix for the Shanghai Stock Exchange. Using yfinance to get stock data: ## Example import yfinance as yf # Get Moutai (600519.SS) stock data, date range from 2020-01-01 to 2021-01-01 stock_data = yf.download('600519.SS', start='2020-01-01', end='2021-01-01', auto_adjust=False, progress=False) # View the first few rows of data print(stock_data.head()) The output data is as follows: !(#) The returned data contains the following columns: * `Open`: Opening price * `High`: Highest price * `Low`: Lowest price * `Close`: Closing price * `Adj Close`: Adjusted closing price (considers factors like dividends, stock splits, etc.) * `Volume`: Trading volume The `yf.download()` function returns a pandas.DataFrame containing the historical data of the specified stock: ## Example import yfinance as yf # Get Moutai stock (600519.SS) data stock_data = yf.download('600519.SS', start='2020-01-01', end='2021-01-01', auto_adjust=False, progress=False) # Check data type print(type(stock_data))# Should output Output: [*********************100%%**********************] 1 of 1 completed * * * ## Data Cleaning and Processing When analyzing stock data, we usually need to perform some data cleaning and processing. Common steps include filling missing values, deleting irrelevant columns, data type conversion, etc. In some versions of yfinance, pandas is automatically introduced, so there's no need to import it again. Therefore, when using yfinance, you can directly utilize the data structures and functions provided by pandas to process and analyze data. Check for missing values and fill them: ## Example import yfinance as yf # Get Moutai (600519.SS) stock data, date range from 2020-01-01 to 2021-01-01 stock_data = yf.download('600519.SS', start='2020-01-01', end='2021-01-01', auto_adjust=False, progress=False) # Check for missing values print(stock_data.isnull().sum()) # Replace missing values using forward fill stock_data.ffill(inplace=True) # Or use backward fill # stock_data.bfill(inplace=True) # Check if missing values have been handled print(stock_data.isnull().sum()) The output result is as follows: [*********************100%%**********************] 1 of 1 completed Open 0High 0Low 0Close 0Adj Close 0Volume 0 dtype: int64 Open 0High 0Low 0Close 0Adj Close 0Volume 0 Delete irrelevant columns: ## Example import yfinance as yf # Get Moutai (600519.SS) stock data, date range from 2020-01-01 to 2021-01-01 stock_data = yf.download('600519.SS', start='2020-01-01', end='2021-01-01', auto_adjust=False, progress=False) # Delete the "Volume" and "Adj Close" columns stock_data_cleaned = stock_data.drop(columns=['Adj Close','Volume']) print(stock_data_cleaned.head()) The output result is as follows: !(#) * * * ## Data Visualization: Plotting Stock Price Curve Using matplotlib or seaborn, we can visualize stock data to help us identify trends and fluctuations. Plot a time series chart of the closing price: ## Example import yfinance as yf import matplotlib.pyplot as plt # Get Moutai (600519.SS) stock data, date range from 2020-01-01 to 2021-01-01 stock_data = yf.download('600519.SS', start='2020-01-01', end='2021-01-01', auto_adjust=False, progress=False) # Delete the "Volume" and "Adj Close" columns stock_data_cleaned = stock_data.drop(columns=['Adj Close','Volume']) # Plot the Moutai closing price curve plt.figure(figsize=(10,6)) plt.plot(stock_data_cleaned['Close'], label='Close Price') plt.title('Maotai Stock Price (2020)', fontsize=14) plt.xlabel('Date', fontsize=12) plt.ylabel('Close Price (CNY)', fontsize=12) plt.legend() plt.grid(True) plt.show() Output as follows: !(#) * * * ## Calculating Stock Technical Indicators In stock analysis, technical indicators (such as moving averages, Relative Strength Index RSI, etc.) are often used to assist in decision-making, and pandas can help us calculate these indicators. ### 1. Simple Moving Average (SMA) The Simple Moving Average (SMA) is one of the most commonly used technical indicators, representing the average closing price over the past N days. ## Example import yfinance as yf import matplotlib.pyplot as plt # Get Moutai (600519.SS) stock data, date range from 2020-01-01 to 2021-01-01 stock_data = yf.download('600519.SS', start='2020-01-01', end='2021-01-01', auto_adjust=False, progress=False) # Delete the "Volume" and "Adj Close" columns stock_data_cleaned = stock_data.drop(columns=['Adj Close','Volume']) # Calculate 50-day and 200-day moving averages stock_data_cleaned['SMA_50']= stock_data_cleaned['Close'].rolling(window=50).mean() stock_data_cleaned['SMA_200']= stock_data_cleaned['Close'].rolling(window=200).mean() # Plot closing price and moving averages plt.figure(figsize=(12,6)) plt.plot(stock_data_cleaned['Close'], label='Close Price') plt.plot(stock_data_cleaned['SMA_50'], label='50-Day SMA') plt.plot(stock_data_cleaned['SMA_200'], label='200-Day SMA') plt.title('Maotai Stock Price with Moving Averages', fontsize=14) plt.xlabel('Date', fontsize=12) plt.ylabel('Price (CNY)', fontsize=12) plt.legend() plt.grid(True) plt.show() Output as follows: !(#) ### 2. Relative Strength Index (RSI) RSI is a technical indicator used to evaluate whether a stock is overbought or oversold. Generally, an RSI greater than 70 indicates overbought conditions, and less than 30 indicates oversold conditions. ## Example import yfinance as yf import matplotlib.pyplot as plt # Get Moutai (600519.SS) stock data, date range from 2020-01-01 to 2021-01-01 stock_data = yf.download('600519.SS', start='2020-01-01', end='2021-01-01', auto_adjust=False, progress=False) # Delete the "Volume" and "Adj Close" columns stock_data_cleaned = stock_data.drop(columns=['Adj Close','Volume']) # Calculate RSI indicator delta = stock_data_cleaned['Close'].diff(1) gain = delta.where(delta >0,0) loss = -delta.where(delta <0,0) # Calculate average gain and loss avg_gain = gain.rolling(window=14).mean() avg_loss = loss.rolling(window=14).mean() # Calculate Relative Strength Index RSI rs = avg_gain / avg_loss rsi =100 - (100 / (1 + rs)) # Add RSI to the data stock_data_cleaned['RSI']= rsi # Plot the RSI curve plt.figure(figsize=(12,6)) plt.plot(stock_data_cleaned['RSI'], label='RSI') plt.axhline(y=70, color='r', linestyle='--', label='Overbought (70)') plt.axhline(y=30, color='g', linestyle='--', label='Oversold (30)') plt.title('RSI Indicator for Maotai Stock', fontsize=14) plt.xlabel('Date', fontsize=12) plt.ylabel('RSI', fontsize=12) plt.legend() plt.grid(True) plt.show() Output as follows: !(#) * * * ## Applications of Stock Data Analysis In practical stock data analysis, Pandas can be used for the following common operations: ### 1. Daily Return and Cumulative Return Calculating the daily return and cumulative return of a stock helps evaluate its long-term performance. ## Example import yfinance as yf import matplotlib.pyplot as plt # Get Moutai (600519.SS) stock data, date range from 2020-01-01 to 2021-01-01 stock_data = yf.download('600519.SS', start='2020-01-01', end='2021-01-01', auto_adjust=False, progress=False) # Delete the "Volume" and "Adj Close" columns stock_data_cleaned = stock_data.drop(columns=['Adj Close','Volume']) # Calculate daily return stock_data_cleaned['Daily_Return']= stock_data_cleaned['Close'].pct_change() # Calculate cumulative return stock_data_cleaned['Cumulative_Return']=(1 + stock_data_cleaned['Daily_Return']).cumprod() # Plot cumulative return plt.figure(figsize=(10,6)) plt.plot(stock_data_cleaned['Cumulative_Return'], label='Cumulative Return') plt.title('Cumulative Return of Maotai Stock (2020)', fontsize=14) plt.xlabel('Date', fontsize=12) plt.ylabel('Cumulative Return', fontsize=12) plt.legend() plt.grid(True) plt.show() Output as follows: !(#) ### 2. Stock Volatility Volatility is an indicator that measures the fluctuation of stock prices. Typically, we can use the standard deviation of returns to measure stock volatility. ## Example import yfinance as yf import matplotlib.pyplot as plt # Get Moutai (600519.SS) stock data, date range from 2020-01-01 to 2021-01-01 stock_data = yf.download('600519.SS', start='2020-01-01', end='2021-01-01', auto_adjust=False, progress=False) # Delete the "Volume" and "Adj Close" columns stock_data_cleaned = stock_data.drop(columns=['Adj Close','Volume']) # Calculate daily return stock_data_cleaned['Daily_Return']= stock_data_cleaned['Close'].pct_change() # Calculate cumulative return stock_data_cleaned['Cumulative_Return']=(1 + stock_data_cleaned['Daily_Return']).cumprod() # Calculate the standard deviation of daily returns (volatility) volatility = stock_data_cleaned['Daily_Return'].std() # Display volatility print(f"Daily Volatility: {volatility:.4f}") Output as follows: [*********************100%%**********************] 1 of 1 completed Daily Volatility: 0.0181
← Ml TutorialPandas Advanced β†’