Pandas Pd Read Excel

[![Image 1: Python math module](#) Pandas Common Functions](#) * * * `read_excel()` is a function in the pandas library for reading Excel files, supporting `.xlsx` and `.xls` format Excel files. Excel is the most commonly used file format in enterprise data analysis, supporting multiple worksheets, rich cell formatting, formulas, etc. `read_excel()` can read data from Excel files and convert it into pandas DataFrame format, facilitating subsequent data processing and analysis. * * * ## Basic Syntax and Parameters ### Syntax Format pandas.read_excel(io, sheet_name=0, header=0, names=None, index_col=None, usecols=None, dtype=None, skiprows=None, nrows=None, na_values=None, ...) ### Parameter Description | Parameter | Type | Description | Default Value | | --- | --- | --- | --- | | io | str, ExcelFile, path object, file-like object | Excel file path or ExcelFile object | Required | | sheet_name | str, int, list, None | Worksheet name or index to read, None reads all worksheets | 0 | | header | int, list of int | Row number to use as column names, 0 means first row | 0 | | names | list-like | Custom column name list | None | | index_col | int, str | Column to use as row index | None | | usecols | int, str, list | Read only specified columns | None | | dtype | dict | Specify data types for columns | None | | skiprows | list-like, int | Skip specified rows | None | | nrows | int | Read only first n rows | None | ### Return Value * **Return Type**: `pd.DataFrame` or `dict` of DataFrames * When `sheet_name` is a single worksheet, returns a DataFrame. * When `sheet_name` is None or contains multiple worksheets, returns a dictionary with worksheet names as keys and corresponding DataFrames as values. * * * ## Examples Through the following examples, fully master various usages of `read_excel()`. ### Example 1: Reading a Local Excel File First create and read a simple Excel file. ## Example import pandas as pd import openpyxl # Need to install: pip install openpyxl # Create a DataFrame data ={ 'name': ['Tom','Jerry','Mike','Lucy'], 'age': [28,35,42,26], 'city': ['Beijing','Shanghai','Guangzhou','Shenzhen'], 'salary': [8000,12000,15000,7000] } df_original = pd.DataFrame(data) # Write DataFrame to Excel file # Excel file path: employees.xlsx df_original.to_excel('employees.xlsx', index=False, engine='openpyxl') # Use read_excel to read Excel file # io: file path (required) df = pd.read_excel('employees.xlsx') # View reading result print("Read DataFrame:") print(df) print("nData types:") print(df.dtypes) **Expected Output:** Read DataFrame: name age city salary 0 Tom 28 Beijing 80001 Jerry 35 Shanghai 120002 Mike 42 Guangzhou 150003 Lucy 26 Shenzhen 7000Data types: name object age int64 city object salary int64 **Code Analysis:** * `pd.read_excel('employees.xlsx')` is the most basic usage, simply pass in the file path. * By default, reads the first worksheet (sheet_name=0), with the first row as column names. * Need to install `openpyxl` library to support .xlsx format reading and writing. ### Example 2: Reading Multiple Worksheets Excel files can contain multiple worksheets, `read_excel()` supports flexible reading of specified worksheets or all worksheets. h2 class="example">Example import pandas as pd # Create Excel file containing multiple worksheets # First create two DataFrames df_sales = pd.DataFrame({ 'product': ['A','B','C','D'], 'quantity': [100,200,150,80], 'price': [50,30,40,60] }) df_inventory = pd.DataFrame({ 'product': ['A','B','C','D'], 'stock': [500,300,400,200], 'warehouse': ['WH1','WH2','WH1','WH3'] }) # Write multiple worksheets to one Excel file with pd.ExcelWriter('multi_sheet.xlsx', engine='openpyxl')as writer: df_sales.to_excel(writer, sheet_name='Sales', index=False) df_inventory.to_excel(writer, sheet_name='Inventory', index=False) # Example 2a: Read specified worksheet (by name) df_sales_read = pd.read_excel('multi_sheet.xlsx', sheet_name='Sales') print("Read Sales worksheet:") print(df_sales_read) print() # Example 2b: Read specified worksheet (by index) df_inv_read = pd.read_excel('multi_sheet.xlsx', sheet_name=1) print("Read 2nd worksheet (index 1):") print(df_inv_read) print() # Example 2c: Read all worksheets (returns dictionary) all_sheets = pd.read_excel('multi_sheet.xlsx', sheet_name=None) print("All worksheet names:",list(all_sheets.keys())) print("nIterate all worksheets:") for sheet_name, df_sheet in all_sheets.items(): print(f"n--- {sheet_name} ---") print(df_sheet) **Expected Output:** Read Sales worksheet: product quantity price 0 A 100 501 B 200 302 C 150 403 D 80 60Read 2nd worksheet (index 1): product stock warehouse 0 A 500 WH1 1 B 300 WA2 2 C 400 WH1 3 D 200 WH3 All worksheet names: ['Sales', 'Inventory']Iterate all worksheets:--- Sales --- product quantity price ...--- Inventory --- product stock warehouse ... **Code Analysis:** * `sheet_name` parameter can accept worksheet name (string) or index (integer). * Setting `sheet_name=None` reads all worksheets, returning a dictionary with worksheet names as keys and DataFrames as values. * Using `pd.ExcelWriter` can conveniently write multiple worksheets. ### Example 3: Advanced Usage - Custom Columns and Skipping Rows In actual work, Excel files may have complex formats and require flexible handling. ## Example import pandas as pd # Create an Excel file with header rows and empty rows # First few rows are metadata, actual data starts from row 4 data_with_header ="""Company Employee Data Creation Date: 2024-01-01 Department: Technology --- name,age,city,salary Tom,28,Beijing,8000 Jerry,35,Shanghai,12000 """ # First create a CSV then convert to Excel (simulate real scenario) import io df_temp = pd.read_csv(io.StringIO(data_with_header.split('---'))) df_temp.to_excel('complex_format.xlsx', index=False, engine='openpyxl') # Example 3a: Skip first few rows, use row N as column names df_skip = pd.read_excel('complex_format.xlsx', header=3) print("Skip first 3 rows, use row 4 as column names:") print(df_skip) print() # Example 3b: Read only specified columns df_cols = pd.read_excel('complex_format.xlsx', usecols=['name','salary']) print("Read only name and salary columns:") print(df_cols) print() # Example 3c: Read only first few rows df_head = pd.read_excel('complex_format.xlsx', nrows=2) print("Read only first 2 rows:") print(df_head) print() # Example 3d: Custom column names df_custom_names = pd.read_excel('complex_format.xlsx', names=['Name','Age','City','Salary'], header=0) print("Custom column names:") print(df_custom_names) **Expected Output:** Skip first 3 rows, use row 4 as column names: name age city salary 0 Tom 28 Beijing 80001 Jerry 35 Shanghai 12000Read only name and salary columns: name salary 0 Tom 80001 Jerry 12000Read only first 2 rows: name age city salary 0 Tom 28 Beijing 8000Custom column names: Name Age City Salary0 Tom 28 Beijing 80001 Jerry 35 Shanghai 12000 **Code Analysis:** * `header` parameter specifies which row to use as column names (0-indexed). * `usecols` can specify which columns to read, supporting column name list or column index. * `nrows` limits the number of rows to read, suitable for partial reading of large files. * `names` parameter can customize column names, overriding the original column names in the file. * * * ## Notes * Reading .xlsx format requires installing `openpyxl`: `pip install openpyxl`. * Reading .xls format requires installing `xlrd`: `pip install xlrd` (note: xlrd 2.0+ no longer supports .xls files). * Excel files have row and column limits (maximum 1048576 rows, 16384 columns), exceeding these limits will cause data loss. * When reading large files, consider using `usecols` parameter to read only needed columns to improve performance. * `sheet_name` parameter supports mixed use of names and indexes. * * * ## Summary `read_excel()` is the core function in pandas for reading Excel files, with very powerful functionality. It supports reading single or multiple worksheets, and can flexibly handle various Excel file formats. In actual data analysis work, Excel files are one of the most common data sources. Proficiently mastering various parameter usages of `read_excel()` can efficiently process various Excel data, preparing for subsequent data cleaning and analysis. Readers are advised to practice more, especially reading multiple worksheets and parameter configuration. [![Image 2: Python math module](#) Pandas Common Functions](#)

YouTip

Pandas Pd Read Excel

📂 Categories