Pandas Concat
Data concatenation is the process of combining multiple DataFrames or Series either row-wise or column-wise. `pd.concat` is the main concatenation function, while `append` is a simplified version (deprecated, `concat` is recommended).
* * *
## Basic Usage of concat
`pd.concat()` can concatenate multiple DataFrames or Series along an axis.
### Row-wise Concatenation (Vertical Concatenation)
## Example
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
"Name": ["Zhang San","Li Si"],
"Age": [25,30]
})
df2 = pd.DataFrame({
"Name": ["Wang Wu","Zhao Liu"],
"Age": [28,35]
})
print("DataFrame 1οΌ")
print(df1)
print()
print("DataFrame 2οΌ")
print(df2)
print()
# Vertical concatenation
result = pd.concat([df1, df2], ignore_index=True)
print("Concatenation ResultοΌ")
print(result)
### Column-wise Concatenation (Horizontal Concatenation)
## Example
import pandas as pd
df1 = pd.DataFrame({
"Name": ["Zhang San","Li Si","Wang Wu"]
})
df2 = pd.DataFrame({
"Age": [25,30,28],
"City": ["Beijing","Shanghai","Guangzhou"]
})
# Horizontal concatenation
result = pd.concat([df1, df2], axis=1)
print("Horizontal ConcatenationοΌ")
print(result)
> `axis=0` indicates row-wise concatenation (adding rows), and `axis=1` indicates column-wise concatenation (adding columns).
* * *
## Handling Duplicate Indices
### ignore_index
## Example
import pandas as pd
df1 = pd.DataFrame({
"Name": ["Zhang San","Li Si"],
"Age": [25,30]
}, index=[0,1])
df2 = pd.DataFrame({
"Name": ["Wang Wu","Zhao Liu"],
"Age": [28,35]
}, index=[0,1])
# By default, original indices are preserved
print("Preserve Original IndicesοΌ")
print(pd.concat([df1, df2]))
print()
# Ignore old indices and regenerate new ones
print("Ignore Original IndicesοΌ")
print(pd.concat([df1, df2], ignore_index=True))
### Verify Duplicate Keys
## Example
import pandas as pd
df1 = pd.DataFrame({
"A": [1,2]
})
df2 = pd.DataFrame({
"A": [3,4]
})
# Check for duplicate keys
print("Verify ObjectsοΌ")
print(pd.concat([df1, df2], verify_integrity=True))
* * *
## Handling Mismatched Columns
### join Parameter
## Example
import pandas as pd
df1 = pd.DataFrame({
"A": [1,2,3],
"B": ["a","b","c"]
})
df2 = pd.DataFrame({
"B": ["x","y","z"],
"C": [10,20,30]
})
print("df1οΌ")
print(df1)
print()
print("df2οΌ")
print(df2)
print()
# outer join (default): keep all columns
print("outer Concatenation (keep all columns)οΌ")
print(pd.concat([df1, df2], join="outer"))
print()
# inner join: keep only common columns
print("inner Concatenation (keep common columns)οΌ")
print(pd.concat([df1, df2], join="inner"))
### Adding Only New Columns
## Example
import pandas as pd
df1 = pd.DataFrame({
"Name": ["Zhang San","Li Si"],
"Age": [25,30]
})
df2 = pd.DataFrame({
"City": ["Beijing","Shanghai"]
})
# Add columns from df2 to df1
result = pd.concat([df1, df2], axis=1)
print("Add Only New ColumnsοΌ")
print(result)
* * *
## Using keys Parameter to Create Hierarchical Index
## Example
import pandas as pd
df1 = pd.DataFrame({"A": [1,2],"B": [3,4]})
df2 = pd.DataFrame({"A": [5,6],"B": [7,8]})
df3 = pd.DataFrame({"A": [9,10],"B": [11,12]})
# Use keys parameter to create hierarchical index
result = pd.concat([df1, df2, df3], keys=["Year One","Year Two","Year Three"])
print("Concatenation with Hierarchical IndexοΌ")
print(result)
print()
# Access data using hierarchical index
print("Access Year Two DataοΌ")
print(result.loc)
* * *
## Practical Example: Merging Data from Multiple Months
## Example
import pandas as pd
# Simulate sales data from multiple months
jan_sales = pd.DataFrame({
"Month": * 3,
"Product": ["A","B","C"],
"Sales": [100,150,80]
})
feb_sales = pd.DataFrame({
"Month": * 3,
"Product": ["A","B","C"],
"Sales": [120,140,90]
})
mar_sales = pd.DataFrame({
"Month": * 3,
"Product": ["A","B","C"],
"Sales": [110,160,85]
})
# Combine first quarter data
quarterly = pd.concat([jan_sales, feb_sales, mar_sales], ignore_index=True)
print("First Quarter SummaryοΌ")
print(quarterly)
print()
# Summarize by month
monthly_summary = quarterly.groupby("Month").sum()
print("Monthly Sales SummaryοΌ")
print(monthly_summary)
* * *
## append Method (Deprecated)
`DataFrame.append()` has been deprecated in Pandas 2.0 and is not recommended. Please use `pd.concat()` instead.
# Not Recommended (Deprecated) result = df1.append(df2)# Recommended result = pd.concat([df1, df2])
> `concat` is the standard method for data concatenation in Pandas, offering better performance and more complete functionality.
YouTip