YouTip LogoYouTip

Pandas Concat

Data concatenation is the process of combining multiple DataFrames or Series either row-wise or column-wise. `pd.concat` is the main concatenation function, while `append` is a simplified version (deprecated, `concat` is recommended). * * * ## Basic Usage of concat `pd.concat()` can concatenate multiple DataFrames or Series along an axis. ### Row-wise Concatenation (Vertical Concatenation) ## Example import pandas as pd # Create two DataFrames df1 = pd.DataFrame({ "Name": ["Zhang San","Li Si"], "Age": [25,30] }) df2 = pd.DataFrame({ "Name": ["Wang Wu","Zhao Liu"], "Age": [28,35] }) print("DataFrame 1:") print(df1) print() print("DataFrame 2:") print(df2) print() # Vertical concatenation result = pd.concat([df1, df2], ignore_index=True) print("Concatenation Result:") print(result) ### Column-wise Concatenation (Horizontal Concatenation) ## Example import pandas as pd df1 = pd.DataFrame({ "Name": ["Zhang San","Li Si","Wang Wu"] }) df2 = pd.DataFrame({ "Age": [25,30,28], "City": ["Beijing","Shanghai","Guangzhou"] }) # Horizontal concatenation result = pd.concat([df1, df2], axis=1) print("Horizontal Concatenation:") print(result) > `axis=0` indicates row-wise concatenation (adding rows), and `axis=1` indicates column-wise concatenation (adding columns). * * * ## Handling Duplicate Indices ### ignore_index ## Example import pandas as pd df1 = pd.DataFrame({ "Name": ["Zhang San","Li Si"], "Age": [25,30] }, index=[0,1]) df2 = pd.DataFrame({ "Name": ["Wang Wu","Zhao Liu"], "Age": [28,35] }, index=[0,1]) # By default, original indices are preserved print("Preserve Original Indices:") print(pd.concat([df1, df2])) print() # Ignore old indices and regenerate new ones print("Ignore Original Indices:") print(pd.concat([df1, df2], ignore_index=True)) ### Verify Duplicate Keys ## Example import pandas as pd df1 = pd.DataFrame({ "A": [1,2] }) df2 = pd.DataFrame({ "A": [3,4] }) # Check for duplicate keys print("Verify Objects:") print(pd.concat([df1, df2], verify_integrity=True)) * * * ## Handling Mismatched Columns ### join Parameter ## Example import pandas as pd df1 = pd.DataFrame({ "A": [1,2,3], "B": ["a","b","c"] }) df2 = pd.DataFrame({ "B": ["x","y","z"], "C": [10,20,30] }) print("df1:") print(df1) print() print("df2:") print(df2) print() # outer join (default): keep all columns print("outer Concatenation (keep all columns):") print(pd.concat([df1, df2], join="outer")) print() # inner join: keep only common columns print("inner Concatenation (keep common columns):") print(pd.concat([df1, df2], join="inner")) ### Adding Only New Columns ## Example import pandas as pd df1 = pd.DataFrame({ "Name": ["Zhang San","Li Si"], "Age": [25,30] }) df2 = pd.DataFrame({ "City": ["Beijing","Shanghai"] }) # Add columns from df2 to df1 result = pd.concat([df1, df2], axis=1) print("Add Only New Columns:") print(result) * * * ## Using keys Parameter to Create Hierarchical Index ## Example import pandas as pd df1 = pd.DataFrame({"A": [1,2],"B": [3,4]}) df2 = pd.DataFrame({"A": [5,6],"B": [7,8]}) df3 = pd.DataFrame({"A": [9,10],"B": [11,12]}) # Use keys parameter to create hierarchical index result = pd.concat([df1, df2, df3], keys=["Year One","Year Two","Year Three"]) print("Concatenation with Hierarchical Index:") print(result) print() # Access data using hierarchical index print("Access Year Two Data:") print(result.loc) * * * ## Practical Example: Merging Data from Multiple Months ## Example import pandas as pd # Simulate sales data from multiple months jan_sales = pd.DataFrame({ "Month": * 3, "Product": ["A","B","C"], "Sales": [100,150,80] }) feb_sales = pd.DataFrame({ "Month": * 3, "Product": ["A","B","C"], "Sales": [120,140,90] }) mar_sales = pd.DataFrame({ "Month": * 3, "Product": ["A","B","C"], "Sales": [110,160,85] }) # Combine first quarter data quarterly = pd.concat([jan_sales, feb_sales, mar_sales], ignore_index=True) print("First Quarter Summary:") print(quarterly) print() # Summarize by month monthly_summary = quarterly.groupby("Month").sum() print("Monthly Sales Summary:") print(monthly_summary) * * * ## append Method (Deprecated) `DataFrame.append()` has been deprecated in Pandas 2.0 and is not recommended. Please use `pd.concat()` instead. # Not Recommended (Deprecated) result = df1.append(df2)# Recommended result = pd.concat([df1, df2]) > `concat` is the standard method for data concatenation in Pandas, offering better performance and more complete functionality.
← Pandas GroupbyPandas Apply β†’