YouTip LogoYouTip

Pandas Pd Pivot Table

[![Image 1: Pandas Common Functions](#) Pandas Common Functions](#)\\n\\n* * *\\n\\n`pd.pivot_table()` is a function in the Pandas library used to **create pivot tables**. Pivot tables are a common feature in spreadsheets that can group, aggregate, and reshape data, making it easy to analyze data from different perspectives.\\n\\nCompared to `pd.crosstab()`, `pivot_table()` is more powerful, supporting multiple aggregation functions, handling missing values, and more complex data reshaping.\\n\\n**Word Definition**: `pivot_table` means "pivot table", which is a table that can dynamically change the data layout and summarize and analyze data from multiple dimensions.\\n\\n* * *\\n\\n## Basic Syntax and Parameters\\n\\n`pd.pivot_table()` is a top-level function in the Pandas library used to create pivot tables.\\n\\n### Syntax Format\\n\\npd.pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All')\\n### Parameter Description\\n\\n* **Parameter**: `data`\\n * Type: DataFrame.\\n * Description: The source DataFrame.\\n\\n* **Parameter**: `values`\\n * Type: Column name or list of column names.\\n * Description: The column(s) to aggregate. If not specified, all numeric columns will be aggregated.\\n\\n* **Parameter**: `index`\\n * Type: Column name, list of column names, or function.\\n * Description: The column(s) to use as row index. Can be a single column or multiple columns (list); multiple columns will create a hierarchical index.\\n\\n* **Parameter**: `columns`\\n * Type: Column name or list of column names.\\n * Description: The column(s) to use as column index.\\n\\n* **Parameter**: `aggfunc`\\n * Type: Function, string, or list of functions.\\n * Description: The aggregation function. Common values include 'sum', 'mean', 'count', 'min', 'max', 'median', 'std', etc. The default is 'mean'.\\n\\n* **Parameter**: `fill_value`\\n * Type: Scalar or None.\\n * Description: The value to use to fill missing values.\\n\\n* **Parameter**: `margins`\\n * Type: Boolean.\\n * Description: Whether to add row/column subtotals. Default is `False`.\\n\\n### Function Description\\n\\n* **Return Value**: Returns a DataFrame, which is the pivot table.\\n* **Effect**: Reshapes and summarizes the data based on the specified rows, columns, and aggregation function.\\n\\n* * *\\n\\n## Examples\\n\\nLet's thoroughly master the usage of `pd.pivot_table()` through a series of examples from simple to complex.\\n\\n### Example 1: Basic Usage - Creating a Simple Pivot Table\\n\\n## Example\\n\\nimport pandas as pd\\n\\nimport numpy as np\\n\\n# 1. Create sales data\\n\\n sales = pd.DataFrame({\\n\\n'date': pd.date_range('2023-01-01', periods=12, freq='D'),\\n\\n'product': ['A','B','C','A','B','C','A','B','C','A','B','C'],\\n\\n'region': ['North','North','North','South','South','South','East','East','East','West','West','West'],\\n\\n'sales': [100,150,200,180,170,160,190,200,210,220,230,240]\\n\\n})\\n\\nprint("=== Sales data ===")\\n\\nprint(sales)\\n\\n# 2. Create pivot table - Aggregate sales by product and region\\n\\n pivot = pd.pivot_table(sales, values='sales', index='product', columns='region', aggfunc='sum')\\n\\nprint("n=== pd.pivot_table() Sales pivot table ===")\\n\\nprint(pivot)\\n\\n# 3. Fill missing values using fill_value\\n\\nprint("n=== Fill missing values (fill_value=0) ===")\\n\\n pivot_fill = pd.pivot_table(sales, values='sales', index='product', columns='region', aggfunc='sum', fill_value=0)\\n\\nprint(pivot_fill)\\n\\n**Expected Output:**\\n\\n=== Sales data === date product region sales 0 2023-01-01 A North 1001 2023-01-01 B North 1502 2023-01-01 C North 2003 2023-01-01 A South 1804 2023-01-01 B South 1705 2023-01-01 C South 1606 2023-01-01 A East 1907 2023-01-01 B East 2008 2023-01-01 C East 2109 2023-01-01 A West 22010 2023-01-01 B West 23011 2023-01-01 C West 240=== pd.pivot_table() Sales pivot table === region East North South West product A 400.0 100.0 180.0 220.0 B 400.0 150.0 170.0 230.0 C 410.0 200.0 160.0 240.0=== Fill missing values (fill_value=0) === region East North South West product A 400 100 180 220 B 400 150 170 230 C 410 200 160 240\\n**Code Explanation:**\\n\\n* The pivot table summarizes sales by product (rows) and region (columns).\\n* `fill_value=0` displays missing combinations as 0, making it easier to read.\\n\\n### Example 2: Using Different Aggregation Functions\\n\\nThe `aggfunc` parameter can specify multiple aggregation methods to analyze data.\\n\\n## Example\\n\\nimport pandas as pd\\n\\nimport numpy as np\\n\\n# 1. Create richer sales data\\n\\n sales = pd.DataFrame({\\n\\n'product': ['A','B','A','B','A','B','A','B','A','B'],\\n\\n'region': ['North','North','South','South','East','East','West','West','North','South'],\\n\\n'sales': [100,150,200,180,190,200,220,230,110,190],\\n\\n'quantity': [10,15,20,18,19,20,22,23,11,19]\\n\\n})\\n\\nprint("=== Sales data ===")\\n\\nprint(sales)\\n\\n# 2. Aggregate (sum)\\n\\nprint("n=== Sum of sales ===")\\n\\n sum_result = pd.pivot_table(sales, values='sales', index='product', columns='region', aggfunc='sum')\\n\\nprint(sum_result)\\n\\n# 3. Calculate average sales\\n\\nprint("n=== Average sales ===")\\n\\n mean_result = pd.pivot_table(sales, values='sales', index='product', columns='region', aggfunc='mean')\\n\\nprint(mean_result.round(1))\\n\\n# 4. Calculate multiple statistics simultaneously (passing a list)\\n\\nprint("n=== Calculate both quantity sum and mean ===")\\n\\n multi_agg = pd.pivot_table(sales, values='sales', index='product', columns='region', aggfunc=['sum','mean'])\\n\\nprint(multi_agg)\\n\\n**Expected Output:**\\n\\n=== Sales data === product region sales quantity 0 A North 100 101 B North 150 152 A South 200 203 B South 180 184 A East 190 195 B East 200 206 A West 220 227 B West 230 238 A North 110 119 B South 190 19=== Sum of sales === region East North South West product A 190 210 200 220 B 200 150 370 230=== Average sales === region East North South West product A 190 105.0 200 220 B 200 150.0 185 230=== Calculate both quantity sum and mean === sum mean region East North South West East North South West product A 190 210 200 220 190 105.0 200 220 B 200 150 370 230 200 150.0 185.0 230\\n**Code Explanation:**\\n\\n* Using different aggregation functions can calculate different statistical metrics.\\n* When passing multiple aggregation functions simultaneously, the pivot table will create a multi-level column index.\\n\\n### Example 3: Adding Subtotals Using margins\\n\\nUsing the `margins` parameter, you can add row and column subtotals, making it easy to view summary data.\\n\\n## Example\\n\\nimport pandas as pd\\n\\n# Use the previous Sales data\\n\\n sales = pd.DataFrame({\\n\\n'product': ['A','B','A','B','A','B'],\\n\\n'region': ['North','North','South','South','East','East'],\\n\\n'sales': [100,150,200,180,190,200]\\n\\n})\\n\\n# 1. No margins\\n\\nprint("=== No margins ===")\\n\\nprint(pd.pivot_table(sales, values='sales', index='product', columns='region', aggfunc='sum'))\\n\\n# 2. Add row and column margins\\n\\nprint("n=== Add margins (margins=True) ===")\\n\\n result = pd.pivot_table(sales, values='sales', index='product', columns='region', aggfunc='sum', margins=True)\\n\\nprint(result)\\n\\n# 3. Customize margin names\\n\\nprint("n=== Customize margin names ===")\\n\\n result_name = pd.pivot_table(sales, values='sales', index='product', columns='region', aggfunc='sum', margins=True, margins_name='Total')\\n\\nprint(result_name)\\n\\n# 4. Margins for multi-level row index
← Pandas Pd Read ExcelPandas Pd Factorize β†’