Pandas Df Stack
[ Pandas Common Functions](#)
* * *
`df.stack()` is a member method of DataFrame used to **convert column indices into row indices**. It "stacks" columns from wide-format data into rows, producing a Series or DataFrame with hierarchical indexing.
This is an important operation in data reshaping and is often used together with `unstack()` to convert data back and forth.
**Word Definition**: `stack` means "to pile up", here it refers to piling column data into rows, transforming a wide table into a long table.
* * *
## Basic Syntax and Parameters
`df.stack()` is an instance method of DataFrame, called using dot notation.
### Syntax Format
DataFrame.stack(level=-1, dropna=True)
### Parameter Description
* **Parameter**: `level`
* Type: int, str, int list, or str list.
* Description: Specifies which level to stack. For multi-level indices, you can specify a specific level number (starting from 0) or level name. Default is -1, indicating the innermost level.
* **Parameter**: `dropna`
* Type: Boolean.
* Description: Whether to drop rows where all values are NaN in the result. Default is True, meaning rows with all NaN values will be removed.
### Function Description
* **Return Value**: Returns a Series or DataFrame. If only one column is stacked from the original DataFrame, a Series is returned; otherwise, a DataFrame is returned.
* **Effect**: Converts column indices into row indices, producing a longer data format. The stacked data is in "long format", suitable for statistical analysis or visualization.
* * *
## Examples
Let's go through a series of examples from simple to complex to fully master the usage of `df.stack()`.
### Example 1: Basic Usage - Convert Columns to Rows
## Example
import pandas as pd
import numpy as np
# 1. Create wide-format data (each quarter is a column)
df = pd.DataFrame({
'name': ['Alice','Bob','Charlie'],
'Q1': [100,150,200],
'Q2': [120,160,210],
'Q3': [130,170,220]
})
print("=== Original Wide-Format Data ===")
print(df)
print(f"Column Names: {df.columns.tolist()}")
# 2. Use df.stack() to convert columns to rows
stacked = df.stack()
print("n=== df.stack() Result ===")
print(stacked)
print(f"nType: {type(stacked)}")
# 3. Convert result to DataFrame for viewing
print("n=== Converted to DataFrame ===")
stacked_df = stacked.reset_index()
stacked_df.columns=['name','quarter','sales']
print(stacked_df)
**Expected Output:**
=== Original Wide-Format Data === name Q1 Q2 Q3 0 Alice 100 120 1301 Bob 150 160 1702 Charlie 'Q2', 'Q3'], columns=['name', 'quarter', 'sales']...
**Code Explanation:**
1. The original data is in wide format, with Q1, Q2, Q3 being three quarterly columns.
2. `stack()` "stacks" the Q1, Q2, Q3 columns into rows.
3. The result is a Series type with two levels of index: outer level is the original row index, inner level is the original column names (quarters).
4. The stacked data becomes "long format", with each row containing only one quarterly value.
### Example 2: Stacking Multi-Level Column Indices
When a DataFrame has multi-level column indices, you can specify which level to stack.
## Example
import pandas as pd
import numpy as np
# 1. Create data with multi-level column indices
# Outer level: Product Type (Electronics, Furniture)
# Inner level: Quarter (Q1, Q2)
df = pd.DataFrame(
[[100,110,200,210],[150,160,180,190]],
index=['Store_A','Store_B'],
columns=pd.MultiIndex.from_tuples([
('Electronics','Q1'),('Electronics','Q2'),
('Furniture','Q1'),('Furniture','Q2')
], names=['Product','Quarter'])
)
print("=== Original Data (Multi-Level Column Index) ===")
print(df)
print(f"Column Index Levels: {df.columns.names}")
# 2. Stack inner level columns (default, level=-1)
print("n=== Stack Inner Level Columns (level=-1) ===")
stacked_inner = df.stack()
print(stacked_inner)
# 3. Stack outer level columns
print("n=== Stack Outer Level Columns (level=0) ===")
stacked_outer = df.stack(level=0)
print(stacked_outer)
# 4. Stack all columns (produce even longer format)
print("n=== Stack All Columns ===")
stacked_all = df.stack(level=[0,1])
print(stacked_all)
**Expected Output:**
=== Original Data (Multi-Level Column Index) ===Product Electronics FurnitureQuarter Q1 Q2 Q1 Q ...
**Code Explanation:**
* The DataFrame with multi-level column indices has column names in tuple form: (Product, Quarter).
* `level=-1` (default) stacks the innermost level, i.e., the Quarter level, leaving Product as a column.
* `level=0` stacks the outer level, i.e., the Product level, leaving Quarter as a column.
* This flexibility allows you to choose which level to stack based on your analytical needs.
### Example 3: Handling Missing Values
Use the `dropna` parameter to control how missing values are handled.
## Example
import pandas as pd
import numpy as np
# 1. Create data with missing values
df = pd.DataFrame({
'name': ['Alice','Bob'],
'Math': [100, np.nan],
'English': [90,85],
'Science': [np.nan,95]
})
print("=== Original Data (With Missing Values) ===")
print(df)
# 2. dropna=True (default), remove rows with all NaN
print("n=== dropna=True (Default) ===")
stacked_drop = df.stack(dropna=True)
print(stacked_drop)
# 3. dropna=False, keep NaN values
print("n=== dropna=False ===")
stacked_keep = df.stack(dropna=False)
print(stacked_keep)
# 4. Practical application: organize stacked results into analyzable long format
print("n=== Organize into Long-Format DataFrame ===")
result = df.set_index('name').stack().reset_index()
result.columns=['name','subject','score']
print(result)
# 5. Remove missing values
result_clean = result.dropna()
print("n=== After Removing Missing Values ===")
print(result_clean)
**Expected Output:**
=== Original Data (With Missing Values) === name Math English Science0 Alice 100.0 90.0 NaN1 Bob NaN 85.0 95.02=== Stack Inner Level Columns (level=-1) ===Store_A Electronics Q1 100 Q2 110 Furniture Q1 200 Q2 210Store_B Electronics Q1 150 ...
**Code Explanation:**
* `dropna=True` (default) removes rows where all values are NaN.
* In data analysis, it's common to preserve the original data and then remove missing values as needed.
* After stacking, use `reset_index()` to convert hierarchical indices into regular columns for easier processing.
### Example 4: Using stack with unstack
`stack()` and `unstack()` are inverse operations. Used together, they allow flexible data reshaping.
## Example
import pandas as pd
# 1. Create initial data
df = pd.DataFrame({
'product': ['A','B','C'],
'North': [100,150,200],
'South': [180,170,160],
'East': [190,200,210]
})
print("=== Original Data ===")
print(df)
# 2. Use unstack to convert regions into columns (first make it multi-level indexed)
df_indexed = df.set_index('product')
print("n=== After Setting Index ===")
print(df_indexed)
# 3. Use stack to convert columns to rows (stack is the inverse of unstack)
print("n=== unstack + stack Round-trip ===")
# First unstack
unstacked = df_indexed.unstack()
print("unstack result:")
print(unstacked)
# Then stack back
restacked = unstacked.stack()
print("nstack back:")
print(restacked)
# 4. Complete example: create multi-level index then transform
print("n=== Multi-Level Index Transformation ===")
# Create multi-level indexed DataFrame
multi_df = pd.DataFrame(
[[100,110],[150,160]],
index=pd.MultiIndex.from_tuples([('A','2023'),('B','2023')], names=['product','year']),
columns=['North','South']
)
print("Original Multi-Level Indexed Data:")
print(multi_df)
# unstack year
print("nunstack year:")
print(multi_df.unstack(level='year'))
# unstack columns
print("nunstack columns (to columns):")
print(multi_df.unstack(level='columns'))
**Expected Output:**
=== unstack + stack Round-trip === unstack result: product North A 100 B 150 C 200South A 180 B 170...
**Code Explanation:**
* `stack()` converts column indices into row indices, transforming a wide table into a long table.
* `unstack()` converts row indices into column indices, transforming a long table into a wide table.
* Together, these operations enable flexible data shape transformations.
* Understanding the inverse relationship between these operations is crucial for data analysis.
> **Tip:** `stack()` and `unstack()` are core tools for data reshaping. `stack()` piles columns into rows (wide to long), while `unstack()` spreads rows into columns (long to wide).
[ Pandas Common Functions](#)
YouTip