YouTip LogoYouTip

Pandas Sorting

Data sorting and aggregation are very common and important operations in data analysis, especially when analyzing large datasets. Sorting helps us arrange data according to specific criteria, while aggregation allows us to summarize data and calculate various statistics. Pandas provides powerful sorting and aggregation functions that can help analysts process data efficiently. | **Operation** | **Method** | **Description** | **Common Functions/Methods** | | --- | --- | --- | --- | | **Sorting** | `sort_values(by, ascending)` | Sort by values in a column, `ascending` controls ascending/descending order | `df.sort_values(by='column')` | | **Sorting** | `sort_index(axis)` | Sort by row or column index | `df.sort_index(axis=0)` | | **Group Aggregation** | `groupby(by)` | Group by a column, then apply aggregation functions | `df.groupby('column')` | | **Aggregation Function** | `agg()` | Aggregation functions such as `sum()`, `mean()`, `count()`, etc. | `df.groupby('column').agg({'value': 'sum'})` | | **Multiple Aggregation** | `agg([func1, func2])` | Apply multiple aggregation functions to the same column | `df.groupby('column').agg({'value': ['mean', 'sum']})` | | **Sort After Grouping** | `apply(lambda x: x.sort_values(...))` | Sort after grouping | `df.groupby('column').apply(lambda x: x.sort_values(...))` | | **Pivot Table** | `pivot_table()` | Create pivot table, summarize data by rows and columns | | * * * ## 1. Data Sorting **Sorting** refers to arranging data in ascending or descending order based on values in a certain column. Pandas provides two main methods for sorting: `sort_values()` and `sort_index()`. ### Sorting Methods * `sort_values()`: Sort by column values. * `sort_index()`: Sort by row or column index. ### Examples | **Operation** | **Method** | **Description** | **Example** | | --- | --- | --- | --- | | Sort by value | `df.sort_values(by, ascending)` | Sort by specified column (`by`), `ascending` controls ascending or descending order, default is ascending | `df.sort_values(by='Age', ascending=False)` | | Sort by index | `df.sort_index(axis)` | Sort by row or column index, `axis` controls sorting by row or column | `df.sort_index(axis=0)` | **`sort_values()` Example:** ## Example import pandas as pd # Example data data ={'Name': ['Alice','Bob','Charlie','David'], 'Age': [25,30,35,40], 'Salary': [50000,60000,70000,80000]} df = pd.DataFrame(data) # Sort by "Age" column in descending order df_sorted = df.sort_values(by='Age', ascending=False) print(df_sorted) **Output:** Name Age Salary3 David 40 800002 Charlie 35 700001 Bob 30 600000 Alice 25 50000 **`sort_index()` Example**: ## Example # Sort by row index df_sorted_by_index = df.sort_index(axis=0) print(df_sorted_by_index) **Output:** Name Age Salary0 Alice 25 500001 Bob 30 600002 Charlie 35 700003 David 40 80000 * * * ## 2. Data Aggregation **Aggregation** is the process of summarizing data according to certain rules, typically performing operations such as sum, average, maximum, minimum, etc. on data in certain columns. Pandas provides the `groupby()` method to group data, then apply different aggregation functions. ### Aggregation Methods * `groupby()`: Group by certain columns. * Aggregation functions: such as `sum()`, `mean()`, `count()`, `min()`, `max()`, `std()`, etc. ### Examples | **Operation** | **Method** | **Description** | **Example** | | --- | --- | --- | --- | | Group by column and aggregate | `df.groupby(by).agg()` | Group by specified column (`by`), `agg()` can accept different aggregation functions for various operations | `df.groupby('Department').agg({'Salary': 'mean'})` | | Multiple aggregation functions | `df.groupby(by).agg([func1, func2])` | Can apply multiple aggregation functions to the same column, returning multiple aggregation results | `df.groupby('Department').agg({'Salary': ['mean', 'sum']})` | **`groupby()` Example**: ## Example import pandas as pd # Example data data ={'Department': ['HR','Finance','HR','IT','IT'], 'Employee': ['Alice','Bob','Charlie','David','Eve'], 'Salary': [50000,60000,55000,70000,75000]} df = pd.DataFrame(data) # Group by department and calculate average salary for each department grouped = df.groupby('Department')['Salary'].mean() print(grouped) **Output:** DepartmentFinance 60000.0 HR 52500.0 IT 72500.0Name: Salary, dtype: float64 **Multiple Aggregation Functions**: ## Example # Group by department and calculate both mean and sum of salary for each department grouped_multiple = df.groupby('Department').agg({'Salary': ['mean','sum']}) print(grouped_multiple) **Output:** Salary mean sum Department Finance 60000.0 60000 HR 52500.0 105000 IT 72500.0 145000 * * * ## 3. Sorting After Grouping Aggregated data can be further sorted by values in a certain column, allowing us to find the most important values in specific groups. ### Sorting After Grouping | **Operation** | **Method** | **Description** | **Example** | | --- | --- | --- | --- | | Sort after grouping | `df.groupby(by).apply(lambda x: x.sort_values(by='col'))` | Sort by values in a certain column within each group. | `df.groupby('Department').apply(lambda x: x.sort_values(by='Salary', ascending=False))` | **Sorting After Grouping Example**: ## Example # Group by department, then sort by salary in descending order grouped_sorted = df.groupby('Department').apply(lambda x: x.sort_values(by='Salary', ascending=False)) print(grouped_sorted) **Output:** Department Employee SalaryDepartment Finance Bob 60000 HR Charlie 55000 HR Alice 50000 IT Eve 75000 IT David 70000 * * * ## 4. Pivot Table A Pivot Table is a special aggregation method that allows us to quickly summarize data through rows, columns, and aggregation functions, similar to pivot tables in Excel. ### Examples | **Operation** | **Method** | **Description** | **Example** | | --- | --- | --- | --- | | Create pivot table | `df.pivot_table(values, index, columns, aggfunc)` | Classify and summarize data with specified columns for rows and columns, `values` is the value to aggregate, `aggfunc` is the aggregation function | `df.pivot_table(values='Salary', index='Department', aggfunc='mean')` | **Pivot Table Example**: ## Example # Use pivot_table to calculate average salary for each department pivot_table = df.pivot_table(values='Salary', index='Department', aggfunc='mean') print(pivot_table) **Output:** DepartmentFinance 60000.0 HR 52500.0 IT 72500.0Name: Salary, dtype: float64
← Pandas AdvancedVue3 Declarative Rendering β†’