Pandas General Functions
Pandas provides a large number of functions for data processing and analysis. Here are some commonly used functions:
### **General Functions**
| **Function** | **Description** |
| --- | --- |
| `pd.isna(obj)` | Check if object is missing value. |
| `pd.notna(obj)` | Check if object is not missing value. |
| `pd.concat(objs, axis)` | Concatenate multiple objects. |
| `pd.merge(left, right, on)` | Merge DataFrames by column. |
| `pd.get_dummies(data)` | One-Hot encoding for categorical variables. |
| `pd.cut(x, bins)` | Binning continuous data. |
| `pd.qcut(x, q)` | Binning by quantiles. |
| `pd.to_numeric(arg)` | Convert to numeric. |
| `pd.to_datetime(arg)` | Convert to datetime. |
| `pd.unique(values)` | Get unique values. |
| `pd.value_counts(values)` | Count frequencies. |
| `pd.factorize(values)` | Encode categorical variables. |
| `pd.crosstab(index, columns)` | Cross tabulation. |
| `pd.pivot_table(data)` | Pivot table. |
| `pd.melt(frame)` | Wide to long. |
* * *
### **Data Reading and Writing (IO)**
| Function | Description |
| --- | --- |
| `pd.read_csv()` | Read CSV file. |
| `pd.read_excel()` | Read Excel. |
| `pd.read_json()` | Read JSON. |
| `pd.read_html()` | Parse HTML table. |
| `pd.read_sql()` | Read from database. |
| `df.to_csv()` | Write to CSV. |
| `df.to_excel()` | Write to Excel. |
| `df.to_json()` | Write to JSON. |
| `df.to_parquet()` | Write to Parquet. |
* * *
### **Data Cleaning**
| Function | Description |
| --- | --- |
| `df.dropna()` | Delete missing values. |
| `df.fillna()` | Fill missing values. |
| `df.replace()` | Replace data. |
| `df.drop_duplicates()` | Remove duplicates. |
| `df.astype()` | Type conversion. |
| `df.rename()` | Rename columns. |
| `df.sort_values()` | Sort. |
| `df.reset_index()` | Reset index. |
* * *
### **Data Selection and Filtering**
| Function | Description |
| --- | --- |
| `df.head()` | First few rows. |
| `df.tail()` | Last few rows. |
| `df.loc[]` | Label-based indexing. |
| `df.iloc[]` | Integer-based indexing. |
| `df.query()` | Conditional filtering. |
| `df.filter()` | Column filtering. |
* * *
### **Grouping and Aggregation**
| Function | Description |
| --- | --- |
| `df.groupby()` | Grouping operation. |
| `groupby.sum()` | Aggregation sum. |
| `groupby.mean()` | Mean. |
| `groupby.agg()` | Multiple aggregation. |
| `groupby.transform()` | Transform. |
* * *
### **Math and Statistical Functions**
| Function | Description |
| --- | --- |
| `Series.sum()` | Sum. |
| `Series.mean()` | Mean. |
| `Series.median()` | Median. |
| `Series.std()` | Standard deviation. |
| `Series.var()` | Variance. |
| `Series.corr()` | Correlation coefficient. |
| `Series.quantile()` | Quantile. |
| `Series.cumsum()` | Cumulative sum. |
* * *
### **String Processing**
| Function | Description |
| --- | --- |
| `Series.str.lower()` | Lowercase. |
| `Series.str.upper()` | Uppercase. |
| `Series.str.strip()` | Remove whitespace. |
| `Series.str.replace()` | Replace. |
| `Series.str.contains()` | Match. |
| `Series.str.split()` | Split. |
| `Series.str.len()` | Length. |
* * *
### **Time Series**
| Function | Description |
| --- | --- |
| `pd.date_range()` | Generate dates. |
| `pd.Timestamp()` | Timestamp. |
| `pd.Timedelta()` | Time delta. |
| `Series.dt.year` | Year. |
| `Series.dt.month` | Month. |
| `Series.dt.day` | Day. |
| `Series.dt.weekday` | Weekday. |
* * *
### **Data Reshaping**
| Function | Description |
| --- | --- |
| `df.pivot()` | Pivot. |
| `df.pivot_table()` | Pivot table. |
| `df.stack()` | Columns to rows. |
| `df.unstack()` | Rows to columns. |
| `pd.melt()` | Wide to long. |
* * *
## Example
import pandas as pd
# General Functions
s = pd.Series([1,2,3,None])
print(pd.isna(s))
# Math
print(s.sum())
# String
s_str = pd.Series(['a','b'])
print(s_str.str.upper())
# Time
dates = pd.to_datetime(['2023-01-01'])
print(dates.dt.month)
* * *
For more detailed information, you can refer to (https://pandas.pydata.org/docs/reference/general_functions.html).
YouTip