Pandas Df Replace
[ Pandas Common Functions](#)\\\\n\\\\n* * *\\\\n\\\\n`df.replace()` is a function in Pandas used to replace data in a DataFrame.\\\\n\\\\nData replacement is a common operation in data cleaning. `replace()` helps you replace specific values with new ones, supporting single value replacement, multiple value replacement, regular expression replacement, and other flexible methods. This is very useful in scenarios such as handling outliers, unifying data formats, and encoding conversion.\\\\n\\\\n* * *\\\\n\\\\n## Basic Syntax and Parameters\\\\n\\\\n`replace()` is a member function of DataFrame, called using the dot operator `.`.\\\\n\\\\n### Syntax Format\\\\n\\\\nDataFrame.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')\\\\n### Parameter Description\\\\n\\\\n| Parameter | Type | Required | Description | Default Value |\\\\n| --- | --- | --- | --- | --- |\\\\n| to_replace | str, regex, list, dict, int, float, None | Required | The value to be replaced. Can be a single value, list of values, dictionary, regular expression, etc. | None |\\\\n| value | scalar, dict, list, str, regex, None | Optional | The replacement value. If `to_replace` is a dictionary, this can be omitted; otherwise, this is required. | None |\\\\n| inplace | bool | Optional | If `True`, modifies the original DataFrame directly without returning a new object; if `False`, returns a new DataFrame with the original data unchanged. | False |\\\\n| limit | int | Optional | Specifies the maximum number of replacements. | None |\\\\n| regex | bool or str | Optional | If `True`, treats `to_replace` as a regular expression. | False |\\\\n| method | str | Optional | Used when `to_replace` is a list. `'pad'` or `'ffill'` means forward fill; `'backfill'` or `'bfill'` means backward fill. | 'pad' |\\\\n\\\\n### Return Value Description\\\\n\\\\n* Returns a new DataFrame (if `inplace=False`), or `None` (if `inplace=True`).\\\\n* The returned DataFrame has the specified values replaced.\\\\n\\\\n* * *\\\\n\\\\n## Examples\\\\n\\\\nLet's master the usage of `replace()` through a series of examples.\\\\n\\\\n### Example 1: Single Value Replacement\\\\n\\\\nReplace a specific value in a DataFrame with a new value.\\\\n\\\\n## Example\\\\n\\\\nimport pandas as pd\\\\n\\\\nimport numpy as np\\\\n\\\\n# Create a DataFrame with duplicate data\\\\n\\\\n data ={\\\\n\\\\n'Name': ['Zhang San','Li Si','Wang Wu','Zhao Liu'],\\\\n\\\\n'Department': ['Technology','Citycolumn','Technology','Citycolumn'],\\\\n\\\\n'Salary': [5000,6000,5500,7000]\\\\n\\\\n}\\\\n\\\\n df = pd.DataFrame(data)\\\\n\\\\nprint("Original data: ")\\\\n\\\\nprint(df)\\\\n\\\\nprint("=" * 50)\\\\n\\\\n# Replace "Technology" with "R&D"\\\\n\\\\n df_replaced = df.replace('Technology','R&D')\\\\n\\\\nprint("Replace Data after:")\\\\n\\\\nprint(df_replaced)\\\\n\\\\n**Expected Output:**\\\\n\\\\nOriginal DataοΌ Name Department Salary0 Zhang San Technology 50001 Li Si Citycolumn 60002 Wang Wu Technology 55003 Zhao Liu Citycolumn 7000==================================================Replace Data after: Name Department Salary0 Zhang San R&D 50001 Li Si Citycolumn 60002 Wang Wu R&D 55003 Zhao Liu Citycolumn 7000\\\\n**Code Analysis:**\\\\n\\\\n1. There are two rows with "Technology" department in the DataFrame.\\\\n2. Using `df.replace('Technology', 'R&D')` replaces all "Technology" with "R&D".\\\\n3. This method replaces all matching values in the DataFrame.\\\\n\\\\n### Example 2: One-to-One Replacement of Multiple Values\\\\n\\\\nUse a dictionary to replace multiple different values at once.\\\\n\\\\n## Example\\\\n\\\\nimport pandas as pd\\\\n\\\\n# Create a DataFrame with data that needs to be replaced\\\\n\\\\n data ={\\\\n\\\\n'Name': ['Zhang San','Li Si','Wang Wu','Zhao Liu'],\\\\n\\\\n'City': ['Beijing','Shanghai','Guangzhou','Shenzhen'],\\\\n\\\\n'Level': ['A','B','A','C']\\\\n\\\\n}\\\\n\\\\n df = pd.DataFrame(data)\\\\n\\\\nprint("Original data: ")\\\\n\\\\nprint(df)\\\\n\\\\nprint("=" * 50)\\\\n\\\\n# Use dictionary for multiple replacements\\\\n\\\\n replacements ={\\\\n\\\\n'Beijing': 'BeijingCity',\\\\n\\\\n'Shanghai': 'ShanghaiCity',\\\\n\\\\n'Guangzhou': 'GuangzhouCity',\\\\n\\\\n'Shenzhen': 'ShenzhenCity',\\\\n\\\\n'A': 'Excellent',\\\\n\\\\n'B': 'Good',\\\\n\\\\n'C': 'Pass'\\\\n\\\\n}\\\\n\\\\n df_replaced = df.replace(replacements)\\\\n\\\\nprint("Replace Data after:")\\\\n\\\\nprint(df_replaced)\\\\n\\\\n**Expected Output:**\\\\n\\\\nOriginal Data: Name City Level0 Zhang San Beijing A 1 Li Si Shanghai B 2 Wang Wu Guangzhou A 3 Zhao Liu Shenzhen C ==================================================Replace Data after: Name City Level0 Zhang San BeijingCity Excellent1 Li Si ShanghaiCity Good2 Wang Wu GuangzhouCity Excellent3 Zhao Liu ShenzhenCity Pass\\\\n**Code Analysis:**\\\\n\\\\n* Through the dictionary `replacements`, we can specify multiple replacement rules at once.\\\\n* This method is very efficient and avoids multiple calls to `replace()`.\\\\n\\\\n### Example 3: Replacing Values in Specific Columns\\\\n\\\\nYou can replace values only in specific columns without affecting other columns.\\\\n\\\\n## Example\\\\n\\\\nimport pandas as pd\\\\n\\\\n# Create a DataFrame\\\\n\\\\n data ={\\\\n\\\\n'Name': ['Zhang San','Li Si','Wang Wu','Zhao Liu'],\\\\n\\\\n'Department': ['Technology','Citycolumn','Technology','Citycolumn'],\\\\n\\\\n'Position characters': ['Technology','Citycolumn','Technology','Citycolumn']\\\\n\\\\n}\\\\n\\\\n df = pd.DataFrame(data)\\\\n\\\\nprint("Original data: ")\\\\n\\\\nprint(df)\\\\n\\\\nprint("=" * 50)\\\\n\\\\n# Only replace "Technology" with "R&D" in the "Department" column\\\\n\\\\n df_replaced = df.replace({'Department': 'Technology'},'R&D')\\\\n\\\\nprint("Only replace Department column Data after:")\\\\n\\\\nprint(df_replaced)\\\\n\\\\nprint("=" * 50)\\\\n\\\\n# Nested dictionaries can also be used to apply different replacement rules to different columns\\\\n\\\\n df_replaced2 = df.replace({'Department': {'Technology': 'R&D','Citycolumn': 'Sales'},'Position characters': 'Technology'})\\\\n\\\\nprint("Data after applying different replacement rules to different columns:")\\\\n\\\\nprint(df_replaced2)\\\\n\\\\n**Expected Output:**\\\\n\\\\nOriginal DataοΌ Name Department Position characters0 Zhang San Technology Technology1 Li Si Citycolumn City column2 Wang Wu Technology Technology3 Zhao Liu Citycolumn City column==================================================Only replace Department column Data after: Name Department Position characters0 Zhang San R&D Technology1 Li Si Citycolumn City column2 Wang Wu R&D Technology3 Zhao Liu Citycolumn City column==================================================Data after applying different replacement rules to different columns: Name Department Position characters0 Zhang San R&D R&D1 Li Si Sales Citycolumn2 Wang Wu R&D R&D3 Zhao Liu Sales Citycolumn\\\\n**Code Analysis:**\\\\n\\\\n* Using `{'Department': 'Technology'}` can replace values only in the "Department" column.\\\\n* Nested dictionary `{'Department': {'Technology': 'R&D'}}` can specify different replacement rules for different columns.\\\\n\\\\n### Example 4: Replacement Using Regular Expressions\\\\n\\\\nThe `regex=True` parameter allows using regular expressions for replacement, which is very useful when processing pattern-matching text.\\\\n\\\\n## Example\\\\n\\\\nimport pandas as pd\\\\n\\\\n# Create a DataFrame that needs to be processed with regular expressions\\\\n\\\\n data ={\\\\n\\\\n'Name': ['Zhang San','Li Si','Wang Wu','Zhao Liu'],\\\\n\\\\n'Phone': ['138-0000-0000','139-1111-1111','137-2222-2222','136-3333-3333'],\\\\n\\\\n'Notes': ['Normal','VIPCustomer','Normal','VIP Customer']\\\\n\\\\n}\\\\n\\\\n df = pd.DataFrame(data)\\\\n\\\\nprint("Original data: ")\\\\n\\\\nprint(df)\\\\n\\\\nprint("=" * 50)\\\\n\\\\n# Use regular expression to replace separators in phone numbers\\\\n\\\\n df_replaced = df.replace(to_replace=r'(d{3})-(d{4})-(d{4})', value=r'1****3', regex=True)\\\\n\\\\nprint("Replace phone number (hide middle 4 characters) Data after:")\\\\n\\\\nprint(df_replaced)\\\\n\\\\nprint("=" * 50)\\\\n\\\\n# Replace "VIPCustomer" in remarks (may have space differences)\\\\n\\\\n df_replaced2 = df.replace(to_replace=r'VIPs*Customer?', value='VIP', regex=True)\\\\n\\\\nprint("Unify VIP notes Data after:")\\\\n\\\\nprint(df_replaced2)\\\\n\\\\n**Expected Output:**\\\\n\\\\nOriginal Data: Name Phone Notes0 Zhang San 138-0000-0000 Normal1 Li Si 139-1111-1111 VIPCustomer2 Wang Wu 137-2222-2222 Normal3 Zhao Liu 136-3333-3333 VIP Customer==================================================Replace phone number (hide middle 4 characters) Data after: Name Phone Notes0 Zhang San 138 **** 0000 Normal1 Li Si 139 **** 1111 VIPCustomer2 Wang Wu 137 **** 2222 Normal3 Zhao Liu 136 **** 3333 VIP Customer==================================================Unify VIP notes Data after: Name Phone Notes0 Zhang San 138-0000-0000 Normal1 Li Si 139-1111-1111 VIP 2 Wang Wu 137-2222-2222 Normal3 Zhao Liu 136-3333-3333 VIP\\\\n**Code Analysis:**\\\\n\\\\n* The first example uses the regular expression `(d{3})-(d{4})-(d{4})` to match the phone number format and replaces it with `1****3`, hiding the middle 4 digits.\\\\n* The second example uses `VIPs*Customer?` to match both "VIPCustomer" and "VIP Customer" formats, uniformly replacing them with "VIP".\\\\n\\\\n### Example 5: Replacing Missing Values NaN\\\\n\\\\n`replace()` can also be used to replace missing values `NaN`, which is a common operation in data cleaning.\\\\n\\\\n## Example\\\\n\\\\nimport pandas as pd\\\\n\\\\nimport numpy as np\\\\n\\\\n# Create a DataFrame with missing values\\\\n\\\\n data ={\\\\n\\\\n'Name': ['Zhang San','Li Si','Wang Wu','Zhao Liu'],\\\\n\\\\n'Age': [25, np.nan,35, np.nan],\\\\n\\\\n'Salary': [5000,6000, np.nan,8000]\\\\n\\\\n}\\\\n\\\\n df = pd.DataFrame(data)\\\\n\\\\nprint("Original data: ")\\\\n\\\\nprint(df)\\\\n\\\\nprint("=" * 50)\\\\n\\\\n# Replace NaN with 0\\\\n\\\\n df_replaced = df.replace(np.nan,0)\\\\n\\\\nprint("Set NaN Replace with 0 Data after:")\\\\n\\\\nprint(df_replaced)\\\\n\\\\nprint("=" * 50)\\\\n\\\\n# Replace NaN with other values, such as "unknown\\\\n\\\\n df_replaced2 = df.replace(np.nan,'Unknown')\\\\n\\\\nprint("Set NaN Replace with'Unknown'Data after:")\\\\n\\\\nprint(df_replaced2)\\\\n\\\\n**Expected Output:**\\\\n\\\\nOriginal data: Name Age Salary0 Zhang San 25.0 5000.01 Li Si NaN 6000.02 Wang Wu 35.0 NaN3 Zhao Liu NaN 8000.0==================================================Set NaN Replace with 0 Data after: Name Age Salary0 Zhang San 25.0 50001 Li Si 0.0 60002 Wang Wu 35.0 03 Zhao Liu 0.0 8000==================================================Set NaN Replace with'Unknown'Data after: Name Age Salary0 Zhang San 25.0 5000.01 Li Si Unknown 6000.02 Wang Wu 35.0 Unknown3 Zhao Liu Unknown 8000.0\\\\n**Code Analysis:**\\\\n\\\\n* `np.nan` represents missing values and can be replaced using `replace(np.nan, value)`.\\\\n* Choose appropriate values to replace missing values based on data type and business requirements.\\\\n\\\\n### Example 6: Numeric Replacement\\\\n\\\\nNumeric data can be replaced for data standardization or outlier handling.\\\\n\\\\n## Example\\\\n\\\\nimport pandas as pd\\\\n\\\\n# Create a DataFrame with outliers\\\\n\\\\n data ={\\\\n\\\\n'Student': ['Zhang San','Li Si','Wang Wu','Zhao Liu','Qian Qi'],\\\\n\\\\n'Score': [85,150,78,92, -10]# 150 and -10 are outliers\\\\n\\\\n}\\\\n\\\\n df = pd.DataFrame(data)\\\\n\\\\nprint("Original data: ")\\\\n\\\\nprint(df)\\\\n\\\\nprint("=" * 50)\\\\n\\\\n# Replace outliers with values in a reasonable range\\\\n\\\\n df_replaced = df.replace({150: 100, -10: 0})\\\\n\\\\nprint("SetOutlier replacement Data after:")\\\\n\\\\nprint(df_replaced)\\\\n\\\\nprint("=" * 50)\\\\n\\\\n# Use conditional judgment to replace multiple values\\\\n\\\\n df_replaced2 = df.replace(to_replace=[150, -10], value=[100,0])\\\\n\\\\nprint("Using a list to replace multiple values:")\\\\n\\\\nprint(df_replaced2)\\\\n\\\\n**Expected Output:**\\\\n\\\\nOriginal Data: Student Score0 Zhang San 851 Li Si 1502 Wang Wu 783 Zhao Liu 924 Qian Qi -10==================================================SetOutlier replacement Data after: Student Score0 Zhang San 851 Li Si 1002 Wang Wu 783 Zhao Liu 924 Qian Qi 0==================================================Using a list to replace multiple values: Student Score0 Zhang San 851 Li Si 1002 Wang Wu 783 Zhao Liu 924 Qian Qi 0\\\\n**Code Analysis:**\\\\n\\\\n* Dictionary `{150: 100, -10: 0}` replaces outlier 150 with 100 and -10 with 0.\\\\n* Lists `[150, -10]` and `[100, 0]` specify the values to be replaced and the replacement values respectively, corresponding in order.\\\\n\\\\n* * *\\\\n\\\\n## Notes\\\\n\\\\n* `replace()` does not modify the original DataFrame by default. To modify in place, use the `inplace=True` parameter.\\\\n* When using regular expressions, ensure the regular expression syntax is correct. Complex regular expressions may lead to unexpected results.\\\\n* Replacement operations are based on value matching and will not change the data type.\\\\n* Note that case sensitivity matters. "Technology" and "Technology" are different values.\\\\n* Before performing data replacement, it is recommended to back up the original data for comparison and traceability.\\\\n\\\\n* * Pandas Common Functions](#)
YouTip