YouTip LogoYouTip

Pandas Dtype

Pandas provides a rich data type system, and correctly understanding and using data types is the foundation of efficient data analysis. This section details the Pandas data type system, type inference, and type conversion methods. * * * ## Pandas Data Types Overview | dtype | Description | Python Type | Example | | --- | --- | --- | --- | | `int64` | 64-bit integer | int | 1, 2, 100 | | `float64` | 64-bit floating point | float | 1.5, 3.14 | | `object` | String or mixed types | str | "hello" | | `bool` | Boolean | bool | True, False | | `datetime64` | Datetime | datetime | 2024-01-01 | | `timedelta64` | Timedelta | timedelta | 1 days | | `category` | Category type | - | Finite set | ## Example import pandas as pd import numpy as np # Create a DataFrame with various data types df = pd.DataFrame({ "Integer": [1,2,3], "Float": [1.5,2.5,3.5], "String": ["a","b","c"], "Boolean": [True,False,True], "Date": pd.date_range("2024-01-01", periods=3) }) print("Column data types:") print(df.dtypes) * * * ## Type Inference and Specification ### Automatic Type Inference ## Example import pandas as pd # Type inference when reading CSV # Pandas will attempt to infer the most appropriate type for each column df = pd.read_csv("data.csv") # Or explicitly specify types using the dtype parameter df = pd.read_csv("data.csv", dtype={ "Age": "int32",# Specify as 32-bit integer to save memory "Salary": "float32",# Specify as 32-bit floating point "Name": "string"# Use PyArrow string }) ### Specifying Types at Creation ## Example import pandas as pd import numpy as np # Create Series with specified data type s = pd.Series([1,2,3], dtype="int8")# Use smaller integer type print(f"int8 type: {s.dtype}") s = pd.Series([1.5,2.5,3.5], dtype="float32")# Use float32 print(f"float32 type: {s.dtype}") # Use numpy types s = pd.Series([1,2,3], dtype=np.int8) print(f"np.int8 type: {s.dtype}") * * * ## Type Conversion ### Using astype for Conversion ## Example import pandas as pd import numpy as np # Create sample data df = pd.DataFrame({ "Integer": [1,2,3], "Float": [1.5,2.5,3.5], "String": ["1","2","3"], "Boolean": [1,0,1] }) print("Original types:") print(df.dtypes) print() # Convert to string df= df.astype(str) print("After converting to string:") print(df.dtypes) # String to numeric df= df.astype(int) print("\n String to integer:") print(df.dtypes) # Numeric to boolean (non-zero is True) df= df.astype(bool) print("\n Integer to boolean:") print(df.dtypes) # Float to integer (truncation) df= df.astype(int) print("\n Float to integer:") print(df) ### Using pd.to_numeric for Conversion ## Example import pandas as pd import numpy as np # Handle numeric strings with special characters s = pd.Series(["$1,000","$2,500","$3,200"]) # Clean and convert s_cleaned = s.str.replace("$","", regex=False).str.replace(",","", regex=False) s_numeric = pd.to_numeric(s_cleaned) print("String to numeric:") print(s_numeric) print(f"Type: {s_numeric.dtype}") print() # Handle missing values s_with_na = pd.Series(["1","2","NA","4"]) s_numeric = pd.to_numeric(s_with_na, errors="coerce")# Invalid values converted to NaN print("Handling missing values:") print(s_numeric) ### Using pd.to_datetime for Date Conversion ## Example import pandas as pd # Convert various date formats dates =["2024-01-01","2024/01/02","01/03/2024","20240104"] # Convert dates dt = pd.to_datetime(date
← Pandas Loc IlocPandas Index β†’