YouTip LogoYouTip

Sklearn Iris Dataset

The Iris Dataset is one of the most classic entry-level datasets in machine learning. The Iris Dataset contains three types of iris flowers (Setosa, Versicolor, Virginica), with 4 features for each flower: sepal length, sepal width, petal length, and petal width. Next, our task is to predict the type of iris flower based on these features. This chapter's case will cover steps such as data loading, visualization, feature selection, data preprocessing, building classification models, model evaluation and optimization, etc. * * * ## 1. Data Loading and Visualization ### Data Loading First, load the Iris dataset. scikit-learn provides a direct interface to load the Iris dataset. ## Example from sklearn.datasets import load_iris import pandas as pd # Load the Iris dataset data = load_iris() # Convert to DataFrame for easy viewing df = pd.DataFrame(data.data, columns=data.feature_names) df['target']= data.target df['species']= df['target'].apply(lambda x: data.target_names) # View the first few rows of data print(df.head()) Output: sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target species 0 5.1 3.5 1.4 0.2 0 setosa 1 4.9 3.0 1.4 0.2 0 setosa 2 4.7 3.2 1.3 0.2 0 setosa 3 4.6 3.1 1.5 0.2 0 setosa 4 5.0 3.6 1.4 0.2 0 setosa At this point, the data has been successfully loaded, and we can see the features of each data point and the corresponding flower species. ### Data Visualization To better understand the data, we can view the relationships between different features through visualization. We can use the matplotlib and seaborn libraries for visualization. ## Example from sklearn.datasets import load_iris import pandas as pd import seaborn as sns import matplotlib.pyplot as plt # Load the Iris dataset data = load_iris() # Convert to DataFrame for easy viewing df = pd.DataFrame(data.data, columns=data.feature_names) df['target']= data.target df['species']= df['target'].apply(lambda x: data.target_names) # Plot the relationships between features sns.pairplot(df, hue="species") plt.show() The pairplot will draw a scatter plot matrix between features, using different colors to identify different iris species. This helps us understand the distribution of each feature and the relationships between them. The display is as follows: !(#) ### Heatmap Visualization of Feature Correlations Through the heatmap, we can view the correlations between features. Stronger correlations can help us make better choices when building models. ## Example from sklearn.datasets import load_iris import pandas as pd import seaborn as sns import matplotlib
← Sklearn House PricesSklearn Custom Models And Func β†’