YouTip LogoYouTip

Ml Algorithms

Machine Learning Algorithms

\n\n

Machine learning algorithms can be categorized into supervised learning, unsupervised learning, reinforcement learning, and other categories.

\n\n

Supervised Learning Algorithms:

\n\n
    \n
  • Linear Regression: Used for regression tasks to predict continuous numerical values.
  • \n
  • Logistic Regression: Used for binary classification tasks to predict categories.
  • \n
  • Support Vector Machine (SVM): Used for classification tasks, constructing a hyperplane for classification.
  • \n
  • Decision Tree: A classification or regression method based on a tree-like structure for decision-making.
  • \n
\n\n

Unsupervised Learning Algorithms:

\n\n
    \n
  • K-means Clustering: Groups data by clustering centers.
  • \n
  • Principal Component Analysis (PCA): Used for dimensionality reduction, extracting the principal components of data.
  • \n
\n\n

Each algorithm has its applicable scenarios. In practical applications, the most suitable machine learning algorithm can be chosen based on the characteristics of the data (such as whether it has labels, the dimensionality of the data, etc.).

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
Categorization DimensionCategoryCore DefinitionTypical AlgorithmsCore Pros and ConsApplicable Scenarios
Learning MethodSupervised LearningLearns the mapping from input to output using labeled dataLogistic Regression, SVM, Decision Tree, CNN, LSTMPros: High prediction accuracy; Cons: Relies on high-quality labeled dataClassification, Regression, Image Recognition, Text Translation
Unsupervised LearningMines intrinsic patterns in data using unlabeled dataK-Means, PCA, DBSCAN, AutoencodersPros: No labeling required; Cons: Weak interpretability of resultsData Clustering, Dimensionality Reduction, Anomaly Detection, User Segmentation
Semi-supervised LearningTrains using a small amount of labeled data and a large amount of unlabeled dataSemi-supervised SVM, Label Propagation AlgorithmPros: Reduces labeling cost; Cons: Complex model designMedical Image Analysis, NLP for Low-Resource Languages
Reinforcement LearningModel optimizes its strategy via reward signals through interaction with the environmentQ-Learning, DQN, PPOPros: Adapts to dynamic decision-making; Cons: Long training cyclesGame AI, Robotic Control, Recommendation Strategy Optimization
Task ObjectiveClassification AlgorithmsPredicts discrete class labelsLogistic Regression, Random Forest, CNNPros: Suitable for classification scenarios; Cons: Sensitive to class imbalanceSpam Detection, Image Classification, Disease Diagnosis
Regression AlgorithmsPredicts continuous numerical outputsLinear Regression, Ridge Regression, XGBoostPros: Outputs continuous values; Cons: Sensitive to outliersHouse Price Prediction, Sales Forecast, Temperature Prediction
Clustering AlgorithmsGroups similar data together without labelsK-Means, Hierarchical Clustering, DBSCANPros: Automatic grouping; Cons: Clustering effect depends on distance metricMarket Segmentation, User Profiling, Anomaly Detection
Dimensionality Reduction AlgorithmsReduces feature dimensionality while retaining core informationPCA, t-SNE, LDAPros: Reduces computational cost; Cons: May lose some informationHigh-Dimensional Data Visualization, Feature Preprocessing
Model StructureLinear ModelsAssumes a linear relationship between input and outputLinear Regression, Logistic Regression, Ridge RegressionPros: Strong interpretability, fast training; Cons: Difficult to fit nonlinear relationshipsSimple Classification/Regression, Baseline Model Building
Tree ModelsBuilt on decision trees, handles nonlinear relationshipsDecision Tree, Random Forest, XGBoost, LightGBMPros: No need for feature normalization; Cons: Deep trees prone to overfittingIndustrial-grade Classification/Regression, Competition-grade Tasks
Neural Network ModelsMulti-layer neuron structure, automatically extracts complex featuresANN, CNN, RNN, TransformerPros: Fits complex relationships; Cons: Requires large amounts of data and computing powerImage Recognition, NLP, Speech Synthesis
Probabilistic ModelsBased on probability and statistical theory, calculates probability distributionsNaive Bayes, Hidden Markov ModelPros: Solid theoretical foundation; Cons: Relies on strong assumptionsText Classification, Speech Recognition, Sequence Labeling
\n\n
\n\n

Supervised Learning Algorithms

\n\n

Linear Regression

\n\n

Linear Regression is an algorithm for regression problems. It predicts a continuous output by learning the linear relationship between input features and the target value.

\n\n

Application Scenarios: Predicting house prices, stock prices, etc.

\n\n

The goal of linear regression is to find the best linear equation:

\n\n

Image 1

\n\n
    \n
  • y is the predicted value (target value).
  • \n
  • x 1, x 2, x n are the input features.
  • \n
  • w 1, w 2, w n are the weights to be learned (model parameters).
  • \n
  • b is the bias term.
  • \n
\n\n

Image 2

\n\n

Next, we use sklearn for simple house price prediction:

\n\n

Example

\n\n
from sklearn.linear_model import LinearRegression\nfrom sklearn.model_selection import train_test_split\nimport pandas as pd\n\n# Assume we have a simple house price dataset\ndata ={\n'Area': [50,60,80,100,120],\n'House Price': [150,180,240,300,350]\n}\ndf = pd.DataFrame(data)\n\n# Features and labels\nX = df[['Area']]\ny = df['House Price']\n\n# Data splitting\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n\n# Train the linear regression model\nmodel = LinearRegression()\nmodel.fit(X_train, y_train)\n\n# Predict\ny_pred = model.predict(X_test)\n\nprint(f"Predicted House Price: {y_pred}")\n
\n\n

Output result:

\n\n
Predicted House Price: [180.8411215]\n
\n\n

Logistic Regression

\n\n

Logistic Regression is an algorithm for classification problems. Despite its name containing "regression", it is used to handle binary classification problems.

\n\n

Logistic Regression predicts a class label by learning the relationship between input features and the class.

\n\n

Application Scenarios: Spam classification, disease diagnosis (whether sick or not).

\n\n

The output of logistic regression is a probability value, indicating the probability that a sample belongs to a certain class.

\n\n

Typically uses the Sigmoid function:

\n\n

Image 3

\n\n

Using logistic regression for a binary classification task:

\n\n

Example

\n\n
from sklearn.linear_model import LogisticRegression\nfrom sklearn.datasets import load_iris\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import accuracy_score\n\n# Load the Iris dataset\niris = load_iris()\nX = iris.data\ny = iris.target\n\n# Only take the first two classes for binary classification\nX = X[y !=2]\ny = y[y !=2]\n\n# Data splitting\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n\n# Train the logistic regression model\nmodel = LogisticRegression()\nmodel.fit(X_train, y_train)\n\n# Predict\ny_pred = model.predict(X_test)\n\n# Evaluate the model\nprint(f"Classification Accuracy: {accuracy_score(y_test, y_pred):.2f}")\n
\n\n

Output result:

\n\n
Classification Accuracy: 1.00\n
\n\n

Support Vector Machine (SVM)

\n\n

Support Vector Machine is a commonly used classification algorithm. It minimizes classification error by constructing a hyperplane to maximize the margin between classes.

\n\n

Application Scenarios: Text classification, face recognition, etc.

\n\n

Using SVM for the Iris classification task:

\n\n

Example

\n\nfrom sklearn.svm import SVC\nfrom sklearn.datasets import load_iris\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import accuracy_score\n\n# Load the Iris dataset\niris = load_iris()\nX = iris.data\nUltra-fasty = iris.target\n\n# Data splitting\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)\n\n# Train the SVM model\nmodel = SVC(kernel='linear')\nmodel.fit(X_train, y_train)\n\n# Predict\ny_pred = model.predict(X_test)\n\n# Evaluate the model\nprint(f"SVM Classification Accuracy: {accuracy_score(y_test, y_pred):.2f}")\n\n\n

Output result:

\n\n
SVM Classification Accuracy: 1.00\n
\n\n

Decision Tree

\n\n

Decision Tree is a classification and regression method based on a tree structure for decision-making. It uses a series of "judgment conditions" to determine which class a sample belongs to.

\n\n

Application Scenarios: Customer classification, credit scoring, etc.

\n\n

Using a decision tree for a classification task:

\n\n

Example

\n\n
from sklearn.tUltra-fastree import DecisionTreeClassifier\nfrom sklearn.datasets import load_iris\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import accuracy_score\n\n# Load the Iris dataset\niris = load_iris()\nX = iris.data\ny = iris.target\n\n# Data splitting\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)\n\n# Train the decision tree model\nmodel = DecisionTreeClassifier(random_state=42)\nmodel.fit(X_train, y_train)\n\n# Predict\ny_pred = model.predict(X_test)\n\n# Evaluate the model\nprint(f"Decision TreeClassification Accuracy: {accuracy_score(y_test, y_pred):.2f}")\n
\n\n

Output result:

\n\n
Decision TreeClassification Accuracy: 1.00\n
\n\n
\n\n

Unsupervised Learning Algorithms

\n\n

K-means Clustering

\n\n

K-means is a centroid-based clustering algorithm. It continuously adjusts the cluster centers so that the data points in each cluster are as close as possible to the cluster center.

\n\n

Application Scenarios: Customer segmentation, market analysis, image compression.

\n\n

Using K-means for customer segmentation:

\n\n

Example

\n\n
from sklearn.cluster import KMeans\nfrom sklearn.datasets import make_blobs\nimport matplotlib.pyplot as plt\n\n# Generate a simple 2D dataset\nX, _ = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)\n\n# Train the K-means model\nmodel = KMeans(n_clusters=4)\nmodel.fit(X)\n\n# Predict the clustering result\ny_kmeans = model.predict(X)\n\n# Visualize the clustering result\nplt.scatter(X[:,0], X[:,1], c=y_kmeans, s=50, cmap='viridis')\nplt.show()\n
\n\n

The output graph is as follows:

\n\n

Image 4

\n\n

Principal Component Analysis (PCA)

\n\n

PCA is a dimensionality reduction technique. It transforms data into a new coordinate system through linear transformation, so that most of the variance is concentrated on the first few principal components.

\n\n

Application Scenarios: Image dimensionality reduction, feature selection, data visualization.

\n\n

Using PCA for dimensionality reduction and visualizing high-dimensional data:

\n\n

Example

\n\n
from sklearn.decomposition import PCA\nfrom sklearn.datasets import load_iris\nimport matplotlib.pyplot as plt\n\n# Load the Iris dataset\niris = load_iris()\nX = iris.data\ny = iris.target\n\n# Reduce dimensionality to 2 dimensions\npca = PCA(n_components=2)\nX_pca = pca.fit_transform(X)\n\n# Visualize the result\nplt.scatter(X_pca[:,0], X_pca[:,1], c=y, cmap='viridis')\nplt.title('PCA of Iris Dataset')\nplt.show()\n
\n\n

The output graph is as follows:

\n\n

Image 5

\n\n
\n\n

Machine Learning Algorithms

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
Full Name (Chinese)Full Name (English)AbbreviationCore Applicable Scenarios
Traditional Machine Learning Algorithms
Decision TreeDecision TreeDTClassification, Regression, Feature Importance Analysis
Random ForestRandom ForestRFClassification, Regression, Anomaly Detection, Feature Selection
Logistic RegressionLogistic RegressionLRBinary Classification Tasks, Probability Prediction, Credit Scoring\n
Support Vector MachineSupport Vector MachineSVMClassification, High-Dimensional Small Sample Data, Text Classification
Naive BayesNaive BayesNBText Classification, Spam Detection, Sentiment Analysis
Gradient Boosting TreeGradient Boosting Decision TreeGBDTClassification, Regression, Ranking Tasks
Extreme Gradient BoostingExtreme Gradient BoostingXGBoostHigh-Precision Classification/Regression, Competition-grade Tasks, Click-Through Rate Prediction
LightGBMLight Gradient Boosting MachineLightGBMLarge-Scale Data Classification/Regression, Real-time Prediction, Recommendation Systems
KNearest Neighbors AlgorithmK-Nearest NeighborKNNSimple Classification/Regression, Recommendation Systems, Anomaly Detection
KK-Means ClusteringK-Means ClusteringK-MeansData Clustering, User Segmentation, Image Segmentation
Principal Component AnalysisPrincipal Component AnalysisPCAData Dimensionality Reduction, High-Dimensional Data Visualization, Feature Denoising
Deep Learning Algorithms
Artificial Neural NetworkArtificial Neural NetworkANNSimple Classification/Regression, Baseline Model Validation
Convolutional Neural NetworkConvolutional Neural NetworkCNNUltra-fastImage Recognition, Object Detection, Video Analysis, Medical Image Diagnosis
Recurrent Neural Network\nRecurrent Neural NetworkRNNSequence Data Processing, Text Generation, Speech Recognition
Long Short-Term Memory NetworkLong Short-Term MemoryLSTMLong Sequence Text Translation, Speech Synthesis, Time Series Prediction
Gated Recurrent UnitGated Recurrent UnitGRUSequence Classification, Sentiment Analysis, Dialogue Systems
Generative Adversarial NetworkGenerative Adversarial NetworkGANImage Generation, Style Transfer, Data Augmentation, Super-Resolution Reconstruction
TransformerTransformerTransformerNatural Language Translation, Text Summarization, Multimodal Tasks, Large Model Foundation Architecture
AutoencoderAutoencoderAEData Compression, Anomaly Detection, Feature Extraction
Variational AutoencoderVariational AutoencoderVAEGenerative Tasks, Data Denoising, Image Generation
Graph Neural NetworkGraph Neural NetworkGNNSocial Network Analysis, Molecular Structure Prediction, Knowledge Graph Reasoning
← Vscode TutorialSklearn Model Save Load β†’