How Ai Works
You might ask: I just want to use AI, why do I need to understand how it works?
The reason is simple: If you know where AI's capabilities come from, you can use it better.
* You'll know when to trust it and when to question it.
* You'll know what it's good at and what it's not good at.
* You'll know where hallucinations come from and how to reduce their impact.
* * *
## Intuitive Understanding of Neural Networks
The core of modern AI is neural networks, a name derived from the neuron structure of the human brain.
### Analogy of Brain Neurons
The human brain has about 86 billion neurons, which are connected to each other and transmit signals. Each neuron receives inputs from other neurons, processes them, and outputs to other neurons. The learning process is the process of adjusting the strength of these connections.
Artificial neural networks borrow this idea but make significant simplifications.
### Three-Layer Basic Structure
A typical neural network is divided into three layers: input layer, hidden layer, and output layer.

Let's use recognizing whether an image is a cat or a dog as an example:
!(https://example.com/wp-content/uploads/2024/12/68747470733a2f2f6d69726f25674a6b50587563386f672e676966.gif)
| Layer | Function | In This Example |
| --- | --- | --- |
| Input Layer | Receives raw data | Each pixel and color information of the image |
| Hidden Layer | Extracts features layer by layer | Edges β Textures β Ears, eyes and other organs |
| Output Layer | Gives the final result | "85% probability it's a cat", "15% probability it's a dog" |
Each layer has many neurons, each receiving output from the previous layer, doing a simple calculation, and passing it to the next layer.
* The first layer might recognize there's a vertical line here, there's a circle here.
* The second layer combines these: vertical line plus circle might be an ear.
* The third layer continues combining: two pointed ears, whiskers, cat eyes - this is very likely a cat.
The amazing thing is: these features are not designed by humans, they are learned by the model from data.
### The Simplest Neuron
Let's use a few lines of Python code to show what a neuron does:
## Example
# ============================================
# Basic computational logic of an artificial neuron
# No complex math, just weighted sum + activation
# ============================================
def simple_neuron(inputs: list, weights: list, bias: float) ->float:
"""
A simplest neuron
inputs: input values (from previous layer neurons)
weights: weights (importance of each input, learned during training)
bias: bias (threshold, learned during training)
"""
# Step 1: Weighted sum
# Multiply each input by its corresponding weight, then sum
weighted_sum =0.0
for input_value, weight in zip(inputs, weights):
weighted_sum += input_value * weight
# Add bias
weighted_sum += bias
# Step 2: Activation function (makes output non-linear)
# Using the simplest ReLU here: negative becomes 0, positive stays the same
output =max(0.0, weighted_sum)
return output
# Simulation: A neuron that judges "is this a cat's ear"
# Input: [degree of pointiness, height position,Has fur or not]
inputs =[0.8,0.9,0.7]# These three features are quite prominent
# Weights: learned after training (in tutorial examples, we assume these values are learned)
weights =[0.5,0.4,0.3]
# Bias: threshold
bias = -0.6
result = simple_neuron(inputs, weights, bias)
print(f"Neuron output:{result:.3f}")
print(f"Judgment:{'Possibly cat ears' if result > 0 else 'Not very similar'}")
# OutputOutputοΌNeuron output:0.660
# OutputOutputοΌJudgment:Possibly cat ears
What this neuron does is very simple: weighted sum of inputs, pass through an activation function, output the result.
But when thousands or millions of such neurons are connected together, with each layer learning different features, the whole system produces astonishing intelligence.
> Remember this intuition: Neural network = many simple computational units connected together, learning by adjusting connection weights.
* * *
## Training vs Inference: Two Different Phases
The AI lifecycle is divided into two completely different phases: training and inference.
Understanding the difference between these two phases helps you understand many things - like why training is so expensive while inference is relatively cheap.
### Training: Let AI Learn
Training is the process of showing the model large amounts of data, letting it continuously adjust parameters, and making predictions increasingly accurate.
For example, training a model to recognize cats and dogs:
* 1. Prepare millions of labeled images (this is a cat, that is a dog)
* 2. Let the model guess "what is this" - at first it will guess wrong a lot
* 3. Tell it "you guessed wrong, it should be a cat", let it adjust the weights in the network
* 4. Repeat millions of times until the model's predictions become increasingly accurate
The training phase requires enormous computing power and data. A large model might need thousands of GPUs to train for months, costing millions of dollars.
### Inference: Let AI Use
Inference is the process of using a trained model, giving it new input, and getting output.
You send a message to ChatGPT and it replies to you - that's inference.
You take a photo with your phone to identify a plant - that's also inference.
The characteristics of inference are:
* No need to adjust parameters, just use the trained weights for computation.
* Usually only needs one GPU or even a phone chip to do.
* Cost is much lower than training.
### Comparison of the Two
| Dimension | Training | Inference |
| --- | --- | --- |
| Goal | Learn knowledge, adjust weights | Apply learned knowledge, give answers |
| Data Volume | Needs massive data | Single input is enough |
| Computing Needs | Very high (thousands of GPUs) | Relatively low (single GPU or phone) |
| Cost | Very high (million-dollar level) | Relatively low (a few cents per time) |
| Frequency | A few or dozens of times | Millions of times per second |
| Who Does It | Companies like OpenAI, Anthropic | Regular users or applications |
To make an analogy: training is like "studying hard for ten years", inference is like "taking an exam".
Studying requires a lot of time and effort, but once you've learned, answering questions becomes fast.
> When you use ChatGPT, you are doing "inference" - the model doesn't "learn" or "get smarter" from your conversation. Its knowledge is frozen at the moment training completed.
* * *
## Introduction to Transformer Architecture
In 2017, Google published a paper "Attention Is All You Need", proposing the Transformer architecture.
This paper changed the entire AI field. Today's large language models are almost all based on Transformers.
### Why Transformer is So Important
Before Transformer, RNN or LSTM were used to process sequential data (like sentences).
Their problem: they can only process one word at a time, making it difficult to capture long-distance connections.
For example, this sentence: "I left my wallet at a coffee shop in Beijing, and went back the next day to find it, ____ is still there." - The blank should be filled with "it". You know "it" refers to "wallet" because you remembered the earlier content.
When old models processed "is still there", they may have already forgotten "wallet".
The breakthrough of Transformer is: it can see the entire sentence at once, and through the attention mechanism, knows which words to focus on.
### Encoder and Decoder
A complete Transformer is divided into two parts:
| Component | Function | Typical Application Scenarios |
| --- | --- | --- |
| Encoder | Understands input, converts text to vector representations | Text classification, sentiment analysis, semantic search |
| Decoder | Based on understanding, generates output text | Writing, translation, dialogue generation |
Some models use only Encoder (like BERT), some use only Decoder (like GPT), and some use both (like T5).
GPT series, Claude, Llama are all "Decoder-only" architectures - their strength is generating fluent text.
* * *
## What is Attention Mechanism
Attention is the core of Transformer, and it's also the key to why it can handle long texts.
### Analogy: Human Attention When Reading
When you read a sentence, you don't spend equal effort on every word.
For example: "It has a long tail and triangular ears" -
YouTip