Ai Agent Working Principle
Before diving deep into AI Agents, let's answer a fundamental question: **Why do we need Agents when we have LLMs?**
Imagine you have a very knowledgeable friendβwho has read countless books and can answer almost any question, but they haven't had access to new information since 2024, don't have a phone, can't go online, can't order food for you, and can't check today's weather. This is a Large Language Model (LLM)βknowledgeable, but **its knowledge is limited to the training data**, and it can't actively retrieve real-time information or perform specific actions.
And **AI Agent** is like giving this friend: a phone with internet access, a calculator, a calendar... enabling them not only to think but to **actually get things done**. AI Agent breaks through this limitation by combining LLM with tools and memory, achieving the **unity of thinking and action**.
* * *
## The Three Core Components of AI Agent
A typical AI Agent consists of three key parts working together. Let's continue using the analogy above to understand:
**1. The Brain - Large Language Model (LLM)**
* **Role**: The Agent's decision-making center and reasoning engine.
* **Function**: Understand the user's **goals** and **context**, analyze the current situation, then decide what to do nextβwhether to answer the question directly or call a tool? It is responsible for planning and breaking down complex tasks.
* **Analogy**: Like a company's **CEO or commander**, responsible for strategic thinking, task planning, and issuing instructions. It knows what to do but needs tools to actually accomplish it.
**2. Tools - Executable Actions**
* **Role**: The Agent's hands and feet, an extension of its capabilities.
* **Function**: Specific functions or APIs that allow the Agent to interact with the external world. For example: `search_web` (search the web), `execute_python_code` (run code), `read_file` (read files), `send_email` (send emails), etc.
* **Analogy**: Like the **various office software and equipment** on an employee's desk, such as Excel, browser, phone, printer. The CEO (brain) gives instructions, and the employee (tool) is responsible for execution.
* **Key Understanding**: Without tools, LLMs can only speak; with tools, Agents can actually do.
**3. Memory - Storage of Dialogue and Experience**
* **Role**: Records the work process and ensures task continuity.
* **Function**:
* **Short-term Memory**: Saves the history of the current conversation, allowing the Agent to remember what was said and done before. Just like chatting with a friend, you don't need to reintroduce yourself with every sentence.
* **Long-term Memory**: Can store more persistent information (such as user preferences, historical task results) for future task reference. It's like having a user profile dedicated to you.
* **Analogy**: Like an employee's **work notes and project files**, avoiding repetitive work and allowing each task to continue based on previous experience.
!(#)
? **One-sentence Summary**: The brain is responsible for thinking, tools for action, and memory for continuity. All three are indispensable.
* * *
## Types of AI Agents
Based on different design goals and complexity, AI Agents can be divided into multiple types:
| Agent Type | Characteristics | Typical Applications |
| --- | --- | --- |
| **Reactive Agent** | Makes immediate responses based on current perception without maintaining internal state. Like a customer service representative who only looks at the present and doesn't remember the past. | Simple Q&A, Game AI |
| **Goal-based Agent** | Plans actions around specific goals and can evaluate whether goals are achieved. Like an employee with KPIs, knowing what they do is to accomplish a specific goal. | Task assistants, Automated workflows |
| **Utility-based Agent** | Scores multiple possible actions through a "utility function" and chooses the optimal solution. Like a decision-maker who weighs pros and cons and pursues optimal solutions. | Resource optimization, Path planning |
| **Learning Agent** | Can learn from experience and continuously optimize decision strategies. The more it's used, the smarter it gets, like an employee who diligently reviews their work. | Recommendation systems, Personalized assistants |
| **Multi-Agent System** | Multiple Agents cooperate with division of labor, each with their own responsibilities. Like a team with clear division of work, each person responsible for one area. | Complex task decomposition, Team collaboration |
Beginners only need to master the first two, which are the most common basic forms.
* * *
## Workflow: The ReAct Loop
Now that we understand the components of an Agent, the next question is: **how does it operate?**
AI Agents typically follow a classic thinking paradigm called **ReAct = Reasoning + Acting**. As the name suggests, it's **think first, then act, then think again, then act again**... This cycle repeats until the task is completed.
You can think of it as how a conscientious new employee works when given a task: _"Let me think about how to do this β Go look up information β See what I found β Think about the next step β Continue acting..."_ rather than acting recklessly without thinking.
### Tracking This Cycle with a Complete Example
The user gives the Agent a command: **"Help me find an Italian restaurant in Beijing with a rating above 4.5, and tell me its address and signature dishes."**
Let's follow the Agent's "thinking" step by step:
**Step 1: Think/Reason**
* **Brain analyzes the goal**: The Agent's brain (LLM) receives this task and first organizes it in its mind: _"This is an information query task that requires two types of information: β Find restaurants that meet the criteria; β‘ Get the address and signature dishes. I don't know anything right now, so I should search first."_
* **Generate action command**: The brain decides: _"Call the `search_web` tool, with the keyword set to 'Beijing Italian restaurant rating above 4.5'."_
? **Note**: At this step, the Agent hasn't "done anything" yetβit's just "thinking." This is where the value of LLM liesβbeing able to understand intent, break down tasks, and make plans.
**Step 2: Act**
* **Call the tool**: Based on the decision from the previous step, the Agent actually calls the `search_web` tool and passes in the keyword.
* **Tool execution**: The search tool performs a search on the internet and returns a batch of raw search results (web page titles, summaries, links, etc.).
? **Note**: The tool itself doesn't "think"βit just executes faithfully. Thinking is the brain's job, execution is the tool's job, with clear division of labor.
**Step 3: Observe**
* **Receive feedback**: The Agent obtains the search results returned by the tool and stores them in **memory** for the next round of reasoning.
* Assume the results include several restaurants: Bottega (rating 4.7), Da Vittorio, Le Marche, etc.
**Step 4: Think Again**
* **Brain analyzes again**: After seeing the search results, the brain continues reasoning: _"Good, I found several candidate restaurants. But what the user wants is the address and signature dishes, which aren't in the search summaries. I need to further query Bottega's detailed information."_
* **Generate new command**: _"Call the `get_restaurant_details` tool to query the detailed information of 'Bottega'."_
**Step 5: Act Again β Observe Again (Cycle Continues)**
* This **Think β Act β Observe** cycle keeps repeating, with each round bringing the Agent closer to the goal, until the brain determines "I have enough information to answer the user."
**Step 6: Output Final Answer**
* When the brain confirms the task is complete, it integrates all the collected information (stored in memory) and generates a structured, human-friendly response:
* _"Found a restaurant that meets the requirements: **Bottega** (rating 4.7). Address: XX Sanlitun Road, Chaoyang District, Beijing. Signature dishes: Truffle pizza, Handmade tiramisu."_
This **Think β Act β Observe β Think Again...** cycle is the core mechanism by which AI Agents autonomously complete complex tasks. It's not about giving an answer all at once, but **advancing step by step, thinking while doing** like a human.
* * *
## Implementing AI Agent in Python
Now that the theory is covered, let's look at the code. An AI Agent system typically works through several core modules working together. Understanding this architecture helps us understand how it thinks and acts.
!(#)
We'll break down each module and use simple Python code to demonstrate its responsibilities. Beginners don't need to delve into every line of codeβthe key is to understand **what each module does**.
### 1. Perception Module β Agent's "Eyes and Ears"
The perception module is responsible for obtaining information from the **environment**, i.e., "what input was received." The environment can be:
* **Digital world**: A piece of text, a web page, records in a database, data returned by an API.
* **Physical world** (through hardware): Camera images, microphone audio, sensor data.
For the most common text-based Agent, perception is "receiving a sentence of user input."
## Example
# Perception Module: Responsible for "receiving" external information
def perceive_from_environment():
"""
Perceive information from the environment.
In this example, the environment is the text input by the user in the command line.
"""
user_input =input("Please enter your command or question:")
print(f" Received information: '{user_input}'")
return user_input # Pass the perceived content to the next module (decision module)
# Get perceived information
current_observation = perceive_from_environment()
### 2. Decision Module (Brain) β Agent's "Commander"
This is the core of the Agent, usually driven by an **AI model (such as LLM)**. It takes the information from the perception module and is responsible for three things:
* **Understand**: What does this information mean? What does the user want?
* **Reason**: Given the current situation, what should I do?
* **Plan**: What tool should be called next (or in the following steps), and what operations should be performed?
The code below uses the simplest "keyword matching" to simulate the decision-making process. In a real Agent, this step is completed by LLM, which can handle much more complex semantic understanding.
## Example
# Decision Module: Responsible for "thinking" and "planning"
def make_decision(observation):
"""
Make simple decisions based on perceived information.
Here keyword matching is used to simulate; actual reasoning is done by LLM in real scenarios.
"""
print(f" Analyzing information: '{observation}'")
# Determine which tool to call based on keywords in user input
if"weather"in observation:
decision ="Call weather query tool"
elif"calculate"in observation:
decision ="Call calculator tool"
elif"end"in observation:
decision ="Execute termination action"
else:
decision ="Generate general conversation response"# If no tool is matched, respond directly
print(f" Decision result: {decision}")
return decision # Pass the decision result to the action module
# Make decision based on perception
current_decision = make_decision(current_observation)
### 3. Action Module β Agent's "Executor"
The decision module outputs "ideas" (what to do), while the action module is responsible for turning ideas into "reality" (actually doing it). It performs specific operations to affect the external environment. Common actions include:
* **Digital actions**: Output answers on screen, call a function, make API requests, write to files.
* **Physical actions** (through hardware control): Control robotic arm movement, make speakers play sound.
## Example
# Action Module: Responsible for "executing" commands from the decision module
def execute_action(decision):
"""
Execute the command given by the decision module and return the execution result.
The execution result will be stored in memory for the next round of reasoning.
"""
print(f" Executing: {decision}")
if decision =="Call weather query tool":
# This can be replaced with real weather API call
result ="Beijing: Sunny, 25β."
elif decision =="Call calculator tool":
result ="1+1=2"
elif decision =="Execute termination action":
result ="Task ended."
print(result)
exit()# End the entire program
else:
result = f"I understand what you mean: '{decision}'"
print(f" Action result: {result}")
return result # Return result, enter "observation" phase
# Execute decision, get result
action_result = execute_action(current_decision)
### 4. Memory Module β Agent's "Work Notes"
Without memory, the Agent would be in an "amnesiac" state for every responseβforgetting what you said before, forgetting what it did itself. The memory module solves this problem. It has two types:
* **Short-term Memory / Conversation History**: Records what was said in the current conversation, allowing the Agent to maintain coherence across multiple rounds of dialogue. Just like chatting with a friend, you don't need to re-explain the context with every sentence.
* **Long-term Memory / Knowledge Base**: Specialized knowledge stored through technologies like vector databases (e.g., company internal documents, user preferences), used to enhance the model's capabilities. Beginners can temporarily ignore this part and first master short-term memory.
In the complete example below, we use a Python list to simulate short-term memory.
### 5. Tool Module β Agent's "Swiss Army Knife"
The model's own capabilities are limited (e.g., it doesn't know real-time weather, can't perform complex calculations, can't manipulate files). The tool module provides the Agent with a set of "add-on skills," greatly expanding its capability boundary. A tool can be a Python function, a third-party API, or a complete external software.
Below is the simplest tool example:
## Example
# Tool Module Example: A simple calculator tool
def calculator_tool(expression):
"""
Receive a mathematical expression string and return the calculation result.
This is a "tool"βwaiting to be called by the Agent's brain as needed.
"""
try:
# Note: eval() has security risks in production environments; this is for demonstration only.
result =eval(expression)
return f"Calculation result: {expression} = {result}"
except Exception as e:
return f"Calculation error: {e}"
# Simulate: After the brain makes a decision, call this tool
tool_result = calculator_tool("3 + 5 * 2")
print(tool_result)# Output: Calculation result: 3 + 5 * 2 = 13
# Note: Python performs multiplication and division before addition and subtraction, so it's 3 + 10 = 13, not 8 * 2 = 16
* * *
## Practice Exercise: Building a Simple Command-Line AI Agent
We introduced each module separately above. Now, let's **combine** them to create a complete mini-Agent capable of continuous dialogue and tool calling.
Before running, please go through this structure in your mind:
* User input β **Perception** β **Decision** (keyword matching) β **Action** (call tool or dialogue) β Output result β **Memory** (store in history) β Wait for next round of input
## Example
# Complete Mini AI Agent Example
import random
# ==================== Tool Definitions ====================
def get_weather(city):
"""Tool 1: Simulate weather query (can be replaced with real weather API)"""
weather_options =["Sunny","Cloudy","Light rain","Strong wind"]
temperature =random.randint(15,35)
return f"Weather in {city} is {random.choice(weather_options)}, temperature {temperature}β."
def simple_calculator(a, b,operator):
"""Tool 2: Simple calculator"""
if operator=='+':
return f"{a} + {b} = {a + b}"
elif operator=='-':
return f"{a} - {b} = {a - b}"
else:
return"Operation not supported."
# ==================== Memory (Short-term)====================
# Use a list to simulate conversation history, each entry is a string
conversation_history =[]
# ==================== Agent Main Loop ====================
def run_simple_agent():
print(" Enter 'exit' to end the conversation.")
print("Tip: You can ask about weather (e.g., 'Beijing weather today') or do calculations (e.g., 'calculate 1+1')n")
while True:
# ---- Perception Stage: Get user input ----
user_input =input("You:")
conversation_history.append(f"User: {user_input}")
# Special command: exit
if user_input.lower()in["exit","quit"]:
print("Agent: Goodbye!")
break
# ---- Decision + Action Stage: Choose tool or dialogue based on input ----
response =""
if"weather"in user_input:
# Try to identify city name from input, default to Beijing
city ="Beijing"
for c in["Beijing","Shanghai","Guangzhou","Shenzhen"]:
if c in user_input:
city = c
break
YouTip