AI Agent Core Components |

\n\n

If we compare an AI Agent to a smart restaurant, how does it transform your request into a dish served at your table? This relies on its four core components: Brain, Tools, Memory, and Planning.

\n\n

Brain: Responsible for understanding orders, determining goals, and deciding the sequence — it serves as the restaurant’s command center.
Tools: Responsible for actual execution, including chopping, cooking, procurement, and other actions — turning decisions into actionable operations.
Memory: Responsible for recording customer preferences, current steps, and processed content — ensuring the workflow remains organized and non-repetitive.
Planning: Responsible for breaking down the entire dish into steps, defining their order, and ensuring tasks proceed smoothly to completion.

\n\n

Overall Architecture

\n\n

The diagram below illustrates the five hierarchical components of an AI Agent and their collaborative relationships. The Perception layer receives external inputs; the Brain handles understanding and decision-making; the Planning layer decomposes tasks; the Tools layer executes actions; and the Memory layer runs throughout, providing state support for all stages.

\n\n

0. Perception Layer (Perception) — The Restaurant’s Front Desk

\n\n

\n
Role: Responsible for receiving customers and understanding all inputs from the external world.
\n\n
Before an Agent acts, it must first “see” and “hear” external information. Modern Agents are no longer limited to pure text inputs but possess multimodal perception capabilities:
\n\n
\n
Text input: User natural language instructions, document content, code.
\n
Images / Videos: Screenshots, design drafts, charts — the Agent can directly “see” and understand images.
\n
Structured data: Tables, JSON, database query results.
\n
Environment state: In computer operation–oriented Agents, current screen state, web DOM structure, etc.
\n
Tool return results: Outputs from previous tool calls, which become new perception inputs entering the next loop.
\n
\n\n
After integration, inputs from the Perception layer form the Agent’s “current context,” which is fed into the Brain for understanding and decision-making.
\n

\n\n

1. Brain (Brain) — Also Known as the Large Model

\n\n

\n
Role: The restaurant’s head chef and manager.
\n\n
This is the most critical part of the Agent (e.g., GPT-4, Claude, DeepSeek, Qwen).
\n\n
\n
It understands what you want to eat (intent understanding).
\n
It coordinates others to get work done (decision-making).
\n
Without it, the entire restaurant would come to a halt.
\n
\n\n
Three Core Functions of the Brain
\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Capability Description Restaurant Analogy
Intent Understanding Parsing user input to clarify the goal Understanding what the customer ordered
Reasoning & Decision-Making Integrating context and memory to decide next steps The chef decides which dish to prepare first
Tool Call Determination Deciding whether external tools are needed, which tool to use, and what parameters to pass Choosing which pot to use, who to send for ingredients
\n\n
Key Concept: The “intelligence ceiling” of the Brain determines the upper bound of the entire Agent. Applying the same set of tools and planning frameworks, integrating a more powerful foundational model often leads to a qualitative leap in task completion quality.
\n

Capability	Description	Restaurant Analogy
Intent Understanding	Parsing user input to clarify the goal	Understanding what the customer ordered
Reasoning & Decision-Making	Integrating context and memory to decide next steps	The chef decides which dish to prepare first
Tool Call Determination	Deciding whether external tools are needed, which tool to use, and what parameters to pass	Choosing which pot to use, who to send for ingredients

\n\n

2. Tools (Tools) — Kitchen Equipment

\n\n

\n
Role: Kitchen utensils and assistants.
\n\n
Having just the head chef (Brain) is insufficient — you also need pots and pans to cook. For an AI Agent, tools are execution units that turn decisions into real-world actions.
\n\n
Tools can be categorized into four types by function:
\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Category Common Tools Function
Information Retrieval Web search, web scraping, document reading, database queries Acquiring real-time or domain-specific information beyond the Agent’s internal knowledge
Computation & Execution Code interpreter, math engine, sandbox environment Handling tasks requiring precise computation or program logic
Content Generation Image generation, speech synthesis, document export Producing non-textual content
System Interaction API interfaces, email, calendar, file operations, message sending Interacting with external systems, services, and the real world
\n\n
Common tool examples:
\n\n
\n
Web search (Information retrieval — like buying fresh ingredients at the market)
\n
Code interpreter (Computation — like a precise oven handling complex calculations)
\n
Drawing tool (Content generation — like a plating artist focusing on aesthetics)
\n
API interface (System interaction — like a delivery rider connecting to the outside world)
\n
\n\n
Function Calling: Modern large models use the “function calling” mechanism to use tools. Developers predefine tool names and parameter descriptions; during inference, the model outputs structured JSON indicating “which tool to call and what parameters to pass,” and external programs handle actual execution and return results to the model.
\n

Category	Common Tools	Function
Information Retrieval	Web search, web scraping, document reading, database queries	Acquiring real-time or domain-specific information beyond the Agent’s internal knowledge
Computation & Execution	Code interpreter, math engine, sandbox environment	Handling tasks requiring precise computation or program logic
Content Generation	Image generation, speech synthesis, document export	Producing non-textual content
System Interaction	API interfaces, email, calendar, file operations, message sending	Interacting with external systems, services, and the real world

\n\n

3. Memory (Memory) — Customer Record Book

\n\n

\n
Role: The waiter’s memory.
\n\n
You probably dislike going to a restaurant and having to repeat your preferences every time: “I don’t eat coriander!”
\n\n
Agent memory is divided into the following types:
\n\n
\n
Short-term Memory (In-Context Memory): The current conversation’s context window. Remembers what you just said (e.g., if you just ordered fish and then say “light spice,” it knows this refers to the fish). Limited by model context length, typically ranging from 8K to 200K tokens.
\n
Long-term Memory (External Memory): Remembers your long-term preferences (e.g., you’re vegetarian, or your home address). Usually implemented via vector databases (e.g., Pinecone, Milvus, Chroma) for persistent storage.
\n
Episodic Memory: Records of historical task execution processes, including “how I handled this situation last time,” helping the Agent learn from past experience.
\n
Semantic Memory: Abstract knowledge and facts, usually internalized during pretraining, or dynamically supplemented via RAG (Retrieval-Augmented Generation).
\n
\n\n
RAG: Giving the Agent an “External Knowledge Base”
\n\n
Retrieval-Augmented Generation (RAG) is currently the most mainstream approach for implementing long-term memory. Its core workflow is as follows:
\n

\n\n

4. Planning (Planning) — Cooking Process Checklist

\n\n

\n
Role: The Kitchen’s SOP for dish delivery.
\n\n
When you order Buddha Jumps Over the Wall, the head chef won’t just start cooking randomly — instead, he mentally generates a checklist:
\n\n
\n
First, prepare ingredients (abalone, sea cucumber…)
\n
Then, simmer the broth
\n
Finally, slow-cook
\n
\n\n
Agents work the same way. When given a complex task (e.g., “write a competitor analysis report”), it decomposes the task itself:
\n\n
\n
Step 1: Gather data on competitors A, B, and C.
\n
Step 2: Compare their pricing and features.
\n
Step 3: Write an article based on the comparison.
\n
Step 4: Proofread for typos.
\n
\n\n
Mainstream Planning Strategies
\n\n
Planning strategies determine how the Agent “thinks before acting”; different strategies vary in reasoning depth and applicable scenarios:
\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Strategy Full Name Core Idea Applicable Scenarios
CoT Chain-of-Thought Write out reasoning steps before giving the final answer Mathematical reasoning, logical analysis
ReAct Reasoning + Acting Alternating between “reasoning” and “acting”; re-reason after each action based on results Dynamic tasks requiring tool calls
ToT Tree-of-Thoughts Explore multiple reasoning branches simultaneously and select the optimal path Complex decision-making, creative tasks
Reflection Self-Reflection After task completion, critically review and correct its own output Code generation, long-form writing
\n\n
ReAct Example: Agent receives task “Check tomorrow’s Beijing weather and send a reminder” → Think: Need to check weather first → Act: Call weather API → Observe: Returns “rain tomorrow” → Think: Condition met, need to draft reminder → Act: Call message-sending tool → Task complete.
\n

Strategy	Full Name	Core Idea	Applicable Scenarios
CoT	Chain-of-Thought	Write out reasoning steps before giving the final answer	Mathematical reasoning, logical analysis
ReAct	Reasoning + Acting	Alternating between “reasoning” and “acting”; re-reason after each action based on results	Dynamic tasks requiring tool calls
ToT	Tree-of-Thoughts	Explore multiple reasoning branches simultaneously and select the optimal path	Complex decision-making, creative tasks
Reflection	Self-Reflection	After task completion, critically review and correct its own output	Code generation, long-form writing

\n\n

5. Agent Loop (Agent Loop)

\n\n

The above components are not isolated — they form a continuously iterative Perception → Thinking → Acting → Observing loop, known as the “Agent Loop.” The Agent repeats this loop until the task completes or a termination condition is met.

\n\n

This loop enables the Agent to self-correct upon failure: If a tool call returns an error or unexpected result, the “Observing” stage feeds this information back to the Brain, which then adjusts its strategy in the next “Thinking” round.

\n\n

Summary

\n\n

When you tell the Agent: “Check tomorrow’s Beijing weather, and if it’s raining, write a reminder and send it to Xiao Wang.”

\n\n

Internally, the Agent operates as follows:

\n\n

Perception Layer: Receives natural language instruction, identifies key entities: “Beijing,” “tomorrow,” “Xiao Wang.”
Brain: Understands the instruction, decomposes into two conditional tasks: check weather, and if raining, send a reminder.
Planning: First check weather → Determine if raining → (if yes) draft reminder → send.
Tools: Call “weather query tool,” obtain result — rain tomorrow.
Memory: Query “Xiao Wang” contact info from address book (memory store).
Tools: Call “send message tool,” dispatch the reminder.
Observation: Confirm message sent successfully; task complete; loop terminates.

\n\n

Execution Process Diagram:

\n\n

Overview of the Five Components

\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n

Component	Restaurant Analogy	Core Responsibility	Key Technologies
Perception Layer	Front Desk Reception	Receive multimodal inputs, build context	multimodal models, OCR, ASR
Brain	Head Chef & Manager	Understand intent, reason, decide, issue tool calls	LLM, Function Calling
Planning	Dish Delivery SOP	Task decomposition, step ordering, self-reflection	ReAct, CoT, ToT, Reflection
Tools	Kitchen Utensils & Assistants	Execute specific operations, connect to external world	Search / Code / API / File System
Memory	Customer Record Book	Manage context, store long-term knowledge	Vector Database, RAG, Context Window

YouTip

Ai Agent Core

AI Agent Core Components |

Overall Architecture

0. Perception Layer (Perception) — The Restaurant’s Front Desk

1. Brain (Brain) — Also Known as the Large Model

Three Core Functions of the Brain

2. Tools (Tools) — Kitchen Equipment

3. Memory (Memory) — Customer Record Book

RAG: Giving the Agent an “External Knowledge Base”

4. Planning (Planning) — Cooking Process Checklist

Mainstream Planning Strategies

5. Agent Loop (Agent Loop)

Summary

Overview of the Five Components

📂 Categories