YouTip LogoYouTip

Ai Agent Core

AI Agent Core Components |

\n\n

If we compare an AI Agent to a smart restaurant, how does it transform your request into a dish served at your table? This relies on its four core components: Brain, Tools, Memory, and Planning.

\n\n
    \n
  • Brain: Responsible for understanding orders, determining goals, and deciding the sequence — it serves as the restaurant’s command center.
  • \n
  • Tools: Responsible for actual execution, including chopping, cooking, procurement, and other actions — turning decisions into actionable operations.
  • \n
  • Memory: Responsible for recording customer preferences, current steps, and processed content — ensuring the workflow remains organized and non-repetitive.
  • \n
  • Planning: Responsible for breaking down the entire dish into steps, defining their order, and ensuring tasks proceed smoothly to completion.
  • \n
\n\n

Image 1

\n\n
\n\n

Overall Architecture

\n\n

The diagram below illustrates the five hierarchical components of an AI Agent and their collaborative relationships. The Perception layer receives external inputs; the Brain handles understanding and decision-making; the Planning layer decomposes tasks; the Tools layer executes actions; and the Memory layer runs throughout, providing state support for all stages.

\n\n

Image 2

\n\n
\n\n

0. Perception Layer (Perception) — The Restaurant’s Front Desk

\n\n
\n

Role: Responsible for receiving customers and understanding all inputs from the external world.

\n\n

Before an Agent acts, it must first “see” and “hear” external information. Modern Agents are no longer limited to pure text inputs but possess multimodal perception capabilities:

\n\n
    \n
  • Text input: User natural language instructions, document content, code.
  • \n
  • Images / Videos: Screenshots, design drafts, charts — the Agent can directly “see” and understand images.
  • \n
  • Structured data: Tables, JSON, database query results.
  • \n
  • Environment state: In computer operation–oriented Agents, current screen state, web DOM structure, etc.
  • \n
  • Tool return results: Outputs from previous tool calls, which become new perception inputs entering the next loop.
  • \n
\n\n

After integration, inputs from the Perception layer form the Agent’s “current context,” which is fed into the Brain for understanding and decision-making.

\n
\n\n
\n\n

1. Brain (Brain) — Also Known as the Large Model

\n\n
\n

Role: The restaurant’s head chef and manager.

\n\n

This is the most critical part of the Agent (e.g., GPT-4, Claude, DeepSeek, Qwen).

\n\n
    \n
  • It understands what you want to eat (intent understanding).
  • \n
  • It coordinates others to get work done (decision-making).
  • \n
  • Without it, the entire restaurant would come to a halt.
  • \n
\n\n

Three Core Functions of the Brain

\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
CapabilityDescriptionRestaurant Analogy
Intent UnderstandingParsing user input to clarify the goalUnderstanding what the customer ordered
Reasoning & Decision-MakingIntegrating context and memory to decide next stepsThe chef decides which dish to prepare first
Tool Call DeterminationDeciding whether external tools are needed, which tool to use, and what parameters to passChoosing which pot to use, who to send for ingredients
\n\n

Key Concept: The “intelligence ceiling” of the Brain determines the upper bound of the entire Agent. Applying the same set of tools and planning frameworks, integrating a more powerful foundational model often leads to a qualitative leap in task completion quality.

\n
\n\n
\n\n

2. Tools (Tools) — Kitchen Equipment

\n\n
\n

Role: Kitchen utensils and assistants.

\n\n

Having just the head chef (Brain) is insufficient — you also need pots and pans to cook. For an AI Agent, tools are execution units that turn decisions into real-world actions.

\n\n

Tools can be categorized into four types by function:

\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
CategoryCommon ToolsFunction
Information RetrievalWeb search, web scraping, document reading, database queriesAcquiring real-time or domain-specific information beyond the Agent’s internal knowledge
Computation & ExecutionCode interpreter, math engine, sandbox environmentHandling tasks requiring precise computation or program logic
Content GenerationImage generation, speech synthesis, document exportProducing non-textual content
System InteractionAPI interfaces, email, calendar, file operations, message sendingInteracting with external systems, services, and the real world
\n\n

Common tool examples:

\n\n
    \n
  • Web search (Information retrieval — like buying fresh ingredients at the market)
  • \n
  • Code interpreter (Computation — like a precise oven handling complex calculations)
  • \n
  • Drawing tool (Content generation — like a plating artist focusing on aesthetics)
  • \n
  • API interface (System interaction — like a delivery rider connecting to the outside world)
  • \n
\n\n

Function Calling: Modern large models use the “function calling” mechanism to use tools. Developers predefine tool names and parameter descriptions; during inference, the model outputs structured JSON indicating “which tool to call and what parameters to pass,” and external programs handle actual execution and return results to the model.

\n
\n\n
\n\n

3. Memory (Memory) — Customer Record Book

\n\n
\n

Role: The waiter’s memory.

\n\n

You probably dislike going to a restaurant and having to repeat your preferences every time: “I don’t eat coriander!”

\n\n

Agent memory is divided into the following types:

\n\n
    \n
  • Short-term Memory (In-Context Memory): The current conversation’s context window. Remembers what you just said (e.g., if you just ordered fish and then say “light spice,” it knows this refers to the fish). Limited by model context length, typically ranging from 8K to 200K tokens.
  • \n
  • Long-term Memory (External Memory): Remembers your long-term preferences (e.g., you’re vegetarian, or your home address). Usually implemented via vector databases (e.g., Pinecone, Milvus, Chroma) for persistent storage.
  • \n
  • Episodic Memory: Records of historical task execution processes, including “how I handled this situation last time,” helping the Agent learn from past experience.
  • \n
  • Semantic Memory: Abstract knowledge and facts, usually internalized during pretraining, or dynamically supplemented via RAG (Retrieval-Augmented Generation).
  • \n
\n\n

RAG: Giving the Agent an “External Knowledge Base”

\n\n

Retrieval-Augmented Generation (RAG) is currently the most mainstream approach for implementing long-term memory. Its core workflow is as follows:

\n
\n\n
\n\n

4. Planning (Planning) — Cooking Process Checklist

\n\n
\n

Role: The Kitchen’s SOP for dish delivery.

\n\n

When you order Buddha Jumps Over the Wall, the head chef won’t just start cooking randomly — instead, he mentally generates a checklist:

\n\n
    \n
  1. First, prepare ingredients (abalone, sea cucumber…)
  2. \n
  3. Then, simmer the broth
  4. \n
  5. Finally, slow-cook
  6. \n
\n\n

Agents work the same way. When given a complex task (e.g., “write a competitor analysis report”), it decomposes the task itself:

\n\n
    \n
  • Step 1: Gather data on competitors A, B, and C.
  • \n
  • Step 2: Compare their pricing and features.
  • \n
  • Step 3: Write an article based on the comparison.
  • \n
  • Step 4: Proofread for typos.
  • \n
\n\n

Mainstream Planning Strategies

\n\n

Planning strategies determine how the Agent “thinks before acting”; different strategies vary in reasoning depth and applicable scenarios:

\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
StrategyFull NameCore IdeaApplicable Scenarios
CoTChain-of-ThoughtWrite out reasoning steps before giving the final answerMathematical reasoning, logical analysis
ReActReasoning + ActingAlternating between “reasoning” and “acting”; re-reason after each action based on resultsDynamic tasks requiring tool calls
ToTTree-of-ThoughtsExplore multiple reasoning branches simultaneously and select the optimal pathComplex decision-making, creative tasks
ReflectionSelf-ReflectionAfter task completion, critically review and correct its own outputCode generation, long-form writing
\n\n

ReAct Example: Agent receives task “Check tomorrow’s Beijing weather and send a reminder” → Think: Need to check weather first → Act: Call weather API → Observe: Returns “rain tomorrow” → Think: Condition met, need to draft reminder → Act: Call message-sending tool → Task complete.

\n
\n\n
\n\n

5. Agent Loop (Agent Loop)

\n\n

The above components are not isolated — they form a continuously iterative Perception → Thinking → Acting → Observing loop, known as the “Agent Loop.” The Agent repeats this loop until the task completes or a termination condition is met.

\n\n

This loop enables the Agent to self-correct upon failure: If a tool call returns an error or unexpected result, the “Observing” stage feeds this information back to the Brain, which then adjusts its strategy in the next “Thinking” round.

\n\n
\n\n

Summary

\n\n

When you tell the Agent: “Check tomorrow’s Beijing weather, and if it’s raining, write a reminder and send it to Xiao Wang.”

\n\n

Internally, the Agent operates as follows:

\n\n
    \n
  1. Perception Layer: Receives natural language instruction, identifies key entities: “Beijing,” “tomorrow,” “Xiao Wang.”
  2. \n
  3. Brain: Understands the instruction, decomposes into two conditional tasks: check weather, and if raining, send a reminder.
  4. \n
  5. Planning: First check weather → Determine if raining → (if yes) draft reminder → send.
  6. \n
  7. Tools: Call “weather query tool,” obtain result — rain tomorrow.
  8. \n
  9. Memory: Query “Xiao Wang” contact info from address book (memory store).
  10. \n
  11. Tools: Call “send message tool,” dispatch the reminder.
  12. \n
  13. Observation: Confirm message sent successfully; task complete; loop terminates.
  14. \n
\n\n

Execution Process Diagram:

\n\n

Image 3

\n\n

Overview of the Five Components

\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
ComponentRestaurant AnalogyCore ResponsibilityKey Technologies
Perception LayerFront Desk ReceptionReceive multimodal inputs, build context multimodal models, OCR, ASR
BrainHead Chef & ManagerUnderstand intent, reason, decide, issue tool callsLLM, Function Calling
PlanningDish Delivery SOPTask decomposition, step ordering, self-reflectionReAct, CoT, ToT, Reflection
ToolsKitchen Utensils & AssistantsExecute specific operations, connect to external worldSearch / Code / API / File System
MemoryCustomer Record BookManage context, store long-term knowledgeVector Database, RAG, Context Window
← Python Ai AgentAi Agent Tutorial →