Build an AI Agent From Scratch: Architecture Tutorial with Code
Build an AI Agent From Scratch: Architecture Tutorial with Code
Most AI agent tutorials teach you to use a framework. This one teaches you how agents actually work by building the core architecture from scratch. No LangChain, no CrewAI -- just you, an API client, and the fundamental patterns that every agent framework uses internally.
Understanding these internals helps you debug agent issues, optimize performance, and make informed decisions about which framework to use (or whether to skip frameworks entirely).
In this tutorial:
- Agent architecture overview
- The core agent loop
- State management
- Tool execution engine
- Memory system
- Putting it all together
- Production considerations
Related tutorials: AI Agent Python Tutorial · Autonomous AI Agent Tutorial · AI Agent Tools Tutorial
Agent Architecture Overview
Every AI agent has the same fundamental architecture, regardless of framework:
┌──────────────┐
│ User Input │
└──────┬───────┘
│
┌──────▼───────┐
│ Controller │ ◄── The "brain" that decides what to do
└──────┬───────┘
│
┌────────────┼────────────┐
│ │ │
┌──────▼──────┐ ┌──▼───────┐ ┌──▼──────┐
│ LLM Client │ │ Memory │ │ Tools │
└─────────────┘ └──────────┘ └─────────┘
The Controller runs the agent loop: think → decide → act → observe → repeat.
The LLM Client handles communication with the language model.
The Memory stores conversation history and working context.
The Tools are functions the agent can call to interact with the outside world.
For a broader overview of agent concepts, see our Complete Guide to AI Agent Orchestration.
The Core Agent Loop
The agent loop is the heart of every agent system. Here's the minimal version:
from openai import OpenAI
import json
client = OpenAI()
class Agent:
def __init__(self, system_prompt: str):
self.messages = [
{"role": "system", "content": system_prompt}
]
def run(self, user_input: str, max_steps: int = 10) -> str:
self.messages.append({"role": "user", "content": user_input})
for step in range(max_steps):
response = client.chat.completions.create(
model="gpt-4o",
messages=self.messages,
tools=self.get_tool_definitions()
)
message = response.choices[0].message
self.messages.append(message.to_dict())
if message.tool_calls:
for tool_call in message.tool_calls:
result = self.execute_tool(
tool_call.function.name,
json.loads(tool_call.function.arguments)
)
self.messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result
})
else:
return message.content
return "Agent reached maximum steps without completing the task."
This 30-line class is the foundation. Every agent framework -- LangChain, CrewAI, AutoGen -- implements this same loop with varying degrees of abstraction.
The Decision Point
The key insight is the if message.tool_calls branch. At every step, the LLM makes a decision:
- Use a tool -- the agent calls a function and continues the loop
- Respond to the user -- the agent produces final output and the loop ends
This is what makes it an agent rather than a chatbot: the LLM decides when to act (tool call) versus when to respond (text output).
State Management
State management determines how the agent tracks context across the conversation. There are three levels:
Level 1: Simple Message List
The basic approach -- store all messages in a list:
class SimpleState:
def __init__(self, system_prompt: str):
self.messages = [{"role": "system", "content": system_prompt}]
def add(self, role: str, content: str):
self.messages.append({"role": role, "content": content})
def get_context(self, max_messages: int = 20) -> list:
return [self.messages[0]] + self.messages[-max_messages:]
Limitation: Token costs grow linearly. A 20-step agent run can consume 20,000+ tokens of context.
Level 2: Summarized Context
Summarize older messages to keep token usage bounded:
class SummarizedState:
def __init__(self, system_prompt: str, max_raw_messages: int = 10):
self.system_prompt = system_prompt
self.summary = ""
self.recent_messages = []
self.max_raw = max_raw_messages
def add(self, role: str, content: str):
self.recent_messages.append({"role": role, "content": content})
if len(self.recent_messages) > self.max_raw:
self._summarize_oldest()
def _summarize_oldest(self):
oldest = self.recent_messages[:5]
self.recent_messages = self.recent_messages[5:]
summary_response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "Summarize the key facts, decisions, and results from this conversation segment."},
{"role": "user", "content": json.dumps(oldest)}
]
)
self.summary += "\n" + summary_response.choices[0].message.content
def get_context(self) -> list:
context = [{"role": "system", "content": f"{self.system_prompt}\n\nPrevious context: {self.summary}"}]
return context + self.recent_messages
Level 3: Structured Working Memory
Get AI agent tips in your inbox
Multi-agent workflows, BYOK tips, and product updates. No spam.
For complex agents, use structured memory with separate stores:
class StructuredMemory:
def __init__(self):
self.facts = []
self.decisions = []
self.artifacts = {}
self.errors = []
def add_fact(self, fact: str):
self.facts.append(fact)
def add_decision(self, decision: str, reasoning: str):
self.decisions.append({"decision": decision, "reasoning": reasoning})
def add_artifact(self, name: str, content: str):
self.artifacts[name] = content
def get_context_string(self) -> str:
parts = []
if self.facts:
parts.append("Known facts:\n" + "\n".join(f"- {f}" for f in self.facts))
if self.decisions:
parts.append("Decisions made:\n" + "\n".join(
f"- {d['decision']} (because: {d['reasoning']})" for d in self.decisions
))
if self.errors:
parts.append("Previous errors:\n" + "\n".join(f"- {e}" for e in self.errors))
return "\n\n".join(parts)
Tool Execution Engine
The tool execution engine translates LLM tool calls into real function calls. This is where you control what agents can do.
Tool Registry
from dataclasses import dataclass
from typing import Callable
import inspect
@dataclass
class Tool:
name: str
description: str
parameters: dict
function: Callable
class ToolRegistry:
def __init__(self):
self.tools: dict[str, Tool] = {}
def register(self, name: str, description: str, func: Callable):
sig = inspect.signature(func)
parameters = self._extract_parameters(sig)
self.tools[name] = Tool(
name=name,
description=description,
parameters=parameters,
function=func
)
def execute(self, name: str, arguments: dict) -> str:
tool = self.tools.get(name)
if not tool:
return f"Error: Unknown tool '{name}'"
try:
result = tool.function(**arguments)
return str(result)
except TypeError as e:
return f"Error: Invalid arguments for {name}: {e}"
except Exception as e:
return f"Error executing {name}: {e}"
def get_openai_tools(self) -> list[dict]:
return [
{
"type": "function",
"function": {
"name": tool.name,
"description": tool.description,
"parameters": tool.parameters
}
}
for tool in self.tools.values()
]
def _extract_parameters(self, sig) -> dict:
properties = {}
required = []
for name, param in sig.parameters.items():
prop = {"type": "string"}
if param.default == inspect.Parameter.empty:
required.append(name)
properties[name] = prop
return {
"type": "object",
"properties": properties,
"required": required
}
Using the Registry
registry = ToolRegistry()
def search_web(query: str) -> str:
import requests
resp = requests.get(f"https://api.tavily.com/search", params={"query": query, "api_key": "key"})
return str(resp.json())
def calculate(expression: str) -> str:
return str(eval(expression))
registry.register("search_web", "Search the web for information", search_web)
registry.register("calculate", "Evaluate a math expression", calculate)
# Get OpenAI-compatible tool definitions
tool_defs = registry.get_openai_tools()
Memory System
Short-Term Memory (Conversation Context)
Short-term memory is the message list we've already built. It persists within a single agent run.
Long-Term Memory (Persistent Storage)
Long-term memory persists across sessions. Here's a file-based implementation:
import json
import os
class LongTermMemory:
def __init__(self, storage_path: str = "agent_memory"):
self.path = storage_path
os.makedirs(storage_path, exist_ok=True)
def store(self, key: str, value: any):
filepath = os.path.join(self.path, f"{key}.json")
with open(filepath, "w") as f:
json.dump(value, f)
def retrieve(self, key: str) -> any:
filepath = os.path.join(self.path, f"{key}.json")
if os.path.exists(filepath):
with open(filepath) as f:
return json.load(f)
return None
def list_keys(self) -> list[str]:
return [f.replace(".json", "") for f in os.listdir(self.path)]
def search(self, query: str) -> list[str]:
results = []
for key in self.list_keys():
value = self.retrieve(key)
if query.lower() in str(value).lower():
results.append(key)
return results
Episodic Memory (Past Interactions)
class EpisodicMemory:
def __init__(self, memory: LongTermMemory):
self.memory = memory
def save_episode(self, task: str, steps: list, outcome: str):
episodes = self.memory.retrieve("episodes") or []
episodes.append({
"task": task,
"steps": steps,
"outcome": outcome,
"timestamp": datetime.now().isoformat()
})
self.memory.store("episodes", episodes)
def find_similar(self, task: str) -> list:
episodes = self.memory.retrieve("episodes") or []
return [e for e in episodes if any(word in e["task"].lower() for word in task.lower().split())]
Putting It All Together
Here's the complete agent with all systems integrated:
class ProductionAgent:
def __init__(self, system_prompt: str):
self.state = SummarizedState(system_prompt)
self.tools = ToolRegistry()
self.memory = LongTermMemory()
self.episodes = EpisodicMemory(self.memory)
self.max_steps = 15
def add_tool(self, name: str, description: str, func):
self.tools.register(name, description, func)
def run(self, user_input: str) -> str:
self.state.add("user", user_input)
steps_taken = []
for step in range(self.max_steps):
context = self.state.get_context()
response = client.chat.completions.create(
model="gpt-4o",
messages=context,
tools=self.tools.get_openai_tools() if self.tools.tools else None
)
message = response.choices[0].message
if message.tool_calls:
for tool_call in message.tool_calls:
args = json.loads(tool_call.function.arguments)
result = self.tools.execute(tool_call.function.name, args)
steps_taken.append({
"tool": tool_call.function.name,
"args": args,
"result": result[:200]
})
self.state.add("tool", result)
else:
self.episodes.save_episode(user_input, steps_taken, message.content)
return message.content
return "Maximum steps reached."
agent = ProductionAgent(
system_prompt="You are a helpful research assistant. Use tools to find information."
)
agent.add_tool("search_web", "Search the web", search_web)
agent.add_tool("calculate", "Calculate math", calculate)
result = agent.run("What is the population of France multiplied by 2?")
print(result)
Production Considerations
What This Architecture Is Missing
This is a complete agent, but production systems need more:
Scroll to see full table
| Component | Tutorial Version | Production Version |
|---|---|---|
| Error handling | Basic try/catch | Circuit breakers, dead letter queues |
| Logging | Print statements | Structured logging, distributed tracing |
| Authentication | None | API key rotation, OAuth |
| Scaling | Single process | Queue-based, horizontal scaling |
| Persistence | File-based | Database (Postgres, Redis) |
| Monitoring | None | Metrics, alerts, dashboards |
| Testing | Manual | Unit tests, integration tests, evals |
When to Use a Framework
Build from scratch when:
- You need to understand agent internals
- You have simple requirements (1-3 tools, linear workflows)
- You need maximum control over behavior
Use a framework when:
- You need multi-agent orchestration
- You want built-in connectors to databases, APIs, and services
- Your team doesn't have deep agent expertise
- You need to ship fast
For framework comparisons, see our AutoGen vs CrewAI vs LangGraph comparison.
The No-Code Alternative
If building agents from scratch or using frameworks sounds like too much work, Ivern AI handles the entire architecture for you:
- No infrastructure -- agent loop, state management, and memory are built in
- BYOK model -- bring your API key, pay only for what you use
- Multi-agent coordination -- create squads of specialized agents in minutes
- Production ready -- monitoring, error handling, and scaling are included
Try it free: ivern.ai/signup
Key Takeaways
- Every agent is a loop -- think → decide → act → observe → repeat
- The LLM is the decision maker -- it chooses when to use tools vs respond
- State management scales with complexity -- from message lists to structured memory
- Tools are your control surface -- what you register determines what the agent can do
- Build from scratch to learn, use frameworks to ship -- and consider no-code for speed
Next tutorials: AI Agent Python Tutorial · AI Agent Tools Tutorial · Autonomous Agent Tutorial
Related Articles
What Is Agentic AI? How It Works and Why It Matters in 2026
Agentic AI refers to AI systems that can plan, reason, and take actions autonomously. Learn what makes AI 'agentic,' how it differs from chatbots, and why coordinated agentic teams are the next evolution.
AI Agent API Integration Tutorial: Connect Agents to Any External Service
Step-by-step tutorial for connecting AI agents to external APIs and services. Covers REST API integration, authentication, error handling, rate limiting, and building a tool layer that lets agents interact with any service.
AI Agent Collaboration Tutorial: How to Make Multiple Agents Work Together
Learn how to build collaborative AI agent systems where multiple specialized agents share context, hand off tasks, and produce results together. Covers communication patterns, context sharing, and real implementation examples.
Want to try multi-agent AI for free?
Generate a blog post, Twitter thread, LinkedIn post, and newsletter from one prompt. No signup required.
Try the Free DemoAI Content Factory -- Free to Start
One prompt generates blog posts, social media, and emails. Free tier, BYOK, zero markup.
No spam. Unsubscribe anytime.