Build an AI Agent From Scratch: Architecture Tutorial with Code

TutorialsBy Ivern AI TeamMay 1, 202616 min read

Build an AI Agent From Scratch: Architecture Tutorial with Code

Most AI agent tutorials teach you to use a framework. This one teaches you how agents actually work by building the core architecture from scratch. No LangChain, no CrewAI -- just you, an API client, and the fundamental patterns that every agent framework uses internally.

Understanding these internals helps you debug agent issues, optimize performance, and make informed decisions about which framework to use (or whether to skip frameworks entirely).

In this tutorial:

Agent architecture overview
The core agent loop
State management
Tool execution engine
Memory system
Putting it all together
Production considerations

Agent Architecture Overview

Every AI agent has the same fundamental architecture, regardless of framework:

                    ┌──────────────┐
                    │   User Input  │
                    └──────┬───────┘
                           │
                    ┌──────▼───────┐
                    │  Controller   │ ◄── The "brain" that decides what to do
                    └──────┬───────┘
                           │
              ┌────────────┼────────────┐
              │            │            │
       ┌──────▼──────┐ ┌──▼───────┐ ┌──▼──────┐
       │  LLM Client  │ │  Memory   │ │  Tools  │
       └─────────────┘ └──────────┘ └─────────┘

The Controller runs the agent loop: think → decide → act → observe → repeat.

The LLM Client handles communication with the language model.

The Memory stores conversation history and working context.

The Tools are functions the agent can call to interact with the outside world.

For a broader overview of agent concepts, see our Complete Guide to AI Agent Orchestration.

The Core Agent Loop

The agent loop is the heart of every agent system. Here's the minimal version:

from openai import OpenAI
import json

client = OpenAI()

class Agent:
    def __init__(self, system_prompt: str):
        self.messages = [
            {"role": "system", "content": system_prompt}
        ]

    def run(self, user_input: str, max_steps: int = 10) -> str:
        self.messages.append({"role": "user", "content": user_input})

        for step in range(max_steps):
            response = client.chat.completions.create(
                model="gpt-4o",
                messages=self.messages,
                tools=self.get_tool_definitions()
            )

            message = response.choices[0].message
            self.messages.append(message.to_dict())

            if message.tool_calls:
                for tool_call in message.tool_calls:
                    result = self.execute_tool(
                        tool_call.function.name,
                        json.loads(tool_call.function.arguments)
                    )
                    self.messages.append({
                        "role": "tool",
                        "tool_call_id": tool_call.id,
                        "content": result
                    })
            else:
                return message.content

        return "Agent reached maximum steps without completing the task."

This 30-line class is the foundation. Every agent framework -- LangChain, CrewAI, AutoGen -- implements this same loop with varying degrees of abstraction.

The Decision Point

The key insight is the if message.tool_calls branch. At every step, the LLM makes a decision:

Use a tool -- the agent calls a function and continues the loop
Respond to the user -- the agent produces final output and the loop ends

This is what makes it an agent rather than a chatbot: the LLM decides when to act (tool call) versus when to respond (text output).

State Management

State management determines how the agent tracks context across the conversation. There are three levels:

Level 1: Simple Message List

The basic approach -- store all messages in a list:

class SimpleState:
    def __init__(self, system_prompt: str):
        self.messages = [{"role": "system", "content": system_prompt}]

    def add(self, role: str, content: str):
        self.messages.append({"role": role, "content": content})

    def get_context(self, max_messages: int = 20) -> list:
        return [self.messages[0]] + self.messages[-max_messages:]

Limitation: Token costs grow linearly. A 20-step agent run can consume 20,000+ tokens of context.

Level 2: Summarized Context

Summarize older messages to keep token usage bounded:

class SummarizedState:
    def __init__(self, system_prompt: str, max_raw_messages: int = 10):
        self.system_prompt = system_prompt
        self.summary = ""
        self.recent_messages = []
        self.max_raw = max_raw_messages

    def add(self, role: str, content: str):
        self.recent_messages.append({"role": role, "content": content})
        
        if len(self.recent_messages) > self.max_raw:
            self._summarize_oldest()

    def _summarize_oldest(self):
        oldest = self.recent_messages[:5]
        self.recent_messages = self.recent_messages[5:]
        
        summary_response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": "Summarize the key facts, decisions, and results from this conversation segment."},
                {"role": "user", "content": json.dumps(oldest)}
            ]
        )
        self.summary += "\n" + summary_response.choices[0].message.content

    def get_context(self) -> list:
        context = [{"role": "system", "content": f"{self.system_prompt}\n\nPrevious context: {self.summary}"}]
        return context + self.recent_messages

Level 3: Structured Working Memory

Get AI agent tips in your inbox

Multi-agent workflows, BYOK tips, and product updates. No spam.

For complex agents, use structured memory with separate stores:

class StructuredMemory:
    def __init__(self):
        self.facts = []
        self.decisions = []
        self.artifacts = {}
        self.errors = []

    def add_fact(self, fact: str):
        self.facts.append(fact)

    def add_decision(self, decision: str, reasoning: str):
        self.decisions.append({"decision": decision, "reasoning": reasoning})

    def add_artifact(self, name: str, content: str):
        self.artifacts[name] = content

    def get_context_string(self) -> str:
        parts = []
        if self.facts:
            parts.append("Known facts:\n" + "\n".join(f"- {f}" for f in self.facts))
        if self.decisions:
            parts.append("Decisions made:\n" + "\n".join(
                f"- {d['decision']} (because: {d['reasoning']})" for d in self.decisions
            ))
        if self.errors:
            parts.append("Previous errors:\n" + "\n".join(f"- {e}" for e in self.errors))
        return "\n\n".join(parts)

Tool Execution Engine

The tool execution engine translates LLM tool calls into real function calls. This is where you control what agents can do.

Tool Registry

from dataclasses import dataclass
from typing import Callable
import inspect

@dataclass
class Tool:
    name: str
    description: str
    parameters: dict
    function: Callable

class ToolRegistry:
    def __init__(self):
        self.tools: dict[str, Tool] = {}

    def register(self, name: str, description: str, func: Callable):
        sig = inspect.signature(func)
        parameters = self._extract_parameters(sig)
        
        self.tools[name] = Tool(
            name=name,
            description=description,
            parameters=parameters,
            function=func
        )

    def execute(self, name: str, arguments: dict) -> str:
        tool = self.tools.get(name)
        if not tool:
            return f"Error: Unknown tool '{name}'"
        
        try:
            result = tool.function(**arguments)
            return str(result)
        except TypeError as e:
            return f"Error: Invalid arguments for {name}: {e}"
        except Exception as e:
            return f"Error executing {name}: {e}"

    def get_openai_tools(self) -> list[dict]:
        return [
            {
                "type": "function",
                "function": {
                    "name": tool.name,
                    "description": tool.description,
                    "parameters": tool.parameters
                }
            }
            for tool in self.tools.values()
        ]

    def _extract_parameters(self, sig) -> dict:
        properties = {}
        required = []
        for name, param in sig.parameters.items():
            prop = {"type": "string"}
            if param.default == inspect.Parameter.empty:
                required.append(name)
            properties[name] = prop
        
        return {
            "type": "object",
            "properties": properties,
            "required": required
        }

Using the Registry

registry = ToolRegistry()

def search_web(query: str) -> str:
    import requests
    resp = requests.get(f"https://api.tavily.com/search", params={"query": query, "api_key": "key"})
    return str(resp.json())

def calculate(expression: str) -> str:
    return str(eval(expression))

registry.register("search_web", "Search the web for information", search_web)
registry.register("calculate", "Evaluate a math expression", calculate)

# Get OpenAI-compatible tool definitions
tool_defs = registry.get_openai_tools()

Memory System

Short-Term Memory (Conversation Context)

Short-term memory is the message list we've already built. It persists within a single agent run.

Long-Term Memory (Persistent Storage)

Long-term memory persists across sessions. Here's a file-based implementation:

import json
import os

class LongTermMemory:
    def __init__(self, storage_path: str = "agent_memory"):
        self.path = storage_path
        os.makedirs(storage_path, exist_ok=True)

    def store(self, key: str, value: any):
        filepath = os.path.join(self.path, f"{key}.json")
        with open(filepath, "w") as f:
            json.dump(value, f)

    def retrieve(self, key: str) -> any:
        filepath = os.path.join(self.path, f"{key}.json")
        if os.path.exists(filepath):
            with open(filepath) as f:
                return json.load(f)
        return None

    def list_keys(self) -> list[str]:
        return [f.replace(".json", "") for f in os.listdir(self.path)]

    def search(self, query: str) -> list[str]:
        results = []
        for key in self.list_keys():
            value = self.retrieve(key)
            if query.lower() in str(value).lower():
                results.append(key)
        return results

Episodic Memory (Past Interactions)

class EpisodicMemory:
    def __init__(self, memory: LongTermMemory):
        self.memory = memory

    def save_episode(self, task: str, steps: list, outcome: str):
        episodes = self.memory.retrieve("episodes") or []
        episodes.append({
            "task": task,
            "steps": steps,
            "outcome": outcome,
            "timestamp": datetime.now().isoformat()
        })
        self.memory.store("episodes", episodes)

    def find_similar(self, task: str) -> list:
        episodes = self.memory.retrieve("episodes") or []
        return [e for e in episodes if any(word in e["task"].lower() for word in task.lower().split())]

Putting It All Together

Here's the complete agent with all systems integrated:

class ProductionAgent:
    def __init__(self, system_prompt: str):
        self.state = SummarizedState(system_prompt)
        self.tools = ToolRegistry()
        self.memory = LongTermMemory()
        self.episodes = EpisodicMemory(self.memory)
        self.max_steps = 15

    def add_tool(self, name: str, description: str, func):
        self.tools.register(name, description, func)

    def run(self, user_input: str) -> str:
        self.state.add("user", user_input)
        steps_taken = []

        for step in range(self.max_steps):
            context = self.state.get_context()
            
            response = client.chat.completions.create(
                model="gpt-4o",
                messages=context,
                tools=self.tools.get_openai_tools() if self.tools.tools else None
            )

            message = response.choices[0].message

            if message.tool_calls:
                for tool_call in message.tool_calls:
                    args = json.loads(tool_call.function.arguments)
                    result = self.tools.execute(tool_call.function.name, args)
                    steps_taken.append({
                        "tool": tool_call.function.name,
                        "args": args,
                        "result": result[:200]
                    })
                    self.state.add("tool", result)
            else:
                self.episodes.save_episode(user_input, steps_taken, message.content)
                return message.content

        return "Maximum steps reached."

agent = ProductionAgent(
    system_prompt="You are a helpful research assistant. Use tools to find information."
)

agent.add_tool("search_web", "Search the web", search_web)
agent.add_tool("calculate", "Calculate math", calculate)

result = agent.run("What is the population of France multiplied by 2?")
print(result)

Production Considerations

What This Architecture Is Missing

This is a complete agent, but production systems need more:

Scroll to see full table

Component	Tutorial Version	Production Version
Error handling	Basic try/catch	Circuit breakers, dead letter queues
Logging	Print statements	Structured logging, distributed tracing
Authentication	None	API key rotation, OAuth
Scaling	Single process	Queue-based, horizontal scaling
Persistence	File-based	Database (Postgres, Redis)
Monitoring	None	Metrics, alerts, dashboards
Testing	Manual	Unit tests, integration tests, evals

When to Use a Framework

Build from scratch when:

You need to understand agent internals
You have simple requirements (1-3 tools, linear workflows)
You need maximum control over behavior

Use a framework when:

You need multi-agent orchestration
You want built-in connectors to databases, APIs, and services
Your team doesn't have deep agent expertise
You need to ship fast

For framework comparisons, see our AutoGen vs CrewAI vs LangGraph comparison.

The No-Code Alternative

If building agents from scratch or using frameworks sounds like too much work, Ivern AI handles the entire architecture for you:

No infrastructure -- agent loop, state management, and memory are built in
BYOK model -- bring your API key, pay only for what you use
Multi-agent coordination -- create squads of specialized agents in minutes
Production ready -- monitoring, error handling, and scaling are included

Try it free: ivern.ai/signup

Key Takeaways

Every agent is a loop -- think → decide → act → observe → repeat
The LLM is the decision maker -- it chooses when to use tools vs respond
State management scales with complexity -- from message lists to structured memory
Tools are your control surface -- what you register determines what the agent can do
Build from scratch to learn, use frameworks to ship -- and consider no-code for speed

Next tutorials: AI Agent Python Tutorial · AI Agent Tools Tutorial · Autonomous Agent Tutorial

What Is Agentic AI? How It Works and Why It Matters in 2026

Agentic AI refers to AI systems that can plan, reason, and take actions autonomously. Learn what makes AI 'agentic,' how it differs from chatbots, and why coordinated agentic teams are the next evolution.

AI Agent API Integration Tutorial: Connect Agents to Any External Service

Step-by-step tutorial for connecting AI agents to external APIs and services. Covers REST API integration, authentication, error handling, rate limiting, and building a tool layer that lets agents interact with any service.

AI Agent Collaboration Tutorial: How to Make Multiple Agents Work Together

Learn how to build collaborative AI agent systems where multiple specialized agents share context, hand off tasks, and produce results together. Covers communication patterns, context sharing, and real implementation examples.

Want to try multi-agent AI for free?

Generate a blog post, Twitter thread, LinkedIn post, and newsletter from one prompt. No signup required.

Try the Free Demo

AI Content Factory -- Free to Start

One prompt generates blog posts, social media, and emails. Free tier, BYOK, zero markup.

No spam. Unsubscribe anytime.

Back to Blog

Build an AI Agent From Scratch: Architecture Tutorial with Code

Agent Architecture Overview

The Core Agent Loop

The Decision Point

State Management

Level 1: Simple Message List

Level 2: Summarized Context

Level 3: Structured Working Memory

Get AI agent tips in your inbox

Tool Execution Engine

Tool Registry

Using the Registry

Memory System

Short-Term Memory (Conversation Context)

Long-Term Memory (Persistent Storage)

Episodic Memory (Past Interactions)

Putting It All Together

Production Considerations

What This Architecture Is Missing

When to Use a Framework

The No-Code Alternative

Key Takeaways

Related Articles

What Is Agentic AI? How It Works and Why It Matters in 2026

AI Agent API Integration Tutorial: Connect Agents to Any External Service

AI Agent Collaboration Tutorial: How to Make Multiple Agents Work Together

Want to try multi-agent AI for free?

AI Content Factory -- Free to Start