Build an AI Agent From Scratch: Architecture Tutorial with Code

TutorialsBy Ivern AI Team16 min read

Build an AI Agent From Scratch: Architecture Tutorial with Code

Most AI agent tutorials teach you to use a framework. This one teaches you how agents actually work by building the core architecture from scratch. No LangChain, no CrewAI -- just you, an API client, and the fundamental patterns that every agent framework uses internally.

Understanding these internals helps you debug agent issues, optimize performance, and make informed decisions about which framework to use (or whether to skip frameworks entirely).

In this tutorial:

Related tutorials: AI Agent Python Tutorial · Autonomous AI Agent Tutorial · AI Agent Tools Tutorial

Agent Architecture Overview

Every AI agent has the same fundamental architecture, regardless of framework:

                    ┌──────────────┐
                    │   User Input  │
                    └──────┬───────┘
                           │
                    ┌──────▼───────┐
                    │  Controller   │ ◄── The "brain" that decides what to do
                    └──────┬───────┘
                           │
              ┌────────────┼────────────┐
              │            │            │
       ┌──────▼──────┐ ┌──▼───────┐ ┌──▼──────┐
       │  LLM Client  │ │  Memory   │ │  Tools  │
       └─────────────┘ └──────────┘ └─────────┘

The Controller runs the agent loop: think → decide → act → observe → repeat.

The LLM Client handles communication with the language model.

The Memory stores conversation history and working context.

The Tools are functions the agent can call to interact with the outside world.

For a broader overview of agent concepts, see our Complete Guide to AI Agent Orchestration.

The Core Agent Loop

The agent loop is the heart of every agent system. Here's the minimal version:

from openai import OpenAI
import json

client = OpenAI()

class Agent:
    def __init__(self, system_prompt: str):
        self.messages = [
            {"role": "system", "content": system_prompt}
        ]

    def run(self, user_input: str, max_steps: int = 10) -> str:
        self.messages.append({"role": "user", "content": user_input})

        for step in range(max_steps):
            response = client.chat.completions.create(
                model="gpt-4o",
                messages=self.messages,
                tools=self.get_tool_definitions()
            )

            message = response.choices[0].message
            self.messages.append(message.to_dict())

            if message.tool_calls:
                for tool_call in message.tool_calls:
                    result = self.execute_tool(
                        tool_call.function.name,
                        json.loads(tool_call.function.arguments)
                    )
                    self.messages.append({
                        "role": "tool",
                        "tool_call_id": tool_call.id,
                        "content": result
                    })
            else:
                return message.content

        return "Agent reached maximum steps without completing the task."

This 30-line class is the foundation. Every agent framework -- LangChain, CrewAI, AutoGen -- implements this same loop with varying degrees of abstraction.

The Decision Point

The key insight is the if message.tool_calls branch. At every step, the LLM makes a decision:

  1. Use a tool -- the agent calls a function and continues the loop
  2. Respond to the user -- the agent produces final output and the loop ends

This is what makes it an agent rather than a chatbot: the LLM decides when to act (tool call) versus when to respond (text output).

State Management

State management determines how the agent tracks context across the conversation. There are three levels:

Level 1: Simple Message List

The basic approach -- store all messages in a list:

class SimpleState:
    def __init__(self, system_prompt: str):
        self.messages = [{"role": "system", "content": system_prompt}]

    def add(self, role: str, content: str):
        self.messages.append({"role": role, "content": content})

    def get_context(self, max_messages: int = 20) -> list:
        return [self.messages[0]] + self.messages[-max_messages:]

Limitation: Token costs grow linearly. A 20-step agent run can consume 20,000+ tokens of context.

Level 2: Summarized Context

Summarize older messages to keep token usage bounded:

class SummarizedState:
    def __init__(self, system_prompt: str, max_raw_messages: int = 10):
        self.system_prompt = system_prompt
        self.summary = ""
        self.recent_messages = []
        self.max_raw = max_raw_messages

    def add(self, role: str, content: str):
        self.recent_messages.append({"role": role, "content": content})
        
        if len(self.recent_messages) > self.max_raw:
            self._summarize_oldest()

    def _summarize_oldest(self):
        oldest = self.recent_messages[:5]
        self.recent_messages = self.recent_messages[5:]
        
        summary_response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": "Summarize the key facts, decisions, and results from this conversation segment."},
                {"role": "user", "content": json.dumps(oldest)}
            ]
        )
        self.summary += "\n" + summary_response.choices[0].message.content

    def get_context(self) -> list:
        context = [{"role": "system", "content": f"{self.system_prompt}\n\nPrevious context: {self.summary}"}]
        return context + self.recent_messages

Level 3: Structured Working Memory

Get AI agent tips in your inbox

Multi-agent workflows, BYOK tips, and product updates. No spam.

For complex agents, use structured memory with separate stores:

class StructuredMemory:
    def __init__(self):
        self.facts = []
        self.decisions = []
        self.artifacts = {}
        self.errors = []

    def add_fact(self, fact: str):
        self.facts.append(fact)

    def add_decision(self, decision: str, reasoning: str):
        self.decisions.append({"decision": decision, "reasoning": reasoning})

    def add_artifact(self, name: str, content: str):
        self.artifacts[name] = content

    def get_context_string(self) -> str:
        parts = []
        if self.facts:
            parts.append("Known facts:\n" + "\n".join(f"- {f}" for f in self.facts))
        if self.decisions:
            parts.append("Decisions made:\n" + "\n".join(
                f"- {d['decision']} (because: {d['reasoning']})" for d in self.decisions
            ))
        if self.errors:
            parts.append("Previous errors:\n" + "\n".join(f"- {e}" for e in self.errors))
        return "\n\n".join(parts)

Tool Execution Engine

The tool execution engine translates LLM tool calls into real function calls. This is where you control what agents can do.

Tool Registry

from dataclasses import dataclass
from typing import Callable
import inspect

@dataclass
class Tool:
    name: str
    description: str
    parameters: dict
    function: Callable

class ToolRegistry:
    def __init__(self):
        self.tools: dict[str, Tool] = {}

    def register(self, name: str, description: str, func: Callable):
        sig = inspect.signature(func)
        parameters = self._extract_parameters(sig)
        
        self.tools[name] = Tool(
            name=name,
            description=description,
            parameters=parameters,
            function=func
        )

    def execute(self, name: str, arguments: dict) -> str:
        tool = self.tools.get(name)
        if not tool:
            return f"Error: Unknown tool '{name}'"
        
        try:
            result = tool.function(**arguments)
            return str(result)
        except TypeError as e:
            return f"Error: Invalid arguments for {name}: {e}"
        except Exception as e:
            return f"Error executing {name}: {e}"

    def get_openai_tools(self) -> list[dict]:
        return [
            {
                "type": "function",
                "function": {
                    "name": tool.name,
                    "description": tool.description,
                    "parameters": tool.parameters
                }
            }
            for tool in self.tools.values()
        ]

    def _extract_parameters(self, sig) -> dict:
        properties = {}
        required = []
        for name, param in sig.parameters.items():
            prop = {"type": "string"}
            if param.default == inspect.Parameter.empty:
                required.append(name)
            properties[name] = prop
        
        return {
            "type": "object",
            "properties": properties,
            "required": required
        }

Using the Registry

registry = ToolRegistry()

def search_web(query: str) -> str:
    import requests
    resp = requests.get(f"https://api.tavily.com/search", params={"query": query, "api_key": "key"})
    return str(resp.json())

def calculate(expression: str) -> str:
    return str(eval(expression))

registry.register("search_web", "Search the web for information", search_web)
registry.register("calculate", "Evaluate a math expression", calculate)

# Get OpenAI-compatible tool definitions
tool_defs = registry.get_openai_tools()

Memory System

Short-Term Memory (Conversation Context)

Short-term memory is the message list we've already built. It persists within a single agent run.

Long-Term Memory (Persistent Storage)

Long-term memory persists across sessions. Here's a file-based implementation:

import json
import os

class LongTermMemory:
    def __init__(self, storage_path: str = "agent_memory"):
        self.path = storage_path
        os.makedirs(storage_path, exist_ok=True)

    def store(self, key: str, value: any):
        filepath = os.path.join(self.path, f"{key}.json")
        with open(filepath, "w") as f:
            json.dump(value, f)

    def retrieve(self, key: str) -> any:
        filepath = os.path.join(self.path, f"{key}.json")
        if os.path.exists(filepath):
            with open(filepath) as f:
                return json.load(f)
        return None

    def list_keys(self) -> list[str]:
        return [f.replace(".json", "") for f in os.listdir(self.path)]

    def search(self, query: str) -> list[str]:
        results = []
        for key in self.list_keys():
            value = self.retrieve(key)
            if query.lower() in str(value).lower():
                results.append(key)
        return results

Episodic Memory (Past Interactions)

class EpisodicMemory:
    def __init__(self, memory: LongTermMemory):
        self.memory = memory

    def save_episode(self, task: str, steps: list, outcome: str):
        episodes = self.memory.retrieve("episodes") or []
        episodes.append({
            "task": task,
            "steps": steps,
            "outcome": outcome,
            "timestamp": datetime.now().isoformat()
        })
        self.memory.store("episodes", episodes)

    def find_similar(self, task: str) -> list:
        episodes = self.memory.retrieve("episodes") or []
        return [e for e in episodes if any(word in e["task"].lower() for word in task.lower().split())]

Putting It All Together

Here's the complete agent with all systems integrated:

class ProductionAgent:
    def __init__(self, system_prompt: str):
        self.state = SummarizedState(system_prompt)
        self.tools = ToolRegistry()
        self.memory = LongTermMemory()
        self.episodes = EpisodicMemory(self.memory)
        self.max_steps = 15

    def add_tool(self, name: str, description: str, func):
        self.tools.register(name, description, func)

    def run(self, user_input: str) -> str:
        self.state.add("user", user_input)
        steps_taken = []

        for step in range(self.max_steps):
            context = self.state.get_context()
            
            response = client.chat.completions.create(
                model="gpt-4o",
                messages=context,
                tools=self.tools.get_openai_tools() if self.tools.tools else None
            )

            message = response.choices[0].message

            if message.tool_calls:
                for tool_call in message.tool_calls:
                    args = json.loads(tool_call.function.arguments)
                    result = self.tools.execute(tool_call.function.name, args)
                    steps_taken.append({
                        "tool": tool_call.function.name,
                        "args": args,
                        "result": result[:200]
                    })
                    self.state.add("tool", result)
            else:
                self.episodes.save_episode(user_input, steps_taken, message.content)
                return message.content

        return "Maximum steps reached."

agent = ProductionAgent(
    system_prompt="You are a helpful research assistant. Use tools to find information."
)

agent.add_tool("search_web", "Search the web", search_web)
agent.add_tool("calculate", "Calculate math", calculate)

result = agent.run("What is the population of France multiplied by 2?")
print(result)

Production Considerations

What This Architecture Is Missing

This is a complete agent, but production systems need more:

Scroll to see full table

ComponentTutorial VersionProduction Version
Error handlingBasic try/catchCircuit breakers, dead letter queues
LoggingPrint statementsStructured logging, distributed tracing
AuthenticationNoneAPI key rotation, OAuth
ScalingSingle processQueue-based, horizontal scaling
PersistenceFile-basedDatabase (Postgres, Redis)
MonitoringNoneMetrics, alerts, dashboards
TestingManualUnit tests, integration tests, evals

When to Use a Framework

Build from scratch when:

  • You need to understand agent internals
  • You have simple requirements (1-3 tools, linear workflows)
  • You need maximum control over behavior

Use a framework when:

  • You need multi-agent orchestration
  • You want built-in connectors to databases, APIs, and services
  • Your team doesn't have deep agent expertise
  • You need to ship fast

For framework comparisons, see our AutoGen vs CrewAI vs LangGraph comparison.

The No-Code Alternative

If building agents from scratch or using frameworks sounds like too much work, Ivern AI handles the entire architecture for you:

  • No infrastructure -- agent loop, state management, and memory are built in
  • BYOK model -- bring your API key, pay only for what you use
  • Multi-agent coordination -- create squads of specialized agents in minutes
  • Production ready -- monitoring, error handling, and scaling are included

Try it free: ivern.ai/signup

Key Takeaways

  1. Every agent is a loop -- think → decide → act → observe → repeat
  2. The LLM is the decision maker -- it chooses when to use tools vs respond
  3. State management scales with complexity -- from message lists to structured memory
  4. Tools are your control surface -- what you register determines what the agent can do
  5. Build from scratch to learn, use frameworks to ship -- and consider no-code for speed

Next tutorials: AI Agent Python Tutorial · AI Agent Tools Tutorial · Autonomous Agent Tutorial

Want to try multi-agent AI for free?

Generate a blog post, Twitter thread, LinkedIn post, and newsletter from one prompt. No signup required.

Try the Free Demo

AI Content Factory -- Free to Start

One prompt generates blog posts, social media, and emails. Free tier, BYOK, zero markup.

No spam. Unsubscribe anytime.