Multi-Agent AI for Data Analysis: Build a Team That Cleans, Analyzes, and Reports

TutorialsBy Ivern AI Team13 min read

Multi-Agent AI for Data Analysis: Build a Team That Cleans, Analyzes, and Reports

Data analysts spend up to 80% of their time cleaning and preparing data. What if you could hand that work -- plus the statistical analysis, chart creation, and report writing -- to a team of AI agents that runs the full pipeline end-to-end?

In this tutorial, you'll build a multi-agent data analysis system with four specialized agents: one that cleans your raw data, one that runs statistical tests, one that generates visualizations, and one that compiles everything into a polished executive report. Each agent has a focused role, a tailored system prompt, and a clear handoff protocol.

We'll walk through the architecture, provide copy-paste agent prompts, and run a real example: turning a messy CSV of regional sales data into a board-ready analysis.

Table of Contents

Why a Multi-Agent Data Analysis Team?

A single LLM can analyze data. But it struggles with the full pipeline because each phase demands different reasoning modes. Cleaning requires meticulous attention to schema anomalies. Statistics requires formal hypothesis testing. Visualization requires design judgment. Report writing requires narrative synthesis.

When you cram all of that into one prompt, you get a jack-of-all-trades that masters none. Context windows fill up. The model forgets which columns it already normalized. It produces a chart that contradicts the p-value it computed two paragraphs earlier.

An ai data analysis team solves this by decomposing the pipeline into discrete, sequential stages. Each agent operates in a focused context. Each handoff includes a structured data contract -- a JSON schema or markdown table -- so nothing is lost between steps. The result is higher quality at every phase, plus the ability to rerun a single stage without redoing the entire analysis.

This pattern -- breaking complex work into specialized agent roles connected by structured handoffs -- is the same approach we use in our multi-agent research pipeline guide. The data analysis flavor simply applies it to tabular data instead of web research.

The 4-Agent Architecture

The pipeline flows through four agents in sequence:

Raw Data → [Data Cleaner] → Cleaned Dataset
                                  ↓
                           [Statistical Analyst] → Analysis Results
                                                       ↓
                                                [Visualization Agent] → Charts & Tables
                                                                          ↓
                                                                   [Report Writer] → Executive Report

Each agent receives:

  1. Its own system prompt defining its role and constraints
  2. Structured input from the previous agent (or the raw file for Agent 1)
  3. A structured output format that the next agent can parse

The orchestration layer handles routing, retries, and validation between stages. If you want to dive deeper into orchestration patterns, see our multi-agent task orchestration guide.

Agent 1: Data Cleaner

The Data Cleaner ingests raw files -- CSVs, Excel workbooks, JSON exports -- and produces a normalized dataset ready for analysis. It handles missing values, type coercion, outlier flagging, and schema validation.

System Prompt

You are a data cleaning specialist. Your job is to take raw data files
and produce a clean, analysis-ready dataset.

Rules:
- Report the original row/column count and the final row/column count.
- For each column, infer the correct data type (int, float, date, categorical, text).
- Handle missing values: impute with median for numeric, "Unknown" for categorical.
  Log every imputation.
- Flag outliers using the IQR method (1.5× IQR). Do not remove them -- add a
  boolean column `is_outlier_{colname}` for each numeric column.
- Standardize date columns to ISO 8601 (YYYY-MM-DD).
- Strip whitespace from all string columns.
- Deduplicate on all columns.
- Output the cleaned dataset as a CSV-formatted string.
- Output a cleaning summary as a markdown table with columns:
  [Column, Original Type, Final Type, Missing Count, Imputation Method, Outlier Count]

What It Produces

  • A cleaned CSV string (passed to the Statistical Analyst)
  • A cleaning summary markdown table (passed to the Report Writer)
  • Row/column counts before and after cleaning

This agent typically processes 10,000 rows of tabular data in 15-25 seconds using a model like GPT-4o or Claude Sonnet.

Agent 2: Statistical Analyst

The Statistical Analyst receives the cleaned dataset and produces formal analysis results: descriptive statistics, hypothesis tests, correlation matrices, and key findings.

System Prompt

You are a senior statistical analyst. You receive a cleaned dataset and a
research question. Perform the following analyses:

1. Descriptive statistics for all numeric columns (mean, median, std, min, max, IQR).
2. Group-by analysis: segment the data by each categorical column and compute
   summary statistics for every numeric column within each group.
3. Correlation matrix for all numeric columns. Flag any correlation with
   |r| > 0.7 as "strong."
4. For the primary research question, run the appropriate hypothesis test:
   - Two groups → independent t-test or Mann-Whitney U
   - Three+ groups → one-way ANOVA or Kruskal-Wallis
   - Time-based → linear regression or paired t-test
   Report the test statistic, p-value, effect size, and whether the result
   is significant at α = 0.05.
5. Identify the top 3 actionable insights from the data.

Output format:
- JSON with keys: descriptive_stats, group_analysis, correlation, hypothesis_test, top_insights
- Each insight should be one sentence with a supporting number.

What It Produces

A structured JSON object containing all analysis results. This JSON is consumed directly by the Visualization Agent for chart generation and by the Report Writer for narrative synthesis.

This is the same statistical rigor we apply in our AI agent workflow for financial analysis, adapted for general-purpose data analysis.

Agent 3: Visualization Agent

The Visualization Agent transforms statistical results into charts. It generates Python/Plotly or Vega-Lite specifications that can be rendered in the final report.

System Prompt

Get AI agent tips in your inbox

Multi-agent workflows, BYOK tips, and product updates. No spam.

You are a data visualization specialist. You receive statistical analysis
results in JSON format. Generate a set of charts that communicate the key findings.

Rules:
- Produce exactly 4-6 charts. No more, no less.
- Chart types should be chosen appropriately:
  - Trends over time → line chart
  - Comparison between groups → bar chart (horizontal for 6+ categories)
  - Distribution → histogram or box plot
  - Composition → stacked bar or pie chart (max 6 slices)
  - Relationship between two variables → scatter plot with trend line
- Use a consistent color palette: ["#2563EB", "#10B981", "#F59E0B", "#EF4444", "#8B5CF6", "#EC4899"]
- Every chart must have: a descriptive title, axis labels, and a data source note.
- Output each chart as a self-contained Plotly Express code block.
- Include a 1-sentence annotation for each chart explaining the key takeaway.

What It Produces

4-6 Plotly Express code blocks, each with an annotation. These are embedded directly in the final report as static images or interactive widgets.

Agent 4: Report Writer

The Report Writer synthesizes the cleaning summary, statistical results, and visualizations into a cohesive executive report.

System Prompt

You are a business report writer. You receive:
1. A data cleaning summary
2. Statistical analysis results (JSON)
3. Chart annotations and descriptions

Write an executive report with this structure:

# [Report Title]

## Executive Summary
3-4 sentences covering the key finding, supporting evidence, and recommended action.

## Methodology
Describe the dataset (row count, column count, date range), cleaning steps taken,
and statistical methods used. Keep this to one paragraph.

## Key Findings
Present the top 3-5 findings as H3 sections. Each finding should have:
- A bold headline sentence
- 2-3 supporting sentences with specific numbers
- A reference to the relevant chart (e.g., "See Figure 2")

## Charts
Embed each chart with a numbered caption (Figure 1, Figure 2, etc.).

## Recommendations
3-5 bullet points with specific, actionable next steps based on the data.

## Appendix: Data Quality Notes
Include the cleaning summary table. Note any caveats about the data.

Rules:
- Write for a non-technical executive audience.
- Every claim must be supported by a specific number from the analysis.
- Do not hedge excessively. If a finding is statistically significant, state it clearly.
- Use bullet points for scannability. Keep paragraphs to 3 sentences max.
- Total length: 800-1200 words.

What It Produces

A complete markdown report ready for delivery via email, Slack, or a dashboard.

The Full Workflow in Action: Sales Data Example

Let's trace the full pipeline with a concrete example. Our input is a CSV file regional_sales_q1_2026.csv with 14,230 rows and these columns:

Date, Region, Product_Category, Units_Sold, Revenue, Customer_Age,
Customer_Segment, Discount_Percent, Sales_Rep, Return_Flag

Our research question: "Which regions and product categories drove Q1 growth, and where should we increase investment in Q2?"

Stage 1: Cleaning

The Data Cleaner processes the file and finds:

Scroll to see full table

ColumnOriginal TypeFinal TypeMissingImputationOutliers
Datetextdate0--0
Regiontextcategorical47"Unknown"--
Units_Soldtextint312median (48)89
Revenuetextfloat203median ($1,247.50)104
Customer_Agetextint1,891median (34)67
Discount_Percenttextfloat0--23

Output: 14,198 rows (32 duplicates removed), 14 columns (4 new outlier flag columns added).

Stage 2: Statistical Analysis

The Statistical Analyst runs the full battery:

  • Descriptive stats: Mean revenue per transaction is $1,342.17 (std $891.43). Median is $1,247.50.
  • Group-by: West region leads with mean revenue of $1,587.30 (+18.3% vs. overall mean). Electronics category accounts for 42.1% of total revenue.
  • Correlation: Discount_Percent has a moderate positive correlation with Units_Sold (r = 0.38) but weak negative correlation with Revenue (r = -0.12).
  • Hypothesis test: One-way ANOVA on Revenue by Region -- F(4, 14193) = 47.3, p < 0.001, η² = 0.013. The difference in mean revenue across regions is statistically significant but the effect size is small.
  • Top insights:
    1. West region generates 31.4% of total Q1 revenue despite representing only 22.1% of transactions.
    2. Electronics in the West has the highest average order value at $2,103.40, which is 56.7% above the overall mean.
    3. Discount rates above 20% correlate with a 14.2% decrease in per-unit revenue without a proportional increase in volume.

Stage 3: Visualization

The Visualization Agent generates 5 charts:

  1. Revenue by Region -- horizontal bar chart showing West leading at $6.05M
  2. Monthly Revenue Trend -- line chart showing March uptick across all regions
  3. Revenue vs. Discount Scatter -- scatter plot with trend line showing the diminishing returns of high discounts
  4. Product Category Mix by Region -- stacked bar showing Electronics dominance in the West
  5. Customer Age Distribution -- histogram showing a bimodal distribution with peaks at 28 and 45

Stage 4: Report

The Report Writer assembles everything into an 1,100-word executive report with:

  • An executive summary highlighting the West region's outperformance
  • A methodology paragraph noting the 14,198-row dataset and ANOVA test
  • Five key findings, each referencing a specific chart
  • Four recommendations including "Increase Electronics inventory allocation in the West by 15-20%" and "Cap discount rates at 15% except for clearance items"
  • The data quality appendix with the cleaning summary table

Total pipeline runtime: approximately 90 seconds. Total cost: see below.

Cost Estimate Per Analysis Run

Here's a breakdown of the ai agent data analysis pipeline costs using current API pricing (May 2026):

Scroll to see full table

AgentModelAvg TokensCost per Run
Data CleanerGPT-4o~18,000 input / 12,000 output$0.15
Statistical AnalystGPT-4o~22,000 input / 8,000 output$0.18
Visualization AgentGPT-4o~8,000 input / 6,000 output$0.07
Report WriterGPT-4o~12,000 input / 4,000 output$0.08
Total~60,000 input / 30,000 output$0.48

Using Claude Sonnet 4 instead of GPT-4o drops the total to approximately $0.36 per run. Using a smaller model like GPT-4o-mini for the cleaner and visualization agents brings it down to roughly $0.22, though you may sacrifice some quality on the statistical analysis.

If you're running this pipeline daily across 10 datasets, expect a monthly cost of $144-$216. For teams managing high volumes, using your own API keys with a BYOK setup keeps costs transparent and under your control.

Multi-Agent vs. Single-Agent Comparison

We tested the same sales analysis using a single-agent approach -- one LLM call with a comprehensive prompt covering cleaning, analysis, visualization code, and report writing.

Scroll to see full table

MetricMulti-Agent TeamSingle Agent
Pipeline runtime90 seconds65 seconds
Cleaning accuracy (missing value detection)100% (312 of 312)87% (272 of 312)
Correct hypothesis test selectedYes (ANOVA)No (used t-test)
Charts with correct axis labels5 of 53 of 5
Internal consistency (stats match narrative)100%78%
Report readability score (Flesch-Kincaid)Grade 10Grade 13
Cost per run$0.48$0.42
Recoverable errors (re-run single stage)YesNo

The single agent is faster and slightly cheaper. But it makes more errors -- particularly in statistical methodology and internal consistency. When a single agent generates 8,000 tokens of output covering four distinct disciplines, something always slips. The multi-agent approach costs $0.06 more per run but produces analysis you can actually trust.

The bigger win is recoverability. If the visualization agent produces a chart with the wrong axis label, you can rerun just that stage for $0.07. With a single agent, you rerun the entire pipeline -- and get different results each time because the model regenerates everything from scratch.

Putting It All Together

A well-designed automated data pipeline ai system does more than save time. It produces consistent, auditable analysis that scales across datasets without degradation. The four-agent architecture -- Clean, Analyze, Visualize, Report -- gives each phase the focused context it needs to produce high-quality output.

Key takeaways for building your own multi-agent data analysis pipeline:

  1. Define strict output schemas between agents. JSON contracts prevent information loss at handoff points.
  2. Use the right model for each task. The cleaner can run on a fast, cheap model. The statistical analyst needs a reasoning-heavy model.
  3. Log every transformation. The cleaning summary isn't just for the report -- it's your audit trail.
  4. Test with known datasets first. Run your pipeline on data where you know the expected results before trusting it with production data.
  5. Iterate on prompts, not on code. The power of this architecture is that improving the Statistical Analyst's prompt doesn't require touching any other agent.

Ready to build your own AI-powered data analysis team? Sign up at ivern.ai to set up multi-agent workflows with your own API keys, custom agent prompts, and automated scheduling -- no infrastructure management required.

Want to try multi-agent AI for free?

Generate a blog post, Twitter thread, LinkedIn post, and newsletter from one prompt. No signup required.

Try the Free Demo

AI Content Factory -- Free to Start

One prompt generates blog posts, social media, and emails. Free tier, BYOK, zero markup.

No spam. Unsubscribe anytime.