Building Arc: A Self-Learning ERP Agent That Audits Clients Before Writing a Single Line of Code

After years of building ERPs for SMB clients, I kept running into the same problem: I’d spend weeks building the wrong thing. The client would describe their process one way, the actual data would tell a different story, and I’d end up rebuilding 40% of the system after the first demo.

Arc is my answer to that problem. It’s a phase-gated AI agent that audits before it builds — and won’t advance to the next phase until I give it explicit approval.

The Core Idea

Arc follows a 5-phase blueprint:

DISCOVER — read everything the client hands over (CSVs, PDFs, screenshots, notes)
AUDIT — map the actual data structure, flag inconsistencies, identify gaps
DESIGN — propose an architecture and wait for approval
BUILD — implement the agreed design
DELIVER — package, document, and hand off

The key word in all of that is wait. Most AI coding agents barrel forward. Arc stops at each gate and writes a report. Nothing happens until I read it and type APPROVE.

The SOUL.md File

Every Arc instance has a SOUL.md at the root of the project. It looks like this:

SOUL.md

# Arc — Project Soul

## Identity
- Name: Arc
- Role: Business Systems Builder
- Client: SMB_004 (retail, 3 locations)
- Phase: AUDIT

## Current Knowledge Base
- kb_entries: 47
- last_updated: 2024-12-09T14:32:00Z
- confidence: 0.73

## Phase Gate Status
- DISCOVER: APPROVED
- AUDIT: IN_PROGRESS
- DESIGN: PENDING
- BUILD: PENDING
- DELIVER: PENDING

This file is Arc’s persistent memory. Every tool call updates it. If the session restarts, Arc reads SOUL.md and picks up exactly where it left off.

The Knowledge Base

Arc uses SQLite for its knowledge base — not because SQLite is glamorous, but because it’s the right tool:

// kb/schema.sql
CREATE TABLE observations (
  id INTEGER PRIMARY KEY,
  phase TEXT NOT NULL,
  category TEXT NOT NULL, -- 'data_structure' | 'process' | 'gap' | 'inconsistency'
  content TEXT NOT NULL,
  confidence REAL DEFAULT 0.5,
  source TEXT,
  created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE phase_reports (
  id INTEGER PRIMARY KEY,
  phase TEXT NOT NULL,
  report_markdown TEXT NOT NULL,
  status TEXT DEFAULT 'draft', -- 'draft' | 'submitted' | 'approved' | 'rejected'
  created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

When Arc discovers that the client’s “product codes” in the inventory sheet don’t match the “item codes” in the sales export, it logs that as an inconsistency observation with a confidence score. By the end of AUDIT, I have a full picture of every data problem before a single schema gets designed.

Server rack infrastructure representing the client data Arc audits before building

Phase Gates in Practice

The gate mechanism is dead simple — a file check:

// lib/gate.ts
export async function checkGate(phase: string, db: Database): Promise<boolean> {
  const row = db.prepare(
    'SELECT status FROM phase_reports WHERE phase = ? ORDER BY created_at DESC LIMIT 1'
  ).get(phase) as { status: string } | undefined;

  return row?.status === 'approved';
}

If the gate is locked, Arc writes a human-readable report and stops:

AUDIT PHASE COMPLETE — AWAITING APPROVAL

I found 3 structural inconsistencies in the client data that need a design decision before I can proceed:

1. Product codes in inventory.csv (format: "P-XXXX") don't match item codes in sales_2024.xlsx (format: "ITM-XXXX"). Are these the same entities?
2. The "supplier" field in 23% of inventory rows is blank. Is this data that exists elsewhere, or genuinely unknown?
3. Location data exists in two formats across three sheets. I recommend normalising to a Location table — confirm?

Reply APPROVE to proceed to DESIGN, or provide corrections.

The Gate Philosophy

That's it. That's the whole gate. A file check against a SQLite row. And it works better than any elaborate approval workflow I've tried. Simple, auditable, impossible to bypass.

Results

Since deploying Arc on client projects:

Zero major architecture rebuilds after the design phase
Average AUDIT report catches 4-7 structural issues that would have become bugs
Client onboarding time down ~35% because discovery is systematic, not conversational

The biggest win is confidence. When I sit down to build, I know exactly what the data looks like, where the gaps are, and what the client approved. No guessing.

What’s Next

I’m working on Arc v2 which adds:

Automatic data profiling (row counts, null rates, cardinality) during AUDIT
A web UI for clients to review and annotate the phase reports directly
Multi-agent mode where Arc can spawn sub-agents for specific audit tasks (financial reconciliation, inventory analysis, etc.)

The Arc agent framework is available on GitHub. The full prompt templates for all five phases are in the repo if you want to adapt the pattern for your own client work.

If you’re building something similar — phase-gated agents, knowledge-base patterns, approval workflows — drop me a line.

For another take on agent architecture, see how I replaced traditional NLP pipelines with a Batch-and-Specialize LLM agent pattern — the same philosophy of audit-before-build, applied to unstructured text intelligence.