Mason Scurry · June 2026

How I built a 30-agent AI operating system for my consulting firm

I'm a 24-year-old who runs a consulting firm out of my apartment. Here's exactly what I built, how I built it, and what it actually cost me.

Why I did this

I run Scurry Consulting solo. I had three big problems: I was the only person doing the thinking, I was forgetting things I knew weeks ago, and I had no one to catch my blind spots.

I didn't want a pile of disconnected AI tools. I wanted something that ran like a staff. So I built one.

What I have now

30 AI agents

14 active workflows

13 divisions

119 total nodes in the system

The actual architecture

The system runs on a TypeScript runtime I built from scratch. Each agent is a plain markdown file that defines its role, its tools, and its rules. At runtime, a loader reads those files and injects them as system prompts into Claude API calls.

There's no magic. It's file reads, API calls, and JSON.

Runtime Node.js + TypeScript, runs on my Mac via launchd

AI models Claude API (Haiku for routing/classification, Sonnet for most work, Opus for hard reasoning)

Memory 117,000-row personal database + vector embeddings (all-MiniLM-L6-v2, on-device)

Spine events.jsonl — every agent action appends to this append-only log

Scheduling scheduler.ts + macOS launchd for timed jobs

Notifications Slack (every agent can post to #improvements, #alerts)

Cost auditing usage.jsonl — every API call logs tokens, Penny reviews it daily

Every agent can be overridden by me. Nothing sends, posts, or ships without my approval. The system proposes. I decide.

The agents

I gave every agent a real name and a specific job. No generic "assistant." Each one has exactly one domain and one set of tools. The overlap is intentional: Frankie (operations) and Danny (engineering) argue about what to build next. That tension is useful.

Agent	Role	What they actually do
Frankie Russo	COO · Ops	Manages build queue, coordinates agent work, flags what's stuck
Dahlia Okonkwo	CMO · Content	Runs the content calendar, briefs Jo and Mateo, reviews before publish
Marcus Bell	CRO · Revenue	Tracks pipeline, flags stale deals, drafts SOWs with Nadia's research
Ruth Abramowitz	CFO · Finance	Categorizes expenses, audits receipts, flags anomalies for my review
Priya Raghavan	COO · Delivery	Tracks active client work, surfaces delays, manages milestone cadence
Danny Kessler	CTO · Engineering	Reviews agent code, proposes architectural changes, runs the build runner autonomously
Wendy Cho	CIO · Intel	Synthesizes market intelligence, monitors competitive signals
Eleanor Voss	CSO · Strategy	Chairs the advisor council, surfaces strategic tradeoffs before I commit
Theo Mensah	CLO · Learning	Runs the self-improvement loop: reviews what worked, proposes system changes
Carmen Ruiz	Personal chief	The only agent with access to my personal database; keeps the personal wing separate
Jo Castellano	LinkedIn writer	Drafts posts from Dahlia's briefs using creativity-engine QD selection
Nell Brennan	Relationship keeper	Flags contacts gone cold past 90 days, drafts outreach for my approval
Penny Okafor	Token auditor	Reviews usage.jsonl daily, flags waste, proposes caching fixes
Aria Santos	Outbound writer	Pulls leads from Apollo, writes cold emails, Marcus reviews before I approve

The other 16 agents handle things like podcast scouting, press outreach, speaking pitch writing, video production, financial advisory, legal review, and brand. None of them do anything I haven't explicitly approved first.

The workflows that run constantly

A workflow is a named path through the system. I can trigger any of them, trace them on the map, or let them run on schedule.

Content engine Knowledge base → Dahlia briefs → Jo drafts → Buffer → LinkedIn

Cold outbound Apollo pulls leads → Aria writes → Marcus reviews → I approve → send

Network tending email + iMessage → Nell flags cold contacts → Nell drafts → I approve

Token audit usage.jsonl → Penny audits → proposes caching fixes → Danny implements on a branch

Self-improvement loop Simone identifies gaps → Theo proposes changes → Danny builds → Frankie ships

Scurryville nightly 2am — Sonnet simulates staff dynamics, files a morning newspaper

Build exhaust → income Packager bundles builds → Gumroad listing drafted → I approve → live

The things I got wrong first

I built agents with no names and no personalities. They were interchangeable. I couldn't tell which one had said what or why. Every agent now has a specific person behind it — a name, a job title, a clear domain boundary. When Frankie and Danny disagree, I know what they each care about.

I gave agents too much autonomy too fast. One sent a message I hadn't seen. I pulled that back immediately. Now the rule is hard: the system proposes, I decide. The approval gate is not optional.

I didn't track costs. I had no idea what each agent was spending until I built Penny. Now every API call logs tokens to usage.jsonl and she reviews it every morning. I've cut my API spend in half since then.

I wrote agents as features, not as roles. The first version had a "content agent" that did writing, research, scheduling, and distribution. It was incoherent. Breaking that into Dahlia (strategy), Jo (writing), Mateo (visuals), Zoe (video), and Otis (podcast scouting) made each one dramatically better.

The turning point was realizing I wasn't building tools. I was building a staff.

What it cost to build this

The runtime took about three months of evenings and weekends. I wrote it in TypeScript because I knew it better than Python. Claude Code did about 60% of the implementation work. I directed, reviewed, and tested.

API costs run under $40/month. Penny's audit keeps that flat. The models are tiered: Haiku handles anything that's classification or routing, Sonnet handles most agent work, Opus gets called only for the hard reasoning tasks. Output tokens cost five times what input tokens cost, so I keep agent outputs tight.

The system map you may have seen on LinkedIn took one afternoon to build. It's a single self-contained HTML file with about 120 nodes. I can update it in minutes when the system changes.

What I'd tell someone starting from scratch

Build one agent. Give it a real name, a specific job, and exactly one tool. Make it draft something. Review it yourself. Ship it only after you've seen ten outputs and understand where it fails.

Do not start with the org chart. The org chart comes after you've run a few things and know what you actually need. I had thirty agents in my head before I knew what one agent should do. That was backwards.

Write a privacy wall before you give agents access to anything personal. My personal database has 117,000 rows of my own life. Carmen is the only agent who can read it, and Walter is the only other agent she can hand it to. That boundary is in code, not just in policy.

Track the money from day one. Anthropic's API is cheap at hobby scale and expensive at production scale. The math changes faster than you expect. Build usage logging before you build the third agent, not after the tenth.

Your approval gate is not a bottleneck. It's the product. The system is useful because I trust it. I trust it because nothing ships without me seeing it first. Don't optimize that away.

The map

The system map I shared on LinkedIn is an interactive radial diagram of the full system — 119 nodes, 77 connections, 14 workflows. Each shape tells you what a thing is: circles are agents, cylinders are databases, hexagons are infrastructure, dashed pills are external services. Color tells you which division it belongs to.

You can click any node, trace any workflow, filter by type, search by name, or follow connections hop by hop. It's the most useful internal tool I've built. I open it when something breaks and I need to understand what's upstream.

I built it because I couldn't hold the system in my head anymore. That's a good sign — it means the system is big enough to need a map.