Skip to content
Side Project · 2026

Paperclip + Hermes — a chain of command of agents that runs three websites.

A multi-agent operating layer that runs LawnCare.Center, TechMeetups.io, and BuildFeed.tech from one inbox.

Paperclip dashboard — live agent cards across the top, summary tiles for agents enabled, tasks in progress, monthly spend, and pending approvals; charts for run activity, issues by priority and status, and success rate; recent activity and recent tasks lists below.

Paperclip + Hermes runs three websites — LawnCare.Center, TechMeetups.io, BuildFeed.tech — from one inbox. Paperclip: the control plane — issues, agents, heartbeats, approvals. Hermes: the local harness — secrets, scripts, cron, AgentMail.

Scope at a glance

One CEO. Six directors. Three websites. Every assignment is an issue: parent goal, chain of command, audit trail. Boss sets strategy. WebOps sequences. Specialists ship. Email in, deliverable out.

By the numbers

7
Agents
6
Specialist roles
166
Issues shipped
4,360
Runs / 14d
98%
Success rate
6
Routines
3
Web properties
24/7
Cadence

The chain of command

Boss is the CEO: 90-day goals, monetization, brand, hires — anything irreversible. Below: the Web Operations Manager — triage, sequence, delegate, unblock, report.

Agent org chart — Boss (Chief of Staff for Agents) at the top, branching down to Analytics Lead and Web Operations Manager, then to Automation & DevOps, Content & Editorial, Social & Distribution Lead, and SEO Agent.
The chain of command — Boss at the top, the Web Operations Manager and Analytics Lead in the middle, four specialist directors below.

Below them: five directors — SEO, Content, Social, DevOps, Analytics. Each owns a discipline across all three sites. Long work splits into child issues with parents, goals, blockers, and a definition of done. Agents don’t poll. Paperclip wakes the right one when a blocker clears, a child completes, or a comment lands.

Analytics Lead agent configuration — name, title (Analytics & Insights Lead), reports to Boss, and a capabilities paragraph covering measurement, weekly and monthly scorecards, anomaly investigation, and decision-ready summaries for CEO and WebOps.
Automation & DevOps agent configuration — name, title (Site Reliability / Operations Lead), reports to Web Operations Manager, and a capabilities paragraph covering sitemap and canonical hygiene, cron jobs, broken-link sweeps, schema validation, uptime monitoring, and deploy safety.
Two directors, two configuration cards — capabilities, who they report to, and the disciplines they own across all three properties.

How an agent actually runs

Agents wake on heartbeats — every five minutes, or on any event: assignment, comment, unblock, approval, email. Pick up the issue. Do the work. Close with a status and a next action. Exit. The next heartbeat starts fresh.

Hermes owns the Search Console pipeline, social and email integrations, and an analytics archive that keeps before/after comparisons honest. AgentMail is the human relay. Every director has an inbox. A subject-tagged email — “[WebOps] …”, “[SEO] …” — opens an issue for that agent; replies thread back to Gmail. This case study was requested that way.

Compose window addressed to the Web Operations Manager — subject '[webops] We launched toronto and bangalore on techmeetups' with a note asking to confirm indexing and review the SEO approach.
An email to the Web Operations Manager — subject-tagged, addressed like a director. The reply threads back to Gmail.

Model Triage

Not every task deserves a frontier model. Each one gets classified — triage, coding, reporting, research — and routed to the cheapest tier that can do it well. Local Qwen takes the free, private, fast work. Mid-tier models — Kimi K2.6, DeepSeek V4 — carry most of the load. Codex owns sandboxed code review. Claude is held back for the final, high-stakes pass. A task only climbs a tier when it earns it: repeated failures, irreversible blast radius, an external- or exec-facing deliverable, or genuine ambiguity the cheaper tier can’t resolve.

Model-triage flowchart — an incoming task is classified into triage, coding, reporting, or research, then routed through tiers from T0 (local Qwen) up to T3 (Claude), with a legend listing the four triggers that justify escalating a tier.
Model triage — every task routed to the cheapest tier that can do it well, escalating only when it earns it. Hover to zoom in on the detail.

Always being tuned

Most of the work is refinement. A retry policy that gives up faster. A prompt that lost a beat after a model upgrade. A budget cap generous in March, cramped by April. None of it glamorous. All of it compounds.

Tuning sticks. When an agent learns something non-obvious — a scraper double-counting on Mondays, one editorial pass producing better headlines — it writes a memory entry. The next heartbeat loads it. Across models. Across agents.

Steady state: 4,360 runs over fourteen days at 98%. The 2% that miss are things I’d rather refuse than retry — rate limits, dead sources, pending approvals. The numbers move every week. The system is built to be moved. The tuning is the work.