shaneohanlon.dev / learning / ai-agents

Managing Teams
of AI Agents

A 10-week self-directed plan covering agent fundamentals, single agent mastery, multi-agent orchestration, and production reliability. Built for evenings and weekends using personal tools.

Claude Code LiteLLM OpenRouter Multi-agent 10 weeks Interactive

Phase 01

How Agents Actually Work

Reliability & Cost Control

Weeks 8–10

Learning Journey

Foundations

Wks 1–2 · Reading

Single Agent

Wks 3–4 · Hands-on

Orchestration

Wks 5–7 · Multi-agent

Production

Wks 8–10 · Reliability

Phase 01 · Weeks 1–2

How Agents Actually Work

Reading-first. No tooling yet.

Understand the ReAct loop, orchestrator vs subagent, context accumulation, and how to evaluate non-deterministic systems. These building blocks underpin everything that follows.

Books to Read

AI Engineering — Chip Huyen ↗ Ch 1–3 (architecture, tool use), Ch 7–8 (ReAct, agent patterns), Ch 10 (evals). Read Ch 10 even if you skip others.
Prompt Engineering for Generative AI ↗ ReAct patterns, chain-of-thought, why agents fail. Tool use and multi-step behaviour sections only.

Concepts to Nail

ReAct loop — Reason → Act → Observe → repeat
Orchestrator vs subagent — planner vs executor
Context accumulation — why long sessions degrade
Evals — testing non-deterministic output
Tool use — how agents interact with systems

Hands-On Validation

Run Claude Code against a personal project. Don't intervene — just observe how it reasons and where it gets stuck.

# Watch the full agentic loop without touching the keyboard
claude "read this repo, find all functions with no docstrings, add them"

Ran Claude Code on a personal project end to end without intervening
Observed the full tool call sequence in the output
Noted at least two places where the agent made a wrong assumption

Phase 02 · Weeks 3–4

Single Agent Mastery

Hands-on with real projects. No multi-agent yet.

Run a real task from one of your own projects end to end with a single agent. Review every diff before accepting. Build intuition for where agents succeed and fail before adding complexity.

Read First

Claude Code — Official Docs ↗ Read in full. Custom instructions, permissions, hooks, Task tool.
Code Agent Orchestra — Addy Osmani ↗ Conductor vs orchestrator mental model. Short but essential framing.

Project Checklist

Set up LiteLLM proxy pointing at OpenRouter
Pointed Claude Code at localhost:4000
Completed a real task end to end with a single agent
Reviewed every diff before accepting
Written CLAUDE.md files for 2 personal repos

The Most Important Skill: CLAUDE.md

Your agent's standing brief. Every project needs one. The quality of your CLAUDE.md directly determines the quality of your agent's output.

# CLAUDE.md — place in repo root

## What this is
Brief description, language, purpose.

## Standards
- Type hints on all functions
- Tests in /tests mirroring /src
- No print() — use logging module

## You can
- Read and edit /src and /tests
- Run pytest, ruff, mypy
- Commit to feature branches

## You cannot
- Push to main
- Delete files without confirming
- Modify dependencies without asking

## When unsure
Ask. Don't guess on architecture.

Phase 03 · Weeks 5–7

Multi-Agent Patterns

Orchestrators, subagents, git worktrees.

Run 3+ parallel agents on your own repo without conflicts. Understand when to reach for each tier. Complete both hands-on projects before moving to Phase 4.

Read

Oh My Claude Code ↗ Read in full. Teams setup, parallel orchestration, swarm patterns.
Claude Code Multi-Agent — Shipyard ↗ Git worktree patterns and parallel agent coordination.
Dive into Claude Code — VILA Lab ↗ Systematic reference. Skim for orchestration patterns.

Hands-On Projects

Set up git worktrees for parallel agent isolation
Ran 3 parallel agents on same repo without merge conflicts
Built orchestrator → subagent pipeline using Task tool
Parallel PR review — 3 subagents from different angles
Parallel codebase audit across worktrees

Orchestrator Pattern

Orchestratorsmart model — plans & delegates

Auditexisting code

Implementchanges

Testcoverage

Docsupdate

Alias	Role	Rationale
smart	Orchestrator	Needs strong reasoning to plan and delegate
default	Subagents	Execute well-scoped tasks, cost-efficient
free	Exploratory	Throwaway tasks, zero API spend

The Three Tiers — Know Which to Reach For

Tier 01 — Start Here

Subagents in Session

Single feature, you're present, fast iteration

Claude Code Task tool

Tier 02 — When Tier 1 Hits Ceiling

Local Parallel

3–10 agents, you review diffs after the fact

Claude Squad + worktrees

Tier 03 — Advanced

Autonomous

Overnight runs, large backlogs, no human in loop

Claude Code Web

Git Worktrees — The Enabling Primitive

Multiple agents on the same repo need isolation or they'll conflict. Worktrees give each agent its own working copy.

# Create isolated worktrees per agent
git worktree add ../branch-tests -b feature/add-test-coverage
git worktree add ../branch-types -b feature/add-type-hints
git worktree add ../branch-docs  -b feature/update-docs

# Each agent runs independently — no conflicts
cd ../branch-tests && claude "add test coverage for module_x.py"
cd ../branch-types && claude "add type hints to all functions in /src"
cd ../branch-docs  && claude "rewrite README with usage examples"

# Clean up after merging
git worktree remove ../branch-tests
git worktree list

Phase 04 · Weeks 8–10

Reliability & Cost Control

Evals, observability, failure modes.

Build an eval harness for something your agents do repeatedly. Wire spend logging. Run an autonomous overnight task with no human in the loop. Revisit the AI Engineering evals chapter now that you have context.

Read

LiteLLM Docs — Callbacks & Budget ↗ Callbacks, budget management, logging. Understand your proxy fully.
AI Engineering Ch 10 — Evals (revisit) ↗ Lands differently once you've run real agents. Read it again.

Hands-On Projects

Wired LiteLLM spend logging to terminal
Written 10 deterministic test cases for a repeatable task
Run eval harness, tracked pass rate across model swap
Ran autonomous health check with no human in loop
Hard budget cap set and verified in LiteLLM

Failure Modes to Design Around

Context Overflow

Long sessions accumulate history and degrade. Start fresh sessions for new tasks — don't continue indefinitely.

Tool Call Loops

Agent retrying a failing tool forever. Set max_turns in Claude Code to hard-stop runaway loops.

Silent Hallucination

Agent claims done but output is wrong. Always verify with tests or a linter — never trust the summary alone.

Cost Runaway

One poorly scoped task burns your monthly budget. The hard cap in LiteLLM catches this — check spend weekly.

Prompt Injection

Agent reads adversarial instructions from files or web content. Scope tool permissions tightly per agent role.

Scope Creep

Agent modifies files outside its intended scope. The CLAUDE.md "You cannot" section must be explicit, not implied.

Observability — Map to What You Already Know

Infra Concept	Agent Equivalent
Request latency	Task completion time
Error rate	Task failure / hallucination rate
Resource utilisation	Token spend per agent role
Distributed tracing	Tool call chain audit log
SLO	Task success rate threshold

LiteLLM Spend Logging

# ~/.litellm/callbacks.py
def log_spend(kwargs, response, start_time, end_time):
    model    = kwargs.get("model")
    usage    = response.usage
    duration = (end_time - start_time).seconds
    cost     = kwargs.get('response_cost', 0)
    print(
        f"[{model}] "
        f"in:{usage.prompt_tokens} out:{usage.completion_tokens} "
        f"time:{duration}s cost:${cost:.4f}"
    )

# Add to litellm_config.yaml
litellm_settings:
  success_callback: ["~/.litellm/callbacks.py:log_spend"]

All Links

Resources

Books

AI Engineering ↗ Chip Huyen, O'Reilly 2025. Start with Ch 7–8 and Ch 10. Prompt Engineering for Generative AI ↗ Phoenix & Taylor. ReAct patterns and failure modes.

Official Docs

Claude Code Docs ↗ Official. Read in full before Phase 3. LiteLLM Docs ↗ Callbacks, budget management, your proxy reference. OpenRouter Docs ↗ API reference, model list, spend dashboard.

Articles

Oh My Claude Code ↗ Best practical Claude Code teams resource. Code Agent Orchestra ↗ Conductor vs orchestrator mental model. Claude Code Multi-Agent ↗ Git worktrees and parallel agent coordination. Dive into Claude Code ↗ VILA Lab. Systematic orchestration reference.

Courses

LangGraph Specialisation ↗ Coursera. Orchestration, tool calling, memory. CrewAI + MCP Course ↗ Coursera. Model Context Protocol, agent coordination.

Community

Anthropic Discord ↗ Claude Code channel. Active, practical troubleshooting. Latent Space Podcast ↗ Best audio for AI engineering depth. Weekly.

Quick Reference

Claude Code Settings ↗ Permissions, hooks, MCP server config. LiteLLM Budget Manager ↗ Hard caps, per-model limits, spend alerts. OpenRouter ↗ Single API key, 300+ models, spend dashboard.

Managing Teamsof AI Agents

How Agents Actually Work

Single Agent Mastery

Multi-Agent Patterns

Reliability & Cost Control

Resources

Managing Teams
of AI Agents