shaneohanlon.dev / learning / ai-agents
Managing Teams
of AI Agents
A 10-week self-directed plan covering agent fundamentals, single agent mastery, multi-agent orchestration, and production reliability. Built for evenings and weekends using personal tools.
Phase 01
How Agents Actually Work
Weeks 1–2
0%
Phase 02
Single Agent Mastery
Weeks 3–4
0%
Phase 03
Multi-Agent Patterns
Weeks 5–7
0%
Phase 04
Reliability & Cost Control
Weeks 8–10
0%
Learning Journey
1
Foundations
Wks 1–2 · Reading
2
Single Agent
Wks 3–4 · Hands-on
3
Orchestration
Wks 5–7 · Multi-agent
4
Production
Wks 8–10 · Reliability
Phase 01 · Weeks 1–2
How Agents Actually Work
Reading-first. No tooling yet.
Understand the ReAct loop, orchestrator vs subagent, context accumulation, and how to evaluate non-deterministic systems. These building blocks underpin everything that follows.
Books to Read
- AI Engineering — Chip Huyen ↗ Ch 1–3 (architecture, tool use), Ch 7–8 (ReAct, agent patterns), Ch 10 (evals). Read Ch 10 even if you skip others.
- Prompt Engineering for Generative AI ↗ ReAct patterns, chain-of-thought, why agents fail. Tool use and multi-step behaviour sections only.
Concepts to Nail
- ReAct loop — Reason → Act → Observe → repeat
- Orchestrator vs subagent — planner vs executor
- Context accumulation — why long sessions degrade
- Evals — testing non-deterministic output
- Tool use — how agents interact with systems
Hands-On Validation
Run Claude Code against a personal project. Don't intervene — just observe how it reasons and where it gets stuck.
# Watch the full agentic loop without touching the keyboard
claude "read this repo, find all functions with no docstrings, add them"- Ran Claude Code on a personal project end to end without intervening
- Observed the full tool call sequence in the output
- Noted at least two places where the agent made a wrong assumption
Phase 02 · Weeks 3–4
Single Agent Mastery
Hands-on with real projects. No multi-agent yet.
Run a real task from one of your own projects end to end with a single agent. Review every diff before accepting. Build intuition for where agents succeed and fail before adding complexity.
Read First
- Claude Code — Official Docs ↗ Read in full. Custom instructions, permissions, hooks, Task tool.
- Code Agent Orchestra — Addy Osmani ↗ Conductor vs orchestrator mental model. Short but essential framing.
Project Checklist
- Set up LiteLLM proxy pointing at OpenRouter
- Pointed Claude Code at localhost:4000
- Completed a real task end to end with a single agent
- Reviewed every diff before accepting
- Written CLAUDE.md files for 2 personal repos
The Most Important Skill: CLAUDE.md
Your agent's standing brief. Every project needs one. The quality of your CLAUDE.md directly determines the quality of your agent's output.
# CLAUDE.md — place in repo root
## What this is
Brief description, language, purpose.
## Standards
- Type hints on all functions
- Tests in /tests mirroring /src
- No print() — use logging module
## You can
- Read and edit /src and /tests
- Run pytest, ruff, mypy
- Commit to feature branches
## You cannot
- Push to main
- Delete files without confirming
- Modify dependencies without asking
## When unsure
Ask. Don't guess on architecture.Phase 03 · Weeks 5–7
Multi-Agent Patterns
Orchestrators, subagents, git worktrees.
Run 3+ parallel agents on your own repo without conflicts. Understand when to reach for each tier. Complete both hands-on projects before moving to Phase 4.
Read
- Oh My Claude Code ↗ Read in full. Teams setup, parallel orchestration, swarm patterns.
- Claude Code Multi-Agent — Shipyard ↗ Git worktree patterns and parallel agent coordination.
- Dive into Claude Code — VILA Lab ↗ Systematic reference. Skim for orchestration patterns.
Hands-On Projects
- Set up git worktrees for parallel agent isolation
- Ran 3 parallel agents on same repo without merge conflicts
- Built orchestrator → subagent pipeline using Task tool
- Parallel PR review — 3 subagents from different angles
- Parallel codebase audit across worktrees
Orchestrator Pattern
Orchestratorsmart model — plans & delegates
Auditexisting code
Implementchanges
Testcoverage
Docsupdate
| Alias | Role | Rationale |
|---|---|---|
| smart | Orchestrator | Needs strong reasoning to plan and delegate |
| default | Subagents | Execute well-scoped tasks, cost-efficient |
| free | Exploratory | Throwaway tasks, zero API spend |
The Three Tiers — Know Which to Reach For
Tier 01 — Start Here
Subagents in Session
Single feature, you're present, fast iteration
Claude Code Task tool
Tier 02 — When Tier 1 Hits Ceiling
Local Parallel
3–10 agents, you review diffs after the fact
Claude Squad + worktrees
Tier 03 — Advanced
Autonomous
Overnight runs, large backlogs, no human in loop
Claude Code Web
Git Worktrees — The Enabling Primitive
Multiple agents on the same repo need isolation or they'll conflict. Worktrees give each agent its own working copy.
# Create isolated worktrees per agent
git worktree add ../branch-tests -b feature/add-test-coverage
git worktree add ../branch-types -b feature/add-type-hints
git worktree add ../branch-docs -b feature/update-docs
# Each agent runs independently — no conflicts
cd ../branch-tests && claude "add test coverage for module_x.py"
cd ../branch-types && claude "add type hints to all functions in /src"
cd ../branch-docs && claude "rewrite README with usage examples"
# Clean up after merging
git worktree remove ../branch-tests
git worktree listPhase 04 · Weeks 8–10
Reliability & Cost Control
Evals, observability, failure modes.
Build an eval harness for something your agents do repeatedly. Wire spend logging. Run an autonomous overnight task with no human in the loop. Revisit the AI Engineering evals chapter now that you have context.
Read
- LiteLLM Docs — Callbacks & Budget ↗ Callbacks, budget management, logging. Understand your proxy fully.
- AI Engineering Ch 10 — Evals (revisit) ↗ Lands differently once you've run real agents. Read it again.
Hands-On Projects
- Wired LiteLLM spend logging to terminal
- Written 10 deterministic test cases for a repeatable task
- Run eval harness, tracked pass rate across model swap
- Ran autonomous health check with no human in loop
- Hard budget cap set and verified in LiteLLM
Failure Modes to Design Around
Context Overflow
Long sessions accumulate history and degrade. Start fresh sessions for new tasks — don't continue indefinitely.
Tool Call Loops
Agent retrying a failing tool forever. Set max_turns in Claude Code to hard-stop runaway loops.
Silent Hallucination
Agent claims done but output is wrong. Always verify with tests or a linter — never trust the summary alone.
Cost Runaway
One poorly scoped task burns your monthly budget. The hard cap in LiteLLM catches this — check spend weekly.
Prompt Injection
Agent reads adversarial instructions from files or web content. Scope tool permissions tightly per agent role.
Scope Creep
Agent modifies files outside its intended scope. The CLAUDE.md "You cannot" section must be explicit, not implied.
Observability — Map to What You Already Know
| Infra Concept | Agent Equivalent |
|---|---|
| Request latency | Task completion time |
| Error rate | Task failure / hallucination rate |
| Resource utilisation | Token spend per agent role |
| Distributed tracing | Tool call chain audit log |
| SLO | Task success rate threshold |
LiteLLM Spend Logging
# ~/.litellm/callbacks.py
def log_spend(kwargs, response, start_time, end_time):
model = kwargs.get("model")
usage = response.usage
duration = (end_time - start_time).seconds
cost = kwargs.get('response_cost', 0)
print(
f"[{model}] "
f"in:{usage.prompt_tokens} out:{usage.completion_tokens} "
f"time:{duration}s cost:${cost:.4f}"
)
# Add to litellm_config.yaml
litellm_settings:
success_callback: ["~/.litellm/callbacks.py:log_spend"]All Links
Resources
Books
AI Engineering ↗
Chip Huyen, O'Reilly 2025. Start with Ch 7–8 and Ch 10.
Prompt Engineering for Generative AI ↗
Phoenix & Taylor. ReAct patterns and failure modes.
Official Docs
Claude Code Docs ↗
Official. Read in full before Phase 3.
LiteLLM Docs ↗
Callbacks, budget management, your proxy reference.
OpenRouter Docs ↗
API reference, model list, spend dashboard.
Articles
Oh My Claude Code ↗
Best practical Claude Code teams resource.
Code Agent Orchestra ↗
Conductor vs orchestrator mental model.
Claude Code Multi-Agent ↗
Git worktrees and parallel agent coordination.
Dive into Claude Code ↗
VILA Lab. Systematic orchestration reference.
Courses
LangGraph Specialisation ↗
Coursera. Orchestration, tool calling, memory.
CrewAI + MCP Course ↗
Coursera. Model Context Protocol, agent coordination.