shaneohanlon.dev / learning / ai-agents

Managing Teams
of AI Agents

A 10-week self-directed plan covering agent fundamentals, single agent mastery, multi-agent orchestration, and production reliability. Built for evenings and weekends using personal tools.

Claude Code LiteLLM OpenRouter Multi-agent 10 weeks Interactive
Phase 01
How Agents Actually Work
Weeks 1–2
0%
Phase 02
Single Agent Mastery
Weeks 3–4
0%
Phase 03
Multi-Agent Patterns
Weeks 5–7
0%
Phase 04
Reliability & Cost Control
Weeks 8–10
0%
Learning Journey
1
Foundations
Wks 1–2 · Reading
2
Single Agent
Wks 3–4 · Hands-on
3
Orchestration
Wks 5–7 · Multi-agent
4
Production
Wks 8–10 · Reliability
Phase 01 · Weeks 1–2

How Agents Actually Work

Reading-first. No tooling yet.
Understand the ReAct loop, orchestrator vs subagent, context accumulation, and how to evaluate non-deterministic systems. These building blocks underpin everything that follows.
Books to Read
Concepts to Nail
  • ReAct loop — Reason → Act → Observe → repeat
  • Orchestrator vs subagent — planner vs executor
  • Context accumulation — why long sessions degrade
  • Evals — testing non-deterministic output
  • Tool use — how agents interact with systems
Hands-On Validation

Run Claude Code against a personal project. Don't intervene — just observe how it reasons and where it gets stuck.

# Watch the full agentic loop without touching the keyboard claude "read this repo, find all functions with no docstrings, add them"
  • Ran Claude Code on a personal project end to end without intervening
  • Observed the full tool call sequence in the output
  • Noted at least two places where the agent made a wrong assumption
Phase 02 · Weeks 3–4

Single Agent Mastery

Hands-on with real projects. No multi-agent yet.
Run a real task from one of your own projects end to end with a single agent. Review every diff before accepting. Build intuition for where agents succeed and fail before adding complexity.
Read First
Project Checklist
  • Set up LiteLLM proxy pointing at OpenRouter
  • Pointed Claude Code at localhost:4000
  • Completed a real task end to end with a single agent
  • Reviewed every diff before accepting
  • Written CLAUDE.md files for 2 personal repos
The Most Important Skill: CLAUDE.md

Your agent's standing brief. Every project needs one. The quality of your CLAUDE.md directly determines the quality of your agent's output.

# CLAUDE.md — place in repo root ## What this is Brief description, language, purpose. ## Standards - Type hints on all functions - Tests in /tests mirroring /src - No print() — use logging module ## You can - Read and edit /src and /tests - Run pytest, ruff, mypy - Commit to feature branches ## You cannot - Push to main - Delete files without confirming - Modify dependencies without asking ## When unsure Ask. Don't guess on architecture.
Phase 03 · Weeks 5–7

Multi-Agent Patterns

Orchestrators, subagents, git worktrees.
Run 3+ parallel agents on your own repo without conflicts. Understand when to reach for each tier. Complete both hands-on projects before moving to Phase 4.
Read
Hands-On Projects
  • Set up git worktrees for parallel agent isolation
  • Ran 3 parallel agents on same repo without merge conflicts
  • Built orchestrator → subagent pipeline using Task tool
  • Parallel PR review — 3 subagents from different angles
  • Parallel codebase audit across worktrees
Orchestrator Pattern
Orchestratorsmart model — plans & delegates
Auditexisting code
Implementchanges
Testcoverage
Docsupdate
AliasRoleRationale
smartOrchestratorNeeds strong reasoning to plan and delegate
defaultSubagentsExecute well-scoped tasks, cost-efficient
freeExploratoryThrowaway tasks, zero API spend
The Three Tiers — Know Which to Reach For
Tier 01 — Start Here
Subagents in Session
Single feature, you're present, fast iteration
Claude Code Task tool
Tier 02 — When Tier 1 Hits Ceiling
Local Parallel
3–10 agents, you review diffs after the fact
Claude Squad + worktrees
Tier 03 — Advanced
Autonomous
Overnight runs, large backlogs, no human in loop
Claude Code Web
Git Worktrees — The Enabling Primitive

Multiple agents on the same repo need isolation or they'll conflict. Worktrees give each agent its own working copy.

# Create isolated worktrees per agent git worktree add ../branch-tests -b feature/add-test-coverage git worktree add ../branch-types -b feature/add-type-hints git worktree add ../branch-docs -b feature/update-docs # Each agent runs independently — no conflicts cd ../branch-tests && claude "add test coverage for module_x.py" cd ../branch-types && claude "add type hints to all functions in /src" cd ../branch-docs && claude "rewrite README with usage examples" # Clean up after merging git worktree remove ../branch-tests git worktree list
Phase 04 · Weeks 8–10

Reliability & Cost Control

Evals, observability, failure modes.
Build an eval harness for something your agents do repeatedly. Wire spend logging. Run an autonomous overnight task with no human in the loop. Revisit the AI Engineering evals chapter now that you have context.
Read
Hands-On Projects
  • Wired LiteLLM spend logging to terminal
  • Written 10 deterministic test cases for a repeatable task
  • Run eval harness, tracked pass rate across model swap
  • Ran autonomous health check with no human in loop
  • Hard budget cap set and verified in LiteLLM
Failure Modes to Design Around
Context Overflow
Long sessions accumulate history and degrade. Start fresh sessions for new tasks — don't continue indefinitely.
Tool Call Loops
Agent retrying a failing tool forever. Set max_turns in Claude Code to hard-stop runaway loops.
Silent Hallucination
Agent claims done but output is wrong. Always verify with tests or a linter — never trust the summary alone.
Cost Runaway
One poorly scoped task burns your monthly budget. The hard cap in LiteLLM catches this — check spend weekly.
Prompt Injection
Agent reads adversarial instructions from files or web content. Scope tool permissions tightly per agent role.
Scope Creep
Agent modifies files outside its intended scope. The CLAUDE.md "You cannot" section must be explicit, not implied.
Observability — Map to What You Already Know
Infra ConceptAgent Equivalent
Request latencyTask completion time
Error rateTask failure / hallucination rate
Resource utilisationToken spend per agent role
Distributed tracingTool call chain audit log
SLOTask success rate threshold
LiteLLM Spend Logging
# ~/.litellm/callbacks.py def log_spend(kwargs, response, start_time, end_time): model = kwargs.get("model") usage = response.usage duration = (end_time - start_time).seconds cost = kwargs.get('response_cost', 0) print( f"[{model}] " f"in:{usage.prompt_tokens} out:{usage.completion_tokens} " f"time:{duration}s cost:${cost:.4f}" ) # Add to litellm_config.yaml litellm_settings: success_callback: ["~/.litellm/callbacks.py:log_spend"]
All Links

Resources