Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
When a multi-agent system gives a wrong answer, the bug is usually in the context flow: an agent didn't get what it needed, got something it shouldn't have, or a hand-off dropped a field. Debugging this requires tracing what each agent saw and produced — a context flow log across the whole system, not just one window. Without it, multi-agent bugs are nearly impossible to diagnose because the failure surfaces in one agent but originates in another's context. Building observability into the context flow from the start is what makes these systems maintainable.
The demo logs each agent's input context size, output summary, and hand-off, producing a trace you can read top-to-bottom to see where context went wrong — the field that was dropped, the agent that saw too much.
TRACE = []
def traced(agent_name):
def wrap(fn):
def inner(ctx_in, *a, **k):
out = fn(ctx_in, *a, **k)
TRACE.append({"agent": agent_name,
"in_keys": list(ctx_in.keys()) if isinstance(ctx_in, dict) else "str",
"out_preview": str(out)[:60]})
return out
return inner
return wrap
@traced("researcher")
def researcher(ctx): return {"findings": "supply risk high"}
@traced("writer")
def writer(ctx): return f"Report: {ctx.get('findings', 'MISSING!')}"
writer(researcher({"task": "Q2"})) # note: writer didn't receive 'findings' -> bug visible in trace
for t in TRACE: print(t)python3 main.pyUse these three in order. Each builds on the one before.
Why are multi-agent systems hard to debug, and what is 'context flow' tracing?
Explain how logging each agent's input context, output, and hand-offs lets me localize a multi-agent bug to the agent whose context was wrong.
Design observability for a multi-agent system: per-agent context traces, hand-off validation, scope-violation detection, and how I'd reconstruct the full context flow to debug a wrong final answer.
Working within free-tier limits. Free / low-tier provider keys rate-limit aggressively, and eval or agent loops that fan out calls will hit
429 Too Many Requestsfast. Survive it: readRetry-Afterand thex-ratelimit-*headers and back off (exponential backoff with jitter + a max-retry cap) instead of hammering; cap in-flight requests with a small concurrency limiter so you stay under the RPM/TPM ceiling; cache identical requests so retries don't re-spend quota; downshift to a smaller/cheaper model for practice runs; use the provider Batch API for non-interactive jobs; or sidestep hosted limits entirely by running a small model locally (Ollama / llama.cpp) or on a free Colab/Kaggle GPU while you learn.