Debugging multi-agent context flow

hard

Learn with your AI

Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.

Open in Claude Open in ChatGPT

Why this matters

When a multi-agent system gives a wrong answer, the bug is usually in the context flow: an agent didn't get what it needed, got something it shouldn't have, or a hand-off dropped a field. Debugging this requires tracing what each agent saw and produced — a context flow log across the whole system, not just one window. Without it, multi-agent bugs are nearly impossible to diagnose because the failure surfaces in one agent but originates in another's context. Building observability into the context flow from the start is what makes these systems maintainable.

Demo

The demo logs each agent's input context size, output summary, and hand-off, producing a trace you can read top-to-bottom to see where context went wrong — the field that was dropped, the agent that saw too much.

TRACE = []

def traced(agent_name):
    def wrap(fn):
        def inner(ctx_in, *a, **k):
            out = fn(ctx_in, *a, **k)
            TRACE.append({"agent": agent_name,
                          "in_keys": list(ctx_in.keys()) if isinstance(ctx_in, dict) else "str",
                          "out_preview": str(out)[:60]})
            return out
        return inner
    return wrap

@traced("researcher")
def researcher(ctx): return {"findings": "supply risk high"}
@traced("writer")
def writer(ctx): return f"Report: {ctx.get('findings', 'MISSING!')}"

writer(researcher({"task": "Q2"}))  # note: writer didn't receive 'findings' -> bug visible in trace
for t in TRACE: print(t)

Run: python3 main.py

Try it yourself

Run it — the trace shows the writer got no 'findings' key (a dropped hand-off), making the bug obvious.

Fix the hand-off so findings flow through, and confirm the trace now shows the writer receiving them.

Add token counts per agent to the trace to spot which agent's context is bloated.

Add a 'saw forbidden key' assertion to catch an agent receiving data outside its scope.

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

Why are multi-agent systems hard to debug, and what is 'context flow' tracing?

2. Why it works (the mechanism)

Explain how logging each agent's input context, output, and hand-offs lets me localize a multi-agent bug to the agent whose context was wrong.

3. Advanced — application & what's next

Design observability for a multi-agent system: per-agent context traces, hand-off validation, scope-violation detection, and how I'd reconstruct the full context flow to debug a wrong final answer.

References

Working within free-tier limits. Free / low-tier provider keys rate-limit aggressively, and eval or agent loops that fan out calls will hit 429 Too Many Requests fast. Survive it: read Retry-After and the x-ratelimit-* headers and back off (exponential backoff with jitter + a max-retry cap) instead of hammering; cap in-flight requests with a small concurrency limiter so you stay under the RPM/TPM ceiling; cache identical requests so retries don't re-spend quota; downshift to a smaller/cheaper model for practice runs; use the provider Batch API for non-interactive jobs; or sidestep hosted limits entirely by running a small model locally (Ollama / llama.cpp) or on a free Colab/Kaggle GPU while you learn.