Parallel tool calls — the latency multiplier

medium

Learn with your AI

Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.

Open in Claude Open in ChatGPT

Why this matters

By default, an agent calls tools sequentially: call 1 returns, model thinks, call 2 returns, model thinks. For independent tools, that's wasted time — parallel execution cuts latency dramatically. Modern Claude / GPT can emit multiple tool_use blocks in a single response; you just need to handle them concurrently and return all results in one tool_result message. This is a free 2-5x latency improvement on multi-tool queries.

Demo

When the model emits 2-4 tool_use blocks in one response, run them in parallel (asyncio, threadpool, whatever your stack uses). Collect all results. Send them back as a list of tool_result blocks in one message. The model now has all the data and produces the answer in one final call. Compare: 'tell me the weather in Paris and Berlin and the time in Tokyo' takes 1 LLM call + 3 parallel tool calls + 1 LLM call = ~3 seconds vs 6+ seconds sequential.

Try it yourself

Switch to async tool execution. Most agent loops are sync by default; this is a one-day refactor that 2-5x's perceived latency.
Test with a query that should trigger parallel tools ('what's the weather in 3 cities?'). Verify the model emits multiple tool_use blocks and your runtime handles them.
Watch for tools with shared state (DB writes, rate limits). Parallel calls to those can collide; add per-tool locks or use one tool that batches.
Add disable_parallel_tool_use=True for any flow where ordering matters (e.g. sequential writes). Most flows don't need it.

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

What does parallel tool calling do and what's the latency win?

2. Why it works (the mechanism)

Walk me through how the model decides to emit multiple tool_use blocks vs sequential. Does it depend on the prompt, the model, or the task?

3. Advanced — application & what's next

I have an agent that calls 3 read tools then 1 write tool. Help me design the orchestration so reads parallelize and the write is sequential, and the model can't accidentally parallelize the write.