Build a small tool that, given a model window, a system prompt, tool schemas, and a target output reservation, reports the variable token budget and a recommended history/retrieval split. Run it across three real features you care about and write up which lever (select/order/compress/cache/measure) each feature needs most.
messages.count_tokens — never estimate tokens with string length in production.