Claude API Prompt Caching: When It Saves Money and When It Doesn't

Mon, 23 Mar 2026 08:00:00 +0100

I have an agent that reads a 15,000 token knowledge base on every turn. Multi-turn conversation, roughly 40 calls per user session. Without caching, every turn repays the full input token cost for a context window that never changes. With Claude API prompt caching, the knowledge base gets written once at 1.25x input price, then every subsequent read costs 0.1x. After the second call, it is already cheaper than paying full input rate. That math is the whole reason the feature exists.

Llm-Cost-Optimization on René Zander | AI Automation Consultant

Claude API Prompt Caching: When It Saves Money and When It Doesn't