Claude Extended Thinking: budget_tokens & Output Token Costs

Fri, 27 Mar 2026 10:00:00 +0100

The first time I turned on Claude extended thinking for a real agent, the run went from 4 seconds to 47. The output was better. The bill was worse. That tradeoff is the whole story.

Claude extended thinking lets Opus or Sonnet produce a block of visible reasoning tokens before the final answer. You give it a budget, it spends that budget thinking, and you pay for every thinking token at the output rate. The upside is measurable quality gains on multi-step problems. The downside is latency and cost that scale with the budget you set.

Llm-Infrastructure on René Zander | AI Automation Consultant

Claude Extended Thinking: budget_tokens & Output Token Costs