<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Llm-Infrastructure on René Zander | AI Automation Consultant</title><link>https://renezander.com/tags/llm-infrastructure/</link><description>Recent content in Llm-Infrastructure on René Zander | AI Automation Consultant</description><generator>Hugo</generator><language>en</language><lastBuildDate>Fri, 27 Mar 2026 10:00:00 +0100</lastBuildDate><atom:link href="https://renezander.com/tags/llm-infrastructure/index.xml" rel="self" type="application/rss+xml"/><item><title>Claude Extended Thinking: budget_tokens &amp; Output Token Costs</title><link>https://renezander.com/blog/claude-extended-thinking/</link><pubDate>Fri, 27 Mar 2026 10:00:00 +0100</pubDate><guid>https://renezander.com/blog/claude-extended-thinking/</guid><description>&lt;p>The first time I turned on Claude extended thinking for a real agent, the run went from 4 seconds to 47. The output was better. The bill was worse. That tradeoff is the whole story.&lt;/p>
&lt;p>Claude extended thinking lets Opus or Sonnet produce a block of visible reasoning tokens before the final answer. You give it a budget, it spends that budget thinking, and you pay for every thinking token at the output rate. The upside is measurable quality gains on multi-step problems. The downside is latency and cost that scale with the budget you set.&lt;/p></description></item></channel></rss>