<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Infrastructure on René Zander | AI Automation Consultant</title><link>https://renezander.com/tags/infrastructure/</link><description>Recent content in Infrastructure on René Zander | AI Automation Consultant</description><generator>Hugo</generator><language>en</language><lastBuildDate>Thu, 16 Apr 2026 09:00:00 +0200</lastBuildDate><atom:link href="https://renezander.com/tags/infrastructure/index.xml" rel="self" type="application/rss+xml"/><item><title>Self-Hosted LLM vs API Cost: Break-Even Analysis (2026)</title><link>https://renezander.com/guides/self-hosted-llm-vs-api/</link><pubDate>Thu, 16 Apr 2026 09:00:00 +0200</pubDate><guid>https://renezander.com/guides/self-hosted-llm-vs-api/</guid><description>&lt;p>Every few months a client asks me the same question. &amp;ldquo;We&amp;rsquo;re burning $8k/mo on Claude. Should we self-host Llama?&amp;rdquo; The answer is almost always no, and the reason has nothing to do with whether the model is good enough. It has to do with what a GPU costs when it&amp;rsquo;s idle, and how much engineering time it takes to keep a serving stack healthy at 3am.&lt;/p>
&lt;p>This guide breaks down self-hosted LLM vs API cost with real numbers. Hetzner GPU pricing, RunPod and Lambda hourly rates, Claude Sonnet 4.6 and Haiku 4.5 token pricing, and the break-even points that actually matter. The goal is to give you a decision framework, not a marketing pitch for either side.&lt;/p></description></item><item><title>GPU Cloud Comparison for AI Inference: 2026 Reality Check</title><link>https://renezander.com/guides/gpu-cloud-comparison-ai-inference/</link><pubDate>Sat, 04 Apr 2026 13:00:00 +0200</pubDate><guid>https://renezander.com/guides/gpu-cloud-comparison-ai-inference/</guid><description>&lt;p>You want to run LLM inference in 2026 and the GPU cloud market has fragmented into roughly three camps: developer-first hourly clouds (Lambda, RunPod, Vast.ai), enterprise Kubernetes clouds (CoreWeave, AWS, GCP, Azure), and fixed-price European hosts (Hetzner, Nebius). The right pick depends less on the raw dollar-per-hour number and more on your utilization pattern, your compliance story, and your network egress shape.&lt;/p>
&lt;p>This is a gpu cloud comparison ai inference engineers actually use when planning production workloads. I will not pretend there is one winner. The honest answer is that Hetzner dominates for always-on L40S-class inference in the EU, RunPod Secure is the sweet spot for spiky workloads, CoreWeave and the hyperscalers are the only real answer for compliance-heavy H100 SXM, and Vast.ai only earns a spot in the experimentation phase.&lt;/p></description></item><item><title>Your Vector Database Decision Is Simpler Than You Think</title><link>https://renezander.com/blog/your-vector-database-decision-is-simpler-than-you-think/</link><pubDate>Tue, 17 Mar 2026 07:41:59 +0000</pubDate><guid>https://renezander.com/blog/your-vector-database-decision-is-simpler-than-you-think/</guid><description>&lt;p>Every week someone asks which vector database they should use. The answer is almost always &amp;ldquo;it depends on three things,&amp;rdquo; and none of them are throughput benchmarks.&lt;/p>
&lt;p>I run semantic search in production on a single VPS. Over a thousand items indexed, embeddings generated on the same machine, queries return in under a second. But that setup only works because of the constraints I&amp;rsquo;m operating in. Change the constraints and the answer changes completely.&lt;/p></description></item><item><title>I Run 10 AI Agents in Production. They're All Bash Scripts.</title><link>https://renezander.com/blog/i-run-10-ai-agents-in-production-theyre-all-bash-scripts-df2/</link><pubDate>Thu, 12 Mar 2026 14:29:44 +0000</pubDate><guid>https://renezander.com/blog/i-run-10-ai-agents-in-production-theyre-all-bash-scripts-df2/</guid><description>&lt;p>A week ago I wrote about &lt;a href="https://dev.to/renezander030/lots-of-people-are-demoing-ai-agents-almost-nobodys-shipping-them-the-right-way-5c10">shipping AI agents the right way&lt;/a>. That piece was about the harness: quality gates, token economics, multi-model verification. The stuff that separates demos from production.&lt;/p>
&lt;p>A lot of people resonated with it. But I left out the part that actually eats most of my time: keeping the boring stuff running.&lt;/p>
&lt;p>So let me walk you through what production AI agents actually look like when the conference talk is over.&lt;/p></description></item><item><title>Lots Of People Are Demoing AI Agents. Almost Nobody's Shipping Them The Right Way.</title><link>https://renezander.com/blog/lots-of-people-are-demoing-ai-agents-almost-nobodys-shipping-them-the-right-way/</link><pubDate>Wed, 04 Mar 2026 10:56:24 +0000</pubDate><guid>https://renezander.com/blog/lots-of-people-are-demoing-ai-agents-almost-nobodys-shipping-them-the-right-way/</guid><description>&lt;p>Lots of people are demoing AI agents. Almost nobody&amp;rsquo;s shipping them the right way.&lt;/p>
&lt;p>Conference stages are packed with live demos of agents writing Terraform, spinning up Kubernetes clusters, and generating Helm charts on command. The audience claps. The tweet goes viral. And then&amp;hellip; nothing ships.&lt;/p>
&lt;p>Here&amp;rsquo;s the uncomfortable truth: the gap between &amp;ldquo;look what my agent can do&amp;rdquo; and &amp;ldquo;this runs in production every day&amp;rdquo; is enormous. I&amp;rsquo;ve been on both sides. I spent years as an Enterprise Architect watching organizations spin up AI pilots that never graduated. Now I run my own infrastructure with Claude as the core agent — not as a demo, not as a proof of concept, but as the actual engine that keeps things moving.&lt;/p></description></item></channel></rss>