<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Kubernetes on René Zander | AI Automation Consultant</title><link>https://renezander.com/tags/kubernetes/</link><description>Recent content in Kubernetes on René Zander | AI Automation Consultant</description><generator>Hugo</generator><language>en</language><lastBuildDate>Thu, 23 Apr 2026 09:00:00 +0000</lastBuildDate><atom:link href="https://renezander.com/tags/kubernetes/index.xml" rel="self" type="application/rss+xml"/><item><title>Voice AI in Production: From RunPod to Hosted Kubernetes</title><link>https://renezander.com/blog/voice-ai-production-kubernetes/</link><pubDate>Thu, 23 Apr 2026 09:00:00 +0000</pubDate><guid>https://renezander.com/blog/voice-ai-production-kubernetes/</guid><description>&lt;p>Your voice model works in a demo. The same model in production stalls under concurrent load. The model file is identical. So is the GPU card. Only the deployment changed.&lt;/p>
&lt;p>If your TTS service runs on a single RunPod pod, you&amp;rsquo;ve already met this wall. You handle one request per GPU at a time. A crash costs ninety seconds to reload the model. Failover isn&amp;rsquo;t in the setup. Your marketing page says &amp;ldquo;generate narration instantly.&amp;rdquo; Your infrastructure says &amp;ldquo;please form an orderly queue.&amp;rdquo;&lt;/p></description></item><item><title>Self-Hosted LLM on Kubernetes: A Production vLLM Deployment</title><link>https://renezander.com/blog/self-hosted-llm-kubernetes/</link><pubDate>Sun, 05 Apr 2026 07:00:00 +0200</pubDate><guid>https://renezander.com/blog/self-hosted-llm-kubernetes/</guid><description>&lt;p>Most teams asking about self-hosted LLM Kubernetes deployments should not be running Kubernetes for this at all. The honest answer is that vLLM on a single GPU box, wrapped in systemd or Docker Compose, covers more use cases than anyone wants to admit. Kubernetes earns its keep only when you already run it, or when you need horizontal scaling, multi-tenant isolation, or proper rolling deploys across a GPU node pool.&lt;/p></description></item><item><title>n8n Self-Hosting Guide: Docker, Kubernetes, and Bare Metal in Production</title><link>https://renezander.com/blog/n8n-self-hosting-guide/</link><pubDate>Tue, 31 Mar 2026 09:00:00 +0200</pubDate><guid>https://renezander.com/blog/n8n-self-hosting-guide/</guid><description>&lt;p>I have been running n8n self-hosted since 2022 across three different topologies: a single-VPS Docker Compose setup, a small Kubernetes cluster with queue mode, and a bare systemd install on a hardened Debian box. Each one earns its place, and picking wrong costs you weekends. This n8n self-hosting guide is the version I wish I had when I started, written for teams that want production stability, not a demo.&lt;/p>
&lt;p>The short verdict up front: run Docker Compose until you physically cannot. Move to Kubernetes only when you already run Kubernetes for other services, or when you are genuinely north of 50,000 executions per day. The bare systemd path exists for people like me who enjoy minimal stacks and want to understand every moving part. All three paths work. The wrong one for your situation will feel like a second job.&lt;/p></description></item></channel></rss>