<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Gpu on René Zander | AI Automation Consultant</title><link>https://renezander.com/tags/gpu/</link><description>Recent content in Gpu on René Zander | AI Automation Consultant</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sun, 05 Apr 2026 07:00:00 +0200</lastBuildDate><atom:link href="https://renezander.com/tags/gpu/index.xml" rel="self" type="application/rss+xml"/><item><title>Self-Hosted LLM on Kubernetes: A Production vLLM Deployment</title><link>https://renezander.com/blog/self-hosted-llm-kubernetes/</link><pubDate>Sun, 05 Apr 2026 07:00:00 +0200</pubDate><guid>https://renezander.com/blog/self-hosted-llm-kubernetes/</guid><description>&lt;p>Most teams asking about self-hosted LLM Kubernetes deployments should not be running Kubernetes for this at all. The honest answer is that vLLM on a single GPU box, wrapped in systemd or Docker Compose, covers more use cases than anyone wants to admit. Kubernetes earns its keep only when you already run it, or when you need horizontal scaling, multi-tenant isolation, or proper rolling deploys across a GPU node pool.&lt;/p></description></item><item><title>GPU Cloud Comparison for AI Inference: 2026 Reality Check</title><link>https://renezander.com/guides/gpu-cloud-comparison-ai-inference/</link><pubDate>Sat, 04 Apr 2026 13:00:00 +0200</pubDate><guid>https://renezander.com/guides/gpu-cloud-comparison-ai-inference/</guid><description>&lt;p>You want to run LLM inference in 2026 and the GPU cloud market has fragmented into roughly three camps: developer-first hourly clouds (Lambda, RunPod, Vast.ai), enterprise Kubernetes clouds (CoreWeave, AWS, GCP, Azure), and fixed-price European hosts (Hetzner, Nebius). The right pick depends less on the raw dollar-per-hour number and more on your utilization pattern, your compliance story, and your network egress shape.&lt;/p>
&lt;p>This is a gpu cloud comparison ai inference engineers actually use when planning production workloads. I will not pretend there is one winner. The honest answer is that Hetzner dominates for always-on L40S-class inference in the EU, RunPod Secure is the sweet spot for spiky workloads, CoreWeave and the hyperscalers are the only real answer for compliance-heavy H100 SXM, and Vast.ai only earns a spot in the experimentation phase.&lt;/p></description></item></channel></rss>