Claude Code with Local LLMs and ANTHROPIC_BASE_URL: Ollama, LM Studio, llama.cpp, vLLM

Wed, 29 Apr 2026 07:00:00 +0200

Native Anthropic endpoints, tool-call compatibility, and context-window sizing for local Claude Code.

Last tested: April 2026. See Changelog at the bottom.

TL;DR cheat sheet

Goal	Use
MacBook Air	Gemma 4 26B-A4B Q4, 32K context, LM Studio or Ollama
MacBook Pro	Gemma 4 26B-A4B Q4 / UD-Q4, 64K context, llama.cpp or LM Studio
Claude Code minimum	32K context (anything below is a chat demo)
Best local backend	LM Studio or Ollama first; llama.cpp for advanced; vLLM for servers
Avoid	8K / 16K context, dense 31B Gemma 4 on 32 GB machines, old llama.cpp builds

The local-Claude-Code rule of thumb

Three things decide whether a local Claude Code session works:

Self-Hosted LLM on Kubernetes: A Production vLLM Deployment

Sun, 05 Apr 2026 07:00:00 +0200

Most teams asking about self-hosted LLM Kubernetes deployments should not be running Kubernetes for this at all. The honest answer is that vLLM on a single GPU box, wrapped in systemd or Docker Compose, covers more use cases than anyone wants to admit. Kubernetes earns its keep only when you already run it, or when you need horizontal scaling, multi-tenant isolation, or proper rolling deploys across a GPU node pool.

Vllm on René Zander | AI Automation Consultant

Claude Code with Local LLMs and ANTHROPIC_BASE_URL: Ollama, LM Studio, llama.cpp, vLLM

TL;DR cheat sheet

The local-Claude-Code rule of thumb

Self-Hosted LLM on Kubernetes: A Production vLLM Deployment