Claude Code with Local LLMs and ANTHROPIC_BASE_URL: Ollama, LM Studio, llama.cpp, vLLM

Wed, 29 Apr 2026 07:00:00 +0200

Native Anthropic endpoints, tool-call compatibility, and context-window sizing for local Claude Code.

Last tested: April 2026. See Changelog at the bottom.

TL;DR cheat sheet

Goal	Use
MacBook Air	Gemma 4 26B-A4B Q4, 32K context, LM Studio or Ollama
MacBook Pro	Gemma 4 26B-A4B Q4 / UD-Q4, 64K context, llama.cpp or LM Studio
Claude Code minimum	32K context (anything below is a chat demo)
Best local backend	LM Studio or Ollama first; llama.cpp for advanced; vLLM for servers
Avoid	8K / 16K context, dense 31B Gemma 4 on 32 GB machines, old llama.cpp builds

The local-Claude-Code rule of thumb

Three things decide whether a local Claude Code session works:

Docker Compose AI ML Development Stack: Local LLM, Vector DB, Full YAML

Fri, 20 Mar 2026 10:00:00 +0100

Every AI project I start now begins the same way: docker compose up -d and I have Ollama, Qdrant, Postgres, Redis, and a LiteLLM proxy running in under two minutes. No pyenv conflicts, no homebrew drift, no “works on my machine”. One YAML file, one command, identical stack across my laptop and my dev VPS.

This is a tutorial for a full docker compose AI ML development stack. Copy the YAML, run it, pull a model, and start building. I use this exact layout for prototyping RAG pipelines, testing MCP servers, and running my cron-driven Claude agents before they ship to production.

Ollama on René Zander | AI Automation Consultant

Claude Code with Local LLMs and ANTHROPIC_BASE_URL: Ollama, LM Studio, llama.cpp, vLLM

TL;DR cheat sheet

The local-Claude-Code rule of thumb

Docker Compose AI ML Development Stack: Local LLM, Vector DB, Full YAML