LiteLLM — Connecting Claude Code to Local LLMs
Use LiteLLM as a translation proxy: Claude Code speaks Anthropic format, local models speak OpenAI format. LiteLLM bridges the two.
Required config.yaml
model_list:
# Map all Claude model names → your local model
- model_name: claude-sonnet-4-20250514
litellm_params:
model: openai/qwen3_8b # or deepseek-r1-distill-qwen-32B, etc.
api_base: http://192.168.x.x:8000/v1
api_key: "your-api-key"
supports_system_message: false
timeout: 300
stream_timeout: 300
- model_name: claude-opus-4-20250514
litellm_params:
model: openai/qwen3_8b
api_base: http://192.168.x.x:8000/v1
api_key: "your-api-key"
supports_system_message: false
- model_name: claude-haiku-4-5-20251001
litellm_params:
model: openai/qwen3_8b
api_base: http://192.168.x.x:8000/v1
api_key: "your-api-key"
supports_system_message: false
general_settings:
enable_anthropic_pass_through: true # REQUIRED — enables /v1/messages endpoint⚠️ Claude Code requests models by exact name (
claude-sonnet-4-20250514). If that alias is missing from the config, you get a 404 / auth error.
Start LiteLLM
litellm --config config.yaml --port 4000Or via Docker Compose (recommended):
services:
litellm:
image: ghcr.io/berriai/litellm:main-latest
ports: ["4000:4000"]
volumes:
- ./config.yaml:/app/config.yaml
env_file: .env
command: ["--config", "/app/config.yaml", "--port", "4000"]Connect Claude Code
export ANTHROPIC_BASE_URL="http://172.18.0.1:4000"
export ANTHROPIC_AUTH_TOKEN="your-litellm-key"
export DISABLE_PROMPT_CACHING=1
export DISABLE_INTERLEAVED_THINKING=1
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1Add to ~/.bashrc / ~/.zshrc to persist.
Verify endpoints
# Test OpenAI format
curl http://172.18.0.1:4000/v1/chat/completions \
-H "Authorization: Bearer your-key" \
-d '{"model":"claude-sonnet-4-20250514","messages":[{"role":"user","content":"hi"}],"max_tokens":20}'
# Test Anthropic format (what Claude Code uses)
curl http://172.18.0.1:4000/v1/messages \
-H "Authorization: Bearer your-key" \
-d '{"model":"claude-sonnet-4-20250514","messages":[{"role":"user","content":"hi"}],"max_tokens":20}'Both should return valid responses.
Common errors
| Error | Cause | Fix |
|---|---|---|
404 NotFoundError | Model alias missing | Add alias to model_list |
AuthenticationError | api_key env var not loaded | Hardcode in config or export before starting |
| Timeout / retrying | Cold-start delay (5s+) | Pre-warm backend before launching claude |
29640500m 59s duration | Wrong timestamp handling | — (unrelated, see Discord hook note) |
Cold-start workaround
Pre-warm the model before launching Claude Code:
curl -s http://your-backend/v1/chat/completions \
-H "Authorization: Bearer key" \
-d '{"model":"qwen3_8b","messages":[{"role":"user","content":"hi"}],"max_tokens":1}' \
> /dev/null && claudeOr run a background keepalive loop:
while true; do
curl -s http://your-backend/v1/chat/completions ... > /dev/null
sleep 20
done &See also
- efforts/litellm-local-llm — active setup log
- Codex CLI can also use this proxy:
export OPENAI_BASE_URL=http://172.18.0.1:4000