LiteLLM — Connecting Claude Code to Local LLMs

Use LiteLLM as a translation proxy: Claude Code speaks Anthropic format, local models speak OpenAI format. LiteLLM bridges the two.

Required config.yaml

model_list:
  # Map all Claude model names → your local model
  - model_name: claude-sonnet-4-20250514
    litellm_params:
      model: openai/qwen3_8b          # or deepseek-r1-distill-qwen-32B, etc.
      api_base: http://192.168.x.x:8000/v1
      api_key: "your-api-key"
      supports_system_message: false
      timeout: 300
      stream_timeout: 300
 
  - model_name: claude-opus-4-20250514
    litellm_params:
      model: openai/qwen3_8b
      api_base: http://192.168.x.x:8000/v1
      api_key: "your-api-key"
      supports_system_message: false
 
  - model_name: claude-haiku-4-5-20251001
    litellm_params:
      model: openai/qwen3_8b
      api_base: http://192.168.x.x:8000/v1
      api_key: "your-api-key"
      supports_system_message: false
 
general_settings:
  enable_anthropic_pass_through: true   # REQUIRED — enables /v1/messages endpoint

⚠️ Claude Code requests models by exact name (claude-sonnet-4-20250514). If that alias is missing from the config, you get a 404 / auth error.

Start LiteLLM

litellm --config config.yaml --port 4000

Or via Docker Compose (recommended):

services:
  litellm:
    image: ghcr.io/berriai/litellm:main-latest
    ports: ["4000:4000"]
    volumes:
      - ./config.yaml:/app/config.yaml
    env_file: .env
    command: ["--config", "/app/config.yaml", "--port", "4000"]

Connect Claude Code

export ANTHROPIC_BASE_URL="http://172.18.0.1:4000"
export ANTHROPIC_AUTH_TOKEN="your-litellm-key"
export DISABLE_PROMPT_CACHING=1
export DISABLE_INTERLEAVED_THINKING=1
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

Add to ~/.bashrc / ~/.zshrc to persist.

Verify endpoints

# Test OpenAI format
curl http://172.18.0.1:4000/v1/chat/completions \
  -H "Authorization: Bearer your-key" \
  -d '{"model":"claude-sonnet-4-20250514","messages":[{"role":"user","content":"hi"}],"max_tokens":20}'
 
# Test Anthropic format (what Claude Code uses)
curl http://172.18.0.1:4000/v1/messages \
  -H "Authorization: Bearer your-key" \
  -d '{"model":"claude-sonnet-4-20250514","messages":[{"role":"user","content":"hi"}],"max_tokens":20}'

Both should return valid responses.

Common errors

Error	Cause	Fix
`404 NotFoundError`	Model alias missing	Add alias to `model_list`
`AuthenticationError`	`api_key` env var not loaded	Hardcode in config or `export` before starting
Timeout / retrying	Cold-start delay (5s+)	Pre-warm backend before launching `claude`
`29640500m 59s` duration	Wrong timestamp handling	— (unrelated, see Discord hook note)

Cold-start workaround

Pre-warm the model before launching Claude Code:

curl -s http://your-backend/v1/chat/completions \
  -H "Authorization: Bearer key" \
  -d '{"model":"qwen3_8b","messages":[{"role":"user","content":"hi"}],"max_tokens":1}' \
  > /dev/null && claude

Or run a background keepalive loop:

while true; do
  curl -s http://your-backend/v1/chat/completions ... > /dev/null
  sleep 20
done &

Phuriwaj

LiteLLM — Connecting Claude Code to Local LLMs

LiteLLM — Connecting Claude Code to Local LLMs

Required config.yaml

Start LiteLLM

Connect Claude Code

Verify endpoints

Common errors

Cold-start workaround

See also