Open Source Coding LLMs (30B–70B Range)

Best locally-runnable coding models as of mid-2026, focused on the 30B–70B parameter range — the practical sweet spot between quality and hardware requirements.

Why / When to Use

Use when selecting a local LLM for Claude Code alternatives, agentic workflows (n8n, OpenACP), or offline coding assistance. These models run in Ollama or llama.cpp on consumer/prosumer GPUs.

Core Concept / Commands

Running in Ollama

# Pull and run a model
ollama pull qwen2.5-coder:32b
ollama run qwen2.5-coder:32b
 
# List running models
ollama list
 
# Run with specific quantization
ollama pull llama3.3:70b-instruct-q4_K_M

Key Options / Variants

Model	Size	Focus	VRAM (Q4)	License
Qwen2.5-Coder 32B	32B	Code-specialized	~20GB	Apache 2.0
Llama 3.3 70B	70B	General + code	~40GB	Meta
DeepSeek R1 70B distill	70B	Reasoning + debug	~40GB	MIT
Kimi-Dev-72B	72B	SWE / repo-level bugs	~45GB	Modified MIT
Nemotron Nano 30B	30B	Fast inference	~20GB	NVIDIA

Recommended by Use Case

Pure coding tasks → Qwen2.5-Coder 32B (Apache 2.0, 24GB GPU, best benchmark)
General coding + reasoning → Llama 3.3 70B (40GB GPU, Meta license)
Debugging / math / chain-of-thought → DeepSeek R1 70B distill (MIT, reasoning built in)
Repo-level bug fixing (SWE-bench) → Kimi-Dev-72B (45GB GPU, best SWE score at this size)

For Local + Agentic Workflows (Claude Code, n8n)

Practical sweet spot: Qwen2.5-Coder 32B

Fits on single 24GB GPU at Q4
Apache 2.0 — clean commercial use
Best coding benchmark in the 30B range

Quality ceiling if hardware allows: Kimi-Dev-72B or Llama 3.3 70B

Need 40–45GB VRAM or aggressive quantization

Gotchas

Llama 3.3 70B at Q4_K_M = ~40GB VRAM — won’t fit on 24GB GPUs without further quantization
DeepSeek R1 distill variants (8B, 14B, 32B) exist if 70B is too large
“Modified MIT” on Kimi-Dev-72B — check terms for commercial use before deploying
Meta license on Llama 3.3 70B allows most commercial use but has user threshold restrictions

Source

Conversation: “Best open source LLM for coding” — 2026-06-02 Web sources: Whatllm, Hugging Face, Siliconflow