Phuriwaj

Contact

Open Source Coding LLMs (30B–70B Range)

Best locally-runnable coding models as of mid-2026, focused on the 30B–70B parameter range — the practical sweet spot between quality and hardware requirements.

Why / When to Use

Use when selecting a local LLM for Claude Code alternatives, agentic workflows (n8n, OpenACP), or offline coding assistance. These models run in Ollama or llama.cpp on consumer/prosumer GPUs.

Core Concept / Commands

Running in Ollama

# Pull and run a model
ollama pull qwen2.5-coder:32b
ollama run qwen2.5-coder:32b
 
# List running models
ollama list
 
# Run with specific quantization
ollama pull llama3.3:70b-instruct-q4_K_M

Key Options / Variants

ModelSizeFocusVRAM (Q4)License
Qwen2.5-Coder 32B32BCode-specialized~20GBApache 2.0
Llama 3.3 70B70BGeneral + code~40GBMeta
DeepSeek R1 70B distill70BReasoning + debug~40GBMIT
Kimi-Dev-72B72BSWE / repo-level bugs~45GBModified MIT
Nemotron Nano 30B30BFast inference~20GBNVIDIA
  • Pure coding tasks → Qwen2.5-Coder 32B (Apache 2.0, 24GB GPU, best benchmark)
  • General coding + reasoning → Llama 3.3 70B (40GB GPU, Meta license)
  • Debugging / math / chain-of-thought → DeepSeek R1 70B distill (MIT, reasoning built in)
  • Repo-level bug fixing (SWE-bench) → Kimi-Dev-72B (45GB GPU, best SWE score at this size)

For Local + Agentic Workflows (Claude Code, n8n)

Practical sweet spot: Qwen2.5-Coder 32B

  • Fits on single 24GB GPU at Q4
  • Apache 2.0 — clean commercial use
  • Best coding benchmark in the 30B range

Quality ceiling if hardware allows: Kimi-Dev-72B or Llama 3.3 70B

  • Need 40–45GB VRAM or aggressive quantization

Gotchas

  • Llama 3.3 70B at Q4_K_M = ~40GB VRAM — won’t fit on 24GB GPUs without further quantization
  • DeepSeek R1 distill variants (8B, 14B, 32B) exist if 70B is too large
  • “Modified MIT” on Kimi-Dev-72B — check terms for commercial use before deploying
  • Meta license on Llama 3.3 70B allows most commercial use but has user threshold restrictions

Source

Conversation: “Best open source LLM for coding” — 2026-06-02 Web sources: Whatllm, Hugging Face, Siliconflow