For AI coding, prioritize large context windows (to process entire codebases), tool calling support, and strong instruction following. The best free coding models include Codestral (Mistral, purpose-built for code), DeepSeek V4, Qwen3-Coder, and Gemini 2.5 Flash (1M context). Models are ranked below by context window and rate limit.
What to Look for in a Coding Model
Not all LLMs are equally good at coding. Here's what separates a coding model from a general-purpose one:
- Context window — The single most important spec for coding. Modern codebases easily exceed 50K tokens. A model with less than 32K context will struggle with multi-file edits, code review, or understanding project structure. Look for at least 128K tokens; 256K+ is ideal for monorepo work.
- Fill-in-the-Middle (FIM) — A specialized training objective where the model learns to fill a gap between prefix and suffix code. Essential for inline code completion in IDEs. Codestral and DeepSeek Coder variants are trained with FIM.
- Tool calling / function calling — Required for agentic coding workflows: "find all files that import X, then refactor them to use Y." Without tool calling, the model can only suggest code, not execute actions. Most OpenAI-compatible endpoints support tool calling if the underlying model does.
- Instruction following — Coding requires precise, unambiguous outputs. Models that drift or hallucinate will introduce bugs. DeepSeek V4 and Qwen3 score particularly well on instruction-following benchmarks.
- Max output tokens — Generating a full file or multiple functions in one shot requires high output limits. 8K output is the practical minimum; 16K+ lets the model generate entire modules at once.
How to Choose a Free Coding Model
Your pick depends on how you code:
- Using Claude Code or Cursor? → Prioritize context window and tool calling. Gemini 2.5 Flash (1M ctx) or DeepSeek V4 (256K) let the agent see your whole project. Both support tool calling via OpenAI-compatible endpoints.
- Inline completion in VS Code / JetBrains? → Look for FIM support. Codestral (Mistral) is purpose-built for this. DeepSeek Coder variants also support FIM.
- Code review / PR review? → Large context is critical — the diff + surrounding code + review guidelines all need to fit in one prompt. Gemini 2.5 Flash's 1M context handles this with room to spare.
- Learning to code? → Prioritize helpfulness and explanation quality. Qwen3 and Llama 3.3 70B are known for clear, educational code explanations.
- Rate limit sensitive? → NVIDIA NIM has 40 RPM with no daily cap, ideal for heavy coding sessions. Groq has 30 RPM / 14,400 RPD — enough for most solo developers.
Try models in the Playground with a real coding task before committing — the same benchmark scores don't always match your specific language or framework.
Top Picks for Coding
1M context, multimodal, strong all-round coding. Free tier: 10 RPM, 250 RPD.
DeepSeek: DeepSeek V4 Flash (free) OpenRouter256K context, latest-gen coding model. Strong instruction following, FIM support.
Codestral Mistral AIPurpose-built for code. 256K context, FIM support, no credit card required.
Qwen: Qwen3 Coder 480B A35B (free) OpenRouterMassive 480B MoE model specialized for code. 262K context.
All Free Coding Models
| Provider | Model | Context | Max Output | Modality | Rate Limit | Released | |
|---|---|---|---|---|---|---|---|
| OpenRouter | Cohere: North Mini Code (free) | 256K | 64K | 200 req/day (free tier) | Jun 9, 2026 | Details | |
| OpenRouter | OpenAI: gpt-oss-safeguard-20b | 131K | 66K | 200 req/day (free tier) | Oct 29, 2025 | Details | |
| OpenRouter | OpenAI: gpt-oss-120b (free) | 131K | 131K | 200 req/day (free tier) | Aug 5, 2025 | Details | |
| OpenRouter | OpenAI: gpt-oss-20b (free) | 131K | 33K | 200 req/day (free tier) | Aug 5, 2025 | Details | |
| OpenRouter | Qwen: Qwen3 Coder 480B A35B (free) | 1.0M | 262K | 200 req/day (free tier) | Jul 23, 2025 | Details | |
| Mistral AI | Codestral | 256K | 256K | ~1 RPS, 500K TPM | — | Details | |
| Cerebras | gpt-oss-120b | 128K | 8K | 30 RPM, 14,400 RPD, 1M TPD | Aug 5, 2025 | Details | |
| Cloudflare Workers AI | @cf/openai/gpt-oss-120b | 128K | 131K | 10K neurons/day (shared) | — | Details | |
| Cloudflare Workers AI | @cf/moonshotai/kimi-k2.7-code | 262K | 131K | 10K neurons/day (shared) | — | Details | |
| Kilo Code | x-ai/grok-code-fast-1:free | 256K | 131K | ~200 req/hr | Aug 28, 2025 | Details | |
| LLM7.io | qwen2.5-coder-32b | 131K | 131K | 30 RPM (120 with token) | Nov 11, 2024 | Details | |
| Ollama Cloud | gpt-oss:120b-cloud | 128K | 131K | Session/weekly limits (unpublished) | — | Details | |
| Ollama Cloud | qwen3-coder:480b-cloud | 128K | 131K | Session/weekly limits (unpublished) | — | Details | |
| OVHcloud AI Endpoints | gpt-oss-20b | 128K | 8K | 2 RPM (anonymous) | Aug 5, 2025 | Details | |
| OVHcloud AI Endpoints | Qwen3-Coder-30B-A3B-Instruct | 262K | 32K | 2 RPM (anonymous) | Jul 31, 2025 | Details | |
| NVIDIA NIM | bigcode/starcoder2-15b | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | deepseek-ai/deepseek-coder-6.7b-instruct | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | google/codegemma-1.1-7b | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | google/codegemma-7b | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | ibm/granite-34b-code-instruct | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | ibm/granite-8b-code-instruct | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | meta/codellama-70b | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | mistralai/codestral-22b-instruct-v0.1 | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | nvidia/nv-embedcode-7b-v1 | 131K | 8K | Up to 40 RPM | — | Details | |
| Alibaba Cloud Model Studio | Qwen3-Coder-Plus | 256K | 8K | Tiered by region | Sep 23, 2025 | Details | |
| OpenRouter | Baidu Qianfan: CoBuddy | 131K | 65K | 200 req/day (free tier) | — | Details |