Best Free LLM APIs for Coding

26 free models available for coding. How to choose a free LLM for coding →

For AI coding, prioritize large context windows (to process entire codebases), tool calling support, and strong instruction following. The best free coding models include Codestral (Mistral, purpose-built for code), DeepSeek V4, Qwen3-Coder, and Gemini 2.5 Flash (1M context). Models are ranked below by context window and rate limit.

What to Look for in a Coding Model

Not all LLMs are equally good at coding. Here's what separates a coding model from a general-purpose one:

  • Context window — The single most important spec for coding. Modern codebases easily exceed 50K tokens. A model with less than 32K context will struggle with multi-file edits, code review, or understanding project structure. Look for at least 128K tokens; 256K+ is ideal for monorepo work.
  • Fill-in-the-Middle (FIM) — A specialized training objective where the model learns to fill a gap between prefix and suffix code. Essential for inline code completion in IDEs. Codestral and DeepSeek Coder variants are trained with FIM.
  • Tool calling / function calling — Required for agentic coding workflows: "find all files that import X, then refactor them to use Y." Without tool calling, the model can only suggest code, not execute actions. Most OpenAI-compatible endpoints support tool calling if the underlying model does.
  • Instruction following — Coding requires precise, unambiguous outputs. Models that drift or hallucinate will introduce bugs. DeepSeek V4 and Qwen3 score particularly well on instruction-following benchmarks.
  • Max output tokens — Generating a full file or multiple functions in one shot requires high output limits. 8K output is the practical minimum; 16K+ lets the model generate entire modules at once.

How to Choose a Free Coding Model

Your pick depends on how you code:

  • Using Claude Code or Cursor? → Prioritize context window and tool calling. Gemini 2.5 Flash (1M ctx) or DeepSeek V4 (256K) let the agent see your whole project. Both support tool calling via OpenAI-compatible endpoints.
  • Inline completion in VS Code / JetBrains? → Look for FIM support. Codestral (Mistral) is purpose-built for this. DeepSeek Coder variants also support FIM.
  • Code review / PR review? → Large context is critical — the diff + surrounding code + review guidelines all need to fit in one prompt. Gemini 2.5 Flash's 1M context handles this with room to spare.
  • Learning to code? → Prioritize helpfulness and explanation quality. Qwen3 and Llama 3.3 70B are known for clear, educational code explanations.
  • Rate limit sensitive? → NVIDIA NIM has 40 RPM with no daily cap, ideal for heavy coding sessions. Groq has 30 RPM / 14,400 RPD — enough for most solo developers.

Try models in the Playground with a real coding task before committing — the same benchmark scores don't always match your specific language or framework.

Top Picks for Coding

All Free Coding Models

Provider Model Context Max Output Modality Rate Limit Released
OpenRouter Cohere: North Mini Code (free) 256K 64K textcode 200 req/day (free tier) Jun 9, 2026 Details
OpenRouter OpenAI: gpt-oss-safeguard-20b 131K 66K text 200 req/day (free tier) Oct 29, 2025 Details
OpenRouter OpenAI: gpt-oss-120b (free) 131K 131K text 200 req/day (free tier) Aug 5, 2025 Details
OpenRouter OpenAI: gpt-oss-20b (free) 131K 33K text 200 req/day (free tier) Aug 5, 2025 Details
OpenRouter Qwen: Qwen3 Coder 480B A35B (free) 1.0M 262K textcode 200 req/day (free tier) Jul 23, 2025 Details
Mistral AI Codestral 256K 256K textcode ~1 RPS, 500K TPM Details
Cerebras gpt-oss-120b 128K 8K text 30 RPM, 14,400 RPD, 1M TPD Aug 5, 2025 Details
Cloudflare Workers AI @cf/openai/gpt-oss-120b 128K 131K text 10K neurons/day (shared) Details
Cloudflare Workers AI @cf/moonshotai/kimi-k2.7-code 262K 131K textcode 10K neurons/day (shared) Details
Kilo Code x-ai/grok-code-fast-1:free 256K 131K textcode ~200 req/hr Aug 28, 2025 Details
LLM7.io qwen2.5-coder-32b 131K 131K textcode 30 RPM (120 with token) Nov 11, 2024 Details
Ollama Cloud gpt-oss:120b-cloud 128K 131K text Session/weekly limits (unpublished) Details
Ollama Cloud qwen3-coder:480b-cloud 128K 131K textcode Session/weekly limits (unpublished) Details
OVHcloud AI Endpoints gpt-oss-20b 128K 8K text 2 RPM (anonymous) Aug 5, 2025 Details
OVHcloud AI Endpoints Qwen3-Coder-30B-A3B-Instruct 262K 32K textcode 2 RPM (anonymous) Jul 31, 2025 Details
NVIDIA NIM bigcode/starcoder2-15b 131K 8K text Up to 40 RPM Details
NVIDIA NIM deepseek-ai/deepseek-coder-6.7b-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM google/codegemma-1.1-7b 131K 8K text Up to 40 RPM Details
NVIDIA NIM google/codegemma-7b 131K 8K text Up to 40 RPM Details
NVIDIA NIM ibm/granite-34b-code-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM ibm/granite-8b-code-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM meta/codellama-70b 131K 8K text Up to 40 RPM Details
NVIDIA NIM mistralai/codestral-22b-instruct-v0.1 131K 8K text Up to 40 RPM Details
NVIDIA NIM nvidia/nv-embedcode-7b-v1 131K 8K textembedding Up to 40 RPM Details
Alibaba Cloud Model Studio Qwen3-Coder-Plus 256K 8K textcode Tiered by region Sep 23, 2025 Details
OpenRouter Baidu Qianfan: CoBuddy 131K 65K textcode 200 req/day (free tier) Details