Best Free LLM APIs for Coding

26 free models available for coding. How to choose a free LLM for coding →

Coding Chat Vision Audio Reasoning Embedding

For AI coding, prioritize large context windows (to process entire codebases), tool calling support, and strong instruction following. The best free coding models include Codestral (Mistral, purpose-built for code), DeepSeek V4, Qwen3-Coder, and Gemini 2.5 Flash (1M context). Models are ranked below by context window and rate limit.

What to Look for in a Coding Model

Not all LLMs are equally good at coding. Here's what separates a coding model from a general-purpose one:

Context window — The single most important spec for coding. Modern codebases easily exceed 50K tokens. A model with less than 32K context will struggle with multi-file edits, code review, or understanding project structure. Look for at least 128K tokens; 256K+ is ideal for monorepo work.
Fill-in-the-Middle (FIM) — A specialized training objective where the model learns to fill a gap between prefix and suffix code. Essential for inline code completion in IDEs. Codestral and DeepSeek Coder variants are trained with FIM.
Tool calling / function calling — Required for agentic coding workflows: "find all files that import X, then refactor them to use Y." Without tool calling, the model can only suggest code, not execute actions. Most OpenAI-compatible endpoints support tool calling if the underlying model does.
Instruction following — Coding requires precise, unambiguous outputs. Models that drift or hallucinate will introduce bugs. DeepSeek V4 and Qwen3 score particularly well on instruction-following benchmarks.
Max output tokens — Generating a full file or multiple functions in one shot requires high output limits. 8K output is the practical minimum; 16K+ lets the model generate entire modules at once.

How to Choose a Free Coding Model

Your pick depends on how you code:

Using Claude Code or Cursor? → Prioritize context window and tool calling. Gemini 2.5 Flash (1M ctx) or DeepSeek V4 (256K) let the agent see your whole project. Both support tool calling via OpenAI-compatible endpoints.
Inline completion in VS Code / JetBrains? → Look for FIM support. Codestral (Mistral) is purpose-built for this. DeepSeek Coder variants also support FIM.
Code review / PR review? → Large context is critical — the diff + surrounding code + review guidelines all need to fit in one prompt. Gemini 2.5 Flash's 1M context handles this with room to spare.
Learning to code? → Prioritize helpfulness and explanation quality. Qwen3 and Llama 3.3 70B are known for clear, educational code explanations.
Rate limit sensitive? → NVIDIA NIM has 40 RPM with no daily cap, ideal for heavy coding sessions. Groq has 30 RPM / 14,400 RPD — enough for most solo developers.

Try models in the Playground with a real coding task before committing — the same benchmark scores don't always match your specific language or framework.

Top Picks for Coding

Google: Gemini 2.5 Flash Google

1M context, multimodal, strong all-round coding. Free tier: 10 RPM, 250 RPD.

DeepSeek: DeepSeek V4 Flash (free) OpenRouter

256K context, latest-gen coding model. Strong instruction following, FIM support.

Codestral Mistral AI

Purpose-built for code. 256K context, FIM support, no credit card required.

Qwen: Qwen3 Coder 480B A35B (free) OpenRouter

Massive 480B MoE model specialized for code. 262K context.

All Free Coding Models

Provider	Model	Context	Max Output	Modality	Rate Limit	Released
OpenRouter	Cohere: North Mini Code (free)	256K	64K	textcode	200 req/day (free tier)	Jun 9, 2026	Details
OpenRouter	OpenAI: gpt-oss-safeguard-20b	131K	66K	text	200 req/day (free tier)	Oct 29, 2025	Details
OpenRouter	OpenAI: gpt-oss-120b (free)	131K	131K	text	200 req/day (free tier)	Aug 5, 2025	Details
OpenRouter	OpenAI: gpt-oss-20b (free)	131K	33K	text	200 req/day (free tier)	Aug 5, 2025	Details
OpenRouter	Qwen: Qwen3 Coder 480B A35B (free)	1.0M	262K	textcode	200 req/day (free tier)	Jul 23, 2025	Details
Mistral AI	Codestral	256K	256K	textcode	~1 RPS, 500K TPM	—	Details
Cerebras	gpt-oss-120b	128K	8K	text	30 RPM, 14,400 RPD, 1M TPD	Aug 5, 2025	Details
Cloudflare Workers AI	@cf/openai/gpt-oss-120b	128K	131K	text	10K neurons/day (shared)	—	Details
Cloudflare Workers AI	@cf/moonshotai/kimi-k2.7-code	262K	131K	textcode	10K neurons/day (shared)	—	Details
Kilo Code	x-ai/grok-code-fast-1:free	256K	131K	textcode	~200 req/hr	Aug 28, 2025	Details
LLM7.io	qwen2.5-coder-32b	131K	131K	textcode	30 RPM (120 with token)	Nov 11, 2024	Details
Ollama Cloud	gpt-oss:120b-cloud	128K	131K	text	Session/weekly limits (unpublished)	—	Details
Ollama Cloud	qwen3-coder:480b-cloud	128K	131K	textcode	Session/weekly limits (unpublished)	—	Details
OVHcloud AI Endpoints	gpt-oss-20b	128K	8K	text	2 RPM (anonymous)	Aug 5, 2025	Details
OVHcloud AI Endpoints	Qwen3-Coder-30B-A3B-Instruct	262K	32K	textcode	2 RPM (anonymous)	Jul 31, 2025	Details
NVIDIA NIM	bigcode/starcoder2-15b	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	deepseek-ai/deepseek-coder-6.7b-instruct	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	google/codegemma-1.1-7b	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	google/codegemma-7b	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	ibm/granite-34b-code-instruct	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	ibm/granite-8b-code-instruct	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	meta/codellama-70b	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	mistralai/codestral-22b-instruct-v0.1	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	nvidia/nv-embedcode-7b-v1	131K	8K	textembedding	Up to 40 RPM	—	Details
Alibaba Cloud Model Studio	Qwen3-Coder-Plus	256K	8K	textcode	Tiered by region	Sep 23, 2025	Details
OpenRouter	Baidu Qianfan: CoBuddy	131K	65K	textcode	200 req/day (free tier)	—	Details

See our FAQ for common questions about free LLM APIs

Best Free LLM APIs for Coding

What to Look for in a Coding Model

How to Choose a Free Coding Model

Top Picks for Coding

All Free Coding Models

Export to Chat Client 🚀