The Short List
| Model | Provider | Context | Best For | No Card? |
|---|---|---|---|---|
| Gemini 2.5 Flash | Google AI Studio | 1,000K | All-round, long docs | No |
| Llama 3.3 70B | Groq | 131K | Speed, chat, quick coding | Yes |
| Nemotron 3 Super | NVIDIA NIM | 262K | Reasoning, math, coding | No |
| Codestral | Mistral AI | 256K | Coding (best-in-class) | Yes |
| GPT-OSS-120B | Cerebras | 128K | Heavy workloads, big model | Yes |
1. Google Gemini 2.5 Flash — The All-Rounder
Provider: Google AI Studio · Context: 1,000,000 tokens · Rate: 10 RPM, 250 RPD
Google's flagship free model with a staggering 1 million token context window — enough to process entire codebases in a single prompt. Multimodal: understands text, images, audio, and video. Best all-rounder for serious work.
Key Highlights
- 1M context window
- Multimodal (text + image + audio + video)
- 10 RPM, 250 RPD
- Excellent for long documents and large codebases
ANTHROPIC_BASE_URL="https://generativelanguage.googleapis.com/v1beta"
ANTHROPIC_AUTH_TOKEN="YOUR_GOOGLE_API_KEY" Google requires account verification but no credit card. Get API key →
2. Groq Llama 3.3 70B — The Speed King
Provider: Groq · Context: 131,000 tokens · Rate: 30 RPM, 14,400 RPD
Groq's LPU inference delivers the fastest tokens-per-second among all free providers. Llama 3.3 70B is Meta's refined 70B model with strong reasoning and coding. Perfect when latency matters.
Key Highlights
- Fastest inference (Groq LPU)
- 131K context
- 30 RPM, 14,400 RPD
- No credit card required
- Great for chat and quick coding tasks
ANTHROPIC_BASE_URL="https://api.groq.com/openai/v1"
ANTHROPIC_AUTH_TOKEN="YOUR_GROQ_API_KEY" 3. NVIDIA Nemotron 3 Super — The Reasoner
Provider: NVIDIA NIM · Context: 262,000 tokens · Rate: Free tier
NVIDIA's own model with 262K context and strong reasoning capabilities. Free tier via NVIDIA NIM with solid rate limits. Great for technical reasoning, math, and coding.
Key Highlights
- 262K context window
- Strong reasoning and math
- OpenAI-compatible API
- Free tier with generous limits
ANTHROPIC_BASE_URL="https://integrate.api.nvidia.com/v1"
ANTHROPIC_AUTH_TOKEN="YOUR_NVIDIA_API_KEY" 4. Mistral Codestral — The Coder
Provider: Mistral AI · Context: 256,000 tokens · Rate: ~1 RPS, 500K TPM
Mistral's dedicated coding model, purpose-built for code generation, debugging, and refactoring. 256K context, no credit card required. The go-to choice for developers.
Key Highlights
- Purpose-built for coding
- 256K context
- No credit card required
- ~1 RPS, 500K TPM
- Supports FIM (fill-in-the-middle)
ANTHROPIC_BASE_URL="https://api.mistral.ai/v1"
ANTHROPIC_AUTH_TOKEN="YOUR_MISTRAL_API_KEY" 5. Cerebras GPT-OSS-120B — The Heavy Lifter
Provider: Cerebras · Context: 128,000 tokens · Rate: 30 RPM, 14,400 RPD, 1M TPD
Cerebras hosts this massive 120B parameter open-source model with blazing-fast inference. No credit card required, 30 RPM, best for heavy workloads with zero cost.
Key Highlights
- 120B parameters
- 128K context
- 30 RPM, 1M TPD
- No credit card required
- Open-source model, fully transparent
ANTHROPIC_BASE_URL="https://api.cerebras.ai/v1"
ANTHROPIC_AUTH_TOKEN="YOUR_CEREBRAS_API_KEY" How We Picked These 5
We maintain a directory of 138+ free LLM models from 18 providers. These 5 were selected based on:
- Context window — all above 128K, enough for real work
- Rate limits — practical for daily use (not 1 RPM)
- Provider reliability — from established companies with stable APIs
- No credit card — 3 of 5 don't even ask for one
- Use case coverage — coding, reasoning, chat, and long documents
How to Use These with Claude Code (cc)
Set two environment variables and you're done:
# Pick your model's config from above
export ANTHROPIC_BASE_URL="https://api.groq.com/openai/v1"
export ANTHROPIC_AUTH_TOKEN="your-api-key"
# Then run cc normally — it now uses the free backend
claude For Cursor, OpenCode, and other tools, grab the one-click config from each model's detail page on freellm.net.
Browse all 138 free models → Model Directory