Free LLM API Models — Browse & Filter 133+ Models
Showing 133 of 133 models
| Provider | Model | Context | Max Output | Modality | Rate Limit | Status | |
|---|---|---|---|---|---|---|---|
| OpenRouter | inclusionAI: Ring-2.6-1T (free) | 262K | 66K | See provider page | Details | ||
| OpenRouter | Baidu Qianfan: CoBuddy (free) | 131K | 66K | See provider page | Details | ||
| OpenRouter | Owl Alpha | 1.0M | 262K | See provider page | Details | ||
| OpenRouter | NVIDIA: Nemotron 3 Nano Omni (free) | 256K | 66K | See provider page | Details | ||
| OpenRouter | Poolside: Laguna XS.2 (free) | 131K | 8K | See provider page | Details | ||
| OpenRouter | Poolside: Laguna M.1 (free) | 131K | 8K | See provider page | Details | ||
| OpenRouter | Baidu: Qianfan-OCR-Fast (free) | 66K | 29K | See provider page | Details | ||
| OpenRouter | Google: Gemma 4 26B A4B (free) | 262K | 33K | See provider page | Details | ||
| OpenRouter | Google: Gemma 4 31B (free) | 262K | 33K | See provider page | Details | ||
| OpenRouter | Google: Lyria 3 Pro Preview | 1.0M | 66K | See provider page | Details | ||
| OpenRouter | Google: Lyria 3 Clip Preview | 1.0M | 66K | See provider page | Details | ||
| OpenRouter | NVIDIA: Nemotron 3 Super (free) | 262K | 262K | See provider page | Details | ||
| OpenRouter | MiniMax: MiniMax M2.5 (free) | 197K | 8K | See provider page | Details | ||
| OpenRouter | Free Models Router | 200K | 8K | See provider page | Details | ||
| OpenRouter | LiquidAI: LFM2.5-1.2B-Thinking (free) | 33K | 8K | See provider page | Details | ||
| OpenRouter | LiquidAI: LFM2.5-1.2B-Instruct (free) | 33K | 8K | See provider page | Details | ||
| OpenRouter | NVIDIA: Nemotron 3 Nano 30B A3B (free) | 256K | 8K | See provider page | Details | ||
| OpenRouter | NVIDIA: Nemotron Nano 12B 2 VL (free) | 128K | 128K | See provider page | Details | ||
| OpenRouter | Qwen: Qwen3 Next 80B A3B Instruct (free) | 262K | 8K | See provider page | Details | ||
| OpenRouter | NVIDIA: Nemotron Nano 9B V2 (free) | 128K | 8K | See provider page | Details | ||
| OpenRouter | OpenAI: gpt-oss-120b (free) | 131K | 131K | See provider page | Details | ||
| OpenRouter | OpenAI: gpt-oss-20b (free) | 131K | 8K | See provider page | Details | ||
| OpenRouter | Z.ai: GLM 4.5 Air (free) | 131K | 96K | See provider page | Details | ||
| OpenRouter | Qwen: Qwen3 Coder 480B A35B (free) | 262K | 262K | See provider page | Details | ||
| OpenRouter | Venice: Uncensored (free) | 33K | 8K | See provider page | Details | ||
| OpenRouter | Meta: Llama 3.3 70B Instruct (free) | 66K | 8K | See provider page | Details | ||
| OpenRouter | Meta: Llama 3.2 3B Instruct (free) | 131K | 8K | See provider page | Details | ||
| OpenRouter | Nous: Hermes 3 405B Instruct (free) | 131K | 8K | See provider page | Details | ||
| NVIDIA NIM | Various open models | 131K | 8K | See provider page | Details | ||
| Mistral (La Plateforme) | Open and Proprietary Mistral models | 256K | 8K | See provider page | Details | ||
| Cohere | Command A (111B) | 256K | 4K | 20 RPM | Details | ||
| Cohere | Command R+ | 128K | 4K | 20 RPM | Details | ||
| Cohere | Command R7B | 128K | 4K | 20 RPM | Details | ||
| Cohere | Embed 4 | 131K | 131K | 2,000 inputs/min | Details | ||
| Cohere | Rerank 3.5 | 131K | 131K | 10 RPM | Details | ||
| Google Gemini | Gemini 2.5 Flash | 1.0M | 65K | 10 RPM, 250 RPD | Details | ||
| Google Gemini | Gemini 2.5 Flash-Lite | 1.0M | 65K | 15 RPM, 1,000 RPD | Details | ||
| Mistral AI | Mistral Small 4 | 256K | 256K | ~1 RPS, 500K TPM | Details | ||
| Mistral AI | Mistral Medium 3 | 128K | 128K | ~1 RPS, 500K TPM | Details | ||
| Mistral AI | Mistral Large 3 | 256K | 256K | ~1 RPS, 500K TPM | Details | ||
| Mistral AI | Mistral Nemo (12B) | 128K | 128K | ~1 RPS, 500K TPM | Details | ||
| Mistral AI | Codestral | 256K | 256K | ~1 RPS, 500K TPM | Details | ||
| Mistral AI | Pixtral Large | 128K | 128K | ~1 RPS, 500K TPM | Details | ||
| Z AI (Zhipu AI) | GLM-4.7-Flash | 200K | 128K | 1 concurrent request | Details | ||
| Z AI (Zhipu AI) | GLM-4.5-Flash | 128K | 8K | 1 concurrent request | Details | ||
| Z AI (Zhipu AI) | GLM-4.6V-Flash | 128K | 4K | 1 concurrent request | Details | ||
| Cerebras | llama3.1-8b | 128K | 8K | 30 RPM, 14,400 RPD, 1M TPD | Details | ||
| Cerebras | gpt-oss-120b | 128K | 8K | 30 RPM, 14,400 RPD, 1M TPD | Details | ||
| Cerebras | qwen-3-235b-a22b-instruct-2507 | 131K | 8K | 30 RPM, 14,400 RPD, 1M TPD | Details | ||
| Cerebras | zai-glm-4.7 | 128K | 8K | 10 RPM, 100 RPD, 1M TPD | Details | ||
| Cloudflare Workers AI | @cf/meta/llama-3.3-70b-instruct-fp8-fast | 131K | 131K | 10K neurons/day (shared) | Details | ||
| Cloudflare Workers AI | @cf/meta/llama-3.1-8b-instruct-fp8-fast | 131K | 131K | 10K neurons/day (shared) | Details | ||
| Cloudflare Workers AI | @cf/meta/llama-3.2-11b-vision-instruct | 131K | 131K | 10K neurons/day (shared) | Details | ||
| Cloudflare Workers AI | @cf/meta/llama-4-scout-17b-16e-instruct | 10.0M | 131K | 10K neurons/day (shared) | Details | ||
| Cloudflare Workers AI | @cf/mistralai/mistral-small-3.1-24b-instruct | 128K | 131K | 10K neurons/day (shared) | Details | ||
| Cloudflare Workers AI | @cf/google/gemma-4-26b-a4b-it | 256K | 131K | 10K neurons/day (shared) | Details | ||
| Cloudflare Workers AI | @cf/qwen/qwq-32b | 32K | 131K | 10K neurons/day (shared) | Details | ||
| Cloudflare Workers AI | @cf/deepseek-ai/deepseek-r1-distill-qwen-32b | 32K | 131K | 10K neurons/day (shared) | Details | ||
| Cloudflare Workers AI | + 42 more models | 131K | 131K | 10K neurons/day (shared) | Details | ||
| GitHub Models | gpt-4.1 | 1.0M | 32K | 10 RPM, 50 RPD | Details | ||
| GitHub Models | gpt-4.1-mini | 1.0M | 32K | 15 RPM, 150 RPD | Details | ||
| GitHub Models | gpt-4o | 128K | 16K | 10 RPM, 50 RPD | Details | ||
| GitHub Models | o3-mini | 200K | 100K | 10 RPM, 50 RPD | Details | ||
| GitHub Models | o4-mini | 200K | 100K | 10 RPM, 50 RPD | Details | ||
| GitHub Models | Llama-4-Scout-17B-16E | 512K | 4K | 15 RPM, 150 RPD | Details | ||
| GitHub Models | Llama-4-Maverick-17B-128E | 256K | 4K | 10 RPM, 50 RPD | Details | ||
| GitHub Models | Meta-Llama-3.3-70B | 131K | 4K | 15 RPM, 150 RPD | Details | ||
| GitHub Models | DeepSeek-R1 | 64K | 8K | 15 RPM, 150 RPD | Details | ||
| GitHub Models | Mistral-Small-3.1 | 128K | 4K | 15 RPM, 150 RPD | Details | ||
| GitHub Models | + 35 more models | 131K | 131K | Varies by tier | Details | ||
| Groq | llama-3.3-70b-versatile | 131K | 32K | 30 RPM, 14,400 RPD | Details | ||
| Groq | llama-3.1-8b-instant | 131K | 131K | 30 RPM, 14,400 RPD | Details | ||
| Groq | llama-4-scout-17b-16e-instruct | 131K | 8K | 30 RPM, 14,400 RPD | Details | ||
| Groq | llama-4-maverick-17b-128e-instruct | 131K | 8K | 15 RPM, 500 RPD | Details | ||
| Groq | qwen3-32b | 131K | 131K | 30 RPM, 14,400 RPD | Details | ||
| Groq | kimi-k2-instruct | 262K | 262K | 30 RPM, 14,400 RPD | Details | ||
| Groq | deepseek-r1-distill-70b | 131K | 8K | 30 RPM, 14,400 RPD | Details | ||
| Groq | whisper-large-v3 | 131K | 131K | 20 RPM, 2,000 RPD | Details | ||
| Groq | whisper-large-v3-turbo | 131K | 131K | 20 RPM, 2,000 RPD | Details | ||
| Hugging Face | Meta-Llama-3.1-8B-Instruct | 128K | 4K | ~1,000 RPD | Details | ||
| Hugging Face | Mistral-7B-Instruct-v0.3 | 32K | 4K | ~1,000 RPD | Details | ||
| Hugging Face | Mixtral-8x7B-Instruct-v0.1 | 32K | 4K | ~1,000 RPD | Details | ||
| Hugging Face | Phi-3.5-mini-instruct | 128K | 4K | ~1,000 RPD | Details | ||
| Hugging Face | Qwen2.5-7B-Instruct | 131K | 4K | ~1,000 RPD | Details | ||
| Hugging Face | + thousands of community models | 131K | 131K | ~$0.10/month free credits | Details | ||
| Kilo Code | bytedance-seed/dola-seed-2.0-pro:free | 131K | 131K | ~200 req/hr | Details | ||
| Kilo Code | x-ai/grok-code-fast-1:optimized:free | 131K | 131K | ~200 req/hr | Details | ||
| Kilo Code | nvidia/nemotron-3-super-120b-a12b:free | 262K | 32K | ~200 req/hr | Details | ||
| Kilo Code | arcee-ai/trinity-large-thinking:free | 131K | 131K | ~200 req/hr | Details | ||
| Kilo Code | openrouter/free | 131K | 131K | ~200 req/hr | Details | ||
| LLM7.io | deepseek-r1-0528 | 131K | 131K | 30 RPM (120 with token) | Details | ||
| LLM7.io | deepseek-v3-0324 | 131K | 131K | 30 RPM (120 with token) | Details | ||
| LLM7.io | gpt-4o-mini | 131K | 131K | 30 RPM (120 with token) | Details | ||
| LLM7.io | mistral-small-3.1-24b | 32K | 131K | 30 RPM (120 with token) | Details | ||
| LLM7.io | qwen2.5-coder-32b | 131K | 131K | 30 RPM (120 with token) | Details | ||
| LLM7.io | + ~24 more models | 131K | 131K | 30 RPM (120 with token) | Details | ||
| ModelScope | Qwen/Qwen3.5-35B-A3B | 131K | 131K | 2,000 RPD total; <=500 RPD/model (dynamic) | Details | ||
| ModelScope | Qwen/Qwen3.5-27B | 131K | 131K | 2,000 RPD total; <=500 RPD/model (dynamic) | Details | ||
| ModelScope | Qwen/Qwen-Image | 131K | 131K | 2,000 RPD total; model/AIGC-specific caps | Details | ||
| ModelScope | + API-Inference-enabled models | 131K | 131K | Dynamic quotas + dynamic concurrency | Details | ||
| NVIDIA NIM | deepseek-ai/deepseek-r1 | 128K | 163K | ~40 RPM | Details | ||
| NVIDIA NIM | nvidia/llama-3.1-nemotron-ultra-253b-v1 | 128K | 4K | ~40 RPM | Details | ||
| NVIDIA NIM | nvidia/nemotron-3-super-120b-a12b | 262K | 262K | ~40 RPM | Details | ||
| NVIDIA NIM | nvidia/nemotron-3-nano-30b-a3b | 128K | 32K | ~40 RPM | Details | ||
| NVIDIA NIM | meta/llama-3.1-405b-instruct | 128K | 4K | ~40 RPM | Details | ||
| NVIDIA NIM | qwen/qwen2.5-72b-instruct | 128K | 8K | ~40 RPM | Details | ||
| NVIDIA NIM | google/gemma-4-31b | 128K | 8K | ~40 RPM | Details | ||
| NVIDIA NIM | mistralai/mistral-large-2-instruct | 128K | 4K | ~40 RPM | Details | ||
| NVIDIA NIM | nvidia/nemotron-nano-2-vl | 128K | 8K | ~40 RPM | Details | ||
| NVIDIA NIM | minimax/minimax-m2.7 | 128K | 8K | ~40 RPM | Details | ||
| NVIDIA NIM | + 90 more models | 131K | 131K | ~40 RPM | Details | ||
| Ollama Cloud | llama3.1:cloud | 128K | 131K | Session/weekly limits (unpublished) | Details | ||
| Ollama Cloud | deepseek-r1:cloud | 128K | 131K | Session/weekly limits (unpublished) | Details | ||
| Ollama Cloud | qwen2.5:cloud | 128K | 131K | Session/weekly limits (unpublished) | Details | ||
| Ollama Cloud | gemma2:cloud | 8K | 131K | Session/weekly limits (unpublished) | Details | ||
| Ollama Cloud | mistral:cloud | 32K | 131K | Session/weekly limits (unpublished) | Details | ||
| Ollama Cloud | + 400 more models | 131K | 131K | Session/weekly limits (unpublished) | Details | ||
| OVHcloud AI Endpoints | Meta-Llama-3_3-70B-Instruct | 131K | 4K | 2 RPM (anonymous) | Details | ||
| OVHcloud AI Endpoints | DeepSeek-R1-Distill-Llama-70B | 131K | 32K | 2 RPM (anonymous) | Details | ||
| OVHcloud AI Endpoints | Qwen3-Coder-30B-A3B-Instruct | 262K | 32K | 2 RPM (anonymous) | Details | ||
| OVHcloud AI Endpoints | Qwen2.5-VL-72B-Instruct | 128K | 8K | 2 RPM (anonymous) | Details | ||
| OVHcloud AI Endpoints | Mistral-Nemo-Instruct-2407 | 128K | 4K | 2 RPM (anonymous) | Details | ||
| OVHcloud AI Endpoints | Qwen3Guard-Gen-8B | 32K | 4K | 2 RPM (anonymous) | Details | ||
| OVHcloud AI Endpoints | Qwen3Guard-Gen-0.6B | 32K | 4K | 2 RPM (anonymous) | Details | ||
| OVHcloud AI Endpoints | + 30 more models | 131K | 131K | 2 RPM (anonymous) | Details | ||
| SiliconFlow | Qwen/Qwen3-8B | 131K | 131K | 1,000 RPM, 50K TPM | Details | ||
| SiliconFlow | deepseek-ai/DeepSeek-R1-0528-Qwen3-8B | 33K | 16K | 1,000 RPM, 50K TPM | Details | ||
| SiliconFlow | deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | 131K | 131K | 1,000 RPM, 50K TPM | Details | ||
| SiliconFlow | THUDM/glm-4-9b-chat | 32K | 32K | 1,000 RPM, 50K TPM | Details | ||
| SiliconFlow | THUDM/GLM-4.1V-9B-Thinking | 66K | 66K | 1,000 RPM, 50K TPM | Details | ||
| SiliconFlow | deepseek-ai/DeepSeek-OCR | 131K | 8K | 1,000 RPM, 50K TPM | Details | ||
| SiliconFlow | + embedding/speech models | 131K | 131K | 1,000 RPM, 50K TPM | Details | ||
| SiliconFlow | Abbreviation | 131K | 8K | See provider page | Details |
How to Use Free LLM API Resources
- Pick a model — Click any model name to see details, rate limits, and API key signup link.
- Get your API key — Sign up on the provider's website (most require no credit card).
- Copy the config — Each model page includes ready-to-copy config snippets for Claude Code, Cursor, and OpenAI-compatible tools.
- Test it — Use the Playground to test your API key before integrating.
Frequently Asked Questions
How do I get a free LLM API key?
Click any model to see its provider page, then click "Get API Key" to sign up. Most providers (Google AI Studio, Groq, NVIDIA NIM, OpenRouter) require no credit card.
Which models work with Claude Code?
Any OpenAI-compatible model works with Claude Code. Filter by "OpenAI-compatible" or check each model's detail page for config snippets.
What do the rate limits mean?
RPM = requests per minute, RPD = requests per day, TPM = tokens per minute. These are the free tier limits — check the provider's website for paid tier options.