For general conversation, look for low latency, strong instruction following, and a helpful personality. Gemini 2.5 Flash offers the largest free context window (1M tokens) with multimodal support. Llama 3.3 70B via Groq delivers the fastest tokens-per-second. Qwen3.5 models on NVIDIA NIM strike a balance of quality and speed.
What to Look for in a Chat Model
Chat models are the most common type of LLM, but they vary significantly in quality for conversation use:
- Latency / tokens per second — Real-time conversation needs fast responses. Groq's LPU hardware delivers the fastest inference (Llama 3.3 70B hits 100+ tok/s). NVIDIA NIM and OpenRouter are slower but offer more model variety.
- Context window — Long conversations or document Q&A need a large context window. Gemini 2.5 Flash (1M ctx) can hold an entire book in memory. Most chat models have 32K–128K, which handles typical back-and-forth conversations easily.
- Instruction following — A good chat model stays on-topic, follows system prompts, and avoids hallucinating. Llama 3.3 70B and Qwen3 are known for strong instruction adherence.
- Multilingual support — If you chat in non-English languages, check the model's training data. Qwen3 has strong Chinese/English bilingual performance. Gemini and Llama support 30+ languages.
- Multimodal input — Want to share images or audio in chat? Gemini 2.5 Flash accepts text, image, audio, and video. Most chat models are text-only.
How to Choose a Free Chat Model
Match the model to your chat use case:
- Casual conversation / chatbot? → Prioritize latency and personality. Llama 3.3 70B via Groq (fastest) or Gemini 2.5 Flash via Google AI Studio (most capable).
- Long-form Q&A / document chat? → Maximize context window. Gemini 2.5 Flash (1M) or Qwen3.5 122B (262K via NVIDIA NIM).
- Multilingual chat? → Qwen3.5 excels in Chinese-English. Gemini supports 30+ languages. Llama covers major European and Asian languages.
- Roleplay / creative conversation? → Look for models with strong creative writing. Llama 3.3 70B and Mistral models tend to have more varied output styles.
- Customer support bot? → Instruction following and safety are critical. Gemini and Qwen3 are well-aligned. Avoid unmoderated open models unless you add guardrails.
Top Picks for Chat
1M context, multimodal, free tier with 10 RPM / 250 RPD. Best all-round chat model.
Meta: Llama 3.3 70B Instruct GroqFastest inference via Groq LPU, strong instruction following, no credit card.
Qwen: Qwen3.5 122B A10B NVIDIA NIM262K context, strong bilingual (Chinese-English), 40 RPM with no daily cap.
NVIDIA: Nemotron 3 Super (free) OpenRouter262K context, strong reasoning, solid chat performance.
All Free Chat Models
| Provider | Model | Context | Max Output | Modality | Rate Limit | Released | |
|---|---|---|---|---|---|---|---|
| OpenRouter | Cohere: North Mini Code (free) | 256K | 64K | 200 req/day (free tier) | Jun 9, 2026 | Details | |
| OpenRouter | Nex AGI: Nex-N2-Pro (free) | 262K | 262K | 200 req/day (free tier) | Jun 8, 2026 | Details | |
| OpenRouter | NVIDIA: Nemotron 3.5 Content Safety (free) | 128K | 8K | 200 req/day (free tier) | Jun 4, 2026 | Details | |
| OpenRouter | NVIDIA: Nemotron 3 Ultra (free) | 1.0M | 66K | 200 req/day (free tier) | Jun 4, 2026 | Details | |
| OpenRouter | MiniMax: MiniMax M3 | 1.0M | 512K | 200 req/day (free tier) | Jun 1, 2026 | Details | |
| OpenRouter | inclusionAI: Ring-2.6-1T | 262K | 66K | 200 req/day (free tier) | May 8, 2026 | Details | |
| OpenRouter | Owl Alpha | 1.0M | 262K | 200 req/day (free tier) | Apr 28, 2026 | Details | |
| OpenRouter | NVIDIA: Nemotron 3 Nano Omni (free) | 256K | 66K | 200 req/day (free tier) | Apr 28, 2026 | Details | |
| OpenRouter | Poolside: Laguna XS.2 (free) | 262K | 33K | 200 req/day (free tier) | Apr 28, 2026 | Details | |
| OpenRouter | Poolside: Laguna M.1 (free) | 262K | 33K | 200 req/day (free tier) | Apr 28, 2026 | Details | |
| OpenRouter | DeepSeek: DeepSeek V4 Flash | 1.0M | 66K | 200 req/day (free tier) | Apr 24, 2026 | Details | |
| OpenRouter | MoonshotAI: Kimi K2.6 | 262K | 262K | 200 req/day (free tier) | Apr 20, 2026 | Details | |
| OpenRouter | Z.ai: GLM 5.1 | 203K | 8K | 200 req/day (free tier) | Apr 7, 2026 | Details | |
| OpenRouter | Google: Gemma 4 26B A4B (free) | 262K | 33K | 200 req/day (free tier) | Apr 2, 2026 | Details | |
| OpenRouter | Google: Gemma 4 31B (free) | 262K | 8K | 200 req/day (free tier) | Apr 2, 2026 | Details | |
| OpenRouter | Arcee AI: Trinity Large Thinking | 262K | 262K | 200 req/day (free tier) | Apr 1, 2026 | Details | |
| OpenRouter | Google: Lyria 3 Pro Preview | 1.0M | 66K | 200 req/day (free tier) | Mar 30, 2026 | Details | |
| OpenRouter | Google: Lyria 3 Clip Preview | 1.0M | 66K | 200 req/day (free tier) | Mar 30, 2026 | Details | |
| OpenRouter | NVIDIA: Nemotron 3 Super (free) | 1.0M | 262K | 200 req/day (free tier) | Mar 11, 2026 | Details | |
| OpenRouter | MiniMax: MiniMax M2.5 | 205K | 197K | 200 req/day (free tier) | Feb 12, 2026 | Details | |
| OpenRouter | Free Models Router | 200K | 8K | 200 req/day (free tier) | Feb 1, 2026 | Details | |
| OpenRouter | LiquidAI: LFM2.5-1.2B-Thinking (free) | 33K | 8K | 200 req/day (free tier) | Jan 20, 2026 | Details | |
| OpenRouter | LiquidAI: LFM2.5-1.2B-Instruct (free) | 33K | 8K | 200 req/day (free tier) | Jan 5, 2026 | Details | |
| OpenRouter | NVIDIA: Nemotron 3 Nano 30B A3B (free) | 256K | 8K | 200 req/day (free tier) | Dec 14, 2025 | Details | |
| OpenRouter | OpenAI: gpt-oss-safeguard-20b | 131K | 66K | 200 req/day (free tier) | Oct 29, 2025 | Details | |
| OpenRouter | NVIDIA: Nemotron Nano 12B 2 VL (free) | 128K | 128K | 200 req/day (free tier) | Oct 28, 2025 | Details | |
| OpenRouter | Qwen: Qwen3 Next 80B A3B Instruct (free) | 262K | 8K | 200 req/day (free tier) | Sep 11, 2025 | Details | |
| OpenRouter | NVIDIA: Nemotron Nano 9B V2 (free) | 128K | 8K | 200 req/day (free tier) | Sep 5, 2025 | Details | |
| OpenRouter | OpenAI: gpt-oss-120b (free) | 131K | 131K | 200 req/day (free tier) | Aug 5, 2025 | Details | |
| OpenRouter | OpenAI: gpt-oss-20b (free) | 131K | 33K | 200 req/day (free tier) | Aug 5, 2025 | Details | |
| OpenRouter | Z.ai: GLM 4.5 Air | 131K | 98K | 200 req/day (free tier) | Jul 28, 2025 | Details | |
| OpenRouter | Qwen: Qwen3 Coder 480B A35B (free) | 1.0M | 262K | 200 req/day (free tier) | Jul 23, 2025 | Details | |
| OpenRouter | Venice: Uncensored (free) | 33K | 8K | 200 req/day (free tier) | Jul 9, 2025 | Details | |
| OpenRouter | Meta: Llama 3.3 70B Instruct (free) | 131K | 8K | 200 req/day (free tier) | Dec 6, 2024 | Details | |
| OpenRouter | Meta: Llama 3.2 3B Instruct (free) | 131K | 8K | 200 req/day (free tier) | Sep 25, 2024 | Details | |
| OpenRouter | Nous: Hermes 3 405B Instruct (free) | 131K | 8K | 200 req/day (free tier) | Aug 16, 2024 | Details | |
| Aion Labs | Aion 2.5 | 128K | 32K | 15 RPM, 20K TPD | — | Details | |
| Aion Labs | Aion 2.0 | 128K | 32K | 15 RPM, 20K TPD | Feb 23, 2026 | Details | |
| Aion Labs | Aion-RP 1.0 (8B) | 32K | 8K | 15 RPM, 20K TPD | — | Details | |
| Cohere | Command A+ (218B) | 128K | 4K | 20 RPM | — | Details | |
| Cohere | Command A (111B) | 256K | 4K | 20 RPM | — | Details | |
| Cohere | Command R+ | 128K | 4K | 20 RPM | — | Details | |
| Cohere | Command R7B | 128K | 4K | 20 RPM | — | Details | |
| Google Gemini | Gemini 3.5 Flash | 1.0M | 64K | 15 RPM, 1,500 RPD | May 19, 2026 | Details | |
| Google Gemini | Gemini 3.1 Flash-Lite | 1.0M | 65K | 30 RPM, 1,500 RPD | Mar 3, 2026 | Details | |
| Google Gemini | Gemini 2.5 Flash | 1.0M | 65K | 15 RPM, 1,500 RPD | May 20, 2025 | Details | |
| Google Gemini | Gemini 2.5 Pro | 2.0M | 65K | 5 RPM, 50 RPD | Jun 5, 2025 | Details | |
| Mistral AI | Mistral Medium 3.5 (128B) | 256K | 256K | ~1 RPS, 500K TPM | — | Details | |
| Mistral AI | Mistral Small 4 | 256K | 256K | ~1 RPS, 500K TPM | Mar 16, 2026 | Details | |
| Mistral AI | Mistral Large 3 | 256K | 256K | ~1 RPS, 500K TPM | Dec 2, 2025 | Details | |
| Mistral AI | Mistral Nemo (12B) | 128K | 128K | ~1 RPS, 500K TPM | — | Details | |
| Mistral AI | Codestral | 256K | 256K | ~1 RPS, 500K TPM | — | Details | |
| Mistral AI | Pixtral Large | 128K | 128K | ~1 RPS, 500K TPM | Nov 18, 2024 | Details | |
| Z AI (Zhipu AI) | GLM-4.7-Flash | 200K | 128K | 1 concurrent request | Jan 19, 2026 | Details | |
| Z AI (Zhipu AI) | GLM-4.6V-Flash | 128K | 4K | 1 concurrent request | — | Details | |
| Cerebras | gpt-oss-120b | 128K | 8K | 30 RPM, 14,400 RPD, 1M TPD | Aug 5, 2025 | Details | |
| Cerebras | zai-glm-4.7 | 128K | 8K | 10 RPM, 100 RPD, 1M TPD | — | Details | |
| Cloudflare Workers AI | @cf/meta/llama-3.3-70b-instruct-fp8-fast | 131K | 131K | 10K neurons/day (shared) | Dec 6, 2024 | Details | |
| Cloudflare Workers AI | @cf/meta/llama-4-scout-17b-16e-instruct | 10.0M | 131K | 10K neurons/day (shared) | — | Details | |
| Cloudflare Workers AI | @cf/openai/gpt-oss-120b | 128K | 131K | 10K neurons/day (shared) | — | Details | |
| Cloudflare Workers AI | @cf/moonshotai/kimi-k2.7-code | 262K | 131K | 10K neurons/day (shared) | — | Details | |
| Cloudflare Workers AI | @cf/google/gemma-4-26b-a4b-it | 256K | 131K | 10K neurons/day (shared) | Apr 2, 2026 | Details | |
| Cloudflare Workers AI | @cf/zhipuai/glm-4.7-flash | 131K | 131K | 10K neurons/day (shared) | — | Details | |
| Cloudflare Workers AI | @cf/mistralai/mistral-small-3.1-24b-instruct | 128K | 131K | 10K neurons/day (shared) | Mar 17, 2025 | Details | |
| Cloudflare Workers AI | @cf/deepseek-ai/deepseek-r1-distill-qwen-32b | 32K | 131K | 10K neurons/day (shared) | Jan 20, 2025 | Details | |
| GitHub Models | gpt-5 | 200K | 32K | 10 RPM, 50 RPD | Aug 7, 2025 | Details | |
| GitHub Models | gpt-4.1 | 1.0M | 32K | 10 RPM, 50 RPD | Apr 14, 2025 | Details | |
| GitHub Models | gpt-4.1-mini | 1.0M | 32K | 15 RPM, 150 RPD | Apr 14, 2025 | Details | |
| GitHub Models | gpt-4o | 128K | 16K | 10 RPM, 50 RPD | May 13, 2024 | Details | |
| GitHub Models | o4-mini | 200K | 100K | 10 RPM, 50 RPD | Apr 16, 2025 | Details | |
| GitHub Models | Llama-4-Scout-17B-16E | 512K | 4K | 15 RPM, 150 RPD | — | Details | |
| GitHub Models | Llama-4-Maverick-17B-128E | 256K | 4K | 10 RPM, 50 RPD | — | Details | |
| GitHub Models | Meta-Llama-3.3-70B | 131K | 4K | 15 RPM, 150 RPD | Dec 6, 2024 | Details | |
| GitHub Models | DeepSeek-R1 | 64K | 8K | 15 RPM, 150 RPD | May 28, 2025 | Details | |
| GitHub Models | Mistral-Small-3.1 | 128K | 4K | 15 RPM, 150 RPD | Mar 17, 2025 | Details | |
| Groq | llama-3.3-70b-versatile | 131K | 32K | 30 RPM, 1,000 RPD | Dec 6, 2024 | Details | |
| Groq | llama-3.1-8b-instant | 131K | 131K | 30 RPM, 1,000 RPD | Jul 23, 2024 | Details | |
| Groq | llama-4-scout-17b-16e-instruct | 131K | 8K | 30 RPM, 1,000 RPD | — | Details | |
| Groq | qwen3-32b | 131K | 131K | 30 RPM, 1,000 RPD | Apr 28, 2025 | Details | |
| Hugging Face | Meta-Llama-3.1-8B-Instruct | 128K | 4K | Credit-metered | Jul 23, 2024 | Details | |
| Hugging Face | Mistral-7B-Instruct-v0.3 | 32K | 4K | Credit-metered | — | Details | |
| Hugging Face | Mixtral-8x7B-Instruct-v0.1 | 32K | 4K | Credit-metered | — | Details | |
| Hugging Face | Phi-3.5-mini-instruct | 128K | 4K | Credit-metered | — | Details | |
| Hugging Face | Qwen2.5-7B-Instruct | 131K | 4K | Credit-metered | Oct 16, 2024 | Details | |
| Kilo Code | x-ai/grok-code-fast-1:free | 256K | 131K | ~200 req/hr | Aug 28, 2025 | Details | |
| Kilo Code | minimax/minimax-m2.5:free | 196K | 8K | ~200 req/hr | Feb 12, 2026 | Details | |
| Kilo Code | bytedance-seed/dola-seed-2.0-pro:free | 131K | 131K | ~200 req/hr | — | Details | |
| Kilo Code | nvidia/nemotron-3-super-120b-a12b:free | 262K | 32K | ~200 req/hr | Mar 11, 2026 | Details | |
| Kilo Code | arcee-ai/trinity-large-thinking:free | 131K | 131K | ~200 req/hr | Apr 1, 2026 | Details | |
| LLM7.io | deepseek-r1-0528 | 131K | 131K | 30 RPM (120 with token) | May 28, 2025 | Details | |
| LLM7.io | deepseek-v3-0324 | 131K | 131K | 30 RPM (120 with token) | Mar 25, 2025 | Details | |
| LLM7.io | gemini-2.5-flash-lite | 131K | 131K | 30 RPM (120 with token) | Jun 17, 2025 | Details | |
| LLM7.io | gpt-4o-mini | 131K | 131K | 30 RPM (120 with token) | Jul 18, 2024 | Details | |
| LLM7.io | mistral-small-3.1-24b | 32K | 131K | 30 RPM (120 with token) | Mar 17, 2025 | Details | |
| LLM7.io | qwen2.5-coder-32b | 131K | 131K | 30 RPM (120 with token) | Nov 11, 2024 | Details | |
| ModelScope | Qwen/Qwen3.5-35B-A3B | 131K | 131K | 2,000 RPD total; <=500 RPD/model (dynamic) | Feb 24, 2026 | Details | |
| ModelScope | Qwen/Qwen3.5-27B | 131K | 131K | 2,000 RPD total; <=500 RPD/model (dynamic) | Feb 24, 2026 | Details | |
| Ollama Cloud | gpt-oss:120b-cloud | 128K | 131K | Session/weekly limits (unpublished) | — | Details | |
| Ollama Cloud | deepseek-v3.1:671b-cloud | 128K | 131K | Session/weekly limits (unpublished) | — | Details | |
| Ollama Cloud | qwen3-coder:480b-cloud | 128K | 131K | Session/weekly limits (unpublished) | — | Details | |
| Ollama Cloud | kimi-k2:1t-cloud | 262K | 131K | Session/weekly limits (unpublished) | — | Details | |
| Ollama Cloud | glm-4.6:cloud | 128K | 131K | Session/weekly limits (unpublished) | — | Details | |
| Ollama Cloud | deepseek-r1:cloud | 128K | 131K | Session/weekly limits (unpublished) | — | Details | |
| OVHcloud AI Endpoints | Qwen3.5-397B-A17B | 131K | 32K | 2 RPM (anonymous) | Feb 16, 2026 | Details | |
| OVHcloud AI Endpoints | gpt-oss-20b | 128K | 8K | 2 RPM (anonymous) | Aug 5, 2025 | Details | |
| OVHcloud AI Endpoints | Meta-Llama-3_3-70B-Instruct | 131K | 4K | 2 RPM (anonymous) | Dec 6, 2024 | Details | |
| OVHcloud AI Endpoints | Llama-3.1-8B-Instruct | 131K | 4K | 2 RPM (anonymous) | Jul 23, 2024 | Details | |
| OVHcloud AI Endpoints | Qwen3.6-27B | 131K | 32K | 2 RPM (anonymous) | Apr 22, 2026 | Details | |
| OVHcloud AI Endpoints | Qwen3.5-9B | 131K | 8K | 2 RPM (anonymous) | Mar 2, 2026 | Details | |
| OVHcloud AI Endpoints | Qwen3-Coder-30B-A3B-Instruct | 262K | 32K | 2 RPM (anonymous) | Jul 31, 2025 | Details | |
| OVHcloud AI Endpoints | Qwen2.5-VL-72B-Instruct | 128K | 8K | 2 RPM (anonymous) | Feb 1, 2025 | Details | |
| OVHcloud AI Endpoints | Mistral-Small-3.2-24B-Instruct | 128K | 4K | 2 RPM (anonymous) | Jun 20, 2025 | Details | |
| OVHcloud AI Endpoints | Mistral-Nemo-Instruct-2407 | 128K | 4K | 2 RPM (anonymous) | — | Details | |
| SambaNova | DeepSeek-V3.1 | 128K | 8K | 20 RPM, 20 RPD, 200K TPD | Aug 21, 2025 | Details | |
| SambaNova | DeepSeek-V3.2 (Preview) | 128K | 8K | 20 RPM, 20 RPD, 200K TPD | — | Details | |
| SambaNova | MiniMax-M2.7 | 128K | 8K | 20 RPM, 20 RPD, 200K TPD | Mar 18, 2026 | Details | |
| SambaNova | gemma-4-31B-it (Preview) | 128K | 8K | 20 RPM, 20 RPD, 200K TPD | — | Details | |
| SiliconFlow | deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | 131K | 131K | 30 RPM, 60K TPM | — | Details | |
| SiliconFlow | Abbreviation | 131K | 8K | See provider page | — | Details | |
| NVIDIA NIM | 01-ai/yi-large | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | adept/fuyu-8b | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | ai21labs/jamba-1.5-large-instruct | 131K | 8K | Up to 40 RPM | Aug 22, 2024 | Details | |
| NVIDIA NIM | aisingapore/sea-lion-7b-instruct | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | baai/bge-m3 | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | bigcode/starcoder2-15b | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | databricks/dbrx-instruct | 131K | 8K | Up to 40 RPM | Mar 27, 2024 | Details | |
| NVIDIA NIM | deepseek-ai/deepseek-coder-6.7b-instruct | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | deepseek-ai/deepseek-v4-flash | 1.0M | 66K | Up to 40 RPM | Apr 24, 2026 | Details | |
| NVIDIA NIM | deepseek-ai/deepseek-v4-pro | 1.0M | 384K | Up to 40 RPM | Apr 24, 2026 | Details | |
| NVIDIA NIM | google/codegemma-1.1-7b | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | google/codegemma-7b | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | google/deplot | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | google/gemma-2b | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | google/recurrentgemma-2b | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | ibm/granite-3.0-3b-a800m-instruct | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | ibm/granite-3.0-8b-instruct | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | ibm/granite-34b-code-instruct | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | ibm/granite-8b-code-instruct | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | meta/codellama-70b | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | meta/llama-3.1-70b-instruct | 131K | 16K | Up to 40 RPM | Jul 23, 2024 | Details | |
| NVIDIA NIM | meta/llama-3.2-11b-vision-instruct | 131K | 16K | Up to 40 RPM | Sep 25, 2024 | Details | |
| NVIDIA NIM | meta/llama-3.2-1b-instruct | 131K | 60K | Up to 40 RPM | Sep 25, 2024 | Details | |
| NVIDIA NIM | meta/llama-3.2-3b-instruct | 131K | 8K | Up to 40 RPM | Sep 25, 2024 | Details | |
| NVIDIA NIM | meta/llama-guard-4-12b | 164K | 16K | Up to 40 RPM | Apr 30, 2025 | Details | |
| NVIDIA NIM | meta/llama2-70b | 131K | 8K | Up to 40 RPM | Jul 18, 2023 | Details | |
| NVIDIA NIM | microsoft/kosmos-2 | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | microsoft/phi-3-vision-128k-instruct | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | microsoft/phi-3.5-moe-instruct | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | microsoft/phi-4-multimodal-instruct | 131K | 8K | Up to 40 RPM | Feb 26, 2025 | Details | |
| NVIDIA NIM | minimaxai/minimax-m2.7 | 205K | 131K | Up to 40 RPM | Mar 18, 2026 | Details | |
| NVIDIA NIM | minimaxai/minimax-m3 | 1.0M | 512K | Up to 40 RPM | Jun 1, 2026 | Details | |
| NVIDIA NIM | mistralai/codestral-22b-instruct-v0.1 | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | mistralai/mistral-7b-instruct-v0.3 | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | mistralai/mistral-large-2-instruct | 131K | 8K | Up to 40 RPM | Nov 18, 2024 | Details | |
| NVIDIA NIM | mistralai/mixtral-8x22b-v0.1 | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | moonshotai/kimi-k2.6 | 262K | 262K | Up to 40 RPM | Apr 20, 2026 | Details | |
| NVIDIA NIM | nv-mistralai/mistral-nemo-12b-instruct | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | nvidia/cosmos-reason2-8b | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | nvidia/embed-qa-4 | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | nvidia/llama-3.1-nemotron-51b-instruct | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | nvidia/llama-3.1-nemotron-70b-instruct | 131K | 8K | Up to 40 RPM | Oct 15, 2024 | Details | |
| NVIDIA NIM | nvidia/llama-3.1-nemotron-ultra-253b-v1 | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | nvidia/llama-3.2-nemoretriever-1b-vlm-embed-v1 | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | nvidia/llama-3.2-nv-embedqa-1b-v1 | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | nvidia/llama-3.3-nemotron-super-49b-v1.5 | 131K | 16K | Up to 40 RPM | Oct 10, 2025 | Details | |
| NVIDIA NIM | nvidia/llama-nemotron-embed-1b-v2 | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | nvidia/llama-nemotron-embed-vl-1b-v2 | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | nvidia/llama3-chatqa-1.5-70b | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | nvidia/mistral-nemo-minitron-8b-8k-instruct | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | nvidia/nemoretriever-parse | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | nvidia/nemotron-3.5-content-safety | 128K | 8K | Up to 40 RPM | Jun 4, 2026 | Details | |
| NVIDIA NIM | nvidia/nemotron-4-340b-instruct | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | nvidia/nemotron-4-340b-reward | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | nvidia/nemotron-nano-3-30b-a3b | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | nvidia/nemotron-parse | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | nvidia/neva-22b | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | nvidia/nv-embed-v1 | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | nvidia/nv-embedcode-7b-v1 | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | nvidia/nv-embedqa-e5-v5 | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | nvidia/nv-embedqa-mistral-7b-v2 | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | nvidia/nvclip | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | nvidia/riva-translate-4b-instruct | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | nvidia/vila | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | qwen/qwen3.5-122b-a10b | 262K | 262K | Up to 40 RPM | Feb 24, 2026 | Details | |
| NVIDIA NIM | qwen/qwen3.5-397b-a17b | 256K | 8K | Up to 40 RPM | Feb 16, 2026 | Details | |
| NVIDIA NIM | snowflake/arctic-embed-l | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | stepfun-ai/step-3.5-flash | 262K | 16K | Up to 40 RPM | Feb 2, 2026 | Details | |
| NVIDIA NIM | stepfun-ai/step-3.7-flash | 256K | 256K | Up to 40 RPM | May 29, 2026 | Details | |
| NVIDIA NIM | writer/palmyra-creative-122b | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | writer/palmyra-fin-70b-32k | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | writer/palmyra-med-70b | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | writer/palmyra-med-70b-32k | 131K | 8K | Up to 40 RPM | — | Details | |
| NVIDIA NIM | z-ai/glm-5.1 | 203K | 8K | Up to 40 RPM | Apr 7, 2026 | Details | |
| NVIDIA NIM | zyphra/zamba2-7b-instruct | 131K | 8K | Up to 40 RPM | — | Details | |
| AI21 Labs | Jamba Large 1.7 | 256K | 4K | 200 RPM, 10 RPS | Aug 8, 2025 | Details | |
| AI21 Labs | Jamba Mini 2 | 256K | 4K | 200 RPM, 10 RPS | — | Details | |
| Aion Labs | aion-1.0 | 131K | 32K | Daily token allowance | Feb 4, 2025 | Details | |
| Aion Labs | aion-1.0-mini | 131K | 32K | Daily token allowance | Feb 4, 2025 | Details | |
| Alibaba Cloud Model Studio | Qwen3-Max | 128K | 32K | Tiered by region | Sep 23, 2025 | Details | |
| Alibaba Cloud Model Studio | Qwen3-Plus | 1.0M | 32K | Tiered by region | — | Details | |
| Alibaba Cloud Model Studio | Qwen3-VL-Plus | 128K | 8K | Tiered by region | — | Details | |
| Alibaba Cloud Model Studio | Qwen3-Coder-Plus | 256K | 8K | Tiered by region | Sep 23, 2025 | Details | |
| Alibaba Cloud Model Studio | QwQ-Plus | 131K | 32K | Tiered by region | — | Details | |
| Cohere | Embed 4 | 131K | 131K | 2,000 inputs/min | — | Details | |
| Cohere | Rerank 3.5 | 131K | 131K | 10 RPM | — | Details | |
| DeepSeek | deepseek-chat (V3.2) | 128K | 8K | Dynamic | Dec 1, 2025 | Details | |
| DeepSeek | deepseek-reasoner (R1) | 128K | 8K | Dynamic | — | Details | |
| Google Gemini | Gemini 3 Flash (Preview) | 1.0M | 65K | Preview limits | — | Details | |
| Mistral AI | Mistral Medium 3 | 128K | 128K | ~1 RPS, 500K TPM | May 7, 2025 | Details | |
| xAI | grok-4.3 | 1.0M | 32K | Credit-based | Apr 30, 2026 | Details | |
| xAI | grok-4.1-fast | 2.0M | 32K | Credit-based | Nov 19, 2025 | Details | |
| xAI | grok-3-mini | 131K | 8K | Credit-based | — | Details | |
| Z AI (Zhipu AI) | GLM-4.5-Flash | 128K | 8K | 1 concurrent request | — | Details | |
| Cerebras | llama-3.3-70b | 128K | 8K | 30 RPM, 14,400 RPD, 1M TPD | Dec 6, 2024 | Details | |
| Cerebras | qwen-3-235b-a22b-instruct-2507 | 131K | 8K | 30 RPM, 14,400 RPD, 1M TPD | Apr 28, 2025 | Details | |
| Cerebras | qwen-3-32b | 131K | 8K | 30 RPM, 14,400 RPD, 1M TPD | Apr 28, 2025 | Details | |
| Cloudflare Workers AI | @cf/meta/llama-3.1-8b-instruct-fp8-fast | 131K | 131K | 10K neurons/day (shared) | Jul 23, 2024 | Details | |
| Cloudflare Workers AI | @cf/meta/llama-3.2-11b-vision-instruct | 131K | 131K | 10K neurons/day (shared) | Sep 25, 2024 | Details | |
| Cloudflare Workers AI | @cf/moonshotai/kimi-k2.5 | 256K | 131K | 10K neurons/day (shared) | — | Details | |
| Groq | llama-4-maverick-17b-128e-instruct | 131K | 8K | 15 RPM, 500 RPD | — | Details | |
| Groq | kimi-k2-instruct | 262K | 262K | 30 RPM, 14,400 RPD | Sep 5, 2025 | Details | |
| Groq | deepseek-r1-distill-70b | 131K | 8K | 30 RPM, 14,400 RPD | — | Details | |
| Groq | whisper-large-v3 | 131K | 131K | 20 RPM, 2,000 RPD | — | Details | |
| Groq | whisper-large-v3-turbo | 131K | 131K | 20 RPM, 2,000 RPD | — | Details | |
| ModelScope | Qwen/Qwen-Image | 131K | 131K | 2,000 RPD total; model/AIGC-specific caps | — | Details | |
| Nebius | Qwen3-235B-A22B | 128K | 32K | Tier-based | Apr 28, 2025 | Details | |
| Nscale | Llama-3.3-70B-Instruct | 128K | 8K | Fair-use | Dec 6, 2024 | Details | |
| Nscale | DeepSeek-R1-Distill-Llama-70B | 128K | 32K | Fair-use | Jan 20, 2025 | Details | |
| OVHcloud AI Endpoints | Qwen3Guard-Gen-8B | 32K | 4K | 2 RPM (anonymous) | — | Details | |
| OVHcloud AI Endpoints | Qwen3Guard-Gen-0.6B | 32K | 4K | 2 RPM (anonymous) | — | Details | |
| SiliconFlow | deepseek-ai/DeepSeek-OCR | 131K | 8K | 30 RPM, 60K TPM | — | Details | |
| OpenRouter | Baidu Qianfan: CoBuddy | 131K | 65K | 200 req/day (free tier) | — | Details | |
| OpenRouter | NVIDIA: Llama Nemotron Embed VL 1B V2 (free) | 131K | 8K | 200 req/day (free tier) | Feb 25, 2026 | Details | |
| OpenRouter | NVIDIA: Llama Nemotron Rerank VL 1B V2 (free) | 10K | 8K | 200 req/day (free tier) | Jun 9, 2026 | Details |