Best Free LLM APIs for Chat

234 free models available for chat. How to choose a free LLM for chat →

For general conversation, look for low latency, strong instruction following, and a helpful personality. Gemini 2.5 Flash offers the largest free context window (1M tokens) with multimodal support. Llama 3.3 70B via Groq delivers the fastest tokens-per-second. Qwen3.5 models on NVIDIA NIM strike a balance of quality and speed.

What to Look for in a Chat Model

Chat models are the most common type of LLM, but they vary significantly in quality for conversation use:

  • Latency / tokens per second — Real-time conversation needs fast responses. Groq's LPU hardware delivers the fastest inference (Llama 3.3 70B hits 100+ tok/s). NVIDIA NIM and OpenRouter are slower but offer more model variety.
  • Context window — Long conversations or document Q&A need a large context window. Gemini 2.5 Flash (1M ctx) can hold an entire book in memory. Most chat models have 32K–128K, which handles typical back-and-forth conversations easily.
  • Instruction following — A good chat model stays on-topic, follows system prompts, and avoids hallucinating. Llama 3.3 70B and Qwen3 are known for strong instruction adherence.
  • Multilingual support — If you chat in non-English languages, check the model's training data. Qwen3 has strong Chinese/English bilingual performance. Gemini and Llama support 30+ languages.
  • Multimodal input — Want to share images or audio in chat? Gemini 2.5 Flash accepts text, image, audio, and video. Most chat models are text-only.

How to Choose a Free Chat Model

Match the model to your chat use case:

  • Casual conversation / chatbot? → Prioritize latency and personality. Llama 3.3 70B via Groq (fastest) or Gemini 2.5 Flash via Google AI Studio (most capable).
  • Long-form Q&A / document chat? → Maximize context window. Gemini 2.5 Flash (1M) or Qwen3.5 122B (262K via NVIDIA NIM).
  • Multilingual chat? → Qwen3.5 excels in Chinese-English. Gemini supports 30+ languages. Llama covers major European and Asian languages.
  • Roleplay / creative conversation? → Look for models with strong creative writing. Llama 3.3 70B and Mistral models tend to have more varied output styles.
  • Customer support bot? → Instruction following and safety are critical. Gemini and Qwen3 are well-aligned. Avoid unmoderated open models unless you add guardrails.

Top Picks for Chat

All Free Chat Models

Provider Model Context Max Output Modality Rate Limit Released
OpenRouter Cohere: North Mini Code (free) 256K 64K textcode 200 req/day (free tier) Jun 9, 2026 Details
OpenRouter Nex AGI: Nex-N2-Pro (free) 262K 262K textimage 200 req/day (free tier) Jun 8, 2026 Details
OpenRouter NVIDIA: Nemotron 3.5 Content Safety (free) 128K 8K textimage 200 req/day (free tier) Jun 4, 2026 Details
OpenRouter NVIDIA: Nemotron 3 Ultra (free) 1.0M 66K text 200 req/day (free tier) Jun 4, 2026 Details
OpenRouter MiniMax: MiniMax M3 1.0M 512K textimage 200 req/day (free tier) Jun 1, 2026 Details
OpenRouter inclusionAI: Ring-2.6-1T 262K 66K text 200 req/day (free tier) May 8, 2026 Details
OpenRouter Owl Alpha 1.0M 262K text 200 req/day (free tier) Apr 28, 2026 Details
OpenRouter NVIDIA: Nemotron 3 Nano Omni (free) 256K 66K textimageaudioreasoning 200 req/day (free tier) Apr 28, 2026 Details
OpenRouter Poolside: Laguna XS.2 (free) 262K 33K text 200 req/day (free tier) Apr 28, 2026 Details
OpenRouter Poolside: Laguna M.1 (free) 262K 33K text 200 req/day (free tier) Apr 28, 2026 Details
OpenRouter DeepSeek: DeepSeek V4 Flash 1.0M 66K text 200 req/day (free tier) Apr 24, 2026 Details
OpenRouter MoonshotAI: Kimi K2.6 262K 262K textimage 200 req/day (free tier) Apr 20, 2026 Details
OpenRouter Z.ai: GLM 5.1 203K 8K text 200 req/day (free tier) Apr 7, 2026 Details
OpenRouter Google: Gemma 4 26B A4B (free) 262K 33K textimage 200 req/day (free tier) Apr 2, 2026 Details
OpenRouter Google: Gemma 4 31B (free) 262K 8K textimage 200 req/day (free tier) Apr 2, 2026 Details
OpenRouter Arcee AI: Trinity Large Thinking 262K 262K textreasoning 200 req/day (free tier) Apr 1, 2026 Details
OpenRouter Google: Lyria 3 Pro Preview 1.0M 66K textimage 200 req/day (free tier) Mar 30, 2026 Details
OpenRouter Google: Lyria 3 Clip Preview 1.0M 66K textimage 200 req/day (free tier) Mar 30, 2026 Details
OpenRouter NVIDIA: Nemotron 3 Super (free) 1.0M 262K text 200 req/day (free tier) Mar 11, 2026 Details
OpenRouter MiniMax: MiniMax M2.5 205K 197K text 200 req/day (free tier) Feb 12, 2026 Details
OpenRouter Free Models Router 200K 8K textimage 200 req/day (free tier) Feb 1, 2026 Details
OpenRouter LiquidAI: LFM2.5-1.2B-Thinking (free) 33K 8K textreasoning 200 req/day (free tier) Jan 20, 2026 Details
OpenRouter LiquidAI: LFM2.5-1.2B-Instruct (free) 33K 8K text 200 req/day (free tier) Jan 5, 2026 Details
OpenRouter NVIDIA: Nemotron 3 Nano 30B A3B (free) 256K 8K text 200 req/day (free tier) Dec 14, 2025 Details
OpenRouter OpenAI: gpt-oss-safeguard-20b 131K 66K text 200 req/day (free tier) Oct 29, 2025 Details
OpenRouter NVIDIA: Nemotron Nano 12B 2 VL (free) 128K 128K textimage 200 req/day (free tier) Oct 28, 2025 Details
OpenRouter Qwen: Qwen3 Next 80B A3B Instruct (free) 262K 8K text 200 req/day (free tier) Sep 11, 2025 Details
OpenRouter NVIDIA: Nemotron Nano 9B V2 (free) 128K 8K text 200 req/day (free tier) Sep 5, 2025 Details
OpenRouter OpenAI: gpt-oss-120b (free) 131K 131K text 200 req/day (free tier) Aug 5, 2025 Details
OpenRouter OpenAI: gpt-oss-20b (free) 131K 33K text 200 req/day (free tier) Aug 5, 2025 Details
OpenRouter Z.ai: GLM 4.5 Air 131K 98K text 200 req/day (free tier) Jul 28, 2025 Details
OpenRouter Qwen: Qwen3 Coder 480B A35B (free) 1.0M 262K textcode 200 req/day (free tier) Jul 23, 2025 Details
OpenRouter Venice: Uncensored (free) 33K 8K text 200 req/day (free tier) Jul 9, 2025 Details
OpenRouter Meta: Llama 3.3 70B Instruct (free) 131K 8K text 200 req/day (free tier) Dec 6, 2024 Details
OpenRouter Meta: Llama 3.2 3B Instruct (free) 131K 8K text 200 req/day (free tier) Sep 25, 2024 Details
OpenRouter Nous: Hermes 3 405B Instruct (free) 131K 8K text 200 req/day (free tier) Aug 16, 2024 Details
Aion Labs Aion 2.5 128K 32K text 15 RPM, 20K TPD Details
Aion Labs Aion 2.0 128K 32K text 15 RPM, 20K TPD Feb 23, 2026 Details
Aion Labs Aion-RP 1.0 (8B) 32K 8K text 15 RPM, 20K TPD Details
Cohere Command A+ (218B) 128K 4K text 20 RPM Details
Cohere Command A (111B) 256K 4K text 20 RPM Details
Cohere Command R+ 128K 4K text 20 RPM Details
Cohere Command R7B 128K 4K text 20 RPM Details
Google Gemini Gemini 3.5 Flash 1.0M 64K text 15 RPM, 1,500 RPD May 19, 2026 Details
Google Gemini Gemini 3.1 Flash-Lite 1.0M 65K text 30 RPM, 1,500 RPD Mar 3, 2026 Details
Google Gemini Gemini 2.5 Flash 1.0M 65K text 15 RPM, 1,500 RPD May 20, 2025 Details
Google Gemini Gemini 2.5 Pro 2.0M 65K text 5 RPM, 50 RPD Jun 5, 2025 Details
Mistral AI Mistral Medium 3.5 (128B) 256K 256K text ~1 RPS, 500K TPM Details
Mistral AI Mistral Small 4 256K 256K text ~1 RPS, 500K TPM Mar 16, 2026 Details
Mistral AI Mistral Large 3 256K 256K text ~1 RPS, 500K TPM Dec 2, 2025 Details
Mistral AI Mistral Nemo (12B) 128K 128K text ~1 RPS, 500K TPM Details
Mistral AI Codestral 256K 256K textcode ~1 RPS, 500K TPM Details
Mistral AI Pixtral Large 128K 128K textimage ~1 RPS, 500K TPM Nov 18, 2024 Details
Z AI (Zhipu AI) GLM-4.7-Flash 200K 128K text 1 concurrent request Jan 19, 2026 Details
Z AI (Zhipu AI) GLM-4.6V-Flash 128K 4K text 1 concurrent request Details
Cerebras gpt-oss-120b 128K 8K text 30 RPM, 14,400 RPD, 1M TPD Aug 5, 2025 Details
Cerebras zai-glm-4.7 128K 8K text 10 RPM, 100 RPD, 1M TPD Details
Cloudflare Workers AI @cf/meta/llama-3.3-70b-instruct-fp8-fast 131K 131K text 10K neurons/day (shared) Dec 6, 2024 Details
Cloudflare Workers AI @cf/meta/llama-4-scout-17b-16e-instruct 10.0M 131K text 10K neurons/day (shared) Details
Cloudflare Workers AI @cf/openai/gpt-oss-120b 128K 131K text 10K neurons/day (shared) Details
Cloudflare Workers AI @cf/moonshotai/kimi-k2.7-code 262K 131K textcode 10K neurons/day (shared) Details
Cloudflare Workers AI @cf/google/gemma-4-26b-a4b-it 256K 131K text 10K neurons/day (shared) Apr 2, 2026 Details
Cloudflare Workers AI @cf/zhipuai/glm-4.7-flash 131K 131K text 10K neurons/day (shared) Details
Cloudflare Workers AI @cf/mistralai/mistral-small-3.1-24b-instruct 128K 131K text 10K neurons/day (shared) Mar 17, 2025 Details
Cloudflare Workers AI @cf/deepseek-ai/deepseek-r1-distill-qwen-32b 32K 131K textreasoning 10K neurons/day (shared) Jan 20, 2025 Details
GitHub Models gpt-5 200K 32K text 10 RPM, 50 RPD Aug 7, 2025 Details
GitHub Models gpt-4.1 1.0M 32K text 10 RPM, 50 RPD Apr 14, 2025 Details
GitHub Models gpt-4.1-mini 1.0M 32K text 15 RPM, 150 RPD Apr 14, 2025 Details
GitHub Models gpt-4o 128K 16K text 10 RPM, 50 RPD May 13, 2024 Details
GitHub Models o4-mini 200K 100K text 10 RPM, 50 RPD Apr 16, 2025 Details
GitHub Models Llama-4-Scout-17B-16E 512K 4K text 15 RPM, 150 RPD Details
GitHub Models Llama-4-Maverick-17B-128E 256K 4K text 10 RPM, 50 RPD Details
GitHub Models Meta-Llama-3.3-70B 131K 4K text 15 RPM, 150 RPD Dec 6, 2024 Details
GitHub Models DeepSeek-R1 64K 8K textreasoning 15 RPM, 150 RPD May 28, 2025 Details
GitHub Models Mistral-Small-3.1 128K 4K text 15 RPM, 150 RPD Mar 17, 2025 Details
Groq llama-3.3-70b-versatile 131K 32K text 30 RPM, 1,000 RPD Dec 6, 2024 Details
Groq llama-3.1-8b-instant 131K 131K text 30 RPM, 1,000 RPD Jul 23, 2024 Details
Groq llama-4-scout-17b-16e-instruct 131K 8K text 30 RPM, 1,000 RPD Details
Groq qwen3-32b 131K 131K text 30 RPM, 1,000 RPD Apr 28, 2025 Details
Hugging Face Meta-Llama-3.1-8B-Instruct 128K 4K text Credit-metered Jul 23, 2024 Details
Hugging Face Mistral-7B-Instruct-v0.3 32K 4K text Credit-metered Details
Hugging Face Mixtral-8x7B-Instruct-v0.1 32K 4K text Credit-metered Details
Hugging Face Phi-3.5-mini-instruct 128K 4K text Credit-metered Details
Hugging Face Qwen2.5-7B-Instruct 131K 4K text Credit-metered Oct 16, 2024 Details
Kilo Code x-ai/grok-code-fast-1:free 256K 131K textcode ~200 req/hr Aug 28, 2025 Details
Kilo Code minimax/minimax-m2.5:free 196K 8K text ~200 req/hr Feb 12, 2026 Details
Kilo Code bytedance-seed/dola-seed-2.0-pro:free 131K 131K text ~200 req/hr Details
Kilo Code nvidia/nemotron-3-super-120b-a12b:free 262K 32K text ~200 req/hr Mar 11, 2026 Details
Kilo Code arcee-ai/trinity-large-thinking:free 131K 131K textreasoning ~200 req/hr Apr 1, 2026 Details
LLM7.io deepseek-r1-0528 131K 131K textreasoning 30 RPM (120 with token) May 28, 2025 Details
LLM7.io deepseek-v3-0324 131K 131K text 30 RPM (120 with token) Mar 25, 2025 Details
LLM7.io gemini-2.5-flash-lite 131K 131K text 30 RPM (120 with token) Jun 17, 2025 Details
LLM7.io gpt-4o-mini 131K 131K text 30 RPM (120 with token) Jul 18, 2024 Details
LLM7.io mistral-small-3.1-24b 32K 131K text 30 RPM (120 with token) Mar 17, 2025 Details
LLM7.io qwen2.5-coder-32b 131K 131K textcode 30 RPM (120 with token) Nov 11, 2024 Details
ModelScope Qwen/Qwen3.5-35B-A3B 131K 131K text 2,000 RPD total; <=500 RPD/model (dynamic) Feb 24, 2026 Details
ModelScope Qwen/Qwen3.5-27B 131K 131K text 2,000 RPD total; <=500 RPD/model (dynamic) Feb 24, 2026 Details
Ollama Cloud gpt-oss:120b-cloud 128K 131K text Session/weekly limits (unpublished) Details
Ollama Cloud deepseek-v3.1:671b-cloud 128K 131K text Session/weekly limits (unpublished) Details
Ollama Cloud qwen3-coder:480b-cloud 128K 131K textcode Session/weekly limits (unpublished) Details
Ollama Cloud kimi-k2:1t-cloud 262K 131K text Session/weekly limits (unpublished) Details
Ollama Cloud glm-4.6:cloud 128K 131K text Session/weekly limits (unpublished) Details
Ollama Cloud deepseek-r1:cloud 128K 131K textreasoning Session/weekly limits (unpublished) Details
OVHcloud AI Endpoints Qwen3.5-397B-A17B 131K 32K text 2 RPM (anonymous) Feb 16, 2026 Details
OVHcloud AI Endpoints gpt-oss-20b 128K 8K text 2 RPM (anonymous) Aug 5, 2025 Details
OVHcloud AI Endpoints Meta-Llama-3_3-70B-Instruct 131K 4K text 2 RPM (anonymous) Dec 6, 2024 Details
OVHcloud AI Endpoints Llama-3.1-8B-Instruct 131K 4K text 2 RPM (anonymous) Jul 23, 2024 Details
OVHcloud AI Endpoints Qwen3.6-27B 131K 32K text 2 RPM (anonymous) Apr 22, 2026 Details
OVHcloud AI Endpoints Qwen3.5-9B 131K 8K text 2 RPM (anonymous) Mar 2, 2026 Details
OVHcloud AI Endpoints Qwen3-Coder-30B-A3B-Instruct 262K 32K textcode 2 RPM (anonymous) Jul 31, 2025 Details
OVHcloud AI Endpoints Qwen2.5-VL-72B-Instruct 128K 8K textimage 2 RPM (anonymous) Feb 1, 2025 Details
OVHcloud AI Endpoints Mistral-Small-3.2-24B-Instruct 128K 4K text 2 RPM (anonymous) Jun 20, 2025 Details
OVHcloud AI Endpoints Mistral-Nemo-Instruct-2407 128K 4K text 2 RPM (anonymous) Details
SambaNova DeepSeek-V3.1 128K 8K text 20 RPM, 20 RPD, 200K TPD Aug 21, 2025 Details
SambaNova DeepSeek-V3.2 (Preview) 128K 8K text 20 RPM, 20 RPD, 200K TPD Details
SambaNova MiniMax-M2.7 128K 8K text 20 RPM, 20 RPD, 200K TPD Mar 18, 2026 Details
SambaNova gemma-4-31B-it (Preview) 128K 8K text 20 RPM, 20 RPD, 200K TPD Details
SiliconFlow deepseek-ai/DeepSeek-R1-Distill-Qwen-7B 131K 131K textreasoning 30 RPM, 60K TPM Details
SiliconFlow Abbreviation 131K 8K text See provider page Details
NVIDIA NIM 01-ai/yi-large 131K 8K text Up to 40 RPM Details
NVIDIA NIM adept/fuyu-8b 131K 8K text Up to 40 RPM Details
NVIDIA NIM ai21labs/jamba-1.5-large-instruct 131K 8K text Up to 40 RPM Aug 22, 2024 Details
NVIDIA NIM aisingapore/sea-lion-7b-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM baai/bge-m3 131K 8K text Up to 40 RPM Details
NVIDIA NIM bigcode/starcoder2-15b 131K 8K text Up to 40 RPM Details
NVIDIA NIM databricks/dbrx-instruct 131K 8K text Up to 40 RPM Mar 27, 2024 Details
NVIDIA NIM deepseek-ai/deepseek-coder-6.7b-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM deepseek-ai/deepseek-v4-flash 1.0M 66K text Up to 40 RPM Apr 24, 2026 Details
NVIDIA NIM deepseek-ai/deepseek-v4-pro 1.0M 384K text Up to 40 RPM Apr 24, 2026 Details
NVIDIA NIM google/codegemma-1.1-7b 131K 8K text Up to 40 RPM Details
NVIDIA NIM google/codegemma-7b 131K 8K text Up to 40 RPM Details
NVIDIA NIM google/deplot 131K 8K text Up to 40 RPM Details
NVIDIA NIM google/gemma-2b 131K 8K text Up to 40 RPM Details
NVIDIA NIM google/recurrentgemma-2b 131K 8K text Up to 40 RPM Details
NVIDIA NIM ibm/granite-3.0-3b-a800m-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM ibm/granite-3.0-8b-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM ibm/granite-34b-code-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM ibm/granite-8b-code-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM meta/codellama-70b 131K 8K text Up to 40 RPM Details
NVIDIA NIM meta/llama-3.1-70b-instruct 131K 16K text Up to 40 RPM Jul 23, 2024 Details
NVIDIA NIM meta/llama-3.2-11b-vision-instruct 131K 16K textimage Up to 40 RPM Sep 25, 2024 Details
NVIDIA NIM meta/llama-3.2-1b-instruct 131K 60K text Up to 40 RPM Sep 25, 2024 Details
NVIDIA NIM meta/llama-3.2-3b-instruct 131K 8K text Up to 40 RPM Sep 25, 2024 Details
NVIDIA NIM meta/llama-guard-4-12b 164K 16K textimage Up to 40 RPM Apr 30, 2025 Details
NVIDIA NIM meta/llama2-70b 131K 8K text Up to 40 RPM Jul 18, 2023 Details
NVIDIA NIM microsoft/kosmos-2 131K 8K text Up to 40 RPM Details
NVIDIA NIM microsoft/phi-3-vision-128k-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM microsoft/phi-3.5-moe-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM microsoft/phi-4-multimodal-instruct 131K 8K text Up to 40 RPM Feb 26, 2025 Details
NVIDIA NIM minimaxai/minimax-m2.7 205K 131K text Up to 40 RPM Mar 18, 2026 Details
NVIDIA NIM minimaxai/minimax-m3 1.0M 512K textimage Up to 40 RPM Jun 1, 2026 Details
NVIDIA NIM mistralai/codestral-22b-instruct-v0.1 131K 8K text Up to 40 RPM Details
NVIDIA NIM mistralai/mistral-7b-instruct-v0.3 131K 8K text Up to 40 RPM Details
NVIDIA NIM mistralai/mistral-large-2-instruct 131K 8K text Up to 40 RPM Nov 18, 2024 Details
NVIDIA NIM mistralai/mixtral-8x22b-v0.1 131K 8K text Up to 40 RPM Details
NVIDIA NIM moonshotai/kimi-k2.6 262K 262K textimage Up to 40 RPM Apr 20, 2026 Details
NVIDIA NIM nv-mistralai/mistral-nemo-12b-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM nvidia/cosmos-reason2-8b 131K 8K textreasoning Up to 40 RPM Details
NVIDIA NIM nvidia/embed-qa-4 131K 8K textembedding Up to 40 RPM Details
NVIDIA NIM nvidia/llama-3.1-nemotron-51b-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM nvidia/llama-3.1-nemotron-70b-instruct 131K 8K text Up to 40 RPM Oct 15, 2024 Details
NVIDIA NIM nvidia/llama-3.1-nemotron-ultra-253b-v1 131K 8K text Up to 40 RPM Details
NVIDIA NIM nvidia/llama-3.2-nemoretriever-1b-vlm-embed-v1 131K 8K textembedding Up to 40 RPM Details
NVIDIA NIM nvidia/llama-3.2-nv-embedqa-1b-v1 131K 8K textembedding Up to 40 RPM Details
NVIDIA NIM nvidia/llama-3.3-nemotron-super-49b-v1.5 131K 16K text Up to 40 RPM Oct 10, 2025 Details
NVIDIA NIM nvidia/llama-nemotron-embed-1b-v2 131K 8K textembedding Up to 40 RPM Details
NVIDIA NIM nvidia/llama-nemotron-embed-vl-1b-v2 131K 8K textembedding Up to 40 RPM Details
NVIDIA NIM nvidia/llama3-chatqa-1.5-70b 131K 8K text Up to 40 RPM Details
NVIDIA NIM nvidia/mistral-nemo-minitron-8b-8k-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM nvidia/nemoretriever-parse 131K 8K text Up to 40 RPM Details
NVIDIA NIM nvidia/nemotron-3.5-content-safety 128K 8K textimage Up to 40 RPM Jun 4, 2026 Details
NVIDIA NIM nvidia/nemotron-4-340b-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM nvidia/nemotron-4-340b-reward 131K 8K text Up to 40 RPM Details
NVIDIA NIM nvidia/nemotron-nano-3-30b-a3b 131K 8K text Up to 40 RPM Details
NVIDIA NIM nvidia/nemotron-parse 131K 8K text Up to 40 RPM Details
NVIDIA NIM nvidia/neva-22b 131K 8K text Up to 40 RPM Details
NVIDIA NIM nvidia/nv-embed-v1 131K 8K textembedding Up to 40 RPM Details
NVIDIA NIM nvidia/nv-embedcode-7b-v1 131K 8K textembedding Up to 40 RPM Details
NVIDIA NIM nvidia/nv-embedqa-e5-v5 131K 8K textembedding Up to 40 RPM Details
NVIDIA NIM nvidia/nv-embedqa-mistral-7b-v2 131K 8K textembedding Up to 40 RPM Details
NVIDIA NIM nvidia/nvclip 131K 8K text Up to 40 RPM Details
NVIDIA NIM nvidia/riva-translate-4b-instruct 131K 8K text Up to 40 RPM Details
NVIDIA NIM nvidia/vila 131K 8K text Up to 40 RPM Details
NVIDIA NIM qwen/qwen3.5-122b-a10b 262K 262K textimage Up to 40 RPM Feb 24, 2026 Details
NVIDIA NIM qwen/qwen3.5-397b-a17b 256K 8K textimage Up to 40 RPM Feb 16, 2026 Details
NVIDIA NIM snowflake/arctic-embed-l 131K 8K textembedding Up to 40 RPM Details
NVIDIA NIM stepfun-ai/step-3.5-flash 262K 16K text Up to 40 RPM Feb 2, 2026 Details
NVIDIA NIM stepfun-ai/step-3.7-flash 256K 256K textimage Up to 40 RPM May 29, 2026 Details
NVIDIA NIM writer/palmyra-creative-122b 131K 8K text Up to 40 RPM Details
NVIDIA NIM writer/palmyra-fin-70b-32k 131K 8K text Up to 40 RPM Details
NVIDIA NIM writer/palmyra-med-70b 131K 8K text Up to 40 RPM Details
NVIDIA NIM writer/palmyra-med-70b-32k 131K 8K text Up to 40 RPM Details
NVIDIA NIM z-ai/glm-5.1 203K 8K text Up to 40 RPM Apr 7, 2026 Details
NVIDIA NIM zyphra/zamba2-7b-instruct 131K 8K text Up to 40 RPM Details
AI21 Labs Jamba Large 1.7 256K 4K text 200 RPM, 10 RPS Aug 8, 2025 Details
AI21 Labs Jamba Mini 2 256K 4K text 200 RPM, 10 RPS Details
Aion Labs aion-1.0 131K 32K text Daily token allowance Feb 4, 2025 Details
Aion Labs aion-1.0-mini 131K 32K text Daily token allowance Feb 4, 2025 Details
Alibaba Cloud Model Studio Qwen3-Max 128K 32K text Tiered by region Sep 23, 2025 Details
Alibaba Cloud Model Studio Qwen3-Plus 1.0M 32K text Tiered by region Details
Alibaba Cloud Model Studio Qwen3-VL-Plus 128K 8K textimage Tiered by region Details
Alibaba Cloud Model Studio Qwen3-Coder-Plus 256K 8K textcode Tiered by region Sep 23, 2025 Details
Alibaba Cloud Model Studio QwQ-Plus 131K 32K text Tiered by region Details
Cohere Embed 4 131K 131K textembedding 2,000 inputs/min Details
Cohere Rerank 3.5 131K 131K textrerank 10 RPM Details
DeepSeek deepseek-chat (V3.2) 128K 8K text Dynamic Dec 1, 2025 Details
DeepSeek deepseek-reasoner (R1) 128K 8K textreasoning Dynamic Details
Google Gemini Gemini 3 Flash (Preview) 1.0M 65K text Preview limits Details
Mistral AI Mistral Medium 3 128K 128K text ~1 RPS, 500K TPM May 7, 2025 Details
xAI grok-4.3 1.0M 32K text Credit-based Apr 30, 2026 Details
xAI grok-4.1-fast 2.0M 32K text Credit-based Nov 19, 2025 Details
xAI grok-3-mini 131K 8K text Credit-based Details
Z AI (Zhipu AI) GLM-4.5-Flash 128K 8K text 1 concurrent request Details
Cerebras llama-3.3-70b 128K 8K text 30 RPM, 14,400 RPD, 1M TPD Dec 6, 2024 Details
Cerebras qwen-3-235b-a22b-instruct-2507 131K 8K text 30 RPM, 14,400 RPD, 1M TPD Apr 28, 2025 Details
Cerebras qwen-3-32b 131K 8K text 30 RPM, 14,400 RPD, 1M TPD Apr 28, 2025 Details
Cloudflare Workers AI @cf/meta/llama-3.1-8b-instruct-fp8-fast 131K 131K text 10K neurons/day (shared) Jul 23, 2024 Details
Cloudflare Workers AI @cf/meta/llama-3.2-11b-vision-instruct 131K 131K textimage 10K neurons/day (shared) Sep 25, 2024 Details
Cloudflare Workers AI @cf/moonshotai/kimi-k2.5 256K 131K text 10K neurons/day (shared) Details
Groq llama-4-maverick-17b-128e-instruct 131K 8K text 15 RPM, 500 RPD Details
Groq kimi-k2-instruct 262K 262K text 30 RPM, 14,400 RPD Sep 5, 2025 Details
Groq deepseek-r1-distill-70b 131K 8K textreasoning 30 RPM, 14,400 RPD Details
Groq whisper-large-v3 131K 131K text 20 RPM, 2,000 RPD Details
Groq whisper-large-v3-turbo 131K 131K text 20 RPM, 2,000 RPD Details
ModelScope Qwen/Qwen-Image 131K 131K text 2,000 RPD total; model/AIGC-specific caps Details
Nebius Qwen3-235B-A22B 128K 32K text Tier-based Apr 28, 2025 Details
Nscale Llama-3.3-70B-Instruct 128K 8K text Fair-use Dec 6, 2024 Details
Nscale DeepSeek-R1-Distill-Llama-70B 128K 32K textreasoning Fair-use Jan 20, 2025 Details
OVHcloud AI Endpoints Qwen3Guard-Gen-8B 32K 4K text 2 RPM (anonymous) Details
OVHcloud AI Endpoints Qwen3Guard-Gen-0.6B 32K 4K text 2 RPM (anonymous) Details
SiliconFlow deepseek-ai/DeepSeek-OCR 131K 8K text 30 RPM, 60K TPM Details
OpenRouter Baidu Qianfan: CoBuddy 131K 65K textcode 200 req/day (free tier) Details
OpenRouter NVIDIA: Llama Nemotron Embed VL 1B V2 (free) 131K 8K textimageembedding 200 req/day (free tier) Feb 25, 2026 Details
OpenRouter NVIDIA: Llama Nemotron Rerank VL 1B V2 (free) 10K 8K textimagererank 200 req/day (free tier) Jun 9, 2026 Details