Best Free LLM APIs for Chat

234 free models available for chat. How to choose a free LLM for chat →

Coding Chat Vision Audio Reasoning Embedding

For general conversation, look for low latency, strong instruction following, and a helpful personality. Gemini 2.5 Flash offers the largest free context window (1M tokens) with multimodal support. Llama 3.3 70B via Groq delivers the fastest tokens-per-second. Qwen3.5 models on NVIDIA NIM strike a balance of quality and speed.

What to Look for in a Chat Model

Chat models are the most common type of LLM, but they vary significantly in quality for conversation use:

Latency / tokens per second — Real-time conversation needs fast responses. Groq's LPU hardware delivers the fastest inference (Llama 3.3 70B hits 100+ tok/s). NVIDIA NIM and OpenRouter are slower but offer more model variety.
Context window — Long conversations or document Q&A need a large context window. Gemini 2.5 Flash (1M ctx) can hold an entire book in memory. Most chat models have 32K–128K, which handles typical back-and-forth conversations easily.
Instruction following — A good chat model stays on-topic, follows system prompts, and avoids hallucinating. Llama 3.3 70B and Qwen3 are known for strong instruction adherence.
Multilingual support — If you chat in non-English languages, check the model's training data. Qwen3 has strong Chinese/English bilingual performance. Gemini and Llama support 30+ languages.
Multimodal input — Want to share images or audio in chat? Gemini 2.5 Flash accepts text, image, audio, and video. Most chat models are text-only.

How to Choose a Free Chat Model

Match the model to your chat use case:

Casual conversation / chatbot? → Prioritize latency and personality. Llama 3.3 70B via Groq (fastest) or Gemini 2.5 Flash via Google AI Studio (most capable).
Long-form Q&A / document chat? → Maximize context window. Gemini 2.5 Flash (1M) or Qwen3.5 122B (262K via NVIDIA NIM).
Multilingual chat? → Qwen3.5 excels in Chinese-English. Gemini supports 30+ languages. Llama covers major European and Asian languages.
Roleplay / creative conversation? → Look for models with strong creative writing. Llama 3.3 70B and Mistral models tend to have more varied output styles.
Customer support bot? → Instruction following and safety are critical. Gemini and Qwen3 are well-aligned. Avoid unmoderated open models unless you add guardrails.

Top Picks for Chat

Google: Gemini 2.5 Flash Google

1M context, multimodal, free tier with 10 RPM / 250 RPD. Best all-round chat model.

Meta: Llama 3.3 70B Instruct Groq

Fastest inference via Groq LPU, strong instruction following, no credit card.

Qwen: Qwen3.5 122B A10B NVIDIA NIM

262K context, strong bilingual (Chinese-English), 40 RPM with no daily cap.

NVIDIA: Nemotron 3 Super (free) OpenRouter

262K context, strong reasoning, solid chat performance.

All Free Chat Models

Provider	Model	Context	Max Output	Modality	Rate Limit	Released
OpenRouter	Cohere: North Mini Code (free)	256K	64K	textcode	200 req/day (free tier)	Jun 9, 2026	Details
OpenRouter	Nex AGI: Nex-N2-Pro (free)	262K	262K	textimage	200 req/day (free tier)	Jun 8, 2026	Details
OpenRouter	NVIDIA: Nemotron 3.5 Content Safety (free)	128K	8K	textimage	200 req/day (free tier)	Jun 4, 2026	Details
OpenRouter	NVIDIA: Nemotron 3 Ultra (free)	1.0M	66K	text	200 req/day (free tier)	Jun 4, 2026	Details
OpenRouter	MiniMax: MiniMax M3	1.0M	512K	textimage	200 req/day (free tier)	Jun 1, 2026	Details
OpenRouter	inclusionAI: Ring-2.6-1T	262K	66K	text	200 req/day (free tier)	May 8, 2026	Details
OpenRouter	Owl Alpha	1.0M	262K	text	200 req/day (free tier)	Apr 28, 2026	Details
OpenRouter	NVIDIA: Nemotron 3 Nano Omni (free)	256K	66K	textimageaudioreasoning	200 req/day (free tier)	Apr 28, 2026	Details
OpenRouter	Poolside: Laguna XS.2 (free)	262K	33K	text	200 req/day (free tier)	Apr 28, 2026	Details
OpenRouter	Poolside: Laguna M.1 (free)	262K	33K	text	200 req/day (free tier)	Apr 28, 2026	Details
OpenRouter	DeepSeek: DeepSeek V4 Flash	1.0M	66K	text	200 req/day (free tier)	Apr 24, 2026	Details
OpenRouter	MoonshotAI: Kimi K2.6	262K	262K	textimage	200 req/day (free tier)	Apr 20, 2026	Details
OpenRouter	Z.ai: GLM 5.1	203K	8K	text	200 req/day (free tier)	Apr 7, 2026	Details
OpenRouter	Google: Gemma 4 26B A4B (free)	262K	33K	textimage	200 req/day (free tier)	Apr 2, 2026	Details
OpenRouter	Google: Gemma 4 31B (free)	262K	8K	textimage	200 req/day (free tier)	Apr 2, 2026	Details
OpenRouter	Arcee AI: Trinity Large Thinking	262K	262K	textreasoning	200 req/day (free tier)	Apr 1, 2026	Details
OpenRouter	Google: Lyria 3 Pro Preview	1.0M	66K	textimage	200 req/day (free tier)	Mar 30, 2026	Details
OpenRouter	Google: Lyria 3 Clip Preview	1.0M	66K	textimage	200 req/day (free tier)	Mar 30, 2026	Details
OpenRouter	NVIDIA: Nemotron 3 Super (free)	1.0M	262K	text	200 req/day (free tier)	Mar 11, 2026	Details
OpenRouter	MiniMax: MiniMax M2.5	205K	197K	text	200 req/day (free tier)	Feb 12, 2026	Details
OpenRouter	Free Models Router	200K	8K	textimage	200 req/day (free tier)	Feb 1, 2026	Details
OpenRouter	LiquidAI: LFM2.5-1.2B-Thinking (free)	33K	8K	textreasoning	200 req/day (free tier)	Jan 20, 2026	Details
OpenRouter	LiquidAI: LFM2.5-1.2B-Instruct (free)	33K	8K	text	200 req/day (free tier)	Jan 5, 2026	Details
OpenRouter	NVIDIA: Nemotron 3 Nano 30B A3B (free)	256K	8K	text	200 req/day (free tier)	Dec 14, 2025	Details
OpenRouter	OpenAI: gpt-oss-safeguard-20b	131K	66K	text	200 req/day (free tier)	Oct 29, 2025	Details
OpenRouter	NVIDIA: Nemotron Nano 12B 2 VL (free)	128K	128K	textimage	200 req/day (free tier)	Oct 28, 2025	Details
OpenRouter	Qwen: Qwen3 Next 80B A3B Instruct (free)	262K	8K	text	200 req/day (free tier)	Sep 11, 2025	Details
OpenRouter	NVIDIA: Nemotron Nano 9B V2 (free)	128K	8K	text	200 req/day (free tier)	Sep 5, 2025	Details
OpenRouter	OpenAI: gpt-oss-120b (free)	131K	131K	text	200 req/day (free tier)	Aug 5, 2025	Details
OpenRouter	OpenAI: gpt-oss-20b (free)	131K	33K	text	200 req/day (free tier)	Aug 5, 2025	Details
OpenRouter	Z.ai: GLM 4.5 Air	131K	98K	text	200 req/day (free tier)	Jul 28, 2025	Details
OpenRouter	Qwen: Qwen3 Coder 480B A35B (free)	1.0M	262K	textcode	200 req/day (free tier)	Jul 23, 2025	Details
OpenRouter	Venice: Uncensored (free)	33K	8K	text	200 req/day (free tier)	Jul 9, 2025	Details
OpenRouter	Meta: Llama 3.3 70B Instruct (free)	131K	8K	text	200 req/day (free tier)	Dec 6, 2024	Details
OpenRouter	Meta: Llama 3.2 3B Instruct (free)	131K	8K	text	200 req/day (free tier)	Sep 25, 2024	Details
OpenRouter	Nous: Hermes 3 405B Instruct (free)	131K	8K	text	200 req/day (free tier)	Aug 16, 2024	Details
Aion Labs	Aion 2.5	128K	32K	text	15 RPM, 20K TPD	—	Details
Aion Labs	Aion 2.0	128K	32K	text	15 RPM, 20K TPD	Feb 23, 2026	Details
Aion Labs	Aion-RP 1.0 (8B)	32K	8K	text	15 RPM, 20K TPD	—	Details
Cohere	Command A+ (218B)	128K	4K	text	20 RPM	—	Details
Cohere	Command A (111B)	256K	4K	text	20 RPM	—	Details
Cohere	Command R+	128K	4K	text	20 RPM	—	Details
Cohere	Command R7B	128K	4K	text	20 RPM	—	Details
Google Gemini	Gemini 3.5 Flash	1.0M	64K	text	15 RPM, 1,500 RPD	May 19, 2026	Details
Google Gemini	Gemini 3.1 Flash-Lite	1.0M	65K	text	30 RPM, 1,500 RPD	Mar 3, 2026	Details
Google Gemini	Gemini 2.5 Flash	1.0M	65K	text	15 RPM, 1,500 RPD	May 20, 2025	Details
Google Gemini	Gemini 2.5 Pro	2.0M	65K	text	5 RPM, 50 RPD	Jun 5, 2025	Details
Mistral AI	Mistral Medium 3.5 (128B)	256K	256K	text	~1 RPS, 500K TPM	—	Details
Mistral AI	Mistral Small 4	256K	256K	text	~1 RPS, 500K TPM	Mar 16, 2026	Details
Mistral AI	Mistral Large 3	256K	256K	text	~1 RPS, 500K TPM	Dec 2, 2025	Details
Mistral AI	Mistral Nemo (12B)	128K	128K	text	~1 RPS, 500K TPM	—	Details
Mistral AI	Codestral	256K	256K	textcode	~1 RPS, 500K TPM	—	Details
Mistral AI	Pixtral Large	128K	128K	textimage	~1 RPS, 500K TPM	Nov 18, 2024	Details
Z AI (Zhipu AI)	GLM-4.7-Flash	200K	128K	text	1 concurrent request	Jan 19, 2026	Details
Z AI (Zhipu AI)	GLM-4.6V-Flash	128K	4K	text	1 concurrent request	—	Details
Cerebras	gpt-oss-120b	128K	8K	text	30 RPM, 14,400 RPD, 1M TPD	Aug 5, 2025	Details
Cerebras	zai-glm-4.7	128K	8K	text	10 RPM, 100 RPD, 1M TPD	—	Details
Cloudflare Workers AI	@cf/meta/llama-3.3-70b-instruct-fp8-fast	131K	131K	text	10K neurons/day (shared)	Dec 6, 2024	Details
Cloudflare Workers AI	@cf/meta/llama-4-scout-17b-16e-instruct	10.0M	131K	text	10K neurons/day (shared)	—	Details
Cloudflare Workers AI	@cf/openai/gpt-oss-120b	128K	131K	text	10K neurons/day (shared)	—	Details
Cloudflare Workers AI	@cf/moonshotai/kimi-k2.7-code	262K	131K	textcode	10K neurons/day (shared)	—	Details
Cloudflare Workers AI	@cf/google/gemma-4-26b-a4b-it	256K	131K	text	10K neurons/day (shared)	Apr 2, 2026	Details
Cloudflare Workers AI	@cf/zhipuai/glm-4.7-flash	131K	131K	text	10K neurons/day (shared)	—	Details
Cloudflare Workers AI	@cf/mistralai/mistral-small-3.1-24b-instruct	128K	131K	text	10K neurons/day (shared)	Mar 17, 2025	Details
Cloudflare Workers AI	@cf/deepseek-ai/deepseek-r1-distill-qwen-32b	32K	131K	textreasoning	10K neurons/day (shared)	Jan 20, 2025	Details
GitHub Models	gpt-5	200K	32K	text	10 RPM, 50 RPD	Aug 7, 2025	Details
GitHub Models	gpt-4.1	1.0M	32K	text	10 RPM, 50 RPD	Apr 14, 2025	Details
GitHub Models	gpt-4.1-mini	1.0M	32K	text	15 RPM, 150 RPD	Apr 14, 2025	Details
GitHub Models	gpt-4o	128K	16K	text	10 RPM, 50 RPD	May 13, 2024	Details
GitHub Models	o4-mini	200K	100K	text	10 RPM, 50 RPD	Apr 16, 2025	Details
GitHub Models	Llama-4-Scout-17B-16E	512K	4K	text	15 RPM, 150 RPD	—	Details
GitHub Models	Llama-4-Maverick-17B-128E	256K	4K	text	10 RPM, 50 RPD	—	Details
GitHub Models	Meta-Llama-3.3-70B	131K	4K	text	15 RPM, 150 RPD	Dec 6, 2024	Details
GitHub Models	DeepSeek-R1	64K	8K	textreasoning	15 RPM, 150 RPD	May 28, 2025	Details
GitHub Models	Mistral-Small-3.1	128K	4K	text	15 RPM, 150 RPD	Mar 17, 2025	Details
Groq	llama-3.3-70b-versatile	131K	32K	text	30 RPM, 1,000 RPD	Dec 6, 2024	Details
Groq	llama-3.1-8b-instant	131K	131K	text	30 RPM, 1,000 RPD	Jul 23, 2024	Details
Groq	llama-4-scout-17b-16e-instruct	131K	8K	text	30 RPM, 1,000 RPD	—	Details
Groq	qwen3-32b	131K	131K	text	30 RPM, 1,000 RPD	Apr 28, 2025	Details
Hugging Face	Meta-Llama-3.1-8B-Instruct	128K	4K	text	Credit-metered	Jul 23, 2024	Details
Hugging Face	Mistral-7B-Instruct-v0.3	32K	4K	text	Credit-metered	—	Details
Hugging Face	Mixtral-8x7B-Instruct-v0.1	32K	4K	text	Credit-metered	—	Details
Hugging Face	Phi-3.5-mini-instruct	128K	4K	text	Credit-metered	—	Details
Hugging Face	Qwen2.5-7B-Instruct	131K	4K	text	Credit-metered	Oct 16, 2024	Details
Kilo Code	x-ai/grok-code-fast-1:free	256K	131K	textcode	~200 req/hr	Aug 28, 2025	Details
Kilo Code	minimax/minimax-m2.5:free	196K	8K	text	~200 req/hr	Feb 12, 2026	Details
Kilo Code	bytedance-seed/dola-seed-2.0-pro:free	131K	131K	text	~200 req/hr	—	Details
Kilo Code	nvidia/nemotron-3-super-120b-a12b:free	262K	32K	text	~200 req/hr	Mar 11, 2026	Details
Kilo Code	arcee-ai/trinity-large-thinking:free	131K	131K	textreasoning	~200 req/hr	Apr 1, 2026	Details
LLM7.io	deepseek-r1-0528	131K	131K	textreasoning	30 RPM (120 with token)	May 28, 2025	Details
LLM7.io	deepseek-v3-0324	131K	131K	text	30 RPM (120 with token)	Mar 25, 2025	Details
LLM7.io	gemini-2.5-flash-lite	131K	131K	text	30 RPM (120 with token)	Jun 17, 2025	Details
LLM7.io	gpt-4o-mini	131K	131K	text	30 RPM (120 with token)	Jul 18, 2024	Details
LLM7.io	mistral-small-3.1-24b	32K	131K	text	30 RPM (120 with token)	Mar 17, 2025	Details
LLM7.io	qwen2.5-coder-32b	131K	131K	textcode	30 RPM (120 with token)	Nov 11, 2024	Details
ModelScope	Qwen/Qwen3.5-35B-A3B	131K	131K	text	2,000 RPD total; <=500 RPD/model (dynamic)	Feb 24, 2026	Details
ModelScope	Qwen/Qwen3.5-27B	131K	131K	text	2,000 RPD total; <=500 RPD/model (dynamic)	Feb 24, 2026	Details
Ollama Cloud	gpt-oss:120b-cloud	128K	131K	text	Session/weekly limits (unpublished)	—	Details
Ollama Cloud	deepseek-v3.1:671b-cloud	128K	131K	text	Session/weekly limits (unpublished)	—	Details
Ollama Cloud	qwen3-coder:480b-cloud	128K	131K	textcode	Session/weekly limits (unpublished)	—	Details
Ollama Cloud	kimi-k2:1t-cloud	262K	131K	text	Session/weekly limits (unpublished)	—	Details
Ollama Cloud	glm-4.6:cloud	128K	131K	text	Session/weekly limits (unpublished)	—	Details
Ollama Cloud	deepseek-r1:cloud	128K	131K	textreasoning	Session/weekly limits (unpublished)	—	Details
OVHcloud AI Endpoints	Qwen3.5-397B-A17B	131K	32K	text	2 RPM (anonymous)	Feb 16, 2026	Details
OVHcloud AI Endpoints	gpt-oss-20b	128K	8K	text	2 RPM (anonymous)	Aug 5, 2025	Details
OVHcloud AI Endpoints	Meta-Llama-3_3-70B-Instruct	131K	4K	text	2 RPM (anonymous)	Dec 6, 2024	Details
OVHcloud AI Endpoints	Llama-3.1-8B-Instruct	131K	4K	text	2 RPM (anonymous)	Jul 23, 2024	Details
OVHcloud AI Endpoints	Qwen3.6-27B	131K	32K	text	2 RPM (anonymous)	Apr 22, 2026	Details
OVHcloud AI Endpoints	Qwen3.5-9B	131K	8K	text	2 RPM (anonymous)	Mar 2, 2026	Details
OVHcloud AI Endpoints	Qwen3-Coder-30B-A3B-Instruct	262K	32K	textcode	2 RPM (anonymous)	Jul 31, 2025	Details
OVHcloud AI Endpoints	Qwen2.5-VL-72B-Instruct	128K	8K	textimage	2 RPM (anonymous)	Feb 1, 2025	Details
OVHcloud AI Endpoints	Mistral-Small-3.2-24B-Instruct	128K	4K	text	2 RPM (anonymous)	Jun 20, 2025	Details
OVHcloud AI Endpoints	Mistral-Nemo-Instruct-2407	128K	4K	text	2 RPM (anonymous)	—	Details
SambaNova	DeepSeek-V3.1	128K	8K	text	20 RPM, 20 RPD, 200K TPD	Aug 21, 2025	Details
SambaNova	DeepSeek-V3.2 (Preview)	128K	8K	text	20 RPM, 20 RPD, 200K TPD	—	Details
SambaNova	MiniMax-M2.7	128K	8K	text	20 RPM, 20 RPD, 200K TPD	Mar 18, 2026	Details
SambaNova	gemma-4-31B-it (Preview)	128K	8K	text	20 RPM, 20 RPD, 200K TPD	—	Details
SiliconFlow	deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	131K	131K	textreasoning	30 RPM, 60K TPM	—	Details
SiliconFlow	Abbreviation	131K	8K	text	See provider page	—	Details
NVIDIA NIM	01-ai/yi-large	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	adept/fuyu-8b	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	ai21labs/jamba-1.5-large-instruct	131K	8K	text	Up to 40 RPM	Aug 22, 2024	Details
NVIDIA NIM	aisingapore/sea-lion-7b-instruct	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	baai/bge-m3	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	bigcode/starcoder2-15b	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	databricks/dbrx-instruct	131K	8K	text	Up to 40 RPM	Mar 27, 2024	Details
NVIDIA NIM	deepseek-ai/deepseek-coder-6.7b-instruct	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	deepseek-ai/deepseek-v4-flash	1.0M	66K	text	Up to 40 RPM	Apr 24, 2026	Details
NVIDIA NIM	deepseek-ai/deepseek-v4-pro	1.0M	384K	text	Up to 40 RPM	Apr 24, 2026	Details
NVIDIA NIM	google/codegemma-1.1-7b	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	google/codegemma-7b	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	google/deplot	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	google/gemma-2b	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	google/recurrentgemma-2b	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	ibm/granite-3.0-3b-a800m-instruct	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	ibm/granite-3.0-8b-instruct	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	ibm/granite-34b-code-instruct	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	ibm/granite-8b-code-instruct	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	meta/codellama-70b	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	meta/llama-3.1-70b-instruct	131K	16K	text	Up to 40 RPM	Jul 23, 2024	Details
NVIDIA NIM	meta/llama-3.2-11b-vision-instruct	131K	16K	textimage	Up to 40 RPM	Sep 25, 2024	Details
NVIDIA NIM	meta/llama-3.2-1b-instruct	131K	60K	text	Up to 40 RPM	Sep 25, 2024	Details
NVIDIA NIM	meta/llama-3.2-3b-instruct	131K	8K	text	Up to 40 RPM	Sep 25, 2024	Details
NVIDIA NIM	meta/llama-guard-4-12b	164K	16K	textimage	Up to 40 RPM	Apr 30, 2025	Details
NVIDIA NIM	meta/llama2-70b	131K	8K	text	Up to 40 RPM	Jul 18, 2023	Details
NVIDIA NIM	microsoft/kosmos-2	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	microsoft/phi-3-vision-128k-instruct	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	microsoft/phi-3.5-moe-instruct	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	microsoft/phi-4-multimodal-instruct	131K	8K	text	Up to 40 RPM	Feb 26, 2025	Details
NVIDIA NIM	minimaxai/minimax-m2.7	205K	131K	text	Up to 40 RPM	Mar 18, 2026	Details
NVIDIA NIM	minimaxai/minimax-m3	1.0M	512K	textimage	Up to 40 RPM	Jun 1, 2026	Details
NVIDIA NIM	mistralai/codestral-22b-instruct-v0.1	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	mistralai/mistral-7b-instruct-v0.3	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	mistralai/mistral-large-2-instruct	131K	8K	text	Up to 40 RPM	Nov 18, 2024	Details
NVIDIA NIM	mistralai/mixtral-8x22b-v0.1	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	moonshotai/kimi-k2.6	262K	262K	textimage	Up to 40 RPM	Apr 20, 2026	Details
NVIDIA NIM	nv-mistralai/mistral-nemo-12b-instruct	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	nvidia/cosmos-reason2-8b	131K	8K	textreasoning	Up to 40 RPM	—	Details
NVIDIA NIM	nvidia/embed-qa-4	131K	8K	textembedding	Up to 40 RPM	—	Details
NVIDIA NIM	nvidia/llama-3.1-nemotron-51b-instruct	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	nvidia/llama-3.1-nemotron-70b-instruct	131K	8K	text	Up to 40 RPM	Oct 15, 2024	Details
NVIDIA NIM	nvidia/llama-3.1-nemotron-ultra-253b-v1	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	nvidia/llama-3.2-nemoretriever-1b-vlm-embed-v1	131K	8K	textembedding	Up to 40 RPM	—	Details
NVIDIA NIM	nvidia/llama-3.2-nv-embedqa-1b-v1	131K	8K	textembedding	Up to 40 RPM	—	Details
NVIDIA NIM	nvidia/llama-3.3-nemotron-super-49b-v1.5	131K	16K	text	Up to 40 RPM	Oct 10, 2025	Details
NVIDIA NIM	nvidia/llama-nemotron-embed-1b-v2	131K	8K	textembedding	Up to 40 RPM	—	Details
NVIDIA NIM	nvidia/llama-nemotron-embed-vl-1b-v2	131K	8K	textembedding	Up to 40 RPM	—	Details
NVIDIA NIM	nvidia/llama3-chatqa-1.5-70b	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	nvidia/mistral-nemo-minitron-8b-8k-instruct	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	nvidia/nemoretriever-parse	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	nvidia/nemotron-3.5-content-safety	128K	8K	textimage	Up to 40 RPM	Jun 4, 2026	Details
NVIDIA NIM	nvidia/nemotron-4-340b-instruct	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	nvidia/nemotron-4-340b-reward	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	nvidia/nemotron-nano-3-30b-a3b	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	nvidia/nemotron-parse	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	nvidia/neva-22b	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	nvidia/nv-embed-v1	131K	8K	textembedding	Up to 40 RPM	—	Details
NVIDIA NIM	nvidia/nv-embedcode-7b-v1	131K	8K	textembedding	Up to 40 RPM	—	Details
NVIDIA NIM	nvidia/nv-embedqa-e5-v5	131K	8K	textembedding	Up to 40 RPM	—	Details
NVIDIA NIM	nvidia/nv-embedqa-mistral-7b-v2	131K	8K	textembedding	Up to 40 RPM	—	Details
NVIDIA NIM	nvidia/nvclip	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	nvidia/riva-translate-4b-instruct	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	nvidia/vila	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	qwen/qwen3.5-122b-a10b	262K	262K	textimage	Up to 40 RPM	Feb 24, 2026	Details
NVIDIA NIM	qwen/qwen3.5-397b-a17b	256K	8K	textimage	Up to 40 RPM	Feb 16, 2026	Details
NVIDIA NIM	snowflake/arctic-embed-l	131K	8K	textembedding	Up to 40 RPM	—	Details
NVIDIA NIM	stepfun-ai/step-3.5-flash	262K	16K	text	Up to 40 RPM	Feb 2, 2026	Details
NVIDIA NIM	stepfun-ai/step-3.7-flash	256K	256K	textimage	Up to 40 RPM	May 29, 2026	Details
NVIDIA NIM	writer/palmyra-creative-122b	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	writer/palmyra-fin-70b-32k	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	writer/palmyra-med-70b	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	writer/palmyra-med-70b-32k	131K	8K	text	Up to 40 RPM	—	Details
NVIDIA NIM	z-ai/glm-5.1	203K	8K	text	Up to 40 RPM	Apr 7, 2026	Details
NVIDIA NIM	zyphra/zamba2-7b-instruct	131K	8K	text	Up to 40 RPM	—	Details
AI21 Labs	Jamba Large 1.7	256K	4K	text	200 RPM, 10 RPS	Aug 8, 2025	Details
AI21 Labs	Jamba Mini 2	256K	4K	text	200 RPM, 10 RPS	—	Details
Aion Labs	aion-1.0	131K	32K	text	Daily token allowance	Feb 4, 2025	Details
Aion Labs	aion-1.0-mini	131K	32K	text	Daily token allowance	Feb 4, 2025	Details
Alibaba Cloud Model Studio	Qwen3-Max	128K	32K	text	Tiered by region	Sep 23, 2025	Details
Alibaba Cloud Model Studio	Qwen3-Plus	1.0M	32K	text	Tiered by region	—	Details
Alibaba Cloud Model Studio	Qwen3-VL-Plus	128K	8K	textimage	Tiered by region	—	Details
Alibaba Cloud Model Studio	Qwen3-Coder-Plus	256K	8K	textcode	Tiered by region	Sep 23, 2025	Details
Alibaba Cloud Model Studio	QwQ-Plus	131K	32K	text	Tiered by region	—	Details
Cohere	Embed 4	131K	131K	textembedding	2,000 inputs/min	—	Details
Cohere	Rerank 3.5	131K	131K	textrerank	10 RPM	—	Details
DeepSeek	deepseek-chat (V3.2)	128K	8K	text	Dynamic	Dec 1, 2025	Details
DeepSeek	deepseek-reasoner (R1)	128K	8K	textreasoning	Dynamic	—	Details
Google Gemini	Gemini 3 Flash (Preview)	1.0M	65K	text	Preview limits	—	Details
Mistral AI	Mistral Medium 3	128K	128K	text	~1 RPS, 500K TPM	May 7, 2025	Details
xAI	grok-4.3	1.0M	32K	text	Credit-based	Apr 30, 2026	Details
xAI	grok-4.1-fast	2.0M	32K	text	Credit-based	Nov 19, 2025	Details
xAI	grok-3-mini	131K	8K	text	Credit-based	—	Details
Z AI (Zhipu AI)	GLM-4.5-Flash	128K	8K	text	1 concurrent request	—	Details
Cerebras	llama-3.3-70b	128K	8K	text	30 RPM, 14,400 RPD, 1M TPD	Dec 6, 2024	Details
Cerebras	qwen-3-235b-a22b-instruct-2507	131K	8K	text	30 RPM, 14,400 RPD, 1M TPD	Apr 28, 2025	Details
Cerebras	qwen-3-32b	131K	8K	text	30 RPM, 14,400 RPD, 1M TPD	Apr 28, 2025	Details
Cloudflare Workers AI	@cf/meta/llama-3.1-8b-instruct-fp8-fast	131K	131K	text	10K neurons/day (shared)	Jul 23, 2024	Details
Cloudflare Workers AI	@cf/meta/llama-3.2-11b-vision-instruct	131K	131K	textimage	10K neurons/day (shared)	Sep 25, 2024	Details
Cloudflare Workers AI	@cf/moonshotai/kimi-k2.5	256K	131K	text	10K neurons/day (shared)	—	Details
Groq	llama-4-maverick-17b-128e-instruct	131K	8K	text	15 RPM, 500 RPD	—	Details
Groq	kimi-k2-instruct	262K	262K	text	30 RPM, 14,400 RPD	Sep 5, 2025	Details
Groq	deepseek-r1-distill-70b	131K	8K	textreasoning	30 RPM, 14,400 RPD	—	Details
Groq	whisper-large-v3	131K	131K	text	20 RPM, 2,000 RPD	—	Details
Groq	whisper-large-v3-turbo	131K	131K	text	20 RPM, 2,000 RPD	—	Details
ModelScope	Qwen/Qwen-Image	131K	131K	text	2,000 RPD total; model/AIGC-specific caps	—	Details
Nebius	Qwen3-235B-A22B	128K	32K	text	Tier-based	Apr 28, 2025	Details
Nscale	Llama-3.3-70B-Instruct	128K	8K	text	Fair-use	Dec 6, 2024	Details
Nscale	DeepSeek-R1-Distill-Llama-70B	128K	32K	textreasoning	Fair-use	Jan 20, 2025	Details
OVHcloud AI Endpoints	Qwen3Guard-Gen-8B	32K	4K	text	2 RPM (anonymous)	—	Details
OVHcloud AI Endpoints	Qwen3Guard-Gen-0.6B	32K	4K	text	2 RPM (anonymous)	—	Details
SiliconFlow	deepseek-ai/DeepSeek-OCR	131K	8K	text	30 RPM, 60K TPM	—	Details
OpenRouter	Baidu Qianfan: CoBuddy	131K	65K	textcode	200 req/day (free tier)	—	Details
OpenRouter	NVIDIA: Llama Nemotron Embed VL 1B V2 (free)	131K	8K	textimageembedding	200 req/day (free tier)	Feb 25, 2026	Details
OpenRouter	NVIDIA: Llama Nemotron Rerank VL 1B V2 (free)	10K	8K	textimagererank	200 req/day (free tier)	Jun 9, 2026	Details

See our FAQ for common questions about free LLM APIs

Best Free LLM APIs for Chat

What to Look for in a Chat Model

How to Choose a Free Chat Model

Top Picks for Chat

All Free Chat Models

Export to Chat Client 🚀