How to Get a Free Cerebras API Key (2026)
2 free models available — no credit card required. Get your Cerebras API key → Test free models →
Cerebras FreeLLM Score
A usable option, though it may have noticeable restrictions or older models.
What is Cerebras?
Ultra-fast inference on Cerebras WSE chips — 1M tokens/day.
Cerebras Cloud offers free API access to Llama and GPT-OSS models running on the Cerebras Wafer-Scale Engine, one of the fastest AI accelerators available. The free tier provides 1 million tokens/day and 14,400 requests/day per model with no credit card required. Context window is limited to 8K on the free tier.
- Ultra-fast inference on WSE chips
- 1M tokens/day free
- No credit card required
- Llama 3.1 8B + GPT-OSS 120B available
API Compatibility: OpenAI SDK-compatible (Chat Completions)
How to Get a Cerebras API Key
- 1
- 2 Go to API Keys
- 3 Generate an API key
- 4 Choose a model Llama 3.3 70B or GPT-OSS 120B available for free.
- 5 Configure OpenAI client Base URL: https://api.cerebras.ai/v1
All Free Cerebras Models — Context Windows & Rate Limits
| Model | Context | Max Output | Modality | Rate Limit | Released | Status |
|---|---|---|---|---|---|---|
| zai-glm-4.7 | 128K | 8K | 10 RPM, 100 RPD, 1M TPD | — | Online | |
| gpt-oss-120b | 128K | 8K | 30 RPM, 14,400 RPD, 1M TPD | Aug 5, 2025 | Online |
Cerebras Free Tier Limits & Pricing
Cerebras API Setup Tutorial & Tools
Cerebras is fully compatible with popular AI coding assistants like Cursor, Claude Code, and more. To see step-by-step API configuration instructions for your favorite tool, please visit our Global Configuration Guide →
Use Cases
What Cerebras's free models are best for, based on aggregated model capabilities:
Limitations & Caveats
- 8K context window on free tier (vs 128K on paid)
- Limited model selection — Llama and GPT-OSS only
- 1M tokens/day shared across models
Frequently Asked Questions
Why is Cerebras limited to 8K context on the free tier?
Cerebras' WSE chips are optimized for throughput, not context length. The free tier caps context at 8K tokens to ensure fair resource allocation. Paid tier unlocks 128K context.
What is GPT-OSS 120B on Cerebras?
GPT-OSS (Open Source Software) is a 120B parameter model trained by Cerebras and made available through their platform. It's a general-purpose model comparable to GPT-4 class models for many tasks.
Is Cerebras as fast as Groq?
Both are hardware-accelerated (Cerebras WSE vs Groq LPU). In benchmarks, Cerebras achieves similar throughput to Groq on comparable models — typically 1,500-2,500 tokens/second.