llama-4-scout-17b-16e-instruct — Free AI Model & API

cerebras/llama-4-scout-17b-16e-instruct
chat
Context Window 128K
Max Output 8K
Rate Limit 30 RPM, 14,400 RPD, 1M TPD
Cost $0.00 FREE
Free Period Since May 10, 2026
Credit Card Not required
Phone Verification Required
Status Online

Overview

Llama 4 Scout 17B on Groq runs Meta's latest MoE generation model with Groq's ultra-fast LPU inference. The Scout variant uses 16 active experts to deliver broad capability in a compact 17B active footprint, with 8K output per request. Combined with Groq's sub-200ms time-to-first-token, it offers a responsive experience for interactive chat and agent workflows. Rate limits are 14,400 requests per day at 30 RPM — sufficient for sustained prototyping and light production use. OpenAI SDK compatible; registration required but no credit card needed.

Model ID
llama-4-scout-17b-16e-instruct
Base URL
https://api.cerebras.ai/v1
Specifications
Context: 128K · Output: 8K · Modality: text · OpenAI Compat: Yes

Quick Start

Integrate llama-4-scout-17b-16e-instruct with 3 lines of code. See the config generator for Claude Code, Cursor, and more.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.cerebras.ai/v1",
    api_key="YOUR_API_KEY"
)

response = client.chat.completions.create(
    model="llama-4-scout-17b-16e-instruct",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: "https://api.cerebras.ai/v1",
  apiKey: "YOUR_API_KEY",
});

const completion = await openai.chat.completions.create({
  model: "llama-4-scout-17b-16e-instruct",
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(completion.choices[0].message.content);
curl https://api.cerebras.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "llama-4-scout-17b-16e-instruct",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Other Free Models from Cerebras

Rate Limits & Constraints

Rate Limit 30 RPM, 14,400 RPD, 1M TPD
Context Window 128K
Max Output Tokens 8K
Cost Free — since May 10, 2026
Credit Card Not required
OpenAI Compatible Yes — drop-in replacement

Cerebras Platform Limitations

  • 8K context window on free tier (vs 128K on paid)
  • Limited model selection — Llama and GPT-OSS only
  • 1M tokens/day shared across models

Features & Use Cases

Best For

Chat

Modality Support

text

Cerebras Highlights

  • Ultra-fast inference on WSE chips
  • 1M tokens/day free
  • No credit card required
  • Llama 3.1 8B + GPT-OSS 120B available

How to Get a Free Cerebras API Key

Follow these steps to get your free API key for llama-4-scout-17b-16e-instruct. No credit card required — just sign up and start using the API.

  1. Sign up at cloud.cerebras.ai Email or GitHub. No credit card.
  2. Go to API Keys
  3. Generate an API key
  4. Choose a model Llama 3.3 70B or GPT-OSS 120B available for free.
  5. Configure OpenAI client Base URL: https://api.cerebras.ai/v1

Playground — Test llama-4-scout-17b-16e-instruct

Test llama-4-scout-17b-16e-instruct directly in your browser. Your API key is sent directly to Cerebras — never stored.

Model: llama-4-scout-17b-16e-instruct Get Key

🔒 Your key is never stored — sent directly to the model provider via our server proxy.

Ready to chat with llama-4-scout-17b-16e-instruct.

Frequently Asked Questions

How do I get an API key for llama-4-scout-17b-16e-instruct?

Sign up at Cerebras to get your API key. No credit card is required — just an email sign-up. Once you have the key, use the code snippets in the Quick Start section above.

Is llama-4-scout-17b-16e-instruct really free?

Yes. llama-4-scout-17b-16e-instruct is available on Cerebras's free tier and has been free since May 10, 2026. Rate limits apply: 30 RPM, 14,400 RPD, 1M TPD. Always check the provider's terms for any changes to the free tier.

What are llama-4-scout-17b-16e-instruct's rate limits?

30 RPM, 14,400 RPD, 1M TPD Context window: 128K. Max output: 8K. No credit card required.

What are the best free alternatives to llama-4-scout-17b-16e-instruct?

Popular free alternatives include inclusionAI: Ring-2.6-1T, Owl Alpha, NVIDIA: Nemotron 3 Nano Omni (free). You can also browse all 164+ free models on our site.

More questions? See our full FAQ →

Similar Free Models