How to Get a Free Ollama Cloud API Key (2026)

6 free models available — no credit card required. Get your Ollama Cloud API key →

Overview

Ollama Cloud — run Llama, Qwen, Gemma via Ollama API in the cloud.

Ollama Cloud provides a hosted version of the popular Ollama runtime, exposing Llama, Qwen, Gemma, and other open models through the familiar Ollama API format. Free tier has unpublished session/weekly limits. Useful for developers already using Ollama locally who want a zero-config cloud option.

  • Ollama-native API format
  • Llama, Qwen, Gemma models
  • Familiar Ollama tooling
  • OpenAI-compatible endpoint available

API Compatibility: Ollama API + OpenAI-compatible wrapper

Quick Start Guide

  1. 1
    Sign up at ollama.com Email registration. No credit card.
  2. 2
    Go to Settings → API Keys
  3. 3
    Create your free Ollama API key
  4. 4
    Choose a model Llama, Qwen, Gemma available. Familiar Ollama API format.
  5. 5
    Configure client Base URL: https://api.ollama.com. OpenAI-compatible wrapper available.

All Free Ollama Cloud Models — Context Windows & Rate Limits

Model Context Max Output Modality Rate Limit Released Status
`gpt-oss:120b-cloud` 128K 131K text Session/weekly limits (unpublished) Online Details
`deepseek-v3.1:671b-cloud` 128K 131K text Session/weekly limits (unpublished) Online Details
`qwen3-coder:480b-cloud` 128K 131K textcode Session/weekly limits (unpublished) Online Details
`kimi-k2:1t-cloud` 262K 131K text Session/weekly limits (unpublished) Online Details
`glm-4.6:cloud` 128K 131K text Session/weekly limits (unpublished) Online Details
`deepseek-r1:cloud` 128K 131K text Session/weekly limits (unpublished) Online Details

Free Tier Pricing & Rate Limits

Credit Card Not required
Free Tier Permanently free
Context Range 128K – 262K
Total Models 6 free
Rate Limits Session/weekly limits (unpublished)
API Compatibility Ollama API + OpenAI-compatible wrapper

Use Cases

What Ollama Cloud's free models are best for, based on aggregated model capabilities:

Chat 6 models Coding 2 models Reasoning 1 model

Limitations & Caveats

  • Rate limits are unpublished — hard to plan capacity
  • Limited model selection compared to Ollama self-hosted
  • Newer/smaller provider with limited track record
See our FAQ for common questions about free LLM APIs