How to Use Free LLM Models with OpenHuman — Complete config.toml Setup

OpenHuman (24K+ GitHub stars) is a desktop AI agent with 118+ integrations, persistent memory, and a built-in model router. By default it uses a paid subscription — but you can point it at a free LLM API instead. Here's how.

OpenHuman's Model Architecture (in Plain English)

OpenHuman uses hints to route each task to the right model, instead of sending everything to one LLM:

HintUsed ForWhat Matters
reasoningPlanning, math, complex debuggingIntelligence, long output
codingCode generation, refactoring, FIMCoding benchmarks, speed
fastUI helpers, autocomplete, quick chatLatency, cost
visionScreenshots, image analysis, OCRMultimodal capability
agentMulti-step tool chains, autonomous tasksTool use, long-context coherence

By default, OpenHuman's backend handles this routing under a paid subscription. But with two lines of config, you can redirect all inference to a free API.

Step 1: Pick Your Free API Backend

OpenHuman talks to any OpenAI-compatible API. The best free option is OpenRouter — one API key gives access to 28+ free models, making it ideal for hint-based routing where each hint needs a different model. One inference_url. All hints covered.

Alternatives if you prefer a single-provider backend:

ProviderBest ForNo Credit Card?inference_url
OpenRouterAll hints (widest model selection)Yeshttps://openrouter.ai/api/v1
NVIDIA NIMReasoning, coding (DeepSeek, Qwen, Nemotron)Yeshttps://integrate.api.nvidia.com/v1
GroqFast inference (Llama 4 Maverick, Kimi K2)Yeshttps://api.groq.com/openai/v1
Google AI StudioVision, long context (Gemini 2.5 Flash)Yeshttps://generativelanguage.googleapis.com/v1beta

OpenRouter is the recommended starting point. Get a key at openrouter.ai/keys — no credit card required for free models.

Step 2: Edit config.toml

OpenHuman stores its configuration in config.toml inside its workspace directory. Open Settings → AI & Skills → Local AI in the desktop app to locate it, or edit directly:

Minimal config — all tasks through OpenRouter

# config.toml — OpenHuman workspace config
inference_url = "https://openrouter.ai/api/v1"
api_key = "sk-or-v1-YOUR_OPENROUTER_KEY"
default_model = "nvidia/nemotron-3-super-120b-a12b:free"

This routes everything through OpenRouter using NVIDIA Nemotron 3 Super as the default. OpenHuman's hint system still works — it requests different model IDs from OpenRouter for each task, and OpenRouter serves the right model.

Full config — pin each hint to a specific free model

# config.toml — OpenHuman with free LLM backend (OpenRouter)
inference_url = "https://openrouter.ai/api/v1"
api_key = "sk-or-v1-YOUR_OPENROUTER_KEY"
default_model = "nvidia/nemotron-3-super-120b-a12b:free"

# Optional: override models for specific teams
[teams.reasoning]
lead_model = "nvidia/nemotron-3-super-120b-a12b:free"    # NVIDIA Nemotron 3 Super — 120B MoE reasoning

[teams.coding]
lead_model = "deepseek/deepseek-v4-flash:free"   # DeepSeek V4 Flash — 384K output, coding-optimized
agent_model = "deepseek/deepseek-v4-flash:free"

[teams.vision]
lead_model = "google/lyria-3-pro-preview"       # Google Lyria 3 Pro — multimodal, 1M ctx

Step 3: Recommended Free Models by Task

With free models tracked across 20+ providers, picking the right model for each hint matters. Here are the best free models as of May 2026:

HintBest Free ModelContextKey StrengthOpenRouter Model ID
reasoning NVIDIA Nemotron 3 Super 1M 120B MoE, best free reasoning model nvidia/nemotron-3-super-120b-a12b:free
coding DeepSeek V4 Flash 1M 384K output — generate entire files in one shot deepseek/deepseek-v4-flash:free
fast Gemini 2.5 Flash 1M Google's reliable workhorse, low latency google/gemini-2.5-flash
vision Google Lyria 3 Pro 1M Google's newest multimodal flagship google/lyria-3-pro-preview
agent Owl Alpha 1M Purpose-built for tool use & multi-step agents openrouter/owl-alpha

Alternative free models (if you hit rate limits)

Fallback ModelAvailable OnBest For
NVIDIA Nemotron 3 Super 120BOpenRouter, NVIDIA NIMReasoning, math
Qwen3 Coder 480B A35BOpenRouterCoding, FIM support
Qwen3.5 397B A17BNVIDIA NIMVision, reasoning
Llama 4 Maverick 17BGroqFastest inference on free tier
GPT-OSS-120BCerebrasOpenAI-compatible, 120B, generous limits
GLM 5.1OpenRouter, NVIDIA NIMLong autonomous sessions (8h+)
CodestralMistral AIDedicated coding, FIM
Baidu CoBuddyOpenRouterAgent-optimized, fast

Why OpenRouter as the inference_url Makes Sense

OpenHuman's power comes from routing different tasks to different models. If you point inference_url at a single-model provider (e.g., Google AI Studio for just Gemini), you give up that routing. OpenRouter is the only free backend that lets you:

  • Use one API key for hundreds of models
  • Route reasoning → Nemotron 3 Super, coding → DeepSeek V4 Flash, vision → Lyria 3 Pro
  • Stay under rate limits by spreading load across different models
  • No credit card required for free models

If you prefer not using OpenRouter, Groq is a solid single-provider alternative — Llama 4 Maverick handles most tasks reasonably, and Groq's LPU hardware makes it the fastest free option.

Step 4: Verify It Works

After editing config.toml:

  1. Restart OpenHuman
  2. Open Settings → AI & Skills → check that your model appears in the status panel
  3. Send a test message — if the response comes back, the custom backend is working

If you see connection errors, double-check that your inference_url includes the full path (/v1) and that your API key is valid. Test the key directly:

curl -s https://openrouter.ai/api/v1/models \\
  -H "Authorization: Bearer YOUR_KEY" | head -20

Free + OpenHuman: What You Get vs. Paid

FeaturePaid SubscriptionFree API (OpenRouter)
Model accessAll providers (OpenAI, Anthropic, Google…)Free models only (no GPT-5, Claude)
Rate limitsHigh, guaranteedVaries, may queue during peak
Hint routingAutomatic optimal routingManual model pin per hint
Memory treeFull featureFull feature (local processing)
Integrations (OAuth)Managed by OpenHumanStill works (proxied through backend)
Cost$20-30/month$0

The free setup gives you the core OpenHuman experience — desktop agent, memory tree, 118+ integrations, voice — without the monthly cost. The trade-off: no access to Claude or GPT-5 quality models, and you manage which model does what. For many developers, that's a good trade.

FAQ

Can I use multiple free providers at once?

OpenHuman supports a single inference_url. If you want to mix providers (e.g., Groq for speed + NVIDIA for reasoning), use OpenRouter — it aggregates all of them behind one endpoint.

Does the memory tree work with a free API?

Yes. The memory tree and Obsidian wiki run locally on your machine using SQLite. They call the LLM for summarization — which goes through your custom inference_url. A fast model like Gemini 2.5 Flash handles summarization well.

What if I need Claude or GPT-5 for a specific task?

You can keep the paid subscription for the main orchestrator and set up specific teams in config.toml to use free models for less critical tasks. This hybrid approach gives you the best of both worlds.

Is this officially supported by OpenHuman?

Yes — inference_url is a documented config option in OpenHuman's schema. It's designed for self-hosted/alternative backends. The OpenHuman backend (auth, integrations, OAuth proxy) still runs on their infrastructure; only inference is redirected.

Get your free API key and start using OpenHuman today →

Get OpenRouter key (no credit card)

OpenHuman config generator →

Or browse all 164+ free models across all providers.