How to Use Free LLM Models with OpenHuman — config.toml Setup Guide

OpenHuman's Model Architecture (in Plain English)

OpenHuman uses hints to route each task to the right model, instead of sending everything to one LLM:

Hint	Used For	What Matters
`reasoning`	Planning, math, complex debugging	Intelligence, long output
`coding`	Code generation, refactoring, FIM	Coding benchmarks, speed
`fast`	UI helpers, autocomplete, quick chat	Latency, cost
`vision`	Screenshots, image analysis, OCR	Multimodal capability
`agent`	Multi-step tool chains, autonomous tasks	Tool use, long-context coherence

By default, OpenHuman's backend handles this routing under a paid subscription. But with two lines of config, you can redirect all inference to a free API.

Step 1: Pick Your Free API Backend

OpenHuman talks to any OpenAI-compatible API. The best free option is OpenRouter — one API key gives access to 287+ free models, making it ideal for hint-based routing where each hint needs a different model. One inference_url. All hints covered.

Alternatives if you prefer a single-provider backend:

Provider	Best For	No Credit Card?	inference_url
OpenRouter	All hints (widest model selection)	Yes	`https://openrouter.ai/api/v1`
NVIDIA NIM	Reasoning, coding (DeepSeek, Qwen, Nemotron)	Yes	`https://integrate.api.nvidia.com/v1`
Groq	Fast inference (Llama 4 Maverick, Kimi K2)	Yes	`https://api.groq.com/openai/v1`
Google AI Studio	Vision, long context (Gemini 2.5 Flash)	Yes	`https://generativelanguage.googleapis.com/v1beta`

OpenRouter is the recommended starting point. Get a key at openrouter.ai/keys — no credit card required for free models.

Step 2: Edit config.toml

OpenHuman stores its configuration in config.toml inside its workspace directory. Open Settings → AI & Skills → Local AI in the desktop app to locate it, or edit directly:

Minimal config — all tasks through OpenRouter

# config.toml — OpenHuman workspace config
inference_url = "https://openrouter.ai/api/v1"
api_key = "sk-or-v1-YOUR_OPENROUTER_KEY"
default_model = "nvidia/nemotron-3-super-120b-a12b:free"

This routes everything through OpenRouter using NVIDIA Nemotron 3 Super as the default. OpenHuman's hint system still works — it requests different model IDs from OpenRouter for each task, and OpenRouter serves the right model.

Full config — pin each hint to a specific free model

# config.toml — OpenHuman with free LLM backend (OpenRouter)
inference_url = "https://openrouter.ai/api/v1"
api_key = "sk-or-v1-YOUR_OPENROUTER_KEY"
default_model = "nvidia/nemotron-3-super-120b-a12b:free"

# Optional: override models for specific teams
[teams.reasoning]
lead_model = "nvidia/nemotron-3-super-120b-a12b:free"    # NVIDIA Nemotron 3 Super — 120B MoE reasoning

[teams.coding]
lead_model = "deepseek/deepseek-v4-flash:free"   # DeepSeek V4 Flash — 384K output, coding-optimized
agent_model = "deepseek/deepseek-v4-flash:free"

[teams.vision]
lead_model = "google/lyria-3-pro-preview"       # Google Lyria 3 Pro — multimodal, 1M ctx

Step 3: Recommended Free Models by Task

With free models tracked across 20+ providers, picking the right model for each hint matters. Here are the best free models as of May 2026:

Hint	Best Free Model	Context	Key Strength	OpenRouter Model ID
reasoning	NVIDIA Nemotron 3 Super	1M	120B MoE, best free reasoning model	`nvidia/nemotron-3-super-120b-a12b:free`
coding	DeepSeek V4 Flash	1M	384K output — generate entire files in one shot	`deepseek/deepseek-v4-flash:free`
fast	Gemini 2.5 Flash	1M	Google's reliable workhorse, low latency	`google/gemini-2.5-flash`
vision	Google Lyria 3 Pro	1M	Google's newest multimodal flagship	`google/lyria-3-pro-preview`
agent	Owl Alpha	1M	Purpose-built for tool use & multi-step agents	`openrouter/owl-alpha`

 Alternative free models (if you hit rate limits)
    Fallback Model Available On Best For
 
  NVIDIA Nemotron 3 Super 120B OpenRouter, NVIDIA NIM Reasoning, math
 Qwen3 Coder 480B A35B OpenRouter Coding, FIM support
 Qwen3.5 397B A17B NVIDIA NIM Vision, reasoning
 Llama 4 Maverick 17B Groq Fastest inference on free tier
 GPT-OSS-120B Cerebras OpenAI-compatible, 120B, generous limits
 GLM 5.1 OpenRouter, NVIDIA NIM Long autonomous sessions (8h+)
 Codestral Mistral AI Dedicated coding, FIM
 Baidu CoBuddy OpenRouter Agent-optimized, fast
 
 
 
 Why OpenRouter as the inference_url Makes Sense
 
OpenHuman's power comes from routing different tasks to different models. If you point inference_url at a single-model provider (e.g., Google AI Studio for just Gemini), you give up that routing. OpenRouter is the only free backend that lets you:
  Use one API key for hundreds of models
 Route reasoning → Nemotron 3 Super, coding → DeepSeek V4 Flash, vision → Lyria 3 Pro
 Stay under rate limits by spreading load across different models
 No credit card required for free models
 
 
If you prefer not using OpenRouter, Groq is a solid single-provider alternative — Llama 4 Maverick handles most tasks reasonably, and Groq's LPU hardware makes it the fastest free option.
 Step 4: Verify It Works
 After editing config.toml:
  Restart OpenHuman
 Open Settings → AI & Skills → check that your model appears in the status panel
 Send a test message — if the response comes back, the custom backend is working
 
 
If you see connection errors, double-check that your inference_url includes the full path (/v1) and that your API key is valid. Test the key directly:
 curl -s https://openrouter.ai/api/v1/models \\
  -H "Authorization: Bearer YOUR_KEY" | head -20
 Free + OpenHuman: What You Get vs. Paid
   Feature Paid Subscription Free API (OpenRouter)
 
  Model access All providers (OpenAI, Anthropic, Google…) Free models only (no GPT-5, Claude)
 Rate limits High, guaranteed Varies, may queue during peak
 Hint routing Automatic optimal routing Manual model pin per hint
 Memory tree Full feature Full feature (local processing)
 Integrations (OAuth) Managed by OpenHuman Still works (proxied through backend)
 Cost $20-30/month $0
 
 
 
The free setup gives you the core OpenHuman experience — desktop agent, memory tree, 118+ integrations, voice — without the monthly cost. The trade-off: no access to Claude or GPT-5 quality models, and you manage which model does what. For many developers, that's a good trade.
 FAQ
 Can I use multiple free providers at once?
 
OpenHuman supports a single inference_url. If you want to mix providers (e.g., Groq for speed + NVIDIA for reasoning), use OpenRouter — it aggregates all of them behind one endpoint.
 Does the memory tree work with a free API?
 
Yes. The memory tree and Obsidian wiki run locally on your machine using SQLite. They call the LLM for summarization — which goes through your custom inference_url. A fast model like Gemini 2.5 Flash handles summarization well.
 What if I need Claude or GPT-5 for a specific task?
 
You can keep the paid subscription for the main orchestrator and set up specific teams in config.toml to use free models for less critical tasks. This hybrid approach gives you the best of both worlds.
 Is this officially supported by OpenHuman?
 
Yes — inference_url is a documented config option in OpenHuman's schema. It's designed for self-hosted/alternative backends. The OpenHuman backend (auth, integrations, OAuth proxy) still runs on their infrastructure; only inference is redirected.
  Get your free API key and start using OpenHuman today →
 Get OpenRouter key (no credit card)
 OpenHuman config generator →
 Or browse all 648+ free models across all providers.

Fallback Model	Available On	Best For
NVIDIA Nemotron 3 Super 120B	OpenRouter, NVIDIA NIM	Reasoning, math
Qwen3 Coder 480B A35B	OpenRouter	Coding, FIM support
Qwen3.5 397B A17B	NVIDIA NIM	Vision, reasoning
Llama 4 Maverick 17B	Groq	Fastest inference on free tier
GPT-OSS-120B	Cerebras	OpenAI-compatible, 120B, generous limits
GLM 5.1	OpenRouter, NVIDIA NIM	Long autonomous sessions (8h+)
Codestral	Mistral AI	Dedicated coding, FIM
Baidu CoBuddy	OpenRouter	Agent-optimized, fast

Feature	Paid Subscription	Free API (OpenRouter)
Model access	All providers (OpenAI, Anthropic, Google…)	Free models only (no GPT-5, Claude)
Rate limits	High, guaranteed	Varies, may queue during peak
Hint routing	Automatic optimal routing	Manual model pin per hint
Memory tree	Full feature	Full feature (local processing)
Integrations (OAuth)	Managed by OpenHuman	Still works (proxied through backend)
Cost	$20-30/month	$0