OpenHuman's Model Architecture (in Plain English)
OpenHuman uses hints to route each task to the right model, instead of sending everything to one LLM:
| Hint | Used For | What Matters |
|---|---|---|
reasoning | Planning, math, complex debugging | Intelligence, long output |
coding | Code generation, refactoring, FIM | Coding benchmarks, speed |
fast | UI helpers, autocomplete, quick chat | Latency, cost |
vision | Screenshots, image analysis, OCR | Multimodal capability |
agent | Multi-step tool chains, autonomous tasks | Tool use, long-context coherence |
By default, OpenHuman's backend handles this routing under a paid subscription. But with two lines of config, you can redirect all inference to a free API.
Step 1: Pick Your Free API Backend
OpenHuman talks to any OpenAI-compatible API. The best free option is OpenRouter — one API key gives access to 28+ free models, making it ideal for hint-based routing where each hint needs a different model. One inference_url. All hints covered.
Alternatives if you prefer a single-provider backend:
| Provider | Best For | No Credit Card? | inference_url |
|---|---|---|---|
| OpenRouter | All hints (widest model selection) | Yes | https://openrouter.ai/api/v1 |
| NVIDIA NIM | Reasoning, coding (DeepSeek, Qwen, Nemotron) | Yes | https://integrate.api.nvidia.com/v1 |
| Groq | Fast inference (Llama 4 Maverick, Kimi K2) | Yes | https://api.groq.com/openai/v1 |
| Google AI Studio | Vision, long context (Gemini 2.5 Flash) | Yes | https://generativelanguage.googleapis.com/v1beta |
OpenRouter is the recommended starting point. Get a key at openrouter.ai/keys — no credit card required for free models.
Step 2: Edit config.toml
OpenHuman stores its configuration in config.toml inside its workspace directory. Open Settings → AI & Skills → Local AI in the desktop app to locate it, or edit directly:
Minimal config — all tasks through OpenRouter
# config.toml — OpenHuman workspace config
inference_url = "https://openrouter.ai/api/v1"
api_key = "sk-or-v1-YOUR_OPENROUTER_KEY"
default_model = "nvidia/nemotron-3-super-120b-a12b:free" This routes everything through OpenRouter using NVIDIA Nemotron 3 Super as the default. OpenHuman's hint system still works — it requests different model IDs from OpenRouter for each task, and OpenRouter serves the right model.
Full config — pin each hint to a specific free model
# config.toml — OpenHuman with free LLM backend (OpenRouter)
inference_url = "https://openrouter.ai/api/v1"
api_key = "sk-or-v1-YOUR_OPENROUTER_KEY"
default_model = "nvidia/nemotron-3-super-120b-a12b:free"
# Optional: override models for specific teams
[teams.reasoning]
lead_model = "nvidia/nemotron-3-super-120b-a12b:free" # NVIDIA Nemotron 3 Super — 120B MoE reasoning
[teams.coding]
lead_model = "deepseek/deepseek-v4-flash:free" # DeepSeek V4 Flash — 384K output, coding-optimized
agent_model = "deepseek/deepseek-v4-flash:free"
[teams.vision]
lead_model = "google/lyria-3-pro-preview" # Google Lyria 3 Pro — multimodal, 1M ctx Step 3: Recommended Free Models by Task
With free models tracked across 20+ providers, picking the right model for each hint matters. Here are the best free models as of May 2026:
| Hint | Best Free Model | Context | Key Strength | OpenRouter Model ID |
|---|---|---|---|---|
| reasoning | NVIDIA Nemotron 3 Super | 1M | 120B MoE, best free reasoning model | nvidia/nemotron-3-super-120b-a12b:free |
| coding | DeepSeek V4 Flash | 1M | 384K output — generate entire files in one shot | deepseek/deepseek-v4-flash:free |
| fast | Gemini 2.5 Flash | 1M | Google's reliable workhorse, low latency | google/gemini-2.5-flash |
| vision | Google Lyria 3 Pro | 1M | Google's newest multimodal flagship | google/lyria-3-pro-preview |
| agent | Owl Alpha | 1M | Purpose-built for tool use & multi-step agents | openrouter/owl-alpha |
Alternative free models (if you hit rate limits)
Fallback Model Available On Best For NVIDIA Nemotron 3 Super 120B OpenRouter, NVIDIA NIM Reasoning, math Qwen3 Coder 480B A35B OpenRouter Coding, FIM support Qwen3.5 397B A17B NVIDIA NIM Vision, reasoning Llama 4 Maverick 17B Groq Fastest inference on free tier GPT-OSS-120B Cerebras OpenAI-compatible, 120B, generous limits GLM 5.1 OpenRouter, NVIDIA NIM Long autonomous sessions (8h+) Codestral Mistral AI Dedicated coding, FIM Baidu CoBuddy OpenRouter Agent-optimized, fast
Why OpenRouter as the inference_url Makes Sense
OpenHuman's power comes from routing different tasks to different models. If you point inference_url at a single-model provider (e.g., Google AI Studio for just Gemini), you give up that routing. OpenRouter is the only free backend that lets you:
- Use one API key for hundreds of models
- Route reasoning → Nemotron 3 Super, coding → DeepSeek V4 Flash, vision → Lyria 3 Pro
- Stay under rate limits by spreading load across different models
- No credit card required for free models
If you prefer not using OpenRouter, Groq is a solid single-provider alternative — Llama 4 Maverick handles most tasks reasonably, and Groq's LPU hardware makes it the fastest free option.
Step 4: Verify It Works
After editing config.toml:
- Restart OpenHuman
- Open Settings → AI & Skills → check that your model appears in the status panel
- Send a test message — if the response comes back, the custom backend is working
If you see connection errors, double-check that your inference_url includes the full path (/v1) and that your API key is valid. Test the key directly:
curl -s https://openrouter.ai/api/v1/models \\
-H "Authorization: Bearer YOUR_KEY" | head -20
Free + OpenHuman: What You Get vs. Paid
Feature Paid Subscription Free API (OpenRouter) Model access All providers (OpenAI, Anthropic, Google…) Free models only (no GPT-5, Claude) Rate limits High, guaranteed Varies, may queue during peak Hint routing Automatic optimal routing Manual model pin per hint Memory tree Full feature Full feature (local processing) Integrations (OAuth) Managed by OpenHuman Still works (proxied through backend) Cost $20-30/month $0
The free setup gives you the core OpenHuman experience — desktop agent, memory tree, 118+ integrations, voice — without the monthly cost. The trade-off: no access to Claude or GPT-5 quality models, and you manage which model does what. For many developers, that's a good trade.
FAQ
Can I use multiple free providers at once?
OpenHuman supports a single inference_url. If you want to mix providers (e.g., Groq for speed + NVIDIA for reasoning), use OpenRouter — it aggregates all of them behind one endpoint.
Does the memory tree work with a free API?
Yes. The memory tree and Obsidian wiki run locally on your machine using SQLite. They call the LLM for summarization — which goes through your custom inference_url. A fast model like Gemini 2.5 Flash handles summarization well.
What if I need Claude or GPT-5 for a specific task?
You can keep the paid subscription for the main orchestrator and set up specific teams in config.toml to use free models for less critical tasks. This hybrid approach gives you the best of both worlds.
Is this officially supported by OpenHuman?
Yes — inference_url is a documented config option in OpenHuman's schema. It's designed for self-hosted/alternative backends. The OpenHuman backend (auth, integrations, OAuth proxy) still runs on their infrastructure; only inference is redirected.
Get your free API key and start using OpenHuman today →
Get OpenRouter key (no credit card)
Or browse all 164+ free models across all providers.