Models

Any OpenAI-compatible server works. But for an agent, not all models are equal: the thing that matters is tool calling.

The real criterion: tool calling

At startup, Mini-claude checks two things about your model:

Size (via Ollama’s /api/show): models under 7B fall back to chat-only, no tool calls. Useful to test the UI, not for real agent work.
Native tool calling (via a startup probe): some models advertise the tools capability but actually emit tool calls as plain text instead of the structured field. Mini-claude recovers many of these via a fallback parser, but they stay unreliable on multi-step tasks. When detected, you’ll see weak tool calling in the header.

Verified for native tool calling

ollama pull mistral:7b     # solid agent, native calls, ~5 GB
ollama pull qwen3:8b       # newer, strong at both code and tools
ollama pull llama3.1:8b    # reliable tool use

The qwen2.5-coder caveat

qwen2.5-coder on Ollama is an excellent code model, but its chat template does not emit native tool calls, it writes them as text. That makes it a poor agent despite being a great coder. The fallback parser handles simple single-tool requests, but complex audits/refactors will be flaky.

Chat-only (tools disabled)

llama3.2:3b, phi4-mini, any sub-7B model. Fine for chatting or testing, not for driving tools.

When tool calling is weak: `/audit`

Even without reliable tool calling, /audit [path] works: Mini-claude gathers the project data itself (tree + README + manifest) and only asks the model to synthesize. Any model can do that. See Slash commands.

Switching at runtime

Open the picker with /model or switch directly with /model mistral:7b. See Slash commands.

Contribute Project memory