Models
Models
mini-claude doesn’t care which model you run, as long as the inference server speaks the OpenAI-compatible API. Here’s how to think about the trade-offs.
Recommended starters
If you’re new to local LLMs and want something snappy on a laptop CPU:
llama3.2:1b— ~1.3 GB, very fast, decent for chat and simple tasks.llama3.2:3b— ~2 GB, our default. Good balance of speed and quality.qwen2.5:0.5b— ~400 MB. Stress-test streaming with near-instant responses.
Stepping up
If you have a GPU or plenty of RAM:
qwen2.5:7b— strong general-purpose 7B.llama3.1:8b— solid all-rounder.mistral:7b— fast, focused.
Performance notes
This section is a stub. We’ll add benchmarks once we have repeatable measurements on a few reference machines.
Switching at runtime
Open the picker with /model or switch directly with /model qwen2.5:7b. See Slash commands.