Skip to content

Models

mini-claude doesn’t care which model you run, as long as the inference server speaks the OpenAI-compatible API. Here’s how to think about the trade-offs.

Recommended starters

If you’re new to local LLMs and want something snappy on a laptop CPU:

  • llama3.2:1b — ~1.3 GB, very fast, decent for chat and simple tasks.
  • llama3.2:3b — ~2 GB, our default. Good balance of speed and quality.
  • qwen2.5:0.5b — ~400 MB. Stress-test streaming with near-instant responses.

Stepping up

If you have a GPU or plenty of RAM:

  • qwen2.5:7b — strong general-purpose 7B.
  • llama3.1:8b — solid all-rounder.
  • mistral:7b — fast, focused.

Performance notes

This section is a stub. We’ll add benchmarks once we have repeatable measurements on a few reference machines.

Switching at runtime

Open the picker with /model or switch directly with /model qwen2.5:7b. See Slash commands.