Skip to content

LLM Models

Nurvus uses Qwen models via Ollama or any OpenAI-compatible server (LM Studio, vLLM, llama.cpp). The install script auto-detects your hardware and recommends a model.

Model Tiers

Tier Hardware Model What to Expect
Minimum 4-7GB RAM qwen2.5:3b Basic tasks only. Will struggle with multi-step plans and tool calling.
Functional 8-15GB RAM qwen2.5:7b Can deploy containers but may output tool calls as text or narrate excessively.
Recommended 16GB+ RAM qwen3:30b-a3b MoE model (3B active params). Fast CPU inference, reliable tool calling, follows the plan-confirm-execute workflow correctly.
High 32-64GB RAM qwen2.5:32b Strong reasoning and planning.
Ultra 64GB+ RAM qwen2.5:72b Best reasoning available locally.

Why qwen3:30b-a3b?

The qwen3:30b-a3b model is a Mixture of Experts (MoE) model with 30B total parameters but only 3B active per token. This means:

  • It runs fast even on CPU-only hardware
  • Quality is much higher than a dense 7B model
  • Tool calling works reliably (the main failure mode of smaller models)
  • 16GB RAM is enough to run it comfortably

Smaller models (3b, 7b) can technically run Demi but will frequently fail to call tools correctly or follow the multi-step workflow.

GPU Acceleration

A GPU helps but isn't required. The install script auto-detects AMD and NVIDIA GPUs and offers to enable acceleration:

  • AMD: ROCm (services.nurvus.acceleration = "rocm")
  • NVIDIA: CUDA (services.nurvus.acceleration = "cuda")

Remote Ollama

If your homelab server doesn't have enough RAM, you can run Ollama on a separate machine and point Demi at it:

services.nurvus = {
  enable = true;
  model = "qwen3:30b-a3b";
  llmUrl = "http://my-beefy-server:11434";
  contextSize = 32768;
};

The NixOS module will skip installing local Ollama when using a remote URL.

LM Studio and Other OpenAI-Compatible Servers

Demi works with any server that exposes an OpenAI-compatible /v1/chat/completions endpoint, including LM Studio, vLLM, and llama.cpp.

Set the NURVUS_LLM_URL environment variable to point to your server. The API protocol is auto-detected from the URL (port 1234 defaults to OpenAI), or you can set it explicitly with NURVUS_API:

# LM Studio on your Mac, accessed from a NixOS VM
NURVUS_LLM_URL=http://192.168.64.1:1234 NURVUS_MODEL=qwen/qwen3-30b-a3b demi

# Explicit API override
NURVUS_LLM_URL=http://my-server:8080 NURVUS_API=openai demi

This is useful when running the LLM on a more powerful machine (like a Mac with Apple Silicon) while Demi runs on a lightweight NixOS server or VM.

Context Size

Context size determines how much conversation Demi can remember within a session. 32K is recommended and sufficient for any homelab management session. Going higher uses more RAM for minimal benefit.

Context RAM Impact Use Case
4096 Minimal Very low RAM systems
8192 Low Default for small models
32768 Moderate Recommended for qwen3:30b-a3b