LLM Models

Nurvus uses Qwen models via Ollama or any OpenAI-compatible server (LM Studio, vLLM, llama.cpp). The install script auto-detects your hardware and recommends a model.

Model Tiers

Tier	Hardware	Model	What to Expect
Minimum	4-7GB RAM	`qwen2.5:3b`	Basic tasks only. Will struggle with multi-step plans and tool calling.
Functional	8-15GB RAM	`qwen2.5:7b`	Can deploy containers but may output tool calls as text or narrate excessively.
Recommended	16GB+ RAM	`qwen3:30b-a3b`	MoE model (3B active params). Fast CPU inference, reliable tool calling, follows the plan-confirm-execute workflow correctly.
High	32-64GB RAM	`qwen2.5:32b`	Strong reasoning and planning.
Ultra	64GB+ RAM	`qwen2.5:72b`	Best reasoning available locally.

Why qwen3:30b-a3b?

The qwen3:30b-a3b model is a Mixture of Experts (MoE) model with 30B total parameters but only 3B active per token. This means:

It runs fast even on CPU-only hardware
Quality is much higher than a dense 7B model
Tool calling works reliably (the main failure mode of smaller models)
16GB RAM is enough to run it comfortably

Smaller models (3b, 7b) can technically run Demi but will frequently fail to call tools correctly or follow the multi-step workflow.

GPU Acceleration

A GPU helps but isn't required. The install script auto-detects AMD and NVIDIA GPUs and offers to enable acceleration:

AMD: ROCm (services.nurvus.acceleration = "rocm")
NVIDIA: CUDA (services.nurvus.acceleration = "cuda")

Remote Ollama

If your homelab server doesn't have enough RAM, you can run Ollama on a separate machine and point Demi at it:

services.nurvus = {
  enable = true;
  model = "qwen3:30b-a3b";
  llmUrl = "http://my-beefy-server:11434";
  contextSize = 32768;
};

The NixOS module will skip installing local Ollama when using a remote URL.

LM Studio and Other OpenAI-Compatible Servers

Demi works with any server that exposes an OpenAI-compatible /v1/chat/completions endpoint, including LM Studio, vLLM, and llama.cpp.

Set the NURVUS_LLM_URL environment variable to point to your server. The API protocol is auto-detected from the URL (port 1234 defaults to OpenAI), or you can set it explicitly with NURVUS_API:

# LM Studio on your Mac, accessed from a NixOS VM
NURVUS_LLM_URL=http://192.168.64.1:1234 NURVUS_MODEL=qwen/qwen3-30b-a3b demi

# Explicit API override
NURVUS_LLM_URL=http://my-server:8080 NURVUS_API=openai demi

This is useful when running the LLM on a more powerful machine (like a Mac with Apple Silicon) while Demi runs on a lightweight NixOS server or VM.

Context Size

Context size determines how much conversation Demi can remember within a session. 32K is recommended and sufficient for any homelab management session. Going higher uses more RAM for minimal benefit.

Context	RAM Impact	Use Case
4096	Minimal	Very low RAM systems
8192	Low	Default for small models
32768	Moderate	Recommended for qwen3:30b-a3b