LLM Models
Nurvus uses Qwen models via Ollama or any OpenAI-compatible server (LM Studio, vLLM, llama.cpp). The install script auto-detects your hardware and recommends a model.
Model Tiers
| Tier | Hardware | Model | What to Expect |
|---|---|---|---|
| Minimum | 4-7GB RAM | qwen2.5:3b |
Basic tasks only. Will struggle with multi-step plans and tool calling. |
| Functional | 8-15GB RAM | qwen2.5:7b |
Can deploy containers but may output tool calls as text or narrate excessively. |
| Recommended | 16GB+ RAM | qwen3:30b-a3b |
MoE model (3B active params). Fast CPU inference, reliable tool calling, follows the plan-confirm-execute workflow correctly. |
| High | 32-64GB RAM | qwen2.5:32b |
Strong reasoning and planning. |
| Ultra | 64GB+ RAM | qwen2.5:72b |
Best reasoning available locally. |
Why qwen3:30b-a3b?
The qwen3:30b-a3b model is a Mixture of Experts (MoE) model with 30B total parameters but only 3B active per token. This means:
- It runs fast even on CPU-only hardware
- Quality is much higher than a dense 7B model
- Tool calling works reliably (the main failure mode of smaller models)
- 16GB RAM is enough to run it comfortably
Smaller models (3b, 7b) can technically run Demi but will frequently fail to call tools correctly or follow the multi-step workflow.
GPU Acceleration
A GPU helps but isn't required. The install script auto-detects AMD and NVIDIA GPUs and offers to enable acceleration:
- AMD: ROCm (
services.nurvus.acceleration = "rocm") - NVIDIA: CUDA (
services.nurvus.acceleration = "cuda")
Remote Ollama
If your homelab server doesn't have enough RAM, you can run Ollama on a separate machine and point Demi at it:
services.nurvus = {
enable = true;
model = "qwen3:30b-a3b";
llmUrl = "http://my-beefy-server:11434";
contextSize = 32768;
};
The NixOS module will skip installing local Ollama when using a remote URL.
LM Studio and Other OpenAI-Compatible Servers
Demi works with any server that exposes an OpenAI-compatible /v1/chat/completions endpoint, including LM Studio, vLLM, and llama.cpp.
Set the NURVUS_LLM_URL environment variable to point to your server. The API protocol is auto-detected from the URL (port 1234 defaults to OpenAI), or you can set it explicitly with NURVUS_API:
# LM Studio on your Mac, accessed from a NixOS VM
NURVUS_LLM_URL=http://192.168.64.1:1234 NURVUS_MODEL=qwen/qwen3-30b-a3b demi
# Explicit API override
NURVUS_LLM_URL=http://my-server:8080 NURVUS_API=openai demi
This is useful when running the LLM on a more powerful machine (like a Mac with Apple Silicon) while Demi runs on a lightweight NixOS server or VM.
Context Size
Context size determines how much conversation Demi can remember within a session. 32K is recommended and sufficient for any homelab management session. Going higher uses more RAM for minimal benefit.
| Context | RAM Impact | Use Case |
|---|---|---|
| 4096 | Minimal | Very low RAM systems |
| 8192 | Low | Default for small models |
| 32768 | Moderate | Recommended for qwen3:30b-a3b |