Blame

75dffb Damien 2026-06-15 10:02:38
Flesh out all pages — add Home Assistant, LAN Machines, GC Jobs, Known Issues; expand Infrastructure, Docker Swarm, Network/Media/Documents/Traefik pages
1
# LAN Machines
2
3
← [[Home]]
4
5
Physical machines on the LAN that are not LXC containers — primarily Ollama inference servers.
6
7
---
8
9
## Machines
10
11
| IP | Hardware | Role |
12
|----|----------|------|
13
| `192.168.2.11` | RTX 3060 (12 GB VRAM) | Ollama — large models |
14
| `192.168.2.40` | RTX 2060 Super (16 GB RAM) | Ollama — small models |
15
| `192.168.2.73` | GTX 1050 | Local workstation |
16
17
---
18
19
## Ollama @ 192.168.2.11 (RTX 3060)
20
21
Primary inference server for large models. API: `http://192.168.2.40:11434`
22
23
**Running models:**
24
25
| Model | Notes |
26
|-------|-------|
27
| `qwen3.6:27b` | 27B MoE — fits in 12 GB VRAM |
28
| `qwen3.5` | |
29
| `ministral-3` | |
30
| `llama3.2` | |
31
| `llama3.1:8b` | Aliased as `llama3.1:8b-gpu` in LiteLLM |
32
| `llava:7b` | Vision/OCR — used by paperless-gpt |
33
| `nomic-embed-text` | Embeddings for Qdrant (vector size 192) |
34
35
---
36
37
## Ollama @ 192.168.2.40 (RTX 2060 Super)
38
39
Secondary inference server for small models. API: `http://192.168.2.40:11434`
40
41
Models stored at `C:\Users\Damien\.ollama\` (Windows machine).
42
43
**Running models:**
44
45
| Model | Notes |
46
|-------|-------|
47
| `llama3.1:8b` | |
48
| `llama3.2:3b` | |
49
50
---
51
52
## Model Sizing Notes
53
54
- **RTX 3060 (12 GB):** fits up to ~14B dense or ~27B MoE at Q4_K_M
55
- **Qwen3-coder-30b-a3b** (MoE) needs ~22 GB VRAM at Q4_K_M — exceeds both `.11` and `.40`. Not runnable locally.
56
- **Rule of thumb:** Q4_K_M quantization uses roughly 0.5–0.6 GB per billion parameters for dense models; MoE models use much less because only a fraction of params activate per token.
57
58
---
59
60
## LiteLLM Integration
61
62
Both Ollama servers are configured as backends in LiteLLM on PCT 109. See [[AI Stack]] for the full model list and proxy config. Reference Ollama models in LiteLLM as their configured model names (e.g. `qwen3.6:27b`, `llama3.1:8b`).