LAN Machines - blame – An Otter Wiki

Blame

75dffb	Damien	2026-06-15 10:02:38
Flesh out all pages — add Home Assistant, LAN Machines, GC Jobs, Known Issues; expand Infrastructure, Docker Swarm, Network/Media/Documents/Traefik pages

# LAN Machines

← [[Home]]

Physical machines on the LAN that are not LXC containers — primarily Ollama inference servers.

---

## Machines

| IP | Hardware | Role |

|----|----------|------|

| `192.168.2.11` | RTX 3060 (12 GB VRAM) | Ollama — large models |

| `192.168.2.40` | RTX 2060 Super (16 GB RAM) | Ollama — small models |

| `192.168.2.73` | GTX 1050 | Local workstation |

---

## Ollama @ 192.168.2.11 (RTX 3060)

Primary inference server for large models. API: `http://192.168.2.40:11434`

**Running models:**

| Model | Notes |

|-------|-------|

| `qwen3.6:27b` | 27B MoE — fits in 12 GB VRAM |

| `qwen3.5` | |

| `ministral-3` | |

| `llama3.2` | |

| `llama3.1:8b` | Aliased as `llama3.1:8b-gpu` in LiteLLM |

| `llava:7b` | Vision/OCR — used by paperless-gpt |

| `nomic-embed-text` | Embeddings for Qdrant (vector size 192) |

---

## Ollama @ 192.168.2.40 (RTX 2060 Super)

Secondary inference server for small models. API: `http://192.168.2.40:11434`

Models stored at `C:\Users\Damien\.ollama\` (Windows machine).

**Running models:**

| Model | Notes |

|-------|-------|

| `llama3.1:8b` | |

| `llama3.2:3b` | |

---

## Model Sizing Notes

- **RTX 3060 (12 GB):** fits up to ~14B dense or ~27B MoE at Q4_K_M

- **Qwen3-coder-30b-a3b** (MoE) needs ~22 GB VRAM at Q4_K_M — exceeds both `.11` and `.40`. Not runnable locally.

- **Rule of thumb:** Q4_K_M quantization uses roughly 0.5–0.6 GB per billion parameters for dense models; MoE models use much less because only a fraction of params activate per token.

---

## LiteLLM Integration

Both Ollama servers are configured as backends in LiteLLM on PCT 109. See [[AI Stack]] for the full model list and proxy config. Reference Ollama models in LiteLLM as their configured model names (e.g. `qwen3.6:27b`, `llama3.1:8b`).