Files
mGPUmanager/config/consumers.yaml
mAi 167999cecf build: deploy as systemd --user unit on mRock
Convention on mRock is user-units for ML services (whisper-server,
mvoice-launcher, comfyui as of today). Switching mGPUmanager too:

- systemd/mgpumanager.service: rewritten as a user unit (%h-based
  WorkingDirectory + ExecStart, WantedBy=default.target). Drops the
  ProtectSystem/ProtectHome hardening that came from the system-unit
  template — user units don't need it, and ProtectHome=read-only
  blocks a user unit's own working dir.
- Makefile deploy target: rsync to ~/.config/systemd/user/ on the
  remote and use systemctl --user, no sudo. README documents the
  lingering prerequisite (loginctl enable-linger m).
- config/consumers.yaml: bind on 0.0.0.0:8770 instead of localhost so
  mRiver / Tailscale peers can actually reach the broker.

Refs: m/mGPUmanager#1 (deploy task).
2026-05-15 16:50:04 +02:00

91 lines
2.0 KiB
YAML

listen: 0.0.0.0:8770
gpu:
total_mib: 16376 # RTX 4070 Ti SUPER
reserved_mib: 1024 # headroom for system/desktop
poll_interval_seconds: 2
routing:
tts: mvoice
stt: mvoice # whisper-server is alternative if explicitly requested
llm: ollama
image: comfyui
# Audio download proxy: any GET under audio_path_prefix is forwarded to this
# consumer at the same path. wa.sh fetches mvoice's generated WAV this way.
audio_proxy: mvoice
audio_path_prefix: /api/audio/
consumers:
mvoice:
url: http://localhost:8766
health:
method: GET
path: /api/health
paths:
tts:
method: POST
path: /api/synthesize
stt:
method: POST
path: /api/transcribe
vram_resident_mib: 2800
load:
method: POST
path: /api/admin/load
unload:
method: POST
path: /api/admin/unload
can_coexist_with: [whisper-server, ollama]
priority: 3
max_concurrency: 1
whisper-server:
url: http://localhost:8178
health:
method: GET
path: /
paths:
stt:
method: POST
path: /inference
vram_resident_mib: 2050
# No HTTP unload; mGPUmanager evicts via systemd restart (Schritt 5).
systemd_unit: whisper-server.service
can_coexist_with: [mvoice, ollama]
priority: 2
max_concurrency: 1
ollama:
url: http://localhost:11434
health:
method: GET
path: /api/tags
paths:
llm:
method: POST
path: /api/generate
# Ollama runs its own LRU keep_alive; we don't track resident VRAM.
vram_managed: true
can_coexist_with: [mvoice, whisper-server]
priority: 2
max_concurrency: 1
comfyui:
url: http://localhost:8188
health:
method: GET
path: /system_stats
paths:
image:
method: POST
path: /prompt
vram_resident_mib: 13000
unload:
method: POST
path: /api/free
body: '{"unload_models":true,"free_memory":true}'
can_coexist_with: []
priority: 1
max_concurrency: 1