mGPUmanager

m/mGPUmanager

Fork 0

Commit Graph

Author	SHA1	Message	Date
mAi	167999cecf	build: deploy as systemd --user unit on mRock Convention on mRock is user-units for ML services (whisper-server, mvoice-launcher, comfyui as of today). Switching mGPUmanager too: - systemd/mgpumanager.service: rewritten as a user unit (%h-based WorkingDirectory + ExecStart, WantedBy=default.target). Drops the ProtectSystem/ProtectHome hardening that came from the system-unit template — user units don't need it, and ProtectHome=read-only blocks a user unit's own working dir. - Makefile deploy target: rsync to ~/.config/systemd/user/ on the remote and use systemctl --user, no sudo. README documents the lingering prerequisite (loginctl enable-linger m). - config/consumers.yaml: bind on 0.0.0.0:8770 instead of localhost so mRiver / Tailscale peers can actually reach the broker. Refs: m/mGPUmanager#1 (deploy task).	2026-05-15 16:50:04 +02:00
mAi	c81c145163	feat: Schritt 2 — mGPUmanager MVP routing + /v1/status Go daemon listening on :8770 that fronts mvoice (8766), whisper-server (8178), ollama (11434), comfyui (8188) behind a single /v1 façade. What this MVP does: - Loads config/consumers.yaml: routing table, per-consumer URL + health + paths + vram_resident_mib + can_coexist_with + load/unload routes. - Background health probe (5s) on every consumer; refuses fast with a structured 503 if the last probe failed (no Felix-Banholzer-style silent fallback). - POST /v1/{tts,stt,llm,image} proxies the request body + Content-Type to the routed consumer's path and streams the response back. - GET /audio/* proxies to audio_proxy consumer (wa.sh fetches its WAV this way). - GET /v1/status exposes live GPU sample (nvidia-smi every 2s), per-consumer health/loaded/gpu_resident_mib/active/total_requests, scheduler stats. - GET /healthz, GET / — broker liveness. The Scheduler interface is in place but the implementation is 'Passthrough' — every job runs immediately, no lock, no queue. Schritt 4 replaces it with a serialising mutex; Schritt 5 adds VRAM-pressure eviction. The interface boundary means server.go stays unchanged. Out of scope here: - Schritt 3: wa.sh migration (parallel work in mAi). - Schritt 4: queue + global GPU lock. - Schritt 5: nvidia-smi-driven LRU eviction. Tests: config validation (good/bad), proxy forwards body, audio proxy streams bytes, unhealthy consumer returns 503, /v1/status JSON shape. Refs: m/mGPUmanager#1	2026-05-11 13:30:17 +02:00

Author

SHA1

Message

Date

mAi

167999cecf

build: deploy as systemd --user unit on mRock

Convention on mRock is user-units for ML services (whisper-server,
mvoice-launcher, comfyui as of today). Switching mGPUmanager too:

- systemd/mgpumanager.service: rewritten as a user unit (%h-based
  WorkingDirectory + ExecStart, WantedBy=default.target). Drops the
  ProtectSystem/ProtectHome hardening that came from the system-unit
  template — user units don't need it, and ProtectHome=read-only
  blocks a user unit's own working dir.
- Makefile deploy target: rsync to ~/.config/systemd/user/ on the
  remote and use systemctl --user, no sudo. README documents the
  lingering prerequisite (loginctl enable-linger m).
- config/consumers.yaml: bind on 0.0.0.0:8770 instead of localhost so
  mRiver / Tailscale peers can actually reach the broker.

Refs: m/mGPUmanager#1 (deploy task).

2026-05-15 16:50:04 +02:00

mAi

c81c145163

feat: Schritt 2 — mGPUmanager MVP routing + /v1/status

Go daemon listening on :8770 that fronts mvoice (8766), whisper-server
(8178), ollama (11434), comfyui (8188) behind a single /v1 façade.

What this MVP does:
- Loads config/consumers.yaml: routing table, per-consumer URL + health +
  paths + vram_resident_mib + can_coexist_with + load/unload routes.
- Background health probe (5s) on every consumer; refuses fast with a
  structured 503 if the last probe failed (no Felix-Banholzer-style
  silent fallback).
- POST /v1/{tts,stt,llm,image} proxies the request body + Content-Type
  to the routed consumer's path and streams the response back.
- GET /audio/* proxies to audio_proxy consumer (wa.sh fetches its WAV
  this way).
- GET /v1/status exposes live GPU sample (nvidia-smi every 2s),
  per-consumer health/loaded/gpu_resident_mib/active/total_requests,
  scheduler stats.
- GET /healthz, GET / — broker liveness.

The Scheduler interface is in place but the implementation is
'Passthrough' — every job runs immediately, no lock, no queue. Schritt 4
replaces it with a serialising mutex; Schritt 5 adds VRAM-pressure
eviction. The interface boundary means server.go stays unchanged.

Out of scope here:
- Schritt 3: wa.sh migration (parallel work in mAi).
- Schritt 4: queue + global GPU lock.
- Schritt 5: nvidia-smi-driven LRU eviction.

Tests: config validation (good/bad), proxy forwards body, audio proxy
streams bytes, unhealthy consumer returns 503, /v1/status JSON shape.

Refs: m/mGPUmanager#1

2026-05-11 13:30:17 +02:00

2 Commits