feat: Schritt 2 — mGPUmanager MVP routing + /v1/status
Go daemon listening on :8770 that fronts mvoice (8766), whisper-server
(8178), ollama (11434), comfyui (8188) behind a single /v1 façade.
What this MVP does:
- Loads config/consumers.yaml: routing table, per-consumer URL + health +
paths + vram_resident_mib + can_coexist_with + load/unload routes.
- Background health probe (5s) on every consumer; refuses fast with a
structured 503 if the last probe failed (no Felix-Banholzer-style
silent fallback).
- POST /v1/{tts,stt,llm,image} proxies the request body + Content-Type
to the routed consumer's path and streams the response back.
- GET /audio/* proxies to audio_proxy consumer (wa.sh fetches its WAV
this way).
- GET /v1/status exposes live GPU sample (nvidia-smi every 2s),
per-consumer health/loaded/gpu_resident_mib/active/total_requests,
scheduler stats.
- GET /healthz, GET / — broker liveness.
The Scheduler interface is in place but the implementation is
'Passthrough' — every job runs immediately, no lock, no queue. Schritt 4
replaces it with a serialising mutex; Schritt 5 adds VRAM-pressure
eviction. The interface boundary means server.go stays unchanged.
Out of scope here:
- Schritt 3: wa.sh migration (parallel work in mAi).
- Schritt 4: queue + global GPU lock.
- Schritt 5: nvidia-smi-driven LRU eviction.
Tests: config validation (good/bad), proxy forwards body, audio proxy
streams bytes, unhealthy consumer returns 503, /v1/status JSON shape.
Refs: m/mGPUmanager#1
This commit is contained in:
30
systemd/mgpumanager.service
Normal file
30
systemd/mgpumanager.service
Normal file
@@ -0,0 +1,30 @@
|
||||
[Unit]
|
||||
Description=mGPUmanager — GPU-Inference-Control-Plane for mRock
|
||||
Documentation=https://mgit.msbls.de/m/mGPUmanager
|
||||
After=network-online.target
|
||||
Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=m
|
||||
Group=m
|
||||
WorkingDirectory=/home/m/dev/mGPUmanager
|
||||
ExecStart=/home/m/dev/mGPUmanager/bin/mgpumanager \
|
||||
--config /home/m/dev/mGPUmanager/config/consumers.yaml \
|
||||
--log-level info
|
||||
Restart=on-failure
|
||||
RestartSec=3
|
||||
TimeoutStopSec=10
|
||||
|
||||
# Hardening — broker has no need for elevated capabilities.
|
||||
NoNewPrivileges=true
|
||||
PrivateTmp=true
|
||||
ProtectSystem=strict
|
||||
ProtectHome=read-only
|
||||
ReadWritePaths=/home/m/dev/mGPUmanager
|
||||
|
||||
# The broker only proxies; nvidia-smi is the only GPU-touching call.
|
||||
PrivateDevices=false
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
Reference in New Issue
Block a user