Live deploy on mRock surfaced a Schritt 5 bug: comfyui was always
treated as preloaded at scheduler startup, which made ensureFits()
short-circuit on the very first /v1/image request — exactly the
scenario eviction is supposed to handle. mvoice was never picked as
a victim, ComfyUI then OOM'd loading FLUX on top of the still-resident
mvoice.
Fix: replace the blanket 'every consumer starts loaded' init with a
heuristic — initialLoaded(cons):
- VRAMManaged (ollama): true. We never track/evict it; the consumer
runs its own LRU.
- Load+Unload both present (mvoice): true. Designed to be controllable;
typically preloads in its own lifespan.
- Unload only, no Load (comfyui): false. Lazy — FLUX isn't resident
until the first /prompt, so we shouldn't bill its 13 GiB against the
GPU budget until then.
- SystemdUnit only (whisper-server): true. Always-on, model loaded at
process start.
- Empty: true. Safe fallback.
Verified live on mRock (2026-05-15):
Before /v1/image: nvidia-smi 8963 MiB used; mvoice gpu_resident_mib 2345
POST /v1/image: HTTP 400 from upstream (empty workflow), broker did
trigger eviction before forwarding
After: nvidia-smi 6547 MiB used; mvoice gpu_resident_mib 9
(~CUDA context only); scheduler.evictions = 2
POST /v1/tts: audio_url returned, tts_ms 670, audio 3.5 s
After reload: nvidia-smi 8943 MiB used; mvoice gpu_resident_mib 2917
Test: TestInitialLoadedHeuristic pins the four cases down so this
doesn't regress when someone adds a fifth consumer type.
Refs: m/mGPUmanager#1 (live deploy).