Broker reports per-consumer gpu_resident_mib=0 for externally-started consumers → eviction finds "no candidates" and never reclaims their VRAM #4

Open
opened 2026-06-07 13:12:26 +00:00 by mAi · 0 comments

Symptom

A /v1/lease (kind=image) acquire fails with insufficient_vram even when an idle consumer is holding evictable VRAM. Broker log:

level=WARN msg="no eviction candidates" target=comfyui need_mib=13000 free_mib=10590 strict=true
POST /v1/lease status=503 (insufficient_vram)

Meanwhile nvidia-smi shows whisper-server holding 1996 MiB — clearly evictable (comfyui can_coexist_with: []). But /v1/status reports it as not-resident:

whisper-server  loaded=true  gpu_resident_mib=0   <- actually holds 1996 MiB
mvoice          loaded=false gpu_resident_mib=0
comfyui         loaded=true  gpu_resident_mib=0

Every consumer reads gpu_resident_mib=0, so eviction-candidate selection (which picks consumers with resident VRAM) finds nothing and the broker can't free space — even though stopping whisper manually frees ~2 GB and lets the lease grant.

Likely cause

The broker attributes VRAM per-consumer from something other than live nvidia-smi per-PID usage (e.g. a load/unload bookkeeping counter that is only updated when the broker itself loads/unloads a consumer). Consumers started outside the broker (whisper via its own systemd unit, mvoice/ollama pre-resident) are tracked as resident=0, so the broker won't evict them. After a broker restart this affects everything. knuth's May deploy verified eviction when the broker itself had loaded mvoice — the regression shows when consumers are externally resident.

Fix direction

Drive eviction-candidate residency from live per-PID nvidia-smi (map each consumer's process/port to its actual GPU memory), not only the broker's own load/unload bookkeeping. A consumer holding real VRAM must be an eviction candidate regardless of who started it. whisper-server in particular has no HTTP unload — evict it via its systemd_unit (stop, not restart; restart reloads the model and frees nothing).

Also (smaller)

  • comfyui.vram_resident_mib was 13000, which is larger than the realistically-reclaimable free VRAM on a desktop (Brave/Wayland/etc. hold ~1.5 GB that is not evictable). Lowered to 11000 in config/consumers.yaml during debugging — FLUX schnell fp8 generated fine at that budget (ComfyUI offloads T5/CLIP to RAM). Validate/keep this value.
  • whisper eviction must use systemctl stop, not restart.

Impact

Until fixed, ImaGen restyle (which acquires a lease via #15-on-the-ImaGen-side) only succeeds when the GPU is already fairly clear; the broker will not auto-reclaim idle services that it didn't itself load. Manual systemctl --user stop whisper-server was the workaround to land the first successful restyle (image produced, lineage correct).

Refs

  • ImaGen #15 (the lease consumer), mGPUmanager #2 (the lease primitive)
  • internal/scheduler/evicting.go (ensureFits / candidate selection), internal/gpu (nvidia-smi poller)
## Symptom A `/v1/lease` (kind=image) acquire fails with `insufficient_vram` even when an idle consumer is holding evictable VRAM. Broker log: ``` level=WARN msg="no eviction candidates" target=comfyui need_mib=13000 free_mib=10590 strict=true POST /v1/lease status=503 (insufficient_vram) ``` Meanwhile `nvidia-smi` shows `whisper-server` holding **1996 MiB** — clearly evictable (comfyui `can_coexist_with: []`). But `/v1/status` reports it as not-resident: ``` whisper-server loaded=true gpu_resident_mib=0 <- actually holds 1996 MiB mvoice loaded=false gpu_resident_mib=0 comfyui loaded=true gpu_resident_mib=0 ``` Every consumer reads `gpu_resident_mib=0`, so eviction-candidate selection (which picks consumers with resident VRAM) finds nothing and the broker can't free space — even though stopping whisper manually frees ~2 GB and lets the lease grant. ## Likely cause The broker attributes VRAM per-consumer from something other than live `nvidia-smi` per-PID usage (e.g. a load/unload bookkeeping counter that is only updated when the broker itself loads/unloads a consumer). Consumers started **outside** the broker (whisper via its own systemd unit, mvoice/ollama pre-resident) are tracked as `resident=0`, so the broker won't evict them. After a broker restart this affects everything. knuth's May deploy verified eviction when the broker itself had loaded mvoice — the regression shows when consumers are externally resident. ## Fix direction Drive eviction-candidate residency from **live per-PID nvidia-smi** (map each consumer's process/port to its actual GPU memory), not only the broker's own load/unload bookkeeping. A consumer holding real VRAM must be an eviction candidate regardless of who started it. whisper-server in particular has no HTTP unload — evict it via its `systemd_unit` (stop, not restart; restart reloads the model and frees nothing). ## Also (smaller) - `comfyui.vram_resident_mib` was 13000, which is larger than the realistically-reclaimable free VRAM on a desktop (Brave/Wayland/etc. hold ~1.5 GB that is not evictable). Lowered to **11000** in config/consumers.yaml during debugging — FLUX schnell fp8 generated fine at that budget (ComfyUI offloads T5/CLIP to RAM). Validate/keep this value. - whisper eviction must use `systemctl stop`, not `restart`. ## Impact Until fixed, ImaGen restyle (which acquires a lease via #15-on-the-ImaGen-side) only succeeds when the GPU is already fairly clear; the broker will not auto-reclaim idle services that it didn't itself load. Manual `systemctl --user stop whisper-server` was the workaround to land the first successful restyle (image produced, lineage correct). ## Refs - ImaGen #15 (the lease consumer), mGPUmanager #2 (the lease primitive) - internal/scheduler/evicting.go (ensureFits / candidate selection), internal/gpu (nvidia-smi poller)
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: m/mGPUmanager#4
No description provided.