Add a generic GPU-lease primitive (acquire/renew/release) for long-running async consumers (ComfyUI/FLUX) #2

New Issue

mAi · 2026-06-07T08:49:29Z

mAi commented

2026-06-07 08:49:29 +00:00

Why

ImaGen's restyle/img2img OOMs on mRock because FLUX never goes through the broker — and the obvious fix (point ImaGen at /v1/image) does not work. ImaGen's generation is a multi-step async cycle: POST /upload/image -> POST /prompt (returns a prompt_id immediately) -> poll GET /history/{id} (up to 300s) -> GET /view. The broker holds the global GPU lock only for the duration of the proxied call (scheduler.Run(ctx, consumer, fn)), and /v1/image's fn is just the POST /prompt proxy — which returns in ms. So the lock acquires, evicts, releases, and only THEN does FLUX render, unprotected. A TTS request can immediately reload and OOM-race it. Routing only /prompt through the broker pays the eviction cost with none of the protection.

The docs/design.md Schritt-6 note ("ImaGen base_url umstellen … One-Line-Config-Change") is stale — flagged here. The lock must be held across the whole generate-poll-fetch cycle.

This is the cross-project counterpart to ImaGen #15 (design + ImaGen-side client are being built there in parallel). Full design + rationale: ImaGen docs/design-broker-gpu-lease.md (branch mai/prometheus/design-route-comfyui).

What to build — a generic GPU lease

A small, protocol-agnostic lease resource. The broker keeps doing exactly what it is good at (evict + ensureLoaded + hold the global lock); it stays ignorant of ComfyUI's wire format. The consumer (ImaGen) acquires a lease, runs its own multi-step cycle directly against its backend, then releases. Reusable by any future long-running GPU consumer (F5-TTS voice-clone, Furbotto, batch jobs).

Acquire

POST /v1/lease   { "kind": "image", "ttl_seconds": 120, "wait_seconds": 120 }

Resolve kind via existing routing.* (image -> comfyui). Accept consumer as a direct alternative.
Run the EXISTING scheduler.Run(ctx, consumer, fn) where fn blocks holding the lock until release or TTL. Eviction (evicting.go) + ensureLoaded + global-lock acquire are reused verbatim — see the implementation sketch in ImaGen docs/design-broker-gpu-lease.md section 3.5 (a LeaseManager goroutine per live lease; no Scheduler change needed, the scheduler's queued acquire is already ctx-cancellable).
Clamp ttl_seconds server-side (e.g. [10, 600]). wait_seconds maps to the cancellable queue wait.
Returns: { token, consumer, granted_at, expires_at, ttl_seconds }.

Renew (heartbeat)

POST /v1/lease/{token}/renew   -> 200 { token, expires_at: now+ttl, ttl_seconds } | 404 lease_unknown

Resets the safety expiry to now+ttl. Holder calls this every ~ttl/3 so a legitimate long generation never false-expires, while a crashed holder stops renewing and the lock frees within one TTL.

Release

DELETE /v1/lease/{token}   -> 200 { released: true } | 200 { released:false, reason:"unknown_or_expired" }

Idempotent. Dropping the lock unblocks the next queued consumer immediately.

Behaviour change: fail the lease when VRAM cannot fit (m-approved)

Today ensureFits (evicting.go:~776) logs "no eviction candidates" and returns nil — proceeds optimistically. For the lease path, if ensureFits exhausts all evictable consumers and still does not fit (e.g. an untracked GPU app like a game holding VRAM), fail the acquire with a structured 503 insufficient_vram (retryable:false) instead of granting a lease that is guaranteed to OOM. This converts an opaque downstream torch.OutOfMemoryError into a clean broker rejection. The synchronous /v1/{tts,stt,llm,image} proxy path keeps its current optimistic behaviour — this stricter check is lease-specific.

Stays the same

routing.image -> comfyui, the comfyui consumer block (can_coexist_with: [], vram_resident_mib, /api/free unload) — unchanged.
The existing /v1/image synchronous proxy can stay (non-async callers) or be retired once ImaGen uses the lease — your call.
Auth: Tailscale boundary, no token now; Authorization: Bearer <env> hook reserved for Phase 2. Lease endpoints inherit the same posture.

Nice-to-have

/v1/status additionally reports live lease holders (token, consumer, expires_at) so m gpu can show "comfyui leased until …".

Acceptance

A held lease keeps the GPU lock across an arbitrary caller-controlled window (acquire ... renew ... release), evicting non-coexistent consumers for its duration.
Crashed holder (stops renewing) -> lock auto-frees within one TTL.
insufficient_vram returned (not granted) when nothing evictable makes FLUX fit.
Go, tests next to packages, go build ./... && go test ./... clean.

Refs

ImaGen #15 (design + ImaGen-side lease client, in parallel). Design doc: ImaGen docs/design-broker-gpu-lease.md sections 3 + 3.5.
mGPUmanager: internal/scheduler/ (Run, locked.go, evicting.go), config/consumers.yaml.

## Why ImaGen's restyle/img2img OOMs on mRock because FLUX never goes through the broker — and the obvious fix (point ImaGen at `/v1/image`) does **not** work. ImaGen's generation is a multi-step async cycle: `POST /upload/image` -> `POST /prompt` (returns a prompt_id immediately) -> poll `GET /history/{id}` (up to 300s) -> `GET /view`. The broker holds the global GPU lock only for the duration of the proxied call (`scheduler.Run(ctx, consumer, fn)`), and `/v1/image`'s `fn` is just the `POST /prompt` proxy — which returns in ms. So the lock acquires, evicts, releases, and only THEN does FLUX render, unprotected. A TTS request can immediately reload and OOM-race it. Routing only `/prompt` through the broker pays the eviction cost with none of the protection. The `docs/design.md` Schritt-6 note ("ImaGen base_url umstellen … One-Line-Config-Change") is **stale** — flagged here. The lock must be held across the whole generate-poll-fetch cycle. This is the cross-project counterpart to **ImaGen #15** (design + ImaGen-side client are being built there in parallel). Full design + rationale: ImaGen `docs/design-broker-gpu-lease.md` (branch `mai/prometheus/design-route-comfyui`). ## What to build — a generic GPU lease A small, protocol-agnostic lease resource. The broker keeps doing exactly what it is good at (evict + ensureLoaded + hold the global lock); it stays ignorant of ComfyUI's wire format. The consumer (ImaGen) acquires a lease, runs its own multi-step cycle directly against its backend, then releases. Reusable by any future long-running GPU consumer (F5-TTS voice-clone, Furbotto, batch jobs). ### Acquire ``` POST /v1/lease { "kind": "image", "ttl_seconds": 120, "wait_seconds": 120 } ``` - Resolve `kind` via existing `routing.*` (image -> comfyui). Accept `consumer` as a direct alternative. - Run the EXISTING `scheduler.Run(ctx, consumer, fn)` where `fn` blocks holding the lock until release or TTL. Eviction (`evicting.go`) + ensureLoaded + global-lock acquire are reused verbatim — see the implementation sketch in ImaGen `docs/design-broker-gpu-lease.md` section 3.5 (a LeaseManager goroutine per live lease; no Scheduler change needed, the scheduler's queued acquire is already ctx-cancellable). - Clamp `ttl_seconds` server-side (e.g. [10, 600]). `wait_seconds` maps to the cancellable queue wait. - Returns: `{ token, consumer, granted_at, expires_at, ttl_seconds }`. ### Renew (heartbeat) ``` POST /v1/lease/{token}/renew -> 200 { token, expires_at: now+ttl, ttl_seconds } | 404 lease_unknown ``` Resets the safety expiry to now+ttl. Holder calls this every ~ttl/3 so a legitimate long generation never false-expires, while a crashed holder stops renewing and the lock frees within one TTL. ### Release ``` DELETE /v1/lease/{token} -> 200 { released: true } | 200 { released:false, reason:"unknown_or_expired" } ``` Idempotent. Dropping the lock unblocks the next queued consumer immediately. ### Behaviour change: fail the lease when VRAM cannot fit (m-approved) Today `ensureFits` (`evicting.go:~776`) logs `"no eviction candidates"` and returns nil — proceeds optimistically. For the **lease** path, if `ensureFits` exhausts all evictable consumers and still does not fit (e.g. an untracked GPU app like a game holding VRAM), **fail the acquire** with a structured `503 insufficient_vram` (`retryable:false`) instead of granting a lease that is guaranteed to OOM. This converts an opaque downstream `torch.OutOfMemoryError` into a clean broker rejection. The synchronous `/v1/{tts,stt,llm,image}` proxy path keeps its current optimistic behaviour — this stricter check is lease-specific. ## Stays the same - `routing.image -> comfyui`, the `comfyui` consumer block (`can_coexist_with: []`, `vram_resident_mib`, `/api/free` unload) — unchanged. - The existing `/v1/image` synchronous proxy can stay (non-async callers) or be retired once ImaGen uses the lease — your call. - Auth: Tailscale boundary, no token now; `Authorization: Bearer <env>` hook reserved for Phase 2. Lease endpoints inherit the same posture. ## Nice-to-have `/v1/status` additionally reports live lease holders (token, consumer, expires_at) so `m gpu` can show "comfyui leased until …". ## Acceptance - A held lease keeps the GPU lock across an arbitrary caller-controlled window (acquire ... renew ... release), evicting non-coexistent consumers for its duration. - Crashed holder (stops renewing) -> lock auto-frees within one TTL. - `insufficient_vram` returned (not granted) when nothing evictable makes FLUX fit. - Go, tests next to packages, `go build ./... && go test ./...` clean. ## Refs - ImaGen #15 (design + ImaGen-side lease client, in parallel). Design doc: ImaGen `docs/design-broker-gpu-lease.md` sections 3 + 3.5. - mGPUmanager: `internal/scheduler/` (Run, locked.go, evicting.go), `config/consumers.yaml`.

m referenced this issue from a commit

2026-06-07 09:03:21 +00:00

feat(lease): generic GPU-lease primitive (acquire/renew/release) — #2

mAi commented

2026-06-07 09:04:23 +00:00

shift-1 — lease primitive implemented ✅

Branch mai/vulcan/add-generic-gpu-lease · commit f6b8b19 · go build/vet/test ./... clean incl. -race.

What landed

LeaseManager (internal/scheduler/lease.go): goroutine-per-live-lease layer on top of the existing Scheduler — no Scheduler-interface change. The holder goroutine parks inside RunLease's fn, holding the global GPU lock until release / TTL expiry / broker shutdown (Close). Token via crypto/rand; ClampTTL enforces [10,600].
Evicting.RunLease: strict-fit variant of Run. When eviction exhausts every evictable consumer and the target still does not fit → ErrInsufficientVRAM (m-approved, lease-path only). The optimistic proxy Run is unchanged. Passthrough/Locked also satisfy LeaseScheduler (delegate to Run).
HTTP (internal/server/server.go):
- POST /v1/lease — kind|consumer, ttl_seconds, wait_seconds → {token, consumer, granted_at, expires_at, ttl_seconds}. Health-gated. 503 insufficient_vram (retryable:false), 503 scheduler_timeout (retryable:true).
- POST /v1/lease/{token}/renew — resets safety expiry; 404 lease_unknown.
- DELETE /v1/lease/{token} — idempotent; {released} / {released:false, reason:"unknown_or_expired"}.
- GET /v1/status now lists live lease holders.
main.go wires the LeaseManager + Close() on shutdown. README endpoint table + error codes updated.

Acceptance criteria

✅ Held lease keeps the GPU lock across an arbitrary acquire…renew…release window, evicting non-coexistent consumers for its duration.
✅ Crashed holder (stops renewing) → lock auto-frees within one TTL (TTL-expiry test).
✅ insufficient_vram returned (not granted) when nothing evictable makes the target fit; optimistic proxy path still proceeds.
✅ go build ./... && go test ./... clean (also -race and go vet).

Tests

Lease lifecycle (acquire/release, lock serialization, wait-timeout, TTL auto-expiry, renew-prevents-expiry, unknown-token, Close-releases-holders), strict-fit fail-closed vs optimistic Run, full HTTP flow incl. TTL clamp + status holder.

Contract matches ImaGen docs/design-broker-gpu-lease.md §3 + §3.5 — the ImaGen #15 client can integrate directly.

Note for the head

The repo was committed not gofmt-clean (pre-existing struct-tag misalignments in config.go, registry.go, scheduler.go, …). I kept my own additions gofmt-clean but did not reformat the pre-existing files — gofmt would even mangle a doc comment in locked.go, and a blanket reformat would create noise + conflict with parallel shifts. A separate gofmt -w ./... cleanup commit could be filed if desired.

## shift-1 — lease primitive implemented ✅ Branch `mai/vulcan/add-generic-gpu-lease` · commit `f6b8b19` · `go build/vet/test ./...` clean incl. `-race`. ### What landed - **LeaseManager** (`internal/scheduler/lease.go`): goroutine-per-live-lease layer on top of the **existing** `Scheduler` — no `Scheduler`-interface change. The holder goroutine parks inside `RunLease`'s `fn`, holding the global GPU lock until release / TTL expiry / broker shutdown (`Close`). Token via `crypto/rand`; `ClampTTL` enforces [10,600]. - **`Evicting.RunLease`**: strict-fit variant of `Run`. When eviction exhausts every evictable consumer and the target still does not fit → `ErrInsufficientVRAM` (m-approved, **lease-path only**). The optimistic proxy `Run` is unchanged. `Passthrough`/`Locked` also satisfy `LeaseScheduler` (delegate to `Run`). - **HTTP** (`internal/server/server.go`): - `POST /v1/lease` — `kind|consumer`, `ttl_seconds`, `wait_seconds` → `{token, consumer, granted_at, expires_at, ttl_seconds}`. Health-gated. `503 insufficient_vram` (retryable:false), `503 scheduler_timeout` (retryable:true). - `POST /v1/lease/{token}/renew` — resets safety expiry; `404 lease_unknown`. - `DELETE /v1/lease/{token}` — idempotent; `{released}` / `{released:false, reason:"unknown_or_expired"}`. - `GET /v1/status` now lists live lease holders. - `main.go` wires the LeaseManager + `Close()` on shutdown. README endpoint table + error codes updated. ### Acceptance criteria - ✅ Held lease keeps the GPU lock across an arbitrary acquire…renew…release window, evicting non-coexistent consumers for its duration. - ✅ Crashed holder (stops renewing) → lock auto-frees within one TTL (TTL-expiry test). - ✅ `insufficient_vram` returned (not granted) when nothing evictable makes the target fit; optimistic proxy path still proceeds. - ✅ `go build ./... && go test ./...` clean (also `-race` and `go vet`). ### Tests Lease lifecycle (acquire/release, lock serialization, wait-timeout, TTL auto-expiry, renew-prevents-expiry, unknown-token, Close-releases-holders), strict-fit fail-closed vs optimistic `Run`, full HTTP flow incl. TTL clamp + status holder. Contract matches ImaGen `docs/design-broker-gpu-lease.md` §3 + §3.5 — the **ImaGen #15** client can integrate directly. ### Next 1. ImaGen #15 client integration against the deployed broker. 2. Deploy to `mrock:8770`, verify restyle-while-TTS eviction with no OOM (`comfyui.total_requests > 0`, eviction recorded). ### Note for the head The repo was committed **not gofmt-clean** (pre-existing struct-tag misalignments in `config.go`, `registry.go`, `scheduler.go`, …). I kept my own additions gofmt-clean but did **not** reformat the pre-existing files — `gofmt` would even mangle a doc comment in `locked.go`, and a blanket reformat would create noise + conflict with parallel shifts. A separate `gofmt -w ./...` cleanup commit could be filed if desired.

m referenced this issue from a commit

2026-06-07 09:06:14 +00:00

Merge mai/vulcan/add-generic-gpu-lease: #2 generic GPU-lease primitive

mAi commented

2026-06-07 09:09:45 +00:00

Merged (4d69b2b) + deployed to mrock:8770, verified live. The generic GPU lease (POST /v1/lease / renew / DELETE) is in production. End-to-end test from ImaGen #15: an img2img request acquired a lease, the broker evicted every evictable consumer (mvoice/whisper/ollama), and — because an untracked game (BG3) held VRAM that nothing could evict — returned 503 insufficient_vram rather than granting a doomed lease. The ImaGen client surfaced it as a clean error. Lease acquire + LRU eviction + the lease-path insufficient_vram fail-fast all confirmed working against a real consumer.

Note: deployed by binary-copy to ~/dev/mGPUmanager/bin/mgpumanager on mRock (that dir is a non-git deploy copy) + systemctl --user restart mgpumanager.service. Old binary backed up to bin/mgpumanager.bak-pre-lease.

**Merged (`4d69b2b`) + deployed to mrock:8770, verified live.** The generic GPU lease (`POST /v1/lease` / renew / `DELETE`) is in production. End-to-end test from ImaGen #15: an img2img request acquired a lease, the broker evicted every evictable consumer (mvoice/whisper/ollama), and — because an untracked game (BG3) held VRAM that nothing could evict — returned `503 insufficient_vram` rather than granting a doomed lease. The ImaGen client surfaced it as a clean error. Lease acquire + LRU eviction + the lease-path `insufficient_vram` fail-fast all confirmed working against a real consumer. Note: deployed by binary-copy to `~/dev/mGPUmanager/bin/mgpumanager` on mRock (that dir is a non-git deploy copy) + `systemctl --user restart mgpumanager.service`. Old binary backed up to `bin/mgpumanager.bak-pre-lease`.

mAi referenced this issue

2026-06-07 13:12:26 +00:00

Broker reports per-consumer gpu_resident_mib=0 for externally-started consumers → eviction finds "no candidates" and never reclaims their VRAM #4

mAi referenced this issue

2026-07-10 10:51:06 +00:00

Broker must be the SOLE VRAM authority — close bypass gaps so out-of-band ollama holds get evicted (imagen starved by 24h keep-alive) #6

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: m/mGPUmanager#2