Add a generic GPU-lease primitive (acquire/renew/release) for long-running async consumers (ComfyUI/FLUX) #2
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Why
ImaGen's restyle/img2img OOMs on mRock because FLUX never goes through the broker — and the obvious fix (point ImaGen at
/v1/image) does not work. ImaGen's generation is a multi-step async cycle:POST /upload/image->POST /prompt(returns a prompt_id immediately) -> pollGET /history/{id}(up to 300s) ->GET /view. The broker holds the global GPU lock only for the duration of the proxied call (scheduler.Run(ctx, consumer, fn)), and/v1/image'sfnis just thePOST /promptproxy — which returns in ms. So the lock acquires, evicts, releases, and only THEN does FLUX render, unprotected. A TTS request can immediately reload and OOM-race it. Routing only/promptthrough the broker pays the eviction cost with none of the protection.The
docs/design.mdSchritt-6 note ("ImaGen base_url umstellen … One-Line-Config-Change") is stale — flagged here. The lock must be held across the whole generate-poll-fetch cycle.This is the cross-project counterpart to ImaGen #15 (design + ImaGen-side client are being built there in parallel). Full design + rationale: ImaGen
docs/design-broker-gpu-lease.md(branchmai/prometheus/design-route-comfyui).What to build — a generic GPU lease
A small, protocol-agnostic lease resource. The broker keeps doing exactly what it is good at (evict + ensureLoaded + hold the global lock); it stays ignorant of ComfyUI's wire format. The consumer (ImaGen) acquires a lease, runs its own multi-step cycle directly against its backend, then releases. Reusable by any future long-running GPU consumer (F5-TTS voice-clone, Furbotto, batch jobs).
Acquire
kindvia existingrouting.*(image -> comfyui). Acceptconsumeras a direct alternative.scheduler.Run(ctx, consumer, fn)wherefnblocks holding the lock until release or TTL. Eviction (evicting.go) + ensureLoaded + global-lock acquire are reused verbatim — see the implementation sketch in ImaGendocs/design-broker-gpu-lease.mdsection 3.5 (a LeaseManager goroutine per live lease; no Scheduler change needed, the scheduler's queued acquire is already ctx-cancellable).ttl_secondsserver-side (e.g. [10, 600]).wait_secondsmaps to the cancellable queue wait.{ token, consumer, granted_at, expires_at, ttl_seconds }.Renew (heartbeat)
Resets the safety expiry to now+ttl. Holder calls this every ~ttl/3 so a legitimate long generation never false-expires, while a crashed holder stops renewing and the lock frees within one TTL.
Release
Idempotent. Dropping the lock unblocks the next queued consumer immediately.
Behaviour change: fail the lease when VRAM cannot fit (m-approved)
Today
ensureFits(evicting.go:~776) logs"no eviction candidates"and returns nil — proceeds optimistically. For the lease path, ifensureFitsexhausts all evictable consumers and still does not fit (e.g. an untracked GPU app like a game holding VRAM), fail the acquire with a structured503 insufficient_vram(retryable:false) instead of granting a lease that is guaranteed to OOM. This converts an opaque downstreamtorch.OutOfMemoryErrorinto a clean broker rejection. The synchronous/v1/{tts,stt,llm,image}proxy path keeps its current optimistic behaviour — this stricter check is lease-specific.Stays the same
routing.image -> comfyui, thecomfyuiconsumer block (can_coexist_with: [],vram_resident_mib,/api/freeunload) — unchanged./v1/imagesynchronous proxy can stay (non-async callers) or be retired once ImaGen uses the lease — your call.Authorization: Bearer <env>hook reserved for Phase 2. Lease endpoints inherit the same posture.Nice-to-have
/v1/statusadditionally reports live lease holders (token, consumer, expires_at) som gpucan show "comfyui leased until …".Acceptance
insufficient_vramreturned (not granted) when nothing evictable makes FLUX fit.go build ./... && go test ./...clean.Refs
docs/design-broker-gpu-lease.mdsections 3 + 3.5.internal/scheduler/(Run, locked.go, evicting.go),config/consumers.yaml.shift-1 — lease primitive implemented ✅
Branch
mai/vulcan/add-generic-gpu-lease· commitf6b8b19·go build/vet/test ./...clean incl.-race.What landed
internal/scheduler/lease.go): goroutine-per-live-lease layer on top of the existingScheduler— noScheduler-interface change. The holder goroutine parks insideRunLease'sfn, holding the global GPU lock until release / TTL expiry / broker shutdown (Close). Token viacrypto/rand;ClampTTLenforces [10,600].Evicting.RunLease: strict-fit variant ofRun. When eviction exhausts every evictable consumer and the target still does not fit →ErrInsufficientVRAM(m-approved, lease-path only). The optimistic proxyRunis unchanged.Passthrough/Lockedalso satisfyLeaseScheduler(delegate toRun).internal/server/server.go):POST /v1/lease—kind|consumer,ttl_seconds,wait_seconds→{token, consumer, granted_at, expires_at, ttl_seconds}. Health-gated.503 insufficient_vram(retryable:false),503 scheduler_timeout(retryable:true).POST /v1/lease/{token}/renew— resets safety expiry;404 lease_unknown.DELETE /v1/lease/{token}— idempotent;{released}/{released:false, reason:"unknown_or_expired"}.GET /v1/statusnow lists live lease holders.main.gowires the LeaseManager +Close()on shutdown. README endpoint table + error codes updated.Acceptance criteria
insufficient_vramreturned (not granted) when nothing evictable makes the target fit; optimistic proxy path still proceeds.go build ./... && go test ./...clean (also-raceandgo vet).Tests
Lease lifecycle (acquire/release, lock serialization, wait-timeout, TTL auto-expiry, renew-prevents-expiry, unknown-token, Close-releases-holders), strict-fit fail-closed vs optimistic
Run, full HTTP flow incl. TTL clamp + status holder.Contract matches ImaGen
docs/design-broker-gpu-lease.md§3 + §3.5 — the ImaGen #15 client can integrate directly.Next
mrock:8770, verify restyle-while-TTS eviction with no OOM (comfyui.total_requests > 0, eviction recorded).Note for the head
The repo was committed not gofmt-clean (pre-existing struct-tag misalignments in
config.go,registry.go,scheduler.go, …). I kept my own additions gofmt-clean but did not reformat the pre-existing files —gofmtwould even mangle a doc comment inlocked.go, and a blanket reformat would create noise + conflict with parallel shifts. A separategofmt -w ./...cleanup commit could be filed if desired.Merged (
4d69b2b) + deployed to mrock:8770, verified live. The generic GPU lease (POST /v1/lease/ renew /DELETE) is in production. End-to-end test from ImaGen #15: an img2img request acquired a lease, the broker evicted every evictable consumer (mvoice/whisper/ollama), and — because an untracked game (BG3) held VRAM that nothing could evict — returned503 insufficient_vramrather than granting a doomed lease. The ImaGen client surfaced it as a clean error. Lease acquire + LRU eviction + the lease-pathinsufficient_vramfail-fast all confirmed working against a real consumer.Note: deployed by binary-copy to
~/dev/mGPUmanager/bin/mgpumanageron mRock (that dir is a non-git deploy copy) +systemctl --user restart mgpumanager.service. Old binary backed up tobin/mgpumanager.bak-pre-lease.