feat: Schritt 4 — Locked scheduler (global GPU lock, queue, stats)
Replaces the MVP Passthrough with scheduler.Locked: a capacity-1 channel serialises every consumer's GPU work end-to-end. main.go switches to it. Behavioural contract: - Jobs that arrive while another job holds the GPU block on the channel until the holder finishes. Context cancellation aborts the wait cleanly (no leaked tokens, queue depth decremented). - Stats track queue_depth, in_flight, total_jobs, last_wait_ms, last_run_ms, oldest_queued — surfaced through /v1/status. - One lock for ALL consumers (not per-consumer): the design (§4.3) is explicit that grobgranular > GPU-stream-granular on single-GPU single-user hardware. mvoice + ollama + comfyui never run truly concurrently any more, which is the whole point — that's what produced the CUDA-OOM under load. Tests: - 5 goroutines hammer the scheduler concurrently → max in-flight = 1. - Cancellation while parked on the lock returns ctx.Err() and frees the queue slot. - Stats reflect in-flight + queue-depth transitions correctly. - Race detector clean. Schritt 5 will compose this with VRAM-pressure eviction: before acquiring the lock, check if the target consumer's resident cost fits under the current GPU headroom; if not, unload the LRU non-coexistent consumer first. Refs: m/mGPUmanager#1 (Schritt 4).
This commit is contained in:
@@ -61,7 +61,9 @@ func main() {
|
||||
|
||||
reg := registry.New(cfg, logger.With("component", "registry"))
|
||||
gpuPoller := gpu.NewPoller(cfg.GPU.PollInterval(), logger.With("component", "gpu"))
|
||||
sched := scheduler.NewPassthrough(reg)
|
||||
// Phase 1 always runs a single-slot global GPU lock. Schritt 5's
|
||||
// eviction-aware scheduler wraps this same lock with VRAM pressure logic.
|
||||
sched := scheduler.NewLocked(reg, 1)
|
||||
|
||||
go reg.Run(ctx)
|
||||
go gpuPoller.Run(ctx)
|
||||
|
||||
Reference in New Issue
Block a user