mAi: #8 - imagen.jobs queue + worker subcommand (flexsiebels write path)

Async write path for the flexsiebels owner-mode UI: flexsiebels INSERTs into imagen.jobs, the worker on mRiver claims pending rows via LISTEN/NOTIFY + 5s safety poll, runs the same generate pipeline imagen generate uses, and writes the result through internal/cloud into imagen.images. - Schema migration imagen_jobs_init: table + status CHECK + two indexes + owner-scoped RLS + grants + AFTER INSERT trigger publishing on the imagen_jobs channel via pg_notify. - internal/worker: DB-agnostic loop over a Queue interface. Drains the whole pending backlog on each wake. Job-scoped contexts are derived from Background so SIGTERM lets the in-flight generation finish (no half-state). ResetStaleRunning at startup unsticks rows left over from a previous crash. Eight unit tests cover the done / failed / missing-id / drain / NOTIFY-wake / shutdown / transient-error paths against a fake queue (no real Postgres in CI). - cmd/imagen/worker.go: pgx-backed Queue (one dedicated conn for LISTEN + UPDATE), plus the workerPipeline that reuses buildBackend + attachUsageSink + prompt.Apply + buildWriter + maybeCloudSync. The per-job owner_user_id overrides the env-level fallback so each row in imagen.images is attributed correctly. - maybeCloudSync now returns (*cloud.SyncResult, error) so the worker can link imagen.jobs.image_id to the inserted imagen.images row. The CLI generate path keeps printing its stderr summary unchanged. - scripts/imagen-worker.service + .env.example for the systemd --user unit on mRiver. EnvironmentFile lives in ~/.dotfiles and is never committed. - docs/setup-worker-mriver.md walks through installation + the spec's SQL-INSERT smoke; docs/architecture.md grows an "async write path" section. - worker_integration_test.go (env-guarded by IMAGEN_WORKER_INTEGRATION=1) drives one real job through the full pipeline against msupabase using the mock backend, then verifies imagen.images + Storage object landed and the row flipped to done with image_id linked. Verified end-to-end: pickup latency ~7ms, total 74ms, failure path captures error text.
2026-05-11 10:23:33 +02:00
parent cb6656c436
commit 2758c5a500
13 changed files with 1205 additions and 27 deletions
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -7,7 +7,7 @@ upstream API. Each adapter only ever sees its own slice of `imagen.yaml`.

 ```
        ┌───────────────────────┐
-        │   cmd/imagen          │   CLI dispatch
+        │   cmd/imagen          │   CLI dispatch (generate / worker / …)
        │   (or HTTP server)    │
        └──────────┬────────────┘
                   │
@@ -18,6 +18,7 @@ upstream API. Each adapter only ever sees its own slice of `imagen.yaml`.
        │   internal/preview    │   tmux-img window spawner
        │   internal/cloud      │   Supabase Storage + imagen.images
        │   internal/usage      │   mai.imagen_usage cost-tracking
+        │   internal/worker     │   imagen.jobs queue consumer
        └──────────┬────────────┘
                   │
        ┌──────────▼────────────┐
@@ -105,9 +106,37 @@ contains the prompt, backend instance name, seed, ISO timestamp, and the
 - Network errors during `Generate` — wrap and return; no retry policy yet
  (decide per-adapter, or move to a shared retry helper if a pattern emerges).

+## Async write path: `imagen worker` + `imagen.jobs`
+
+`imagen generate` is the synchronous CLI. For web callers (flexsiebels'
+owner-mode UI) `cmd/imagen worker` runs as a daemon that consumes the
+`imagen.jobs` table.
+
+```
+flexsiebels POST          imagen worker (mRiver, systemd)
+  → INSERT INTO              LISTEN imagen_jobs  ◄── pg_notify trigger
+    imagen.jobs(pending)     claim row (UPDATE … RETURNING)
+                             dispatch through internal/backend
+                             write disk + cloud-sync via internal/cloud
+                             UPDATE imagen.jobs SET status='done', image_id=…
+```
+
+The queue table lives next to `imagen.images` in the same `imagen` schema.
+Owner-scoped RLS lets the flexsiebels user INSERT + read their own rows;
+the worker writes (status updates + image_id link) via service-role which
+bypasses RLS. A 5-second safety poll fires on every wake-up to cover
+dropped NOTIFY events and worker cold starts with a non-empty queue. See
+`docs/setup-worker-mriver.md` for the systemd installation.
+
+The worker reuses `internal/backend`, `internal/output`, and
+`internal/cloud` unchanged — it is purely an orchestration layer around
+the same pipeline `imagen generate` drives.
+
 ## Out of scope (today)

 - Image post-processing (cropping, watermarking).
- Cost-tracking (lands with the Replicate adapter, since only API backends bill).
 - Multi-image `n>1` per request — backends that support it can expose it via
  `BackendOpts`; the framework doesn't have a first-class field yet.
+- Job cancellation / kill switch — separate follow-up issue.
+- Concurrent workers / multi-host scale-out — `FOR UPDATE SKIP LOCKED` in
+  the claim query makes it cheap to add, but a single worker is the v1 setup.