mAi: #10 - multi-model backend expansion (workflow templates + compare harness)

Path 1 architecture: one comfyui adapter, workflows as data. - workflow_template.go: embed.FS + token substitution with type-preserving whole-value placeholders. ${prompt} → string, ${seed} → int64, ${cfg} → float64 — no JSON round-tripping. Partial matches ignored. - comfyui.go: refactored to load workflow from embedded FS or filesystem path. Back-compat preserved: workflow: defaults to flux1-schnell. - workflows/{flux1-schnell,flux2-klein,sd35-medium}.json — bundled templates. flux1-schnell migrated from hardcoded with identical node IDs. - compare.go: new `imagen compare` subcommand. Sequential N-backend run (one GPU on mRock — parallel would OOM), per-backend PNG, sidecar JSON with per-model metadata + errors, composite contact sheet via Go image package (no ImageMagick dep). - Sample config gains flux2-klein-local + sd35-medium-local instances. - docs/backends.md: architecture rationale + per-model HF download paths + how to add a new bundled workflow + compare-harness reference. Live smoke verified: compare mock + flux-schnell-local at 768×768 → both PNGs written, sidecar JSON has workflow="flux1-schnell" + full metadata, contact sheet renders. Worker contract (Request → Generate) unchanged, so flexsiebels /imagine UI API surface preserved. Tests: 11 existing comfyui + 6 new workflow_template + 5 new compare tests, all green. Adding a new model is now yaml + JSON, never Go.
2026-05-11 17:29:57 +02:00
parent 623dd290c5
commit 8435817ce1
15 changed files with 1638 additions and 122 deletions
--- a/docs/backends.md
+++ b/docs/backends.md
@@ -0,0 +1,310 @@
+# ImaGen backends
+
+This document covers the local-ComfyUI backend plug-in story: how adapters
+are layered, how to add a new model without touching Go, and the per-model
+setup steps for the bundled templates.
+
+For the host-side ComfyUI install (mRock — venv, weights for the default
+FLUX.1-schnell, systemd, VRAM coexistence with Ollama, smoke test against
+the raw HTTP API), see [`setup-comfyui-mrock.md`](setup-comfyui-mrock.md).
+
+## Architecture: Path 1 — workflow-template adapter
+
+`imagen generate` and `imagen compare` dispatch through the `comfyui`
+adapter, which holds the HTTP plumbing (`/prompt`, `/history/{id}`, `/view`,
+`/system_stats`) and treats the workflow itself as data. Each backend
+instance in `imagen.yaml` picks a workflow JSON via the `workflow:` key.
+Adding a new model is yaml + JSON, never Go:
+
+```
+internal/backend/
+  comfyui.go              # one adapter, all ComfyUI models
+  workflow_template.go    # loader + token-substitution
+  workflows/
+    flux1-schnell.json    # bundled templates (embedded with //go:embed)
+    flux2-klein.json
+    sd35-medium.json
+```
+
+### Why Path 1 over per-family adapters (`comfyui-flux.go`, `comfyui-sd3.go`…)
+
+- **Workflow JSON is the natural exchange format**. ComfyUI users export
+  workflows from its GUI as JSON. Anything else means rebuilding the graph
+  by hand in Go for every new model.
+- **Adding a model is a config change, not a build change**. With Path 2,
+  every new family is a Go file, a new test file, a registry entry, a new
+  worker binary, a redeploy. Path 1 lets us land a new model with one yaml
+  block + one JSON file + one section in this doc.
+- **The HTTP plumbing is identical across families**. `/prompt`,
+  `/history`, `/view`, the retry policy, the "value not in list" hint, VRAM
+  reporting — none of it depends on the workflow shape. Path 2 would
+  duplicate that across files.
+- **Failure isolation stays clean**. The workflow loader fails at adapter
+  construction (`imagen backends` surfaces the error), the HTTP layer
+  fails at `Generate`, and ComfyUI's own validation surfaces missing-model
+  hints. Each layer's error message points at the right config knob.
+
+Path 2's argument was "each family owns its quirks (samplers, schedulers,
+dual-stage etc.)". That argument doesn't survive contact with the
+substitution-map design: per-family knobs are just key/value fields in the
+yaml block and `${shift}`/`${guidance}`/`${cfg}` placeholders in the
+template. No code duplication, no inheritance to debug.
+
+### Token substitution
+
+`workflow_template.SubstituteWorkflow` walks the parsed JSON and replaces
+every whole-value string of the form `"${key}"` with the typed value from
+the substitution map. Numbers stay numbers, strings stay strings — no
+round-tripping through `strings.Replace`.
+
+The substitution map is built per call from:
+
+1. **Request fields** (always present): `${prompt}`, `${negative}`,
+   `${width}`, `${height}`, `${seed}`, `${steps}`, `${sampler}`,
+   `${scheduler}`, `${cfg}`.
+2. **Every scalar field from the yaml block** (string / int / int64 /
+   float64 / bool), minus framework keys (`type`, `base_url`, `workflow`,
+   `default_*`). So `${vae}`, `${clip}`, `${clip_l}`, `${clip_t5}`,
+   `${dtype}`, `${shift}`, `${guidance}` all become substitutable just by
+   being in yaml.
+3. **Sensible defaults** for the common optional knobs above, so a
+   workflow that references `${dtype}` without the user setting one in
+   yaml still substitutes cleanly (`fp8_e4m3fn` for FLUX, `3.0` for SD3
+   shift, etc.). Extra defaults are ignored by workflows that don't
+   reference them.
+
+Partial matches (e.g. `"prefix ${prompt} suffix"`) are deliberately **not**
+substituted — the placeholder must be the entire value so we can preserve
+its JSON type. This prevents a prompt containing literal `${seed}` text
+from corrupting the workflow.
+
+Unknown placeholders (referenced in JSON but missing from the substitution
+map) error out before the workflow leaves the binary.
+
+### Back-compat
+
+The `workflow:` field defaults to `flux1-schnell` if omitted. Existing
+yaml blocks like the pre-#10 FLUX.1-schnell instance:
+
+```yaml
+flux-schnell-local:
+  type: comfyui
+  base_url: http://mrock:8188
+  model: flux1-schnell.safetensors
+```
+
+still work unchanged — they implicitly pick up the migrated
+`flux1-schnell.json` template, which keeps the same node IDs (6, 8, 9, 10,
+11, 12, 13, 27, 30, 31) as the historical hardcoded workflow.
+
+## Bundled workflows
+
+### FLUX.1-schnell — the back-compat default
+
+| Field | Default | Notes |
+|---|---|---|
+| `model` | `flux1-schnell.safetensors` | drop in `models/unet/` |
+| `vae` | `ae.safetensors` | `models/vae/` |
+| `clip_l` | `clip_l.safetensors` | `models/clip/` |
+| `clip_t5` | `t5xxl_fp8_e4m3fn.safetensors` | `models/clip/` |
+| `dtype` | `fp8_e4m3fn` | weight dtype for the UNet loader |
+| `default_steps` / `default_cfg` | 4 / 1.0 | schnell is distilled to ~4 steps |
+
+VRAM peak ~10–12 GB at 1024×1024. Install path:
+[`setup-comfyui-mrock.md`](setup-comfyui-mrock.md). Already shipping.
+
+### FLUX.2 [klein] 4B — direct upgrade
+
+Released by Black Forest Labs late 2025 / early 2026, BFL non-commercial
+license. The distilled 4B "klein" variant lands sub-second on the RTX
+4070 Ti SUPER and shares the new Qwen-based text encoder + a re-trained
+VAE with the larger family.
+
+```yaml
+flux2-klein-local:
+  type: comfyui
+  base_url: http://mrock:8188
+  workflow: flux2-klein
+  model: flux-2-klein-base-4b-fp8.safetensors    # models/unet/
+  vae: flux2-vae.safetensors                     # models/vae/
+  clip: qwen_3_4b.safetensors                    # models/text_encoders/
+  dtype: fp8_e4m3fn
+  default_steps: 4
+  default_cfg: 1.0
+  guidance: 4.0
+```
+
+**Model downloads** (on mRock, ungated mirrors when available):
+
+```bash
+cd ~/dev/comfyui/models
+curl -L -o unet/flux-2-klein-base-4b-fp8.safetensors \
+  https://huggingface.co/black-forest-labs/FLUX.2-klein/resolve/main/flux-2-klein-base-4b-fp8.safetensors
+curl -L -o vae/flux2-vae.safetensors \
+  https://huggingface.co/black-forest-labs/FLUX.2-klein/resolve/main/flux2-vae.safetensors
+mkdir -p text_encoders
+curl -L -o text_encoders/qwen_3_4b.safetensors \
+  https://huggingface.co/black-forest-labs/FLUX.2-klein/resolve/main/qwen_3_4b.safetensors
+```
+
+BFL's primary repo is gated; if `curl` returns 401, configure an HF token
+in `~/.cache/huggingface/token` or use one of the community mirrors
+(check the official model card for the current list). The filenames the
+template references match BFL's canonical names — rename downloads to
+match if a mirror uses different ones.
+
+VRAM peak: ~8.5 GB (4B fp8). With Ollama parked at ~8 GB this still fits;
+unlike FLUX.1-schnell, klein doesn't require stopping Ollama on mRock.
+
+### SD3.5-medium — single-checkpoint variant
+
+Stability AI's 2.5B mid-size model with bundled text encoders. The
+`incl_clips_t5xxlfp8scaled` variant ships clip_g + clip_l + t5xxl_fp8 all
+in one `.safetensors`, so the workflow uses `CheckpointLoaderSimple`
+instead of separate UNet/VAE/CLIP loaders.
+
+```yaml
+sd35-medium-local:
+  type: comfyui
+  base_url: http://mrock:8188
+  workflow: sd35-medium
+  model: sd3.5_medium_incl_clips_t5xxlfp8scaled.safetensors  # models/checkpoints/
+  default_steps: 28
+  default_sampler: dpmpp_2m
+  default_scheduler: sgm_uniform
+  default_cfg: 4.5
+  shift: 3.0
+```
+
+**Model download** (on mRock):
+
+```bash
+cd ~/dev/comfyui/models
+curl -L -o checkpoints/sd3.5_medium_incl_clips_t5xxlfp8scaled.safetensors \
+  https://huggingface.co/stabilityai/stable-diffusion-3.5-medium/resolve/main/sd3.5_medium_incl_clips_t5xxlfp8scaled.safetensors
+```
+
+VRAM peak: ~9.9 GB at 1024×1024. Same envelope as FLUX.1-schnell — stop
+Ollama before generating, restart after.
+
+## Adding a new bundled workflow
+
+1. **Export from ComfyUI**: load the model in the ComfyUI GUI, build a
+   text-to-image workflow that produces what you want, "Save (API
+   Format)" — the file you get is the right shape.
+2. **Sprinkle placeholders**: open the JSON and replace per-call values
+   with `${name}` tokens. Whole-value substitution only:
+
+   ```json
+   "inputs": {
+     "text": "${prompt}",         // was "a cat sitting on a chair"
+     "seed": "${seed}",            // was 1234567
+     "steps": "${steps}",          // was 28
+     "cfg": "${cfg}",
+     "sampler_name": "${sampler}",
+     "scheduler": "${scheduler}",
+     "width": "${width}",
+     "height": "${height}"
+   }
+   ```
+
+   Use `${model}` for the checkpoint / unet filename and any per-template
+   knobs (`${vae}`, `${shift}`, `${guidance}`, `${clip}` …).
+3. **Drop it into `internal/backend/workflows/<name>.json`**. The
+   `//go:embed workflows/*.json` directive in `workflow_template.go`
+   picks it up at build time — no registry entry needed.
+4. **Add a yaml instance** in `internal/config/config.go`'s `Sample` block
+   for `imagen config init` (and `~/.config/imagen.yaml`) so users
+   discover the new backend.
+5. **Document the model files + HF download URLs** in this doc.
+6. **Smoke test**: `imagen generate "test" --backend <new-instance>
+   --size 1024x1024` should produce an image.
+
+Per-call overrides for sampler/scheduler/cfg go via `--steps`, `--seed`,
+and (programmatic) `backend.Request.BackendOpts["sampler"]` /
+`["scheduler"]` / `["cfg"]`. The compare harness forwards the
+constant-across-backends knobs verbatim.
+
+## Loading a workflow from disk (one-off)
+
+Pass an absolute filesystem path as `workflow:` and the adapter reads it
+from disk instead of the embedded FS. Handy for prototyping a new model
+before committing it:
+
+```yaml
+my-experimental:
+  type: comfyui
+  base_url: http://mrock:8188
+  workflow: /home/m/dev/comfyui/workflows/my-test.json
+  model: my-test-model.safetensors
+```
+
+The fallback chain is: filesystem path (if the string looks like a path
+or ends in `.json`), then bundled lookup by name, then bundled lookup
+with `.json` appended.
+
+## `imagen compare`: cross-backend evaluation
+
+```bash
+imagen compare "a wizard casting a spell" \
+  --models flux-schnell-local,flux2-klein-local,sd35-medium-local \
+  --size 1024x1024 \
+  --output ~/Pictures/imagen/compare
+```
+
+Per run, `compare`:
+
+- creates `<output>/<YYYYMMDD-HHMMSS>-<prompt-slug>/`
+- dispatches each named backend sequentially (mRock has one GPU; parallel
+  would OOM) — one backend's failure doesn't abort the run
+- writes per-backend PNGs as `<prompt-slug>--<backend-slug>.png`
+- writes `compare.json` listing every attempt (success + failure) with
+  per-model `seed`, `latency_ms`, `model`, `vram_used_mib`, full
+  `metadata` map, and the error string for any failure
+- composites a `contact-sheet.png` with the prompt as header and each
+  cell labelled `<backend>` / `<latency>ms · seed <n>`
+
+Flags mirror `generate`: `--seed`, `--steps`, `--style`, `--negative`,
+`--size` are shared across all backends. `--no-contact-sheet` skips the
+composite when only the per-image PNGs and sidecar matter (e.g. for a
+worker script that builds its own diff view).
+
+## Diagnostics
+
+`imagen backends` shows every instance with its registration state. For
+local ComfyUI, the status is currently just `registered` (we don't probe
+the upstream HTTP endpoint at startup — the boot-helper hint kicks in on
+first generation if mRock is asleep).
+
+Per-backend errors emit at most three kinds:
+
+1. **Adapter construction failure** (e.g. workflow JSON not found,
+   missing required yaml field). Caught at `buildBackend` time:
+   `imagen: backend "<name>": <err>`.
+2. **HTTP / runtime failure during Generate**. Wrapped with the boot
+   helper for `connection refused`/`no such host`/timeouts pointing at
+   `boot-whitetower mrock` so a sleeping mRock has an obvious next step.
+3. **ComfyUI workflow-validation failure** (200-with-node_errors or 400).
+   Surfaces with a model-not-found hint (matching `value_not_in_list` +
+   `unet_name`/`ckpt_name`) when applicable, pointing back at this doc.
+
+## Worker daemon notes
+
+`imagen worker` (the `imagen.jobs` queue consumer) uses the same adapter
+ workflow lookup as the synchronous CLI — flexsiebels' `/imagine` UI
+INSERTs a `backend = <instance>` row, the worker claims it, and the
+underlying ComfyUI HTTP calls are identical to what `generate` makes. No
+worker-specific changes are required when a new backend lands; the
+config + workflow are the only state that has to be present on the
+worker host.
+
+After merging a new template or yaml block:
+
+```bash
+# On the worker host (mRiver today):
+systemctl --user restart imagen-worker
+```
+
+The daemon-rebuild trap from issue #9 still applies: if you build the
+imagen binary on the dev machine and `scp` it over, restart the unit so
+systemd picks up the new ELF.