mAi: #10 - multi-model backend expansion (workflow templates + compare harness)

Path 1 architecture: one comfyui adapter, workflows as data.

- workflow_template.go: embed.FS + token substitution with type-preserving
  whole-value placeholders. ${prompt} → string, ${seed} → int64,
  ${cfg} → float64 — no JSON round-tripping. Partial matches ignored.
- comfyui.go: refactored to load workflow from embedded FS or filesystem
  path. Back-compat preserved: workflow: defaults to flux1-schnell.
- workflows/{flux1-schnell,flux2-klein,sd35-medium}.json — bundled
  templates. flux1-schnell migrated from hardcoded with identical node IDs.
- compare.go: new `imagen compare` subcommand. Sequential N-backend run
  (one GPU on mRock — parallel would OOM), per-backend PNG, sidecar JSON
  with per-model metadata + errors, composite contact sheet via Go image
  package (no ImageMagick dep).
- Sample config gains flux2-klein-local + sd35-medium-local instances.
- docs/backends.md: architecture rationale + per-model HF download paths
  + how to add a new bundled workflow + compare-harness reference.

Live smoke verified: compare mock + flux-schnell-local at 768×768 →
both PNGs written, sidecar JSON has workflow="flux1-schnell" + full
metadata, contact sheet renders. Worker contract (Request → Generate)
unchanged, so flexsiebels /imagine UI API surface preserved.

Tests: 11 existing comfyui + 6 new workflow_template + 5 new compare
tests, all green.

Adding a new model is now yaml + JSON, never Go.
This commit is contained in:
mAi
2026-05-11 17:29:57 +02:00
parent 623dd290c5
commit 8435817ce1
15 changed files with 1638 additions and 122 deletions

310
docs/backends.md Normal file
View File

@@ -0,0 +1,310 @@
# ImaGen backends
This document covers the local-ComfyUI backend plug-in story: how adapters
are layered, how to add a new model without touching Go, and the per-model
setup steps for the bundled templates.
For the host-side ComfyUI install (mRock — venv, weights for the default
FLUX.1-schnell, systemd, VRAM coexistence with Ollama, smoke test against
the raw HTTP API), see [`setup-comfyui-mrock.md`](setup-comfyui-mrock.md).
## Architecture: Path 1 — workflow-template adapter
`imagen generate` and `imagen compare` dispatch through the `comfyui`
adapter, which holds the HTTP plumbing (`/prompt`, `/history/{id}`, `/view`,
`/system_stats`) and treats the workflow itself as data. Each backend
instance in `imagen.yaml` picks a workflow JSON via the `workflow:` key.
Adding a new model is yaml + JSON, never Go:
```
internal/backend/
comfyui.go # one adapter, all ComfyUI models
workflow_template.go # loader + token-substitution
workflows/
flux1-schnell.json # bundled templates (embedded with //go:embed)
flux2-klein.json
sd35-medium.json
```
### Why Path 1 over per-family adapters (`comfyui-flux.go`, `comfyui-sd3.go`…)
- **Workflow JSON is the natural exchange format**. ComfyUI users export
workflows from its GUI as JSON. Anything else means rebuilding the graph
by hand in Go for every new model.
- **Adding a model is a config change, not a build change**. With Path 2,
every new family is a Go file, a new test file, a registry entry, a new
worker binary, a redeploy. Path 1 lets us land a new model with one yaml
block + one JSON file + one section in this doc.
- **The HTTP plumbing is identical across families**. `/prompt`,
`/history`, `/view`, the retry policy, the "value not in list" hint, VRAM
reporting — none of it depends on the workflow shape. Path 2 would
duplicate that across files.
- **Failure isolation stays clean**. The workflow loader fails at adapter
construction (`imagen backends` surfaces the error), the HTTP layer
fails at `Generate`, and ComfyUI's own validation surfaces missing-model
hints. Each layer's error message points at the right config knob.
Path 2's argument was "each family owns its quirks (samplers, schedulers,
dual-stage etc.)". That argument doesn't survive contact with the
substitution-map design: per-family knobs are just key/value fields in the
yaml block and `${shift}`/`${guidance}`/`${cfg}` placeholders in the
template. No code duplication, no inheritance to debug.
### Token substitution
`workflow_template.SubstituteWorkflow` walks the parsed JSON and replaces
every whole-value string of the form `"${key}"` with the typed value from
the substitution map. Numbers stay numbers, strings stay strings — no
round-tripping through `strings.Replace`.
The substitution map is built per call from:
1. **Request fields** (always present): `${prompt}`, `${negative}`,
`${width}`, `${height}`, `${seed}`, `${steps}`, `${sampler}`,
`${scheduler}`, `${cfg}`.
2. **Every scalar field from the yaml block** (string / int / int64 /
float64 / bool), minus framework keys (`type`, `base_url`, `workflow`,
`default_*`). So `${vae}`, `${clip}`, `${clip_l}`, `${clip_t5}`,
`${dtype}`, `${shift}`, `${guidance}` all become substitutable just by
being in yaml.
3. **Sensible defaults** for the common optional knobs above, so a
workflow that references `${dtype}` without the user setting one in
yaml still substitutes cleanly (`fp8_e4m3fn` for FLUX, `3.0` for SD3
shift, etc.). Extra defaults are ignored by workflows that don't
reference them.
Partial matches (e.g. `"prefix ${prompt} suffix"`) are deliberately **not**
substituted — the placeholder must be the entire value so we can preserve
its JSON type. This prevents a prompt containing literal `${seed}` text
from corrupting the workflow.
Unknown placeholders (referenced in JSON but missing from the substitution
map) error out before the workflow leaves the binary.
### Back-compat
The `workflow:` field defaults to `flux1-schnell` if omitted. Existing
yaml blocks like the pre-#10 FLUX.1-schnell instance:
```yaml
flux-schnell-local:
type: comfyui
base_url: http://mrock:8188
model: flux1-schnell.safetensors
```
still work unchanged — they implicitly pick up the migrated
`flux1-schnell.json` template, which keeps the same node IDs (6, 8, 9, 10,
11, 12, 13, 27, 30, 31) as the historical hardcoded workflow.
## Bundled workflows
### FLUX.1-schnell — the back-compat default
| Field | Default | Notes |
|---|---|---|
| `model` | `flux1-schnell.safetensors` | drop in `models/unet/` |
| `vae` | `ae.safetensors` | `models/vae/` |
| `clip_l` | `clip_l.safetensors` | `models/clip/` |
| `clip_t5` | `t5xxl_fp8_e4m3fn.safetensors` | `models/clip/` |
| `dtype` | `fp8_e4m3fn` | weight dtype for the UNet loader |
| `default_steps` / `default_cfg` | 4 / 1.0 | schnell is distilled to ~4 steps |
VRAM peak ~1012 GB at 1024×1024. Install path:
[`setup-comfyui-mrock.md`](setup-comfyui-mrock.md). Already shipping.
### FLUX.2 [klein] 4B — direct upgrade
Released by Black Forest Labs late 2025 / early 2026, BFL non-commercial
license. The distilled 4B "klein" variant lands sub-second on the RTX
4070 Ti SUPER and shares the new Qwen-based text encoder + a re-trained
VAE with the larger family.
```yaml
flux2-klein-local:
type: comfyui
base_url: http://mrock:8188
workflow: flux2-klein
model: flux-2-klein-base-4b-fp8.safetensors # models/unet/
vae: flux2-vae.safetensors # models/vae/
clip: qwen_3_4b.safetensors # models/text_encoders/
dtype: fp8_e4m3fn
default_steps: 4
default_cfg: 1.0
guidance: 4.0
```
**Model downloads** (on mRock, ungated mirrors when available):
```bash
cd ~/dev/comfyui/models
curl -L -o unet/flux-2-klein-base-4b-fp8.safetensors \
https://huggingface.co/black-forest-labs/FLUX.2-klein/resolve/main/flux-2-klein-base-4b-fp8.safetensors
curl -L -o vae/flux2-vae.safetensors \
https://huggingface.co/black-forest-labs/FLUX.2-klein/resolve/main/flux2-vae.safetensors
mkdir -p text_encoders
curl -L -o text_encoders/qwen_3_4b.safetensors \
https://huggingface.co/black-forest-labs/FLUX.2-klein/resolve/main/qwen_3_4b.safetensors
```
BFL's primary repo is gated; if `curl` returns 401, configure an HF token
in `~/.cache/huggingface/token` or use one of the community mirrors
(check the official model card for the current list). The filenames the
template references match BFL's canonical names — rename downloads to
match if a mirror uses different ones.
VRAM peak: ~8.5 GB (4B fp8). With Ollama parked at ~8 GB this still fits;
unlike FLUX.1-schnell, klein doesn't require stopping Ollama on mRock.
### SD3.5-medium — single-checkpoint variant
Stability AI's 2.5B mid-size model with bundled text encoders. The
`incl_clips_t5xxlfp8scaled` variant ships clip_g + clip_l + t5xxl_fp8 all
in one `.safetensors`, so the workflow uses `CheckpointLoaderSimple`
instead of separate UNet/VAE/CLIP loaders.
```yaml
sd35-medium-local:
type: comfyui
base_url: http://mrock:8188
workflow: sd35-medium
model: sd3.5_medium_incl_clips_t5xxlfp8scaled.safetensors # models/checkpoints/
default_steps: 28
default_sampler: dpmpp_2m
default_scheduler: sgm_uniform
default_cfg: 4.5
shift: 3.0
```
**Model download** (on mRock):
```bash
cd ~/dev/comfyui/models
curl -L -o checkpoints/sd3.5_medium_incl_clips_t5xxlfp8scaled.safetensors \
https://huggingface.co/stabilityai/stable-diffusion-3.5-medium/resolve/main/sd3.5_medium_incl_clips_t5xxlfp8scaled.safetensors
```
VRAM peak: ~9.9 GB at 1024×1024. Same envelope as FLUX.1-schnell — stop
Ollama before generating, restart after.
## Adding a new bundled workflow
1. **Export from ComfyUI**: load the model in the ComfyUI GUI, build a
text-to-image workflow that produces what you want, "Save (API
Format)" — the file you get is the right shape.
2. **Sprinkle placeholders**: open the JSON and replace per-call values
with `${name}` tokens. Whole-value substitution only:
```json
"inputs": {
"text": "${prompt}", // was "a cat sitting on a chair"
"seed": "${seed}", // was 1234567
"steps": "${steps}", // was 28
"cfg": "${cfg}",
"sampler_name": "${sampler}",
"scheduler": "${scheduler}",
"width": "${width}",
"height": "${height}"
}
```
Use `${model}` for the checkpoint / unet filename and any per-template
knobs (`${vae}`, `${shift}`, `${guidance}`, `${clip}` …).
3. **Drop it into `internal/backend/workflows/<name>.json`**. The
`//go:embed workflows/*.json` directive in `workflow_template.go`
picks it up at build time — no registry entry needed.
4. **Add a yaml instance** in `internal/config/config.go`'s `Sample` block
for `imagen config init` (and `~/.config/imagen.yaml`) so users
discover the new backend.
5. **Document the model files + HF download URLs** in this doc.
6. **Smoke test**: `imagen generate "test" --backend <new-instance>
--size 1024x1024` should produce an image.
Per-call overrides for sampler/scheduler/cfg go via `--steps`, `--seed`,
and (programmatic) `backend.Request.BackendOpts["sampler"]` /
`["scheduler"]` / `["cfg"]`. The compare harness forwards the
constant-across-backends knobs verbatim.
## Loading a workflow from disk (one-off)
Pass an absolute filesystem path as `workflow:` and the adapter reads it
from disk instead of the embedded FS. Handy for prototyping a new model
before committing it:
```yaml
my-experimental:
type: comfyui
base_url: http://mrock:8188
workflow: /home/m/dev/comfyui/workflows/my-test.json
model: my-test-model.safetensors
```
The fallback chain is: filesystem path (if the string looks like a path
or ends in `.json`), then bundled lookup by name, then bundled lookup
with `.json` appended.
## `imagen compare`: cross-backend evaluation
```bash
imagen compare "a wizard casting a spell" \
--models flux-schnell-local,flux2-klein-local,sd35-medium-local \
--size 1024x1024 \
--output ~/Pictures/imagen/compare
```
Per run, `compare`:
- creates `<output>/<YYYYMMDD-HHMMSS>-<prompt-slug>/`
- dispatches each named backend sequentially (mRock has one GPU; parallel
would OOM) — one backend's failure doesn't abort the run
- writes per-backend PNGs as `<prompt-slug>--<backend-slug>.png`
- writes `compare.json` listing every attempt (success + failure) with
per-model `seed`, `latency_ms`, `model`, `vram_used_mib`, full
`metadata` map, and the error string for any failure
- composites a `contact-sheet.png` with the prompt as header and each
cell labelled `<backend>` / `<latency>ms · seed <n>`
Flags mirror `generate`: `--seed`, `--steps`, `--style`, `--negative`,
`--size` are shared across all backends. `--no-contact-sheet` skips the
composite when only the per-image PNGs and sidecar matter (e.g. for a
worker script that builds its own diff view).
## Diagnostics
`imagen backends` shows every instance with its registration state. For
local ComfyUI, the status is currently just `registered` (we don't probe
the upstream HTTP endpoint at startup — the boot-helper hint kicks in on
first generation if mRock is asleep).
Per-backend errors emit at most three kinds:
1. **Adapter construction failure** (e.g. workflow JSON not found,
missing required yaml field). Caught at `buildBackend` time:
`imagen: backend "<name>": <err>`.
2. **HTTP / runtime failure during Generate**. Wrapped with the boot
helper for `connection refused`/`no such host`/timeouts pointing at
`boot-whitetower mrock` so a sleeping mRock has an obvious next step.
3. **ComfyUI workflow-validation failure** (200-with-node_errors or 400).
Surfaces with a model-not-found hint (matching `value_not_in_list` +
`unet_name`/`ckpt_name`) when applicable, pointing back at this doc.
## Worker daemon notes
`imagen worker` (the `imagen.jobs` queue consumer) uses the same adapter
+ workflow lookup as the synchronous CLI — flexsiebels' `/imagine` UI
INSERTs a `backend = <instance>` row, the worker claims it, and the
underlying ComfyUI HTTP calls are identical to what `generate` makes. No
worker-specific changes are required when a new backend lands; the
config + workflow are the only state that has to be present on the
worker host.
After merging a new template or yaml block:
```bash
# On the worker host (mRiver today):
systemctl --user restart imagen-worker
```
The daemon-rebuild trap from issue #9 still applies: if you build the
imagen binary on the dev machine and `scp` it over, restart the unit so
systemd picks up the new ELF.