Files
ImaGen/docs/backends.md
mAi 8435817ce1 mAi: #10 - multi-model backend expansion (workflow templates + compare harness)
Path 1 architecture: one comfyui adapter, workflows as data.

- workflow_template.go: embed.FS + token substitution with type-preserving
  whole-value placeholders. ${prompt} → string, ${seed} → int64,
  ${cfg} → float64 — no JSON round-tripping. Partial matches ignored.
- comfyui.go: refactored to load workflow from embedded FS or filesystem
  path. Back-compat preserved: workflow: defaults to flux1-schnell.
- workflows/{flux1-schnell,flux2-klein,sd35-medium}.json — bundled
  templates. flux1-schnell migrated from hardcoded with identical node IDs.
- compare.go: new `imagen compare` subcommand. Sequential N-backend run
  (one GPU on mRock — parallel would OOM), per-backend PNG, sidecar JSON
  with per-model metadata + errors, composite contact sheet via Go image
  package (no ImageMagick dep).
- Sample config gains flux2-klein-local + sd35-medium-local instances.
- docs/backends.md: architecture rationale + per-model HF download paths
  + how to add a new bundled workflow + compare-harness reference.

Live smoke verified: compare mock + flux-schnell-local at 768×768 →
both PNGs written, sidecar JSON has workflow="flux1-schnell" + full
metadata, contact sheet renders. Worker contract (Request → Generate)
unchanged, so flexsiebels /imagine UI API surface preserved.

Tests: 11 existing comfyui + 6 new workflow_template + 5 new compare
tests, all green.

Adding a new model is now yaml + JSON, never Go.
2026-05-11 17:29:57 +02:00

311 lines
12 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ImaGen backends
This document covers the local-ComfyUI backend plug-in story: how adapters
are layered, how to add a new model without touching Go, and the per-model
setup steps for the bundled templates.
For the host-side ComfyUI install (mRock — venv, weights for the default
FLUX.1-schnell, systemd, VRAM coexistence with Ollama, smoke test against
the raw HTTP API), see [`setup-comfyui-mrock.md`](setup-comfyui-mrock.md).
## Architecture: Path 1 — workflow-template adapter
`imagen generate` and `imagen compare` dispatch through the `comfyui`
adapter, which holds the HTTP plumbing (`/prompt`, `/history/{id}`, `/view`,
`/system_stats`) and treats the workflow itself as data. Each backend
instance in `imagen.yaml` picks a workflow JSON via the `workflow:` key.
Adding a new model is yaml + JSON, never Go:
```
internal/backend/
comfyui.go # one adapter, all ComfyUI models
workflow_template.go # loader + token-substitution
workflows/
flux1-schnell.json # bundled templates (embedded with //go:embed)
flux2-klein.json
sd35-medium.json
```
### Why Path 1 over per-family adapters (`comfyui-flux.go`, `comfyui-sd3.go`…)
- **Workflow JSON is the natural exchange format**. ComfyUI users export
workflows from its GUI as JSON. Anything else means rebuilding the graph
by hand in Go for every new model.
- **Adding a model is a config change, not a build change**. With Path 2,
every new family is a Go file, a new test file, a registry entry, a new
worker binary, a redeploy. Path 1 lets us land a new model with one yaml
block + one JSON file + one section in this doc.
- **The HTTP plumbing is identical across families**. `/prompt`,
`/history`, `/view`, the retry policy, the "value not in list" hint, VRAM
reporting — none of it depends on the workflow shape. Path 2 would
duplicate that across files.
- **Failure isolation stays clean**. The workflow loader fails at adapter
construction (`imagen backends` surfaces the error), the HTTP layer
fails at `Generate`, and ComfyUI's own validation surfaces missing-model
hints. Each layer's error message points at the right config knob.
Path 2's argument was "each family owns its quirks (samplers, schedulers,
dual-stage etc.)". That argument doesn't survive contact with the
substitution-map design: per-family knobs are just key/value fields in the
yaml block and `${shift}`/`${guidance}`/`${cfg}` placeholders in the
template. No code duplication, no inheritance to debug.
### Token substitution
`workflow_template.SubstituteWorkflow` walks the parsed JSON and replaces
every whole-value string of the form `"${key}"` with the typed value from
the substitution map. Numbers stay numbers, strings stay strings — no
round-tripping through `strings.Replace`.
The substitution map is built per call from:
1. **Request fields** (always present): `${prompt}`, `${negative}`,
`${width}`, `${height}`, `${seed}`, `${steps}`, `${sampler}`,
`${scheduler}`, `${cfg}`.
2. **Every scalar field from the yaml block** (string / int / int64 /
float64 / bool), minus framework keys (`type`, `base_url`, `workflow`,
`default_*`). So `${vae}`, `${clip}`, `${clip_l}`, `${clip_t5}`,
`${dtype}`, `${shift}`, `${guidance}` all become substitutable just by
being in yaml.
3. **Sensible defaults** for the common optional knobs above, so a
workflow that references `${dtype}` without the user setting one in
yaml still substitutes cleanly (`fp8_e4m3fn` for FLUX, `3.0` for SD3
shift, etc.). Extra defaults are ignored by workflows that don't
reference them.
Partial matches (e.g. `"prefix ${prompt} suffix"`) are deliberately **not**
substituted — the placeholder must be the entire value so we can preserve
its JSON type. This prevents a prompt containing literal `${seed}` text
from corrupting the workflow.
Unknown placeholders (referenced in JSON but missing from the substitution
map) error out before the workflow leaves the binary.
### Back-compat
The `workflow:` field defaults to `flux1-schnell` if omitted. Existing
yaml blocks like the pre-#10 FLUX.1-schnell instance:
```yaml
flux-schnell-local:
type: comfyui
base_url: http://mrock:8188
model: flux1-schnell.safetensors
```
still work unchanged — they implicitly pick up the migrated
`flux1-schnell.json` template, which keeps the same node IDs (6, 8, 9, 10,
11, 12, 13, 27, 30, 31) as the historical hardcoded workflow.
## Bundled workflows
### FLUX.1-schnell — the back-compat default
| Field | Default | Notes |
|---|---|---|
| `model` | `flux1-schnell.safetensors` | drop in `models/unet/` |
| `vae` | `ae.safetensors` | `models/vae/` |
| `clip_l` | `clip_l.safetensors` | `models/clip/` |
| `clip_t5` | `t5xxl_fp8_e4m3fn.safetensors` | `models/clip/` |
| `dtype` | `fp8_e4m3fn` | weight dtype for the UNet loader |
| `default_steps` / `default_cfg` | 4 / 1.0 | schnell is distilled to ~4 steps |
VRAM peak ~1012 GB at 1024×1024. Install path:
[`setup-comfyui-mrock.md`](setup-comfyui-mrock.md). Already shipping.
### FLUX.2 [klein] 4B — direct upgrade
Released by Black Forest Labs late 2025 / early 2026, BFL non-commercial
license. The distilled 4B "klein" variant lands sub-second on the RTX
4070 Ti SUPER and shares the new Qwen-based text encoder + a re-trained
VAE with the larger family.
```yaml
flux2-klein-local:
type: comfyui
base_url: http://mrock:8188
workflow: flux2-klein
model: flux-2-klein-base-4b-fp8.safetensors # models/unet/
vae: flux2-vae.safetensors # models/vae/
clip: qwen_3_4b.safetensors # models/text_encoders/
dtype: fp8_e4m3fn
default_steps: 4
default_cfg: 1.0
guidance: 4.0
```
**Model downloads** (on mRock, ungated mirrors when available):
```bash
cd ~/dev/comfyui/models
curl -L -o unet/flux-2-klein-base-4b-fp8.safetensors \
https://huggingface.co/black-forest-labs/FLUX.2-klein/resolve/main/flux-2-klein-base-4b-fp8.safetensors
curl -L -o vae/flux2-vae.safetensors \
https://huggingface.co/black-forest-labs/FLUX.2-klein/resolve/main/flux2-vae.safetensors
mkdir -p text_encoders
curl -L -o text_encoders/qwen_3_4b.safetensors \
https://huggingface.co/black-forest-labs/FLUX.2-klein/resolve/main/qwen_3_4b.safetensors
```
BFL's primary repo is gated; if `curl` returns 401, configure an HF token
in `~/.cache/huggingface/token` or use one of the community mirrors
(check the official model card for the current list). The filenames the
template references match BFL's canonical names — rename downloads to
match if a mirror uses different ones.
VRAM peak: ~8.5 GB (4B fp8). With Ollama parked at ~8 GB this still fits;
unlike FLUX.1-schnell, klein doesn't require stopping Ollama on mRock.
### SD3.5-medium — single-checkpoint variant
Stability AI's 2.5B mid-size model with bundled text encoders. The
`incl_clips_t5xxlfp8scaled` variant ships clip_g + clip_l + t5xxl_fp8 all
in one `.safetensors`, so the workflow uses `CheckpointLoaderSimple`
instead of separate UNet/VAE/CLIP loaders.
```yaml
sd35-medium-local:
type: comfyui
base_url: http://mrock:8188
workflow: sd35-medium
model: sd3.5_medium_incl_clips_t5xxlfp8scaled.safetensors # models/checkpoints/
default_steps: 28
default_sampler: dpmpp_2m
default_scheduler: sgm_uniform
default_cfg: 4.5
shift: 3.0
```
**Model download** (on mRock):
```bash
cd ~/dev/comfyui/models
curl -L -o checkpoints/sd3.5_medium_incl_clips_t5xxlfp8scaled.safetensors \
https://huggingface.co/stabilityai/stable-diffusion-3.5-medium/resolve/main/sd3.5_medium_incl_clips_t5xxlfp8scaled.safetensors
```
VRAM peak: ~9.9 GB at 1024×1024. Same envelope as FLUX.1-schnell — stop
Ollama before generating, restart after.
## Adding a new bundled workflow
1. **Export from ComfyUI**: load the model in the ComfyUI GUI, build a
text-to-image workflow that produces what you want, "Save (API
Format)" — the file you get is the right shape.
2. **Sprinkle placeholders**: open the JSON and replace per-call values
with `${name}` tokens. Whole-value substitution only:
```json
"inputs": {
"text": "${prompt}", // was "a cat sitting on a chair"
"seed": "${seed}", // was 1234567
"steps": "${steps}", // was 28
"cfg": "${cfg}",
"sampler_name": "${sampler}",
"scheduler": "${scheduler}",
"width": "${width}",
"height": "${height}"
}
```
Use `${model}` for the checkpoint / unet filename and any per-template
knobs (`${vae}`, `${shift}`, `${guidance}`, `${clip}` …).
3. **Drop it into `internal/backend/workflows/<name>.json`**. The
`//go:embed workflows/*.json` directive in `workflow_template.go`
picks it up at build time — no registry entry needed.
4. **Add a yaml instance** in `internal/config/config.go`'s `Sample` block
for `imagen config init` (and `~/.config/imagen.yaml`) so users
discover the new backend.
5. **Document the model files + HF download URLs** in this doc.
6. **Smoke test**: `imagen generate "test" --backend <new-instance>
--size 1024x1024` should produce an image.
Per-call overrides for sampler/scheduler/cfg go via `--steps`, `--seed`,
and (programmatic) `backend.Request.BackendOpts["sampler"]` /
`["scheduler"]` / `["cfg"]`. The compare harness forwards the
constant-across-backends knobs verbatim.
## Loading a workflow from disk (one-off)
Pass an absolute filesystem path as `workflow:` and the adapter reads it
from disk instead of the embedded FS. Handy for prototyping a new model
before committing it:
```yaml
my-experimental:
type: comfyui
base_url: http://mrock:8188
workflow: /home/m/dev/comfyui/workflows/my-test.json
model: my-test-model.safetensors
```
The fallback chain is: filesystem path (if the string looks like a path
or ends in `.json`), then bundled lookup by name, then bundled lookup
with `.json` appended.
## `imagen compare`: cross-backend evaluation
```bash
imagen compare "a wizard casting a spell" \
--models flux-schnell-local,flux2-klein-local,sd35-medium-local \
--size 1024x1024 \
--output ~/Pictures/imagen/compare
```
Per run, `compare`:
- creates `<output>/<YYYYMMDD-HHMMSS>-<prompt-slug>/`
- dispatches each named backend sequentially (mRock has one GPU; parallel
would OOM) — one backend's failure doesn't abort the run
- writes per-backend PNGs as `<prompt-slug>--<backend-slug>.png`
- writes `compare.json` listing every attempt (success + failure) with
per-model `seed`, `latency_ms`, `model`, `vram_used_mib`, full
`metadata` map, and the error string for any failure
- composites a `contact-sheet.png` with the prompt as header and each
cell labelled `<backend>` / `<latency>ms · seed <n>`
Flags mirror `generate`: `--seed`, `--steps`, `--style`, `--negative`,
`--size` are shared across all backends. `--no-contact-sheet` skips the
composite when only the per-image PNGs and sidecar matter (e.g. for a
worker script that builds its own diff view).
## Diagnostics
`imagen backends` shows every instance with its registration state. For
local ComfyUI, the status is currently just `registered` (we don't probe
the upstream HTTP endpoint at startup — the boot-helper hint kicks in on
first generation if mRock is asleep).
Per-backend errors emit at most three kinds:
1. **Adapter construction failure** (e.g. workflow JSON not found,
missing required yaml field). Caught at `buildBackend` time:
`imagen: backend "<name>": <err>`.
2. **HTTP / runtime failure during Generate**. Wrapped with the boot
helper for `connection refused`/`no such host`/timeouts pointing at
`boot-whitetower mrock` so a sleeping mRock has an obvious next step.
3. **ComfyUI workflow-validation failure** (200-with-node_errors or 400).
Surfaces with a model-not-found hint (matching `value_not_in_list` +
`unet_name`/`ckpt_name`) when applicable, pointing back at this doc.
## Worker daemon notes
`imagen worker` (the `imagen.jobs` queue consumer) uses the same adapter
+ workflow lookup as the synchronous CLI — flexsiebels' `/imagine` UI
INSERTs a `backend = <instance>` row, the worker claims it, and the
underlying ComfyUI HTTP calls are identical to what `generate` makes. No
worker-specific changes are required when a new backend lands; the
config + workflow are the only state that has to be present on the
worker host.
After merging a new template or yaml block:
```bash
# On the worker host (mRiver today):
systemctl --user restart imagen-worker
```
The daemon-rebuild trap from issue #9 still applies: if you build the
imagen binary on the dev machine and `scp` it over, restart the unit so
systemd picks up the new ELF.