ImaGen/docs/backends.md

# ImaGen backends

This document covers the local-ComfyUI backend plug-in story: how adapters
are layered, how to add a new model without touching Go, and the per-model
setup steps for the bundled templates.

For the host-side ComfyUI install (mRock — venv, weights for the default
FLUX.1-schnell, systemd, VRAM coexistence with Ollama, smoke test against
the raw HTTP API), see [`setup-comfyui-mrock.md`](setup-comfyui-mrock.md).

## Architecture: Path 1 — workflow-template adapter

`imagen generate` and `imagen compare` dispatch through the `comfyui`
adapter, which holds the HTTP plumbing (`/prompt`, `/history/{id}`, `/view`,
`/system_stats`) and treats the workflow itself as data. Each backend
instance in `imagen.yaml` picks a workflow JSON via the `workflow:` key.
Adding a new model is yaml + JSON, never Go:

```
internal/backend/
  comfyui.go              # one adapter, all ComfyUI models
  workflow_template.go    # loader + token-substitution
  workflows/
    flux1-schnell.json    # bundled templates (embedded with //go:embed)
    flux2-klein.json
    sd35-medium.json
```

### Why Path 1 over per-family adapters (`comfyui-flux.go`, `comfyui-sd3.go`…)

- **Workflow JSON is the natural exchange format**. ComfyUI users export
  workflows from its GUI as JSON. Anything else means rebuilding the graph
  by hand in Go for every new model.
- **Adding a model is a config change, not a build change**. With Path 2,
  every new family is a Go file, a new test file, a registry entry, a new
  worker binary, a redeploy. Path 1 lets us land a new model with one yaml
  block + one JSON file + one section in this doc.
- **The HTTP plumbing is identical across families**. `/prompt`,
  `/history`, `/view`, the retry policy, the "value not in list" hint, VRAM
  reporting — none of it depends on the workflow shape. Path 2 would
  duplicate that across files.
- **Failure isolation stays clean**. The workflow loader fails at adapter
  construction (`imagen backends` surfaces the error), the HTTP layer
  fails at `Generate`, and ComfyUI's own validation surfaces missing-model
  hints. Each layer's error message points at the right config knob.

Path 2's argument was "each family owns its quirks (samplers, schedulers,
dual-stage etc.)". That argument doesn't survive contact with the
substitution-map design: per-family knobs are just key/value fields in the
yaml block and `${shift}`/`${guidance}`/`${cfg}` placeholders in the
template. No code duplication, no inheritance to debug.

### Token substitution

`workflow_template.SubstituteWorkflow` walks the parsed JSON and replaces
every whole-value string of the form `"${key}"` with the typed value from
the substitution map. Numbers stay numbers, strings stay strings — no
round-tripping through `strings.Replace`.

The substitution map is built per call from:

1. **Request fields** (always present): `${prompt}`, `${negative}`,
   `${width}`, `${height}`, `${seed}`, `${steps}`, `${sampler}`,
   `${scheduler}`, `${cfg}`.
2. **Every scalar field from the yaml block** (string / int / int64 /
   float64 / bool), minus framework keys (`type`, `base_url`, `workflow`,
   `default_*`). So `${vae}`, `${clip}`, `${clip_l}`, `${clip_t5}`,
   `${dtype}`, `${shift}`, `${guidance}` all become substitutable just by
   being in yaml.
3. **Sensible defaults** for the common optional knobs above, so a
   workflow that references `${dtype}` without the user setting one in
   yaml still substitutes cleanly (`fp8_e4m3fn` for FLUX, `3.0` for SD3
   shift, etc.). Extra defaults are ignored by workflows that don't
   reference them.

Partial matches (e.g. `"prefix ${prompt} suffix"`) are deliberately **not**
substituted — the placeholder must be the entire value so we can preserve
its JSON type. This prevents a prompt containing literal `${seed}` text
from corrupting the workflow.

Unknown placeholders (referenced in JSON but missing from the substitution
map) error out before the workflow leaves the binary.

### Back-compat

The `workflow:` field defaults to `flux1-schnell` if omitted. Existing
yaml blocks like the pre-#10 FLUX.1-schnell instance:

```yaml
flux-schnell-local:
  type: comfyui
  base_url: http://mrock:8188
  model: flux1-schnell.safetensors
```

still work unchanged — they implicitly pick up the migrated
`flux1-schnell.json` template, which keeps the same node IDs (6, 8, 9, 10,
11, 12, 13, 27, 30, 31) as the historical hardcoded workflow.

## Bundled workflows

### FLUX.1-schnell — the back-compat default

| Field | Default | Notes |
|---|---|---|
| `model` | `flux1-schnell.safetensors` | drop in `models/unet/` |
| `vae` | `ae.safetensors` | `models/vae/` |
| `clip_l` | `clip_l.safetensors` | `models/clip/` |
| `clip_t5` | `t5xxl_fp8_e4m3fn.safetensors` | `models/clip/` |
| `dtype` | `fp8_e4m3fn` | weight dtype for the UNet loader |
| `default_steps` / `default_cfg` | 4 / 1.0 | schnell is distilled to ~4 steps |

VRAM peak ~10–12 GB at 1024×1024. Install path:
[`setup-comfyui-mrock.md`](setup-comfyui-mrock.md). Already shipping.

### FLUX.2 [klein] 4B — direct upgrade

Released by Black Forest Labs late 2025 / early 2026, BFL non-commercial
license. The distilled 4B "klein" variant lands sub-second on the RTX
4070 Ti SUPER and shares the new Qwen-based text encoder + a re-trained
VAE with the larger family.

```yaml
flux2-klein-local:
  type: comfyui
  base_url: http://mrock:8188
  workflow: flux2-klein
  model: flux-2-klein-base-4b-fp8.safetensors    # models/unet/
  vae: flux2-vae.safetensors                     # models/vae/
  clip: qwen_3_4b.safetensors                    # models/text_encoders/
  dtype: fp8_e4m3fn
  default_steps: 4
  default_cfg: 1.0
  guidance: 4.0
```

**Model downloads** (on mRock, ungated mirrors when available):

```bash
cd ~/dev/comfyui/models
curl -L -o unet/flux-2-klein-base-4b-fp8.safetensors \
  https://huggingface.co/black-forest-labs/FLUX.2-klein/resolve/main/flux-2-klein-base-4b-fp8.safetensors
curl -L -o vae/flux2-vae.safetensors \
  https://huggingface.co/black-forest-labs/FLUX.2-klein/resolve/main/flux2-vae.safetensors
mkdir -p text_encoders
curl -L -o text_encoders/qwen_3_4b.safetensors \
  https://huggingface.co/black-forest-labs/FLUX.2-klein/resolve/main/qwen_3_4b.safetensors
```

BFL's primary repo is gated; if `curl` returns 401, configure an HF token
in `~/.cache/huggingface/token` or use one of the community mirrors
(check the official model card for the current list). The filenames the
template references match BFL's canonical names — rename downloads to
match if a mirror uses different ones.

VRAM peak: ~8.5 GB (4B fp8). With Ollama parked at ~8 GB this still fits;
unlike FLUX.1-schnell, klein doesn't require stopping Ollama on mRock.

### SD3.5-medium — single-checkpoint variant

Stability AI's 2.5B mid-size model with bundled text encoders. The
`incl_clips_t5xxlfp8scaled` variant ships clip_g + clip_l + t5xxl_fp8 all
in one `.safetensors`, so the workflow uses `CheckpointLoaderSimple`
instead of separate UNet/VAE/CLIP loaders.

```yaml
sd35-medium-local:
  type: comfyui
  base_url: http://mrock:8188
  workflow: sd35-medium
  model: sd3.5_medium_incl_clips_t5xxlfp8scaled.safetensors  # models/checkpoints/
  default_steps: 28
  default_sampler: dpmpp_2m
  default_scheduler: sgm_uniform
  default_cfg: 4.5
  shift: 3.0
```

**Model download** (on mRock):

```bash
cd ~/dev/comfyui/models
curl -L -o checkpoints/sd3.5_medium_incl_clips_t5xxlfp8scaled.safetensors \
  https://huggingface.co/stabilityai/stable-diffusion-3.5-medium/resolve/main/sd3.5_medium_incl_clips_t5xxlfp8scaled.safetensors
```

VRAM peak: ~9.9 GB at 1024×1024. Same envelope as FLUX.1-schnell — stop
Ollama before generating, restart after.

## Adding a new bundled workflow

1. **Export from ComfyUI**: load the model in the ComfyUI GUI, build a
   text-to-image workflow that produces what you want, "Save (API
   Format)" — the file you get is the right shape.
2. **Sprinkle placeholders**: open the JSON and replace per-call values
   with `${name}` tokens. Whole-value substitution only:

   ```json
   "inputs": {
     "text": "${prompt}",         // was "a cat sitting on a chair"
     "seed": "${seed}",            // was 1234567
     "steps": "${steps}",          // was 28
     "cfg": "${cfg}",
     "sampler_name": "${sampler}",
     "scheduler": "${scheduler}",
     "width": "${width}",
     "height": "${height}"
   }
   ```

   Use `${model}` for the checkpoint / unet filename and any per-template
   knobs (`${vae}`, `${shift}`, `${guidance}`, `${clip}` …).
3. **Drop it into `internal/backend/workflows/<name>.json`**. The
   `//go:embed workflows/*.json` directive in `workflow_template.go`
   picks it up at build time — no registry entry needed.
4. **Add a yaml instance** in `internal/config/config.go`'s `Sample` block
   for `imagen config init` (and `~/.config/imagen.yaml`) so users
   discover the new backend.
5. **Document the model files + HF download URLs** in this doc.
6. **Smoke test**: `imagen generate "test" --backend <new-instance>
   --size 1024x1024` should produce an image.

Per-call overrides for sampler/scheduler/cfg go via `--steps`, `--seed`,
and (programmatic) `backend.Request.BackendOpts["sampler"]` /
`["scheduler"]` / `["cfg"]`. The compare harness forwards the
constant-across-backends knobs verbatim.

## Loading a workflow from disk (one-off)

Pass an absolute filesystem path as `workflow:` and the adapter reads it
from disk instead of the embedded FS. Handy for prototyping a new model
before committing it:

```yaml
my-experimental:
  type: comfyui
  base_url: http://mrock:8188
  workflow: /home/m/dev/comfyui/workflows/my-test.json
  model: my-test-model.safetensors
```

The fallback chain is: filesystem path (if the string looks like a path
or ends in `.json`), then bundled lookup by name, then bundled lookup
with `.json` appended.

## `imagen compare`: cross-backend evaluation

```bash
imagen compare "a wizard casting a spell" \
  --models flux-schnell-local,flux2-klein-local,sd35-medium-local \
  --size 1024x1024 \
  --output ~/Pictures/imagen/compare
```

Per run, `compare`:

- creates `<output>/<YYYYMMDD-HHMMSS>-<prompt-slug>/`
- dispatches each named backend sequentially (mRock has one GPU; parallel
  would OOM) — one backend's failure doesn't abort the run
- writes per-backend PNGs as `<prompt-slug>--<backend-slug>.png`
- writes `compare.json` listing every attempt (success + failure) with
  per-model `seed`, `latency_ms`, `model`, `vram_used_mib`, full
  `metadata` map, and the error string for any failure
- composites a `contact-sheet.png` with the prompt as header and each
  cell labelled `<backend>` / `<latency>ms · seed <n>`

Flags mirror `generate`: `--seed`, `--steps`, `--style`, `--negative`,
`--size` are shared across all backends. `--no-contact-sheet` skips the
composite when only the per-image PNGs and sidecar matter (e.g. for a
worker script that builds its own diff view).

## Diagnostics

`imagen backends` shows every instance with its registration state. For
local ComfyUI, the status is currently just `registered` (we don't probe
the upstream HTTP endpoint at startup — the boot-helper hint kicks in on
first generation if mRock is asleep).

Per-backend errors emit at most three kinds:

1. **Adapter construction failure** (e.g. workflow JSON not found,
   missing required yaml field). Caught at `buildBackend` time:
   `imagen: backend "<name>": <err>`.
2. **HTTP / runtime failure during Generate**. Wrapped with the boot
   helper for `connection refused`/`no such host`/timeouts pointing at
   `boot-whitetower mrock` so a sleeping mRock has an obvious next step.
3. **ComfyUI workflow-validation failure** (200-with-node_errors or 400).
   Surfaces with a model-not-found hint (matching `value_not_in_list` +
   `unet_name`/`ckpt_name`) when applicable, pointing back at this doc.

## Worker daemon notes

`imagen worker` (the `imagen.jobs` queue consumer) uses the same adapter
+ workflow lookup as the synchronous CLI — flexsiebels' `/imagine` UI
INSERTs a `backend = <instance>` row, the worker claims it, and the
underlying ComfyUI HTTP calls are identical to what `generate` makes. No
worker-specific changes are required when a new backend lands; the
config + workflow are the only state that has to be present on the
worker host.

After merging a new template or yaml block:

```bash
# On the worker host (mRiver today):
systemctl --user restart imagen-worker
```

The daemon-rebuild trap from issue #9 still applies: if you build the
imagen binary on the dev machine and `scp` it over, restart the unit so
systemd picks up the new ELF.