# ImaGen backends This document covers the local-ComfyUI backend plug-in story: how adapters are layered, how to add a new model without touching Go, and the per-model setup steps for the bundled templates. For the host-side ComfyUI install (mRock — venv, weights for the default FLUX.1-schnell, systemd, VRAM coexistence with Ollama, smoke test against the raw HTTP API), see [`setup-comfyui-mrock.md`](setup-comfyui-mrock.md). ## Architecture: Path 1 — workflow-template adapter `imagen generate` and `imagen compare` dispatch through the `comfyui` adapter, which holds the HTTP plumbing (`/prompt`, `/history/{id}`, `/view`, `/system_stats`) and treats the workflow itself as data. Each backend instance in `imagen.yaml` picks a workflow JSON via the `workflow:` key. Adding a new model is yaml + JSON, never Go: ``` internal/backend/ comfyui.go # one adapter, all ComfyUI models workflow_template.go # loader + token-substitution workflows/ flux1-schnell.json # bundled templates (embedded with //go:embed) flux2-klein.json sd35-medium.json ``` ### Why Path 1 over per-family adapters (`comfyui-flux.go`, `comfyui-sd3.go`…) - **Workflow JSON is the natural exchange format**. ComfyUI users export workflows from its GUI as JSON. Anything else means rebuilding the graph by hand in Go for every new model. - **Adding a model is a config change, not a build change**. With Path 2, every new family is a Go file, a new test file, a registry entry, a new worker binary, a redeploy. Path 1 lets us land a new model with one yaml block + one JSON file + one section in this doc. - **The HTTP plumbing is identical across families**. `/prompt`, `/history`, `/view`, the retry policy, the "value not in list" hint, VRAM reporting — none of it depends on the workflow shape. Path 2 would duplicate that across files. - **Failure isolation stays clean**. The workflow loader fails at adapter construction (`imagen backends` surfaces the error), the HTTP layer fails at `Generate`, and ComfyUI's own validation surfaces missing-model hints. Each layer's error message points at the right config knob. Path 2's argument was "each family owns its quirks (samplers, schedulers, dual-stage etc.)". That argument doesn't survive contact with the substitution-map design: per-family knobs are just key/value fields in the yaml block and `${shift}`/`${guidance}`/`${cfg}` placeholders in the template. No code duplication, no inheritance to debug. ### Token substitution `workflow_template.SubstituteWorkflow` walks the parsed JSON and replaces every whole-value string of the form `"${key}"` with the typed value from the substitution map. Numbers stay numbers, strings stay strings — no round-tripping through `strings.Replace`. The substitution map is built per call from: 1. **Request fields** (always present): `${prompt}`, `${negative}`, `${width}`, `${height}`, `${seed}`, `${steps}`, `${sampler}`, `${scheduler}`, `${cfg}`. 2. **Every scalar field from the yaml block** (string / int / int64 / float64 / bool), minus framework keys (`type`, `base_url`, `workflow`, `default_*`). So `${vae}`, `${clip}`, `${clip_l}`, `${clip_t5}`, `${dtype}`, `${shift}`, `${guidance}` all become substitutable just by being in yaml. 3. **Sensible defaults** for the common optional knobs above, so a workflow that references `${dtype}` without the user setting one in yaml still substitutes cleanly (`fp8_e4m3fn` for FLUX, `3.0` for SD3 shift, etc.). Extra defaults are ignored by workflows that don't reference them. Partial matches (e.g. `"prefix ${prompt} suffix"`) are deliberately **not** substituted — the placeholder must be the entire value so we can preserve its JSON type. This prevents a prompt containing literal `${seed}` text from corrupting the workflow. Unknown placeholders (referenced in JSON but missing from the substitution map) error out before the workflow leaves the binary. ### Back-compat The `workflow:` field defaults to `flux1-schnell` if omitted. Existing yaml blocks like the pre-#10 FLUX.1-schnell instance: ```yaml flux-schnell-local: type: comfyui base_url: http://mrock:8188 model: flux1-schnell.safetensors ``` still work unchanged — they implicitly pick up the migrated `flux1-schnell.json` template, which keeps the same node IDs (6, 8, 9, 10, 11, 12, 13, 27, 30, 31) as the historical hardcoded workflow. ## Bundled workflows ### FLUX.1-schnell — the back-compat default | Field | Default | Notes | |---|---|---| | `model` | `flux1-schnell.safetensors` | drop in `models/unet/` | | `vae` | `ae.safetensors` | `models/vae/` | | `clip_l` | `clip_l.safetensors` | `models/clip/` | | `clip_t5` | `t5xxl_fp8_e4m3fn.safetensors` | `models/clip/` | | `dtype` | `fp8_e4m3fn` | weight dtype for the UNet loader | | `default_steps` / `default_cfg` | 4 / 1.0 | schnell is distilled to ~4 steps | VRAM peak ~10–12 GB at 1024×1024. Install path: [`setup-comfyui-mrock.md`](setup-comfyui-mrock.md). Already shipping. ### FLUX.2 [klein] 4B — direct upgrade Released by Black Forest Labs late 2025 / early 2026, BFL non-commercial license. The distilled 4B "klein" variant lands sub-second on the RTX 4070 Ti SUPER and shares the new Qwen-based text encoder + a re-trained VAE with the larger family. ```yaml flux2-klein-local: type: comfyui base_url: http://mrock:8188 workflow: flux2-klein model: flux-2-klein-base-4b-fp8.safetensors # models/unet/ vae: flux2-vae.safetensors # models/vae/ clip: qwen_3_4b.safetensors # models/text_encoders/ dtype: fp8_e4m3fn default_steps: 4 default_cfg: 1.0 guidance: 4.0 ``` **Model downloads** (on mRock, ungated mirrors when available): ```bash cd ~/dev/comfyui/models curl -L -o unet/flux-2-klein-base-4b-fp8.safetensors \ https://huggingface.co/black-forest-labs/FLUX.2-klein/resolve/main/flux-2-klein-base-4b-fp8.safetensors curl -L -o vae/flux2-vae.safetensors \ https://huggingface.co/black-forest-labs/FLUX.2-klein/resolve/main/flux2-vae.safetensors mkdir -p text_encoders curl -L -o text_encoders/qwen_3_4b.safetensors \ https://huggingface.co/black-forest-labs/FLUX.2-klein/resolve/main/qwen_3_4b.safetensors ``` BFL's primary repo is gated; if `curl` returns 401, configure an HF token in `~/.cache/huggingface/token` or use one of the community mirrors (check the official model card for the current list). The filenames the template references match BFL's canonical names — rename downloads to match if a mirror uses different ones. VRAM peak: ~8.5 GB (4B fp8). With Ollama parked at ~8 GB this still fits; unlike FLUX.1-schnell, klein doesn't require stopping Ollama on mRock. ### SD3.5-medium — single-checkpoint variant Stability AI's 2.5B mid-size model with bundled text encoders. The `incl_clips_t5xxlfp8scaled` variant ships clip_g + clip_l + t5xxl_fp8 all in one `.safetensors`, so the workflow uses `CheckpointLoaderSimple` instead of separate UNet/VAE/CLIP loaders. ```yaml sd35-medium-local: type: comfyui base_url: http://mrock:8188 workflow: sd35-medium model: sd3.5_medium_incl_clips_t5xxlfp8scaled.safetensors # models/checkpoints/ default_steps: 28 default_sampler: dpmpp_2m default_scheduler: sgm_uniform default_cfg: 4.5 shift: 3.0 ``` **Model download** (on mRock): ```bash cd ~/dev/comfyui/models curl -L -o checkpoints/sd3.5_medium_incl_clips_t5xxlfp8scaled.safetensors \ https://huggingface.co/stabilityai/stable-diffusion-3.5-medium/resolve/main/sd3.5_medium_incl_clips_t5xxlfp8scaled.safetensors ``` VRAM peak: ~9.9 GB at 1024×1024. Same envelope as FLUX.1-schnell — stop Ollama before generating, restart after. ## Adding a new bundled workflow 1. **Export from ComfyUI**: load the model in the ComfyUI GUI, build a text-to-image workflow that produces what you want, "Save (API Format)" — the file you get is the right shape. 2. **Sprinkle placeholders**: open the JSON and replace per-call values with `${name}` tokens. Whole-value substitution only: ```json "inputs": { "text": "${prompt}", // was "a cat sitting on a chair" "seed": "${seed}", // was 1234567 "steps": "${steps}", // was 28 "cfg": "${cfg}", "sampler_name": "${sampler}", "scheduler": "${scheduler}", "width": "${width}", "height": "${height}" } ``` Use `${model}` for the checkpoint / unet filename and any per-template knobs (`${vae}`, `${shift}`, `${guidance}`, `${clip}` …). 3. **Drop it into `internal/backend/workflows/.json`**. The `//go:embed workflows/*.json` directive in `workflow_template.go` picks it up at build time — no registry entry needed. 4. **Add a yaml instance** in `internal/config/config.go`'s `Sample` block for `imagen config init` (and `~/.config/imagen.yaml`) so users discover the new backend. 5. **Document the model files + HF download URLs** in this doc. 6. **Smoke test**: `imagen generate "test" --backend --size 1024x1024` should produce an image. Per-call overrides for sampler/scheduler/cfg go via `--steps`, `--seed`, and (programmatic) `backend.Request.BackendOpts["sampler"]` / `["scheduler"]` / `["cfg"]`. The compare harness forwards the constant-across-backends knobs verbatim. ## Loading a workflow from disk (one-off) Pass an absolute filesystem path as `workflow:` and the adapter reads it from disk instead of the embedded FS. Handy for prototyping a new model before committing it: ```yaml my-experimental: type: comfyui base_url: http://mrock:8188 workflow: /home/m/dev/comfyui/workflows/my-test.json model: my-test-model.safetensors ``` The fallback chain is: filesystem path (if the string looks like a path or ends in `.json`), then bundled lookup by name, then bundled lookup with `.json` appended. ## `imagen compare`: cross-backend evaluation ```bash imagen compare "a wizard casting a spell" \ --models flux-schnell-local,flux2-klein-local,sd35-medium-local \ --size 1024x1024 \ --output ~/Pictures/imagen/compare ``` Per run, `compare`: - creates `/-/` - dispatches each named backend sequentially (mRock has one GPU; parallel would OOM) — one backend's failure doesn't abort the run - writes per-backend PNGs as `--.png` - writes `compare.json` listing every attempt (success + failure) with per-model `seed`, `latency_ms`, `model`, `vram_used_mib`, full `metadata` map, and the error string for any failure - composites a `contact-sheet.png` with the prompt as header and each cell labelled `` / `ms · seed ` Flags mirror `generate`: `--seed`, `--steps`, `--style`, `--negative`, `--size` are shared across all backends. `--no-contact-sheet` skips the composite when only the per-image PNGs and sidecar matter (e.g. for a worker script that builds its own diff view). ## Diagnostics `imagen backends` shows every instance with its registration state. For local ComfyUI, the status is currently just `registered` (we don't probe the upstream HTTP endpoint at startup — the boot-helper hint kicks in on first generation if mRock is asleep). Per-backend errors emit at most three kinds: 1. **Adapter construction failure** (e.g. workflow JSON not found, missing required yaml field). Caught at `buildBackend` time: `imagen: backend "": `. 2. **HTTP / runtime failure during Generate**. Wrapped with the boot helper for `connection refused`/`no such host`/timeouts pointing at `boot-whitetower mrock` so a sleeping mRock has an obvious next step. 3. **ComfyUI workflow-validation failure** (200-with-node_errors or 400). Surfaces with a model-not-found hint (matching `value_not_in_list` + `unet_name`/`ckpt_name`) when applicable, pointing back at this doc. ## Worker daemon notes `imagen worker` (the `imagen.jobs` queue consumer) uses the same adapter + workflow lookup as the synchronous CLI — flexsiebels' `/imagine` UI INSERTs a `backend = ` row, the worker claims it, and the underlying ComfyUI HTTP calls are identical to what `generate` makes. No worker-specific changes are required when a new backend lands; the config + workflow are the only state that has to be present on the worker host. After merging a new template or yaml block: ```bash # On the worker host (mRiver today): systemctl --user restart imagen-worker ``` The daemon-rebuild trap from issue #9 still applies: if you build the imagen binary on the dev machine and `scp` it over, restart the unit so systemd picks up the new ELF.