Files
ImaGen/docs/backends.md
mAi 8435817ce1 mAi: #10 - multi-model backend expansion (workflow templates + compare harness)
Path 1 architecture: one comfyui adapter, workflows as data.

- workflow_template.go: embed.FS + token substitution with type-preserving
  whole-value placeholders. ${prompt} → string, ${seed} → int64,
  ${cfg} → float64 — no JSON round-tripping. Partial matches ignored.
- comfyui.go: refactored to load workflow from embedded FS or filesystem
  path. Back-compat preserved: workflow: defaults to flux1-schnell.
- workflows/{flux1-schnell,flux2-klein,sd35-medium}.json — bundled
  templates. flux1-schnell migrated from hardcoded with identical node IDs.
- compare.go: new `imagen compare` subcommand. Sequential N-backend run
  (one GPU on mRock — parallel would OOM), per-backend PNG, sidecar JSON
  with per-model metadata + errors, composite contact sheet via Go image
  package (no ImageMagick dep).
- Sample config gains flux2-klein-local + sd35-medium-local instances.
- docs/backends.md: architecture rationale + per-model HF download paths
  + how to add a new bundled workflow + compare-harness reference.

Live smoke verified: compare mock + flux-schnell-local at 768×768 →
both PNGs written, sidecar JSON has workflow="flux1-schnell" + full
metadata, contact sheet renders. Worker contract (Request → Generate)
unchanged, so flexsiebels /imagine UI API surface preserved.

Tests: 11 existing comfyui + 6 new workflow_template + 5 new compare
tests, all green.

Adding a new model is now yaml + JSON, never Go.
2026-05-11 17:29:57 +02:00

12 KiB
Raw Permalink Blame History

ImaGen backends

This document covers the local-ComfyUI backend plug-in story: how adapters are layered, how to add a new model without touching Go, and the per-model setup steps for the bundled templates.

For the host-side ComfyUI install (mRock — venv, weights for the default FLUX.1-schnell, systemd, VRAM coexistence with Ollama, smoke test against the raw HTTP API), see setup-comfyui-mrock.md.

Architecture: Path 1 — workflow-template adapter

imagen generate and imagen compare dispatch through the comfyui adapter, which holds the HTTP plumbing (/prompt, /history/{id}, /view, /system_stats) and treats the workflow itself as data. Each backend instance in imagen.yaml picks a workflow JSON via the workflow: key. Adding a new model is yaml + JSON, never Go:

internal/backend/
  comfyui.go              # one adapter, all ComfyUI models
  workflow_template.go    # loader + token-substitution
  workflows/
    flux1-schnell.json    # bundled templates (embedded with //go:embed)
    flux2-klein.json
    sd35-medium.json

Why Path 1 over per-family adapters (comfyui-flux.go, comfyui-sd3.go…)

  • Workflow JSON is the natural exchange format. ComfyUI users export workflows from its GUI as JSON. Anything else means rebuilding the graph by hand in Go for every new model.
  • Adding a model is a config change, not a build change. With Path 2, every new family is a Go file, a new test file, a registry entry, a new worker binary, a redeploy. Path 1 lets us land a new model with one yaml block + one JSON file + one section in this doc.
  • The HTTP plumbing is identical across families. /prompt, /history, /view, the retry policy, the "value not in list" hint, VRAM reporting — none of it depends on the workflow shape. Path 2 would duplicate that across files.
  • Failure isolation stays clean. The workflow loader fails at adapter construction (imagen backends surfaces the error), the HTTP layer fails at Generate, and ComfyUI's own validation surfaces missing-model hints. Each layer's error message points at the right config knob.

Path 2's argument was "each family owns its quirks (samplers, schedulers, dual-stage etc.)". That argument doesn't survive contact with the substitution-map design: per-family knobs are just key/value fields in the yaml block and ${shift}/${guidance}/${cfg} placeholders in the template. No code duplication, no inheritance to debug.

Token substitution

workflow_template.SubstituteWorkflow walks the parsed JSON and replaces every whole-value string of the form "${key}" with the typed value from the substitution map. Numbers stay numbers, strings stay strings — no round-tripping through strings.Replace.

The substitution map is built per call from:

  1. Request fields (always present): ${prompt}, ${negative}, ${width}, ${height}, ${seed}, ${steps}, ${sampler}, ${scheduler}, ${cfg}.
  2. Every scalar field from the yaml block (string / int / int64 / float64 / bool), minus framework keys (type, base_url, workflow, default_*). So ${vae}, ${clip}, ${clip_l}, ${clip_t5}, ${dtype}, ${shift}, ${guidance} all become substitutable just by being in yaml.
  3. Sensible defaults for the common optional knobs above, so a workflow that references ${dtype} without the user setting one in yaml still substitutes cleanly (fp8_e4m3fn for FLUX, 3.0 for SD3 shift, etc.). Extra defaults are ignored by workflows that don't reference them.

Partial matches (e.g. "prefix ${prompt} suffix") are deliberately not substituted — the placeholder must be the entire value so we can preserve its JSON type. This prevents a prompt containing literal ${seed} text from corrupting the workflow.

Unknown placeholders (referenced in JSON but missing from the substitution map) error out before the workflow leaves the binary.

Back-compat

The workflow: field defaults to flux1-schnell if omitted. Existing yaml blocks like the pre-#10 FLUX.1-schnell instance:

flux-schnell-local:
  type: comfyui
  base_url: http://mrock:8188
  model: flux1-schnell.safetensors

still work unchanged — they implicitly pick up the migrated flux1-schnell.json template, which keeps the same node IDs (6, 8, 9, 10, 11, 12, 13, 27, 30, 31) as the historical hardcoded workflow.

Bundled workflows

FLUX.1-schnell — the back-compat default

Field Default Notes
model flux1-schnell.safetensors drop in models/unet/
vae ae.safetensors models/vae/
clip_l clip_l.safetensors models/clip/
clip_t5 t5xxl_fp8_e4m3fn.safetensors models/clip/
dtype fp8_e4m3fn weight dtype for the UNet loader
default_steps / default_cfg 4 / 1.0 schnell is distilled to ~4 steps

VRAM peak ~1012 GB at 1024×1024. Install path: setup-comfyui-mrock.md. Already shipping.

FLUX.2 [klein] 4B — direct upgrade

Released by Black Forest Labs late 2025 / early 2026, BFL non-commercial license. The distilled 4B "klein" variant lands sub-second on the RTX 4070 Ti SUPER and shares the new Qwen-based text encoder + a re-trained VAE with the larger family.

flux2-klein-local:
  type: comfyui
  base_url: http://mrock:8188
  workflow: flux2-klein
  model: flux-2-klein-base-4b-fp8.safetensors    # models/unet/
  vae: flux2-vae.safetensors                     # models/vae/
  clip: qwen_3_4b.safetensors                    # models/text_encoders/
  dtype: fp8_e4m3fn
  default_steps: 4
  default_cfg: 1.0
  guidance: 4.0

Model downloads (on mRock, ungated mirrors when available):

cd ~/dev/comfyui/models
curl -L -o unet/flux-2-klein-base-4b-fp8.safetensors \
  https://huggingface.co/black-forest-labs/FLUX.2-klein/resolve/main/flux-2-klein-base-4b-fp8.safetensors
curl -L -o vae/flux2-vae.safetensors \
  https://huggingface.co/black-forest-labs/FLUX.2-klein/resolve/main/flux2-vae.safetensors
mkdir -p text_encoders
curl -L -o text_encoders/qwen_3_4b.safetensors \
  https://huggingface.co/black-forest-labs/FLUX.2-klein/resolve/main/qwen_3_4b.safetensors

BFL's primary repo is gated; if curl returns 401, configure an HF token in ~/.cache/huggingface/token or use one of the community mirrors (check the official model card for the current list). The filenames the template references match BFL's canonical names — rename downloads to match if a mirror uses different ones.

VRAM peak: ~8.5 GB (4B fp8). With Ollama parked at ~8 GB this still fits; unlike FLUX.1-schnell, klein doesn't require stopping Ollama on mRock.

SD3.5-medium — single-checkpoint variant

Stability AI's 2.5B mid-size model with bundled text encoders. The incl_clips_t5xxlfp8scaled variant ships clip_g + clip_l + t5xxl_fp8 all in one .safetensors, so the workflow uses CheckpointLoaderSimple instead of separate UNet/VAE/CLIP loaders.

sd35-medium-local:
  type: comfyui
  base_url: http://mrock:8188
  workflow: sd35-medium
  model: sd3.5_medium_incl_clips_t5xxlfp8scaled.safetensors  # models/checkpoints/
  default_steps: 28
  default_sampler: dpmpp_2m
  default_scheduler: sgm_uniform
  default_cfg: 4.5
  shift: 3.0

Model download (on mRock):

cd ~/dev/comfyui/models
curl -L -o checkpoints/sd3.5_medium_incl_clips_t5xxlfp8scaled.safetensors \
  https://huggingface.co/stabilityai/stable-diffusion-3.5-medium/resolve/main/sd3.5_medium_incl_clips_t5xxlfp8scaled.safetensors

VRAM peak: ~9.9 GB at 1024×1024. Same envelope as FLUX.1-schnell — stop Ollama before generating, restart after.

Adding a new bundled workflow

  1. Export from ComfyUI: load the model in the ComfyUI GUI, build a text-to-image workflow that produces what you want, "Save (API Format)" — the file you get is the right shape.

  2. Sprinkle placeholders: open the JSON and replace per-call values with ${name} tokens. Whole-value substitution only:

    "inputs": {
      "text": "${prompt}",         // was "a cat sitting on a chair"
      "seed": "${seed}",            // was 1234567
      "steps": "${steps}",          // was 28
      "cfg": "${cfg}",
      "sampler_name": "${sampler}",
      "scheduler": "${scheduler}",
      "width": "${width}",
      "height": "${height}"
    }
    

    Use ${model} for the checkpoint / unet filename and any per-template knobs (${vae}, ${shift}, ${guidance}, ${clip} …).

  3. Drop it into internal/backend/workflows/<name>.json. The //go:embed workflows/*.json directive in workflow_template.go picks it up at build time — no registry entry needed.

  4. Add a yaml instance in internal/config/config.go's Sample block for imagen config init (and ~/.config/imagen.yaml) so users discover the new backend.

  5. Document the model files + HF download URLs in this doc.

  6. Smoke test: imagen generate "test" --backend <new-instance> --size 1024x1024 should produce an image.

Per-call overrides for sampler/scheduler/cfg go via --steps, --seed, and (programmatic) backend.Request.BackendOpts["sampler"] / ["scheduler"] / ["cfg"]. The compare harness forwards the constant-across-backends knobs verbatim.

Loading a workflow from disk (one-off)

Pass an absolute filesystem path as workflow: and the adapter reads it from disk instead of the embedded FS. Handy for prototyping a new model before committing it:

my-experimental:
  type: comfyui
  base_url: http://mrock:8188
  workflow: /home/m/dev/comfyui/workflows/my-test.json
  model: my-test-model.safetensors

The fallback chain is: filesystem path (if the string looks like a path or ends in .json), then bundled lookup by name, then bundled lookup with .json appended.

imagen compare: cross-backend evaluation

imagen compare "a wizard casting a spell" \
  --models flux-schnell-local,flux2-klein-local,sd35-medium-local \
  --size 1024x1024 \
  --output ~/Pictures/imagen/compare

Per run, compare:

  • creates <output>/<YYYYMMDD-HHMMSS>-<prompt-slug>/
  • dispatches each named backend sequentially (mRock has one GPU; parallel would OOM) — one backend's failure doesn't abort the run
  • writes per-backend PNGs as <prompt-slug>--<backend-slug>.png
  • writes compare.json listing every attempt (success + failure) with per-model seed, latency_ms, model, vram_used_mib, full metadata map, and the error string for any failure
  • composites a contact-sheet.png with the prompt as header and each cell labelled <backend> / <latency>ms · seed <n>

Flags mirror generate: --seed, --steps, --style, --negative, --size are shared across all backends. --no-contact-sheet skips the composite when only the per-image PNGs and sidecar matter (e.g. for a worker script that builds its own diff view).

Diagnostics

imagen backends shows every instance with its registration state. For local ComfyUI, the status is currently just registered (we don't probe the upstream HTTP endpoint at startup — the boot-helper hint kicks in on first generation if mRock is asleep).

Per-backend errors emit at most three kinds:

  1. Adapter construction failure (e.g. workflow JSON not found, missing required yaml field). Caught at buildBackend time: imagen: backend "<name>": <err>.
  2. HTTP / runtime failure during Generate. Wrapped with the boot helper for connection refused/no such host/timeouts pointing at boot-whitetower mrock so a sleeping mRock has an obvious next step.
  3. ComfyUI workflow-validation failure (200-with-node_errors or 400). Surfaces with a model-not-found hint (matching value_not_in_list + unet_name/ckpt_name) when applicable, pointing back at this doc.

Worker daemon notes

imagen worker (the imagen.jobs queue consumer) uses the same adapter

  • workflow lookup as the synchronous CLI — flexsiebels' /imagine UI INSERTs a backend = <instance> row, the worker claims it, and the underlying ComfyUI HTTP calls are identical to what generate makes. No worker-specific changes are required when a new backend lands; the config + workflow are the only state that has to be present on the worker host.

After merging a new template or yaml block:

# On the worker host (mRiver today):
systemctl --user restart imagen-worker

The daemon-rebuild trap from issue #9 still applies: if you build the imagen binary on the dev machine and scp it over, restart the unit so systemd picks up the new ELF.