mAi: #10 - multi-model backend expansion (workflow templates + compare harness)
Path 1 architecture: one comfyui adapter, workflows as data.
- workflow_template.go: embed.FS + token substitution with type-preserving
whole-value placeholders. ${prompt} → string, ${seed} → int64,
${cfg} → float64 — no JSON round-tripping. Partial matches ignored.
- comfyui.go: refactored to load workflow from embedded FS or filesystem
path. Back-compat preserved: workflow: defaults to flux1-schnell.
- workflows/{flux1-schnell,flux2-klein,sd35-medium}.json — bundled
templates. flux1-schnell migrated from hardcoded with identical node IDs.
- compare.go: new `imagen compare` subcommand. Sequential N-backend run
(one GPU on mRock — parallel would OOM), per-backend PNG, sidecar JSON
with per-model metadata + errors, composite contact sheet via Go image
package (no ImageMagick dep).
- Sample config gains flux2-klein-local + sd35-medium-local instances.
- docs/backends.md: architecture rationale + per-model HF download paths
+ how to add a new bundled workflow + compare-harness reference.
Live smoke verified: compare mock + flux-schnell-local at 768×768 →
both PNGs written, sidecar JSON has workflow="flux1-schnell" + full
metadata, contact sheet renders. Worker contract (Request → Generate)
unchanged, so flexsiebels /imagine UI API surface preserved.
Tests: 11 existing comfyui + 6 new workflow_template + 5 new compare
tests, all green.
Adding a new model is now yaml + JSON, never Go.
This commit is contained in:
310
docs/backends.md
Normal file
310
docs/backends.md
Normal file
@@ -0,0 +1,310 @@
|
||||
# ImaGen backends
|
||||
|
||||
This document covers the local-ComfyUI backend plug-in story: how adapters
|
||||
are layered, how to add a new model without touching Go, and the per-model
|
||||
setup steps for the bundled templates.
|
||||
|
||||
For the host-side ComfyUI install (mRock — venv, weights for the default
|
||||
FLUX.1-schnell, systemd, VRAM coexistence with Ollama, smoke test against
|
||||
the raw HTTP API), see [`setup-comfyui-mrock.md`](setup-comfyui-mrock.md).
|
||||
|
||||
## Architecture: Path 1 — workflow-template adapter
|
||||
|
||||
`imagen generate` and `imagen compare` dispatch through the `comfyui`
|
||||
adapter, which holds the HTTP plumbing (`/prompt`, `/history/{id}`, `/view`,
|
||||
`/system_stats`) and treats the workflow itself as data. Each backend
|
||||
instance in `imagen.yaml` picks a workflow JSON via the `workflow:` key.
|
||||
Adding a new model is yaml + JSON, never Go:
|
||||
|
||||
```
|
||||
internal/backend/
|
||||
comfyui.go # one adapter, all ComfyUI models
|
||||
workflow_template.go # loader + token-substitution
|
||||
workflows/
|
||||
flux1-schnell.json # bundled templates (embedded with //go:embed)
|
||||
flux2-klein.json
|
||||
sd35-medium.json
|
||||
```
|
||||
|
||||
### Why Path 1 over per-family adapters (`comfyui-flux.go`, `comfyui-sd3.go`…)
|
||||
|
||||
- **Workflow JSON is the natural exchange format**. ComfyUI users export
|
||||
workflows from its GUI as JSON. Anything else means rebuilding the graph
|
||||
by hand in Go for every new model.
|
||||
- **Adding a model is a config change, not a build change**. With Path 2,
|
||||
every new family is a Go file, a new test file, a registry entry, a new
|
||||
worker binary, a redeploy. Path 1 lets us land a new model with one yaml
|
||||
block + one JSON file + one section in this doc.
|
||||
- **The HTTP plumbing is identical across families**. `/prompt`,
|
||||
`/history`, `/view`, the retry policy, the "value not in list" hint, VRAM
|
||||
reporting — none of it depends on the workflow shape. Path 2 would
|
||||
duplicate that across files.
|
||||
- **Failure isolation stays clean**. The workflow loader fails at adapter
|
||||
construction (`imagen backends` surfaces the error), the HTTP layer
|
||||
fails at `Generate`, and ComfyUI's own validation surfaces missing-model
|
||||
hints. Each layer's error message points at the right config knob.
|
||||
|
||||
Path 2's argument was "each family owns its quirks (samplers, schedulers,
|
||||
dual-stage etc.)". That argument doesn't survive contact with the
|
||||
substitution-map design: per-family knobs are just key/value fields in the
|
||||
yaml block and `${shift}`/`${guidance}`/`${cfg}` placeholders in the
|
||||
template. No code duplication, no inheritance to debug.
|
||||
|
||||
### Token substitution
|
||||
|
||||
`workflow_template.SubstituteWorkflow` walks the parsed JSON and replaces
|
||||
every whole-value string of the form `"${key}"` with the typed value from
|
||||
the substitution map. Numbers stay numbers, strings stay strings — no
|
||||
round-tripping through `strings.Replace`.
|
||||
|
||||
The substitution map is built per call from:
|
||||
|
||||
1. **Request fields** (always present): `${prompt}`, `${negative}`,
|
||||
`${width}`, `${height}`, `${seed}`, `${steps}`, `${sampler}`,
|
||||
`${scheduler}`, `${cfg}`.
|
||||
2. **Every scalar field from the yaml block** (string / int / int64 /
|
||||
float64 / bool), minus framework keys (`type`, `base_url`, `workflow`,
|
||||
`default_*`). So `${vae}`, `${clip}`, `${clip_l}`, `${clip_t5}`,
|
||||
`${dtype}`, `${shift}`, `${guidance}` all become substitutable just by
|
||||
being in yaml.
|
||||
3. **Sensible defaults** for the common optional knobs above, so a
|
||||
workflow that references `${dtype}` without the user setting one in
|
||||
yaml still substitutes cleanly (`fp8_e4m3fn` for FLUX, `3.0` for SD3
|
||||
shift, etc.). Extra defaults are ignored by workflows that don't
|
||||
reference them.
|
||||
|
||||
Partial matches (e.g. `"prefix ${prompt} suffix"`) are deliberately **not**
|
||||
substituted — the placeholder must be the entire value so we can preserve
|
||||
its JSON type. This prevents a prompt containing literal `${seed}` text
|
||||
from corrupting the workflow.
|
||||
|
||||
Unknown placeholders (referenced in JSON but missing from the substitution
|
||||
map) error out before the workflow leaves the binary.
|
||||
|
||||
### Back-compat
|
||||
|
||||
The `workflow:` field defaults to `flux1-schnell` if omitted. Existing
|
||||
yaml blocks like the pre-#10 FLUX.1-schnell instance:
|
||||
|
||||
```yaml
|
||||
flux-schnell-local:
|
||||
type: comfyui
|
||||
base_url: http://mrock:8188
|
||||
model: flux1-schnell.safetensors
|
||||
```
|
||||
|
||||
still work unchanged — they implicitly pick up the migrated
|
||||
`flux1-schnell.json` template, which keeps the same node IDs (6, 8, 9, 10,
|
||||
11, 12, 13, 27, 30, 31) as the historical hardcoded workflow.
|
||||
|
||||
## Bundled workflows
|
||||
|
||||
### FLUX.1-schnell — the back-compat default
|
||||
|
||||
| Field | Default | Notes |
|
||||
|---|---|---|
|
||||
| `model` | `flux1-schnell.safetensors` | drop in `models/unet/` |
|
||||
| `vae` | `ae.safetensors` | `models/vae/` |
|
||||
| `clip_l` | `clip_l.safetensors` | `models/clip/` |
|
||||
| `clip_t5` | `t5xxl_fp8_e4m3fn.safetensors` | `models/clip/` |
|
||||
| `dtype` | `fp8_e4m3fn` | weight dtype for the UNet loader |
|
||||
| `default_steps` / `default_cfg` | 4 / 1.0 | schnell is distilled to ~4 steps |
|
||||
|
||||
VRAM peak ~10–12 GB at 1024×1024. Install path:
|
||||
[`setup-comfyui-mrock.md`](setup-comfyui-mrock.md). Already shipping.
|
||||
|
||||
### FLUX.2 [klein] 4B — direct upgrade
|
||||
|
||||
Released by Black Forest Labs late 2025 / early 2026, BFL non-commercial
|
||||
license. The distilled 4B "klein" variant lands sub-second on the RTX
|
||||
4070 Ti SUPER and shares the new Qwen-based text encoder + a re-trained
|
||||
VAE with the larger family.
|
||||
|
||||
```yaml
|
||||
flux2-klein-local:
|
||||
type: comfyui
|
||||
base_url: http://mrock:8188
|
||||
workflow: flux2-klein
|
||||
model: flux-2-klein-base-4b-fp8.safetensors # models/unet/
|
||||
vae: flux2-vae.safetensors # models/vae/
|
||||
clip: qwen_3_4b.safetensors # models/text_encoders/
|
||||
dtype: fp8_e4m3fn
|
||||
default_steps: 4
|
||||
default_cfg: 1.0
|
||||
guidance: 4.0
|
||||
```
|
||||
|
||||
**Model downloads** (on mRock, ungated mirrors when available):
|
||||
|
||||
```bash
|
||||
cd ~/dev/comfyui/models
|
||||
curl -L -o unet/flux-2-klein-base-4b-fp8.safetensors \
|
||||
https://huggingface.co/black-forest-labs/FLUX.2-klein/resolve/main/flux-2-klein-base-4b-fp8.safetensors
|
||||
curl -L -o vae/flux2-vae.safetensors \
|
||||
https://huggingface.co/black-forest-labs/FLUX.2-klein/resolve/main/flux2-vae.safetensors
|
||||
mkdir -p text_encoders
|
||||
curl -L -o text_encoders/qwen_3_4b.safetensors \
|
||||
https://huggingface.co/black-forest-labs/FLUX.2-klein/resolve/main/qwen_3_4b.safetensors
|
||||
```
|
||||
|
||||
BFL's primary repo is gated; if `curl` returns 401, configure an HF token
|
||||
in `~/.cache/huggingface/token` or use one of the community mirrors
|
||||
(check the official model card for the current list). The filenames the
|
||||
template references match BFL's canonical names — rename downloads to
|
||||
match if a mirror uses different ones.
|
||||
|
||||
VRAM peak: ~8.5 GB (4B fp8). With Ollama parked at ~8 GB this still fits;
|
||||
unlike FLUX.1-schnell, klein doesn't require stopping Ollama on mRock.
|
||||
|
||||
### SD3.5-medium — single-checkpoint variant
|
||||
|
||||
Stability AI's 2.5B mid-size model with bundled text encoders. The
|
||||
`incl_clips_t5xxlfp8scaled` variant ships clip_g + clip_l + t5xxl_fp8 all
|
||||
in one `.safetensors`, so the workflow uses `CheckpointLoaderSimple`
|
||||
instead of separate UNet/VAE/CLIP loaders.
|
||||
|
||||
```yaml
|
||||
sd35-medium-local:
|
||||
type: comfyui
|
||||
base_url: http://mrock:8188
|
||||
workflow: sd35-medium
|
||||
model: sd3.5_medium_incl_clips_t5xxlfp8scaled.safetensors # models/checkpoints/
|
||||
default_steps: 28
|
||||
default_sampler: dpmpp_2m
|
||||
default_scheduler: sgm_uniform
|
||||
default_cfg: 4.5
|
||||
shift: 3.0
|
||||
```
|
||||
|
||||
**Model download** (on mRock):
|
||||
|
||||
```bash
|
||||
cd ~/dev/comfyui/models
|
||||
curl -L -o checkpoints/sd3.5_medium_incl_clips_t5xxlfp8scaled.safetensors \
|
||||
https://huggingface.co/stabilityai/stable-diffusion-3.5-medium/resolve/main/sd3.5_medium_incl_clips_t5xxlfp8scaled.safetensors
|
||||
```
|
||||
|
||||
VRAM peak: ~9.9 GB at 1024×1024. Same envelope as FLUX.1-schnell — stop
|
||||
Ollama before generating, restart after.
|
||||
|
||||
## Adding a new bundled workflow
|
||||
|
||||
1. **Export from ComfyUI**: load the model in the ComfyUI GUI, build a
|
||||
text-to-image workflow that produces what you want, "Save (API
|
||||
Format)" — the file you get is the right shape.
|
||||
2. **Sprinkle placeholders**: open the JSON and replace per-call values
|
||||
with `${name}` tokens. Whole-value substitution only:
|
||||
|
||||
```json
|
||||
"inputs": {
|
||||
"text": "${prompt}", // was "a cat sitting on a chair"
|
||||
"seed": "${seed}", // was 1234567
|
||||
"steps": "${steps}", // was 28
|
||||
"cfg": "${cfg}",
|
||||
"sampler_name": "${sampler}",
|
||||
"scheduler": "${scheduler}",
|
||||
"width": "${width}",
|
||||
"height": "${height}"
|
||||
}
|
||||
```
|
||||
|
||||
Use `${model}` for the checkpoint / unet filename and any per-template
|
||||
knobs (`${vae}`, `${shift}`, `${guidance}`, `${clip}` …).
|
||||
3. **Drop it into `internal/backend/workflows/<name>.json`**. The
|
||||
`//go:embed workflows/*.json` directive in `workflow_template.go`
|
||||
picks it up at build time — no registry entry needed.
|
||||
4. **Add a yaml instance** in `internal/config/config.go`'s `Sample` block
|
||||
for `imagen config init` (and `~/.config/imagen.yaml`) so users
|
||||
discover the new backend.
|
||||
5. **Document the model files + HF download URLs** in this doc.
|
||||
6. **Smoke test**: `imagen generate "test" --backend <new-instance>
|
||||
--size 1024x1024` should produce an image.
|
||||
|
||||
Per-call overrides for sampler/scheduler/cfg go via `--steps`, `--seed`,
|
||||
and (programmatic) `backend.Request.BackendOpts["sampler"]` /
|
||||
`["scheduler"]` / `["cfg"]`. The compare harness forwards the
|
||||
constant-across-backends knobs verbatim.
|
||||
|
||||
## Loading a workflow from disk (one-off)
|
||||
|
||||
Pass an absolute filesystem path as `workflow:` and the adapter reads it
|
||||
from disk instead of the embedded FS. Handy for prototyping a new model
|
||||
before committing it:
|
||||
|
||||
```yaml
|
||||
my-experimental:
|
||||
type: comfyui
|
||||
base_url: http://mrock:8188
|
||||
workflow: /home/m/dev/comfyui/workflows/my-test.json
|
||||
model: my-test-model.safetensors
|
||||
```
|
||||
|
||||
The fallback chain is: filesystem path (if the string looks like a path
|
||||
or ends in `.json`), then bundled lookup by name, then bundled lookup
|
||||
with `.json` appended.
|
||||
|
||||
## `imagen compare`: cross-backend evaluation
|
||||
|
||||
```bash
|
||||
imagen compare "a wizard casting a spell" \
|
||||
--models flux-schnell-local,flux2-klein-local,sd35-medium-local \
|
||||
--size 1024x1024 \
|
||||
--output ~/Pictures/imagen/compare
|
||||
```
|
||||
|
||||
Per run, `compare`:
|
||||
|
||||
- creates `<output>/<YYYYMMDD-HHMMSS>-<prompt-slug>/`
|
||||
- dispatches each named backend sequentially (mRock has one GPU; parallel
|
||||
would OOM) — one backend's failure doesn't abort the run
|
||||
- writes per-backend PNGs as `<prompt-slug>--<backend-slug>.png`
|
||||
- writes `compare.json` listing every attempt (success + failure) with
|
||||
per-model `seed`, `latency_ms`, `model`, `vram_used_mib`, full
|
||||
`metadata` map, and the error string for any failure
|
||||
- composites a `contact-sheet.png` with the prompt as header and each
|
||||
cell labelled `<backend>` / `<latency>ms · seed <n>`
|
||||
|
||||
Flags mirror `generate`: `--seed`, `--steps`, `--style`, `--negative`,
|
||||
`--size` are shared across all backends. `--no-contact-sheet` skips the
|
||||
composite when only the per-image PNGs and sidecar matter (e.g. for a
|
||||
worker script that builds its own diff view).
|
||||
|
||||
## Diagnostics
|
||||
|
||||
`imagen backends` shows every instance with its registration state. For
|
||||
local ComfyUI, the status is currently just `registered` (we don't probe
|
||||
the upstream HTTP endpoint at startup — the boot-helper hint kicks in on
|
||||
first generation if mRock is asleep).
|
||||
|
||||
Per-backend errors emit at most three kinds:
|
||||
|
||||
1. **Adapter construction failure** (e.g. workflow JSON not found,
|
||||
missing required yaml field). Caught at `buildBackend` time:
|
||||
`imagen: backend "<name>": <err>`.
|
||||
2. **HTTP / runtime failure during Generate**. Wrapped with the boot
|
||||
helper for `connection refused`/`no such host`/timeouts pointing at
|
||||
`boot-whitetower mrock` so a sleeping mRock has an obvious next step.
|
||||
3. **ComfyUI workflow-validation failure** (200-with-node_errors or 400).
|
||||
Surfaces with a model-not-found hint (matching `value_not_in_list` +
|
||||
`unet_name`/`ckpt_name`) when applicable, pointing back at this doc.
|
||||
|
||||
## Worker daemon notes
|
||||
|
||||
`imagen worker` (the `imagen.jobs` queue consumer) uses the same adapter
|
||||
+ workflow lookup as the synchronous CLI — flexsiebels' `/imagine` UI
|
||||
INSERTs a `backend = <instance>` row, the worker claims it, and the
|
||||
underlying ComfyUI HTTP calls are identical to what `generate` makes. No
|
||||
worker-specific changes are required when a new backend lands; the
|
||||
config + workflow are the only state that has to be present on the
|
||||
worker host.
|
||||
|
||||
After merging a new template or yaml block:
|
||||
|
||||
```bash
|
||||
# On the worker host (mRiver today):
|
||||
systemctl --user restart imagen-worker
|
||||
```
|
||||
|
||||
The daemon-rebuild trap from issue #9 still applies: if you build the
|
||||
imagen binary on the dev machine and `scp` it over, restart the unit so
|
||||
systemd picks up the new ELF.
|
||||
Reference in New Issue
Block a user