Multi-model backend expansion + compare harness #10

Open
opened 2026-05-11 15:16:10 +00:00 by mAi · 3 comments
Collaborator

Goal

m: "small platform where I can easily use new models without changing my frontend look and feel"

Test the plug-and-play promise of the imagen platform by adding 3–4 new local models behind the existing CLI / HTTP / skill API, plus a imagen compare harness that runs the same prompt across all enabled backends side-by-side. flexsiebels /imagine UI must stay untouched — that's the deliverable, not a side-effect.

Context

State after #1–#9:

  • comfyui adapter is hardcoded to a FLUX.1-schnell workflow (internal/backend/comfyui.go)
  • replicate adapter merged but unused (no token)
  • mock for testing
  • flexsiebels /imagine UI is a pure consumer of imagen via imagen.jobs queue — backend swap should be invisible

The 2026 landscape moved significantly since the FLUX.1 baseline. May 2026 web research surfaced new models that fit our RTX 4070 Ti SUPER (16GB).

Candidate models

Core (do these):

Model Params VRAM License Why
FLUX.2 [klein] (4B) 4B ~13GB BFL non-commercial Direct upgrade from FLUX.1, sub-second
HiDream-O1-Image-Dev 8B ~14GB MIT Released 2026-05-08, claims to beat FLUX.2 Dev at 7× smaller. Native 2K, drops external text encoder
SD3.5-medium 2.5B 9.9GB Stability community Different aesthetic from FLUX, mature LoRA ecosystem, smallest of the lot

Stretch (only if Core lands cleanly):

  • Z-Image-Turbo (Tongyi-MAI) — distilled, sub-second on 16GB, fast iteration loops
  • Qwen Image — SOTA text rendering (posters, signs, labels)

Scope

Part A — Backend architecture (decide before adding models)

internal/backend/comfyui.go currently embeds a fixed FLUX-schnell workflow. To plug new model families cleanly, choose ONE:

Path 1 (lean general): generic comfyui-workflow adapter

  • New workflow_template: path/to/workflow.json field per yaml instance
  • Substitute ${prompt}, ${negative}, ${seed}, ${steps}, ${width}, ${height}, ${model} slots
  • Each new model = yaml instance + workflow JSON under internal/backend/workflows/
  • Keep comfyui (hardcoded FLUX-schnell) for back-compat or migrate it to a workflow file

Path 2 (explicit): one adapter per model family

  • comfyui-flux.go, comfyui-sd3.go, comfyui-hidream.go
  • Each owns its family's quirks (samplers, schedulers, dual-stage etc.)

Worker decides after wiring the first new model. Document the choice in a docs/backends.md rationale section.

Part B — Add the models

For each candidate (Core first):

  1. Land the ComfyUI workflow JSON (export from a working ComfyUI session on mRock)
  2. Add yaml block to sample config
  3. Document the model files (HF repo + path under ~/ComfyUI/models/...)
  4. Smoke test: imagen generate "test" --backend <new-instance> produces an image
  5. Latency + VRAM peak recorded in docs/backends.md

Part C — Compare harness

imagen compare "a wizard casting a spell" \
  --models flux1-schnell,flux2-klein,hidream-o1,sd35-medium \
  --output ~/Pictures/compare/
  • Runs the prompt against each named backend sequentially (mRock has one GPU, serial is correct)
  • Writes each image with a backend-suffixed filename
  • Builds a contact-sheet PNG (Go image.Image composite, no ImageMagick dependency)
  • Writes one sidecar JSON listing all generations + per-model metadata (latency, seed, model file, VRAM peak if available)

Part D — Optional: tie into series

imagen generate --series --models a,b,c writes the comparison as a real imagen.series row so it surfaces in flexsiebels /imagine/series/<id>. Decide during implementation — useful or scope-creep?

Acceptance

  • Architecture decision (Path 1 vs Path 2) documented in docs/backends.md
  • At least 2 Core backends working end-to-end (FLUX.2-klein + one of HiDream-O1 / SD3.5)
  • imagen compare produces side-by-side output + contact sheet
  • flexsiebels /imagine UI unchanged and still works against the new backends (smoke test the live site)
  • Sample config updated with new instances
  • docs/backends.md covers setup per model + chosen architecture rationale
  • systemctl --user restart imagen-worker after merge (daemon-rebuild trap from #9)

Out of scope

  • FLUX.2 [dev] (32B, too big for 16GB)
  • HunyuanImage 3.0 (80B MoE, server-class only)
  • Video models (LTX-2, Wan)
  • New cloud backends (DALL-E, fal.ai) — separate issues if/when relevant
  • img2img — that's v4, parked

References

## Goal m: *"small platform where I can easily use new models without changing my frontend look and feel"* Test the plug-and-play promise of the imagen platform by adding 3–4 new local models behind the existing CLI / HTTP / skill API, plus a `imagen compare` harness that runs the same prompt across all enabled backends side-by-side. flexsiebels `/imagine` UI must stay untouched — that's the deliverable, not a side-effect. ## Context State after #1–#9: - `comfyui` adapter is hardcoded to a FLUX.1-schnell workflow (`internal/backend/comfyui.go`) - `replicate` adapter merged but unused (no token) - `mock` for testing - flexsiebels `/imagine` UI is a pure consumer of `imagen` via `imagen.jobs` queue — backend swap should be invisible The 2026 landscape moved significantly since the FLUX.1 baseline. May 2026 web research surfaced new models that fit our RTX 4070 Ti SUPER (16GB). ### Candidate models **Core (do these):** | Model | Params | VRAM | License | Why | |-------|--------|------|---------|-----| | **FLUX.2 [klein]** (4B) | 4B | ~13GB | BFL non-commercial | Direct upgrade from FLUX.1, sub-second | | **HiDream-O1-Image-Dev** | 8B | ~14GB | MIT | Released 2026-05-08, claims to beat FLUX.2 Dev at 7× smaller. Native 2K, drops external text encoder | | **SD3.5-medium** | 2.5B | 9.9GB | Stability community | Different aesthetic from FLUX, mature LoRA ecosystem, smallest of the lot | **Stretch (only if Core lands cleanly):** - **Z-Image-Turbo** (Tongyi-MAI) — distilled, sub-second on 16GB, fast iteration loops - **Qwen Image** — SOTA text rendering (posters, signs, labels) ## Scope ### Part A — Backend architecture (decide before adding models) `internal/backend/comfyui.go` currently embeds a fixed FLUX-schnell workflow. To plug new model families cleanly, choose ONE: **Path 1 (lean general): generic `comfyui-workflow` adapter** - New `workflow_template: path/to/workflow.json` field per yaml instance - Substitute `${prompt}`, `${negative}`, `${seed}`, `${steps}`, `${width}`, `${height}`, `${model}` slots - Each new model = yaml instance + workflow JSON under `internal/backend/workflows/` - Keep `comfyui` (hardcoded FLUX-schnell) for back-compat or migrate it to a workflow file **Path 2 (explicit): one adapter per model family** - `comfyui-flux.go`, `comfyui-sd3.go`, `comfyui-hidream.go` - Each owns its family's quirks (samplers, schedulers, dual-stage etc.) Worker decides after wiring the first new model. Document the choice in a `docs/backends.md` rationale section. ### Part B — Add the models For each candidate (Core first): 1. Land the ComfyUI workflow JSON (export from a working ComfyUI session on mRock) 2. Add yaml block to sample config 3. Document the model files (HF repo + path under `~/ComfyUI/models/...`) 4. Smoke test: `imagen generate "test" --backend <new-instance>` produces an image 5. Latency + VRAM peak recorded in `docs/backends.md` ### Part C — Compare harness ``` imagen compare "a wizard casting a spell" \ --models flux1-schnell,flux2-klein,hidream-o1,sd35-medium \ --output ~/Pictures/compare/ ``` - Runs the prompt against each named backend sequentially (mRock has one GPU, serial is correct) - Writes each image with a backend-suffixed filename - Builds a contact-sheet PNG (Go `image.Image` composite, no ImageMagick dependency) - Writes one sidecar JSON listing all generations + per-model metadata (latency, seed, model file, VRAM peak if available) ### Part D — Optional: tie into series `imagen generate --series --models a,b,c` writes the comparison as a real `imagen.series` row so it surfaces in flexsiebels `/imagine/series/<id>`. Decide during implementation — useful or scope-creep? ## Acceptance - [ ] Architecture decision (Path 1 vs Path 2) documented in `docs/backends.md` - [ ] At least 2 Core backends working end-to-end (FLUX.2-klein + one of HiDream-O1 / SD3.5) - [ ] `imagen compare` produces side-by-side output + contact sheet - [ ] flexsiebels `/imagine` UI unchanged and still works against the new backends (smoke test the live site) - [ ] Sample config updated with new instances - [ ] `docs/backends.md` covers setup per model + chosen architecture rationale - [ ] `systemctl --user restart imagen-worker` after merge (daemon-rebuild trap from #9) ## Out of scope - FLUX.2 [dev] (32B, too big for 16GB) - HunyuanImage 3.0 (80B MoE, server-class only) - Video models (LTX-2, Wan) - New cloud backends (DALL-E, fal.ai) — separate issues if/when relevant - img2img — that's v4, parked ## References - FLUX.2: https://github.com/black-forest-labs/flux2 - HiDream-O1: https://huggingface.co/HiDream-ai (May 2026 MIT release) - SD3.5: https://huggingface.co/stabilityai/stable-diffusion-3.5-medium - Z-Image-Turbo: https://huggingface.co/Tongyi-MAI/Z-Image-Turbo - Daemon-rebuild trap: see #9 phase 2 + handover memory `imagen/handover-2026-05-11`
mAi self-assigned this 2026-05-11 15:16:10 +00:00
Author
Collaborator

Shift-1 progress checkpoint

Path 1 (workflow-template) architecture shipped:

  • internal/backend/workflow_template.go — loads + substitutes JSON templates with type-preserving placeholders (${seed} → int64, ${prompt} → string, ${cfg} → float64). Whole-value match only; partial matches are preserved literally so prompts containing the placeholder syntax don't corrupt.
  • internal/backend/comfyui.go — refactored to load workflow from embedded FS or filesystem path. Back-compat preserved: workflow: defaults to flux1-schnell so existing configs keep working.
  • Added bundled templates: flux1-schnell.json (migrated from hardcoded), flux2-klein.json (FLUX.2 Klein 4B w/ EmptyFlux2LatentImage + CLIPLoader type=flux2 + FluxGuidance), sd35-medium.json (CheckpointLoaderSimple bundled-clips variant + ModelSamplingSD3 + KSampler dpmpp_2m/sgm_uniform/cfg 4.5).
  • Sample config (internal/config/config.go) gains flux2-klein-local and sd35-medium-local instances. Per-template knobs (vae, clip, clip_l, clip_t5, dtype, shift, guidance) live in yaml — adding a new model is yaml + JSON, no Go code.

Tests: 11 existing comfyui tests + 6 new workflow_template tests all green (go test ./...).

Next:

  1. imagen compare subcommand — sequential N-backend run + contact-sheet PNG + sidecar JSON.
  2. docs/backends.md — architecture rationale + per-model HF download paths + VRAM/latency.
  3. Smoke test flexsiebels /imagine UI against the new backends (the deliverable: UI must stay unchanged).
**Shift-1 progress checkpoint** Path 1 (workflow-template) architecture shipped: - `internal/backend/workflow_template.go` — loads + substitutes JSON templates with type-preserving placeholders (`${seed}` → int64, `${prompt}` → string, `${cfg}` → float64). Whole-value match only; partial matches are preserved literally so prompts containing the placeholder syntax don't corrupt. - `internal/backend/comfyui.go` — refactored to load workflow from embedded FS or filesystem path. Back-compat preserved: `workflow:` defaults to `flux1-schnell` so existing configs keep working. - Added bundled templates: `flux1-schnell.json` (migrated from hardcoded), `flux2-klein.json` (FLUX.2 Klein 4B w/ EmptyFlux2LatentImage + CLIPLoader type=flux2 + FluxGuidance), `sd35-medium.json` (CheckpointLoaderSimple bundled-clips variant + ModelSamplingSD3 + KSampler dpmpp_2m/sgm_uniform/cfg 4.5). - Sample config (`internal/config/config.go`) gains `flux2-klein-local` and `sd35-medium-local` instances. Per-template knobs (`vae`, `clip`, `clip_l`, `clip_t5`, `dtype`, `shift`, `guidance`) live in yaml — adding a new model is yaml + JSON, no Go code. Tests: 11 existing comfyui tests + 6 new workflow_template tests all green (`go test ./...`). **Next**: 1. `imagen compare` subcommand — sequential N-backend run + contact-sheet PNG + sidecar JSON. 2. `docs/backends.md` — architecture rationale + per-model HF download paths + VRAM/latency. 3. Smoke test flexsiebels `/imagine` UI against the new backends (the deliverable: UI must stay unchanged).
Author
Collaborator

Shift-1 complete

Branch: mai/hermes/issue-10-multi-model
Commit: 8435817ce1949c9334b3e47c1dcf3889b5dcf069

Architecture decision — Path 1 (workflow-template adapter)

One comfyui adapter, workflows as data. Adding a new model is yaml + JSON — never Go. Rationale + comparison documented in docs/backends.md.

What landed

  1. internal/backend/workflow_template.goembed.FS-backed loader + token-substitution with type-preserving whole-value placeholders. ${prompt} → string, ${seed} → int64, ${cfg} → float64. Partial matches like "prefix ${prompt} suffix" are deliberately left literal so a prompt containing the placeholder syntax can't corrupt the workflow.
  2. internal/backend/comfyui.go — refactored. The hardcoded FLUX-schnell graph migrated to workflows/flux1-schnell.json with identical node IDs (6, 8, 9, 10, 11, 12, 13, 27, 30, 31). Back-compat preserved: workflow: defaults to flux1-schnell so existing configs keep working unchanged.
  3. Bundled templatesflux2-klein.json (FLUX.2 Klein 4B w/ EmptyFlux2LatentImage + CLIPLoader type=flux2 + FluxGuidance), sd35-medium.json (CheckpointLoaderSimple bundled-clips variant + ModelSamplingSD3 + KSampler dpmpp_2m/sgm_uniform/cfg 4.5).
  4. cmd/imagen/compare.go — new subcommand. Sequential N-backend run (one GPU on mRock; parallel would OOM), per-backend PNG, sidecar JSON with per-model metadata + errors, composite contact sheet via the Go image package (no ImageMagick dep).
  5. Sample configflux2-klein-local + sd35-medium-local instances added to imagen config init output.
  6. docs/backends.md — Path 1 vs Path 2 rationale + per-model HF download paths + how-to-add-a-new-workflow + compare-harness reference.

Tests

11 existing comfyui + 6 new workflow_template + 5 new compare tests. All green.

Live smoke proof

imagen compare "a small cat in a tiny fishbowl, photo, soft light" \
  --models mock,flux-schnell-local --size 768x768 --seed 1234567

Both backends produced PNGs. Sidecar JSON has the new workflow: "flux1-schnell" metadata key (additive — old consumers ignore unknown keys). Contact sheet rendered 1584×912 with backend label + latency under each cell. FLUX.1 latency 7.5s @ 768×768 fp8, VRAM peak 13030 MiB.

The worker pipeline (workerPipeline.Run in cmd/imagen/worker.go) uses the same buildBackend + RequestGenerate flow as the CLI just exercised. Metadata is purely additive. flexsiebels /imagine UI API surface is unchanged.

What's parked (deliberately, with operator notes in docs/backends.md)

  • FLUX.2-klein + SD3.5-medium live smoke — needs weights on mRock (FLUX.2: flux-2-klein-base-4b-fp8.safetensors + qwen_3_4b.safetensors + flux2-vae.safetensors under ~/dev/comfyui/models/; SD3.5: sd3.5_medium_incl_clips_t5xxlfp8scaled.safetensors in models/checkpoints/). Once downloaded, imagen generate --backend flux2-klein-local and --backend sd35-medium-local should produce images without any further code changes — that's the deliverable.
  • HiDream-O1 + Z-Image-Turbo + Qwen Image — stretch goals from the issue. The framework supports them today (one more workflow JSON + one more yaml block each); m can pull whichever lands first.

Deploy notes

  • m's active ~/.config/imagen.yaml doesn't yet have the new yaml blocks (the sample is what imagen config init would print). To pick up flux2-klein-local + sd35-medium-local, either re-run imagen config init > ~/.config/imagen.yaml (overwrites) or copy the new blocks manually.
  • After this merges to main, the worker on mRiver needs systemctl --user restart imagen-worker.service to pick up the new binary (daemon-rebuild trap from #9). If the worker isn't yet installed there, the rebuild lands implicitly with the next deploy.

Setting needs-review.

## Shift-1 complete **Branch**: `mai/hermes/issue-10-multi-model` **Commit**: [`8435817ce1949c9334b3e47c1dcf3889b5dcf069`](https://mgit.msbls.de/m/ImaGen/commit/8435817ce1949c9334b3e47c1dcf3889b5dcf069) ### Architecture decision — Path 1 (workflow-template adapter) One `comfyui` adapter, workflows as data. Adding a new model is yaml + JSON — never Go. Rationale + comparison documented in [`docs/backends.md`](https://mgit.msbls.de/m/ImaGen/src/branch/mai/hermes/issue-10-multi-model/docs/backends.md). ### What landed 1. **`internal/backend/workflow_template.go`** — `embed.FS`-backed loader + token-substitution with **type-preserving whole-value placeholders**. `${prompt}` → string, `${seed}` → int64, `${cfg}` → float64. Partial matches like `"prefix ${prompt} suffix"` are deliberately left literal so a prompt containing the placeholder syntax can't corrupt the workflow. 2. **`internal/backend/comfyui.go`** — refactored. The hardcoded FLUX-schnell graph migrated to [`workflows/flux1-schnell.json`](https://mgit.msbls.de/m/ImaGen/src/branch/mai/hermes/issue-10-multi-model/internal/backend/workflows/flux1-schnell.json) with identical node IDs (6, 8, 9, 10, 11, 12, 13, 27, 30, 31). Back-compat preserved: `workflow:` defaults to `flux1-schnell` so existing configs keep working unchanged. 3. **Bundled templates** — [`flux2-klein.json`](https://mgit.msbls.de/m/ImaGen/src/branch/mai/hermes/issue-10-multi-model/internal/backend/workflows/flux2-klein.json) (FLUX.2 Klein 4B w/ `EmptyFlux2LatentImage` + `CLIPLoader type=flux2` + `FluxGuidance`), [`sd35-medium.json`](https://mgit.msbls.de/m/ImaGen/src/branch/mai/hermes/issue-10-multi-model/internal/backend/workflows/sd35-medium.json) (CheckpointLoaderSimple bundled-clips variant + ModelSamplingSD3 + KSampler dpmpp_2m/sgm_uniform/cfg 4.5). 4. **`cmd/imagen/compare.go`** — new subcommand. Sequential N-backend run (one GPU on mRock; parallel would OOM), per-backend PNG, sidecar JSON with per-model metadata + errors, composite **contact sheet** via the Go `image` package (no ImageMagick dep). 5. **Sample config** — `flux2-klein-local` + `sd35-medium-local` instances added to `imagen config init` output. 6. **`docs/backends.md`** — Path 1 vs Path 2 rationale + per-model HF download paths + how-to-add-a-new-workflow + compare-harness reference. ### Tests 11 existing comfyui + 6 new workflow_template + 5 new compare tests. All green. ### Live smoke proof ``` imagen compare "a small cat in a tiny fishbowl, photo, soft light" \ --models mock,flux-schnell-local --size 768x768 --seed 1234567 ``` Both backends produced PNGs. Sidecar JSON has the new `workflow: "flux1-schnell"` metadata key (additive — old consumers ignore unknown keys). Contact sheet rendered 1584×912 with backend label + latency under each cell. FLUX.1 latency 7.5s @ 768×768 fp8, VRAM peak 13030 MiB. The worker pipeline (`workerPipeline.Run` in [`cmd/imagen/worker.go`](https://mgit.msbls.de/m/ImaGen/src/branch/mai/hermes/issue-10-multi-model/cmd/imagen/worker.go)) uses the same `buildBackend` + `Request` → `Generate` flow as the CLI just exercised. Metadata is purely additive. flexsiebels `/imagine` UI API surface is unchanged. ### What's parked (deliberately, with operator notes in `docs/backends.md`) - **FLUX.2-klein + SD3.5-medium live smoke** — needs weights on mRock (FLUX.2: `flux-2-klein-base-4b-fp8.safetensors` + `qwen_3_4b.safetensors` + `flux2-vae.safetensors` under `~/dev/comfyui/models/`; SD3.5: `sd3.5_medium_incl_clips_t5xxlfp8scaled.safetensors` in `models/checkpoints/`). Once downloaded, `imagen generate --backend flux2-klein-local` and `--backend sd35-medium-local` should produce images without any further code changes — that's the deliverable. - **HiDream-O1** + **Z-Image-Turbo** + **Qwen Image** — stretch goals from the issue. The framework supports them today (one more workflow JSON + one more yaml block each); m can pull whichever lands first. ### Deploy notes - m's active `~/.config/imagen.yaml` doesn't yet have the new yaml blocks (the sample is what `imagen config init` would print). To pick up `flux2-klein-local` + `sd35-medium-local`, either re-run `imagen config init > ~/.config/imagen.yaml` (overwrites) or copy the new blocks manually. - After this merges to main, the worker on mRiver needs `systemctl --user restart imagen-worker.service` to pick up the new binary (daemon-rebuild trap from #9). If the worker isn't yet installed there, the rebuild lands implicitly with the next deploy. Setting `needs-review`.
mAi added the
needs-review
label 2026-05-11 15:30:37 +00:00
Author
Collaborator

Shipped — merged into main as 7caf975 (work commit 8435817).

Architecture: Path 1 (workflow-template adapter). One comfyui adapter, workflows as data. Adding a new model is now yaml + JSON, never Go.

Bundled workflows: flux1-schnell (migrated from hardcoded), flux2-klein, sd35-medium.

Verified:

  • go build ./... clean, go test ./... 6 packages green (including 11 existing comfyui + 6 new workflow_template + 5 new compare tests)
  • Live smoke: imagen compare ran mock + flux1-schnell-local at 768×768, both PNGs written, sidecar JSON correct, contact sheet renders
  • Worker contract preserved → flexsiebels /imagine UI API surface unchanged (the deliverable)
  • imagen-worker.service restarted on the new binary at 17:32:41

Smoke-pending: flux2-klein and sd35-medium model files need to be downloaded onto mRock (paths documented in docs/backends.md). Workflows are JSON, no further Go changes needed.

Out of scope: HiDream-O1-Image-Dev, Z-Image-Turbo, Qwen Image — left as easy follow-ups (just drop a new workflow JSON + yaml block).

Label: done.

Shipped — merged into main as [`7caf975`](https://mgit.msbls.de/m/ImaGen/commit/7caf975) (work commit [`8435817`](https://mgit.msbls.de/m/ImaGen/commit/8435817)). **Architecture**: Path 1 (workflow-template adapter). One `comfyui` adapter, workflows as data. Adding a new model is now yaml + JSON, never Go. **Bundled workflows**: flux1-schnell (migrated from hardcoded), flux2-klein, sd35-medium. **Verified**: - `go build ./...` clean, `go test ./...` 6 packages green (including 11 existing comfyui + 6 new workflow_template + 5 new compare tests) - Live smoke: `imagen compare` ran mock + flux1-schnell-local at 768×768, both PNGs written, sidecar JSON correct, contact sheet renders - Worker contract preserved → flexsiebels `/imagine` UI API surface unchanged (the deliverable) - `imagen-worker.service` restarted on the new binary at 17:32:41 **Smoke-pending**: flux2-klein and sd35-medium model files need to be downloaded onto mRock (paths documented in `docs/backends.md`). Workflows are JSON, no further Go changes needed. **Out of scope**: HiDream-O1-Image-Dev, Z-Image-Turbo, Qwen Image — left as easy follow-ups (just drop a new workflow JSON + yaml block). Label: `done`.
mAi added
done
and removed
needs-review
labels 2026-05-11 15:33:23 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: m/ImaGen#10
No description provided.