First step of the model-agnostic image-generation framework. Lands the
plumbing other components (skill, ComfyUI/Replicate adapters, agents)
will plug into:
- internal/backend: Backend interface (Request/Result), thread-safe
Registry with init-time Register, plus a Mock reference adapter that
emits a deterministic gradient PNG for smoke tests.
- internal/config: YAML loader for ~/.config/imagen.yaml. Framework owns
default_backend + output settings + a per-backend block; each adapter
owns the schema below its own block via BackendSpec.Raw.
- internal/output: filename templating ({date}/{time}/{slug}/{seed}/
{backend}/{ext}), JSON metadata sidecar, --output override path.
- internal/prompt: embedded styles.yaml, style-preset suffix application.
- internal/server: 501 stub — HTTP surface lands in a follow-up issue.
- cmd/imagen: generate / backends / config (init|validate|path) / serve
/ version subcommands. Stdlib-only flag parsing with a small helper to
honour positional prompt args ahead of flags (matches the issue spec).
- Tests for output (slug, naming template, sidecar), backend (mock PNG
validity + determinism, registry build + duplicate panic), config
(round-trip + validation), prompt (style apply + unknown-style error).
- CLAUDE.md, README.md, docs/architecture.md, docs/usage.md, Makefile.
Acceptance criteria from #211:
1. go build ./... — clean
2. imagen backends — lists registered backends, exits 0
3. imagen generate "test prompt" --backend mock --output /tmp/x.png —
writes a 1024x1024 PNG plus an x.png.json sidecar
4. imagen config init | imagen config validate — round-trips cleanly
5. CLAUDE.md "Adding a new adapter" — six-step recipe
4.5 KiB
ImaGen — Project Instructions
ImaGen is a model-agnostic image-generation framework. It has a single
opinionated CLI (imagen) that dispatches to whichever backend the user
configured — local FLUX on mRock via ComfyUI today, Replicate or DALL-E
tomorrow, something else next year. The framework owns plumbing (config,
output, naming, sidecars, prompt enrichment); each adapter owns the schema
and lifecycle of its own block in ~/.config/imagen.yaml.
Architecture
cmd/imagen/ CLI shell — generate, backends, config, serve
internal/backend/ Backend interface + Registry + Mock reference impl
internal/prompt/ Style preset registry (embedded styles.yaml)
internal/output/ Filename templating, image writer, JSON sidecar
internal/config/ YAML loader, validation, sample generator
internal/server/ HTTP stub (not implemented yet — follow-up issue)
docs/ architecture.md, usage.md
Data flow for imagen generate:
- Parse flags, load config (
internal/config). - Resolve the requested instance name to a config block, then the block's
typeto a registered constructor inbackend.Default. - Apply style preset (
internal/prompt) to the prompt. - Call
backend.Generate(ctx, Request). The adapter returns a*Resultwith an image stream + metadata. - Stream to disk via
internal/output. Ifwrite_metadata_jsonis on, a sidecar<image>.jsonis written next to it.
Backend contract
type Backend interface {
Name() string
Generate(ctx context.Context, req Request) (*Result, error)
}
Request carries the cross-backend fields (prompt, negative, size, steps,
seed, style preset, free-form BackendOpts). Result returns the image
bytes via an io.ReadCloser, the MIME type, and a metadata map (model name,
seed actually used, latency, cost-estimate, …).
Adding a new adapter
- Create
internal/backend/<adapter>.go(e.g.comfyui.go). Define a struct that holds whatever the adapter needs (HTTP client, model id, token). - Add a constructor
func New<Adapter>(name string, cfg map[string]any) (Backend, error). Read fields fromcfg— that map is the adapter's own block fromimagen.yamlminus thetype:key. Resolve secrets from env vars (api_token_env,api_key_env) — never accept tokens inline. - Implement
Name()(return the user-facing instance name) andGenerate(ctx, Request). - In
init()callRegister("<type-name>", New<Adapter>). - Anonymous-import the package from
cmd/imagen/main.goif it lives in a separate package, so theinit()runs. - Add a smoke test under
internal/backend/<adapter>_test.go. Network tests should be guarded bytesting.Short()or an env var.
Config
~/.config/imagen.yaml (override with --config). Top-level keys:
default_backend— instance name used when--backendis omitted.output.directory/output.naming/output.write_metadata_json.backends:— map of instance-name →{type, …adapter-specific…}.
The framework parses type and stuffs the rest into BackendSpec.Raw. The
adapter is free to define any schema it likes inside its block.
Credentials
Never hardcode. Always reference env-var names from the config:
flux-dev-replicate:
type: replicate
api_token_env: REPLICATE_API_TOKEN
The adapter then os.Getenv("REPLICATE_API_TOKEN") at construction and fails
fast if unset. Tokens never go through imagen.yaml in plaintext.
How the /imagine skill calls into imagen
The skill (issue #4) wraps imagen generate and post-processes the path it
prints on stdout. Slash-command surface area:
/imagine "a cat in a fishbowl" --style blog-header --size 1024x1024
The skill resolves to imagen generate "<prompt>" --backend <default> … and
returns the image path so otto can attach it to a chat reply.
References
- mAi project conventions:
~/.m/docs/msystem.md - Backend follow-ups: ImaGen issues #2 (ComfyUI on mRock), #3 (Replicate), #4 (skill)
- mRock GPU: NVIDIA RTX 4070 Ti SUPER, 16 GB VRAM, runs Ollama + F5-TTS
House rules
- No technical debt. No TODOs in landed code. If something can't be done now, open an issue.
- All user-facing strings: ASCII or proper Unicode (Umlaute), never
ae/oe/ue. - Tests live next to the package they cover (
*_test.go). Notests/dir. go build ./...andgo test ./...must be clean before any commit.- Run
task build(ormake build) for the full build; both call intogo build -o bin/imagen ./cmd/imagen.