Files
ImaGen/CLAUDE.md
mAi 237270b204 mAi: #211 - bootstrap ImaGen framework skeleton
First step of the model-agnostic image-generation framework. Lands the
plumbing other components (skill, ComfyUI/Replicate adapters, agents)
will plug into:

- internal/backend: Backend interface (Request/Result), thread-safe
  Registry with init-time Register, plus a Mock reference adapter that
  emits a deterministic gradient PNG for smoke tests.
- internal/config: YAML loader for ~/.config/imagen.yaml. Framework owns
  default_backend + output settings + a per-backend block; each adapter
  owns the schema below its own block via BackendSpec.Raw.
- internal/output: filename templating ({date}/{time}/{slug}/{seed}/
  {backend}/{ext}), JSON metadata sidecar, --output override path.
- internal/prompt: embedded styles.yaml, style-preset suffix application.
- internal/server: 501 stub — HTTP surface lands in a follow-up issue.
- cmd/imagen: generate / backends / config (init|validate|path) / serve
  / version subcommands. Stdlib-only flag parsing with a small helper to
  honour positional prompt args ahead of flags (matches the issue spec).
- Tests for output (slug, naming template, sidecar), backend (mock PNG
  validity + determinism, registry build + duplicate panic), config
  (round-trip + validation), prompt (style apply + unknown-style error).
- CLAUDE.md, README.md, docs/architecture.md, docs/usage.md, Makefile.

Acceptance criteria from #211:
1. go build ./... — clean
2. imagen backends — lists registered backends, exits 0
3. imagen generate "test prompt" --backend mock --output /tmp/x.png —
   writes a 1024x1024 PNG plus an x.png.json sidecar
4. imagen config init | imagen config validate — round-trips cleanly
5. CLAUDE.md "Adding a new adapter" — six-step recipe
2026-05-08 14:37:05 +02:00

4.5 KiB

ImaGen — Project Instructions

ImaGen is a model-agnostic image-generation framework. It has a single opinionated CLI (imagen) that dispatches to whichever backend the user configured — local FLUX on mRock via ComfyUI today, Replicate or DALL-E tomorrow, something else next year. The framework owns plumbing (config, output, naming, sidecars, prompt enrichment); each adapter owns the schema and lifecycle of its own block in ~/.config/imagen.yaml.

Architecture

cmd/imagen/                CLI shell — generate, backends, config, serve
internal/backend/          Backend interface + Registry + Mock reference impl
internal/prompt/           Style preset registry (embedded styles.yaml)
internal/output/           Filename templating, image writer, JSON sidecar
internal/config/           YAML loader, validation, sample generator
internal/server/           HTTP stub (not implemented yet — follow-up issue)
docs/                      architecture.md, usage.md

Data flow for imagen generate:

  1. Parse flags, load config (internal/config).
  2. Resolve the requested instance name to a config block, then the block's type to a registered constructor in backend.Default.
  3. Apply style preset (internal/prompt) to the prompt.
  4. Call backend.Generate(ctx, Request). The adapter returns a *Result with an image stream + metadata.
  5. Stream to disk via internal/output. If write_metadata_json is on, a sidecar <image>.json is written next to it.

Backend contract

type Backend interface {
    Name() string
    Generate(ctx context.Context, req Request) (*Result, error)
}

Request carries the cross-backend fields (prompt, negative, size, steps, seed, style preset, free-form BackendOpts). Result returns the image bytes via an io.ReadCloser, the MIME type, and a metadata map (model name, seed actually used, latency, cost-estimate, …).

Adding a new adapter

  1. Create internal/backend/<adapter>.go (e.g. comfyui.go). Define a struct that holds whatever the adapter needs (HTTP client, model id, token).
  2. Add a constructor func New<Adapter>(name string, cfg map[string]any) (Backend, error). Read fields from cfg — that map is the adapter's own block from imagen.yaml minus the type: key. Resolve secrets from env vars (api_token_env, api_key_env) — never accept tokens inline.
  3. Implement Name() (return the user-facing instance name) and Generate(ctx, Request).
  4. In init() call Register("<type-name>", New<Adapter>).
  5. Anonymous-import the package from cmd/imagen/main.go if it lives in a separate package, so the init() runs.
  6. Add a smoke test under internal/backend/<adapter>_test.go. Network tests should be guarded by testing.Short() or an env var.

Config

~/.config/imagen.yaml (override with --config). Top-level keys:

  • default_backend — instance name used when --backend is omitted.
  • output.directory / output.naming / output.write_metadata_json.
  • backends: — map of instance-name → {type, …adapter-specific…}.

The framework parses type and stuffs the rest into BackendSpec.Raw. The adapter is free to define any schema it likes inside its block.

Credentials

Never hardcode. Always reference env-var names from the config:

flux-dev-replicate:
  type: replicate
  api_token_env: REPLICATE_API_TOKEN

The adapter then os.Getenv("REPLICATE_API_TOKEN") at construction and fails fast if unset. Tokens never go through imagen.yaml in plaintext.

How the /imagine skill calls into imagen

The skill (issue #4) wraps imagen generate and post-processes the path it prints on stdout. Slash-command surface area:

/imagine "a cat in a fishbowl" --style blog-header --size 1024x1024

The skill resolves to imagen generate "<prompt>" --backend <default> … and returns the image path so otto can attach it to a chat reply.

References

  • mAi project conventions: ~/.m/docs/msystem.md
  • Backend follow-ups: ImaGen issues #2 (ComfyUI on mRock), #3 (Replicate), #4 (skill)
  • mRock GPU: NVIDIA RTX 4070 Ti SUPER, 16 GB VRAM, runs Ollama + F5-TTS

House rules

  • No technical debt. No TODOs in landed code. If something can't be done now, open an issue.
  • All user-facing strings: ASCII or proper Unicode (Umlaute), never ae/oe/ue.
  • Tests live next to the package they cover (*_test.go). No tests/ dir.
  • go build ./... and go test ./... must be clean before any commit.
  • Run task build (or make build) for the full build; both call into go build -o bin/imagen ./cmd/imagen.