First step of the model-agnostic image-generation framework. Lands the
plumbing other components (skill, ComfyUI/Replicate adapters, agents)
will plug into:
- internal/backend: Backend interface (Request/Result), thread-safe
Registry with init-time Register, plus a Mock reference adapter that
emits a deterministic gradient PNG for smoke tests.
- internal/config: YAML loader for ~/.config/imagen.yaml. Framework owns
default_backend + output settings + a per-backend block; each adapter
owns the schema below its own block via BackendSpec.Raw.
- internal/output: filename templating ({date}/{time}/{slug}/{seed}/
{backend}/{ext}), JSON metadata sidecar, --output override path.
- internal/prompt: embedded styles.yaml, style-preset suffix application.
- internal/server: 501 stub — HTTP surface lands in a follow-up issue.
- cmd/imagen: generate / backends / config (init|validate|path) / serve
/ version subcommands. Stdlib-only flag parsing with a small helper to
honour positional prompt args ahead of flags (matches the issue spec).
- Tests for output (slug, naming template, sidecar), backend (mock PNG
validity + determinism, registry build + duplicate panic), config
(round-trip + validation), prompt (style apply + unknown-style error).
- CLAUDE.md, README.md, docs/architecture.md, docs/usage.md, Makefile.
Acceptance criteria from #211:
1. go build ./... — clean
2. imagen backends — lists registered backends, exits 0
3. imagen generate "test prompt" --backend mock --output /tmp/x.png —
writes a 1024x1024 PNG plus an x.png.json sidecar
4. imagen config init | imagen config validate — round-trips cleanly
5. CLAUDE.md "Adding a new adapter" — six-step recipe
114 lines
4.5 KiB
Markdown
114 lines
4.5 KiB
Markdown
# ImaGen — Project Instructions
|
|
|
|
ImaGen is a model-agnostic image-generation framework. It has a single
|
|
opinionated CLI (`imagen`) that dispatches to whichever backend the user
|
|
configured — local FLUX on mRock via ComfyUI today, Replicate or DALL-E
|
|
tomorrow, something else next year. The framework owns plumbing (config,
|
|
output, naming, sidecars, prompt enrichment); each adapter owns the schema
|
|
and lifecycle of its own block in `~/.config/imagen.yaml`.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
cmd/imagen/ CLI shell — generate, backends, config, serve
|
|
internal/backend/ Backend interface + Registry + Mock reference impl
|
|
internal/prompt/ Style preset registry (embedded styles.yaml)
|
|
internal/output/ Filename templating, image writer, JSON sidecar
|
|
internal/config/ YAML loader, validation, sample generator
|
|
internal/server/ HTTP stub (not implemented yet — follow-up issue)
|
|
docs/ architecture.md, usage.md
|
|
```
|
|
|
|
Data flow for `imagen generate`:
|
|
|
|
1. Parse flags, load config (`internal/config`).
|
|
2. Resolve the requested **instance name** to a config block, then the block's
|
|
`type` to a registered constructor in `backend.Default`.
|
|
3. Apply style preset (`internal/prompt`) to the prompt.
|
|
4. Call `backend.Generate(ctx, Request)`. The adapter returns a `*Result`
|
|
with an image stream + metadata.
|
|
5. Stream to disk via `internal/output`. If `write_metadata_json` is on, a
|
|
sidecar `<image>.json` is written next to it.
|
|
|
|
## Backend contract
|
|
|
|
```go
|
|
type Backend interface {
|
|
Name() string
|
|
Generate(ctx context.Context, req Request) (*Result, error)
|
|
}
|
|
```
|
|
|
|
`Request` carries the cross-backend fields (prompt, negative, size, steps,
|
|
seed, style preset, free-form `BackendOpts`). `Result` returns the image
|
|
bytes via an `io.ReadCloser`, the MIME type, and a metadata map (model name,
|
|
seed actually used, latency, cost-estimate, …).
|
|
|
|
## Adding a new adapter
|
|
|
|
1. Create `internal/backend/<adapter>.go` (e.g. `comfyui.go`). Define a struct
|
|
that holds whatever the adapter needs (HTTP client, model id, token).
|
|
2. Add a constructor `func New<Adapter>(name string, cfg map[string]any) (Backend, error)`.
|
|
Read fields from `cfg` — that map is the adapter's own block from
|
|
`imagen.yaml` minus the `type:` key. Resolve secrets from env vars
|
|
(`api_token_env`, `api_key_env`) — never accept tokens inline.
|
|
3. Implement `Name()` (return the user-facing instance name) and
|
|
`Generate(ctx, Request)`.
|
|
4. In `init()` call `Register("<type-name>", New<Adapter>)`.
|
|
5. Anonymous-import the package from `cmd/imagen/main.go` if it lives in a
|
|
separate package, so the `init()` runs.
|
|
6. Add a smoke test under `internal/backend/<adapter>_test.go`. Network tests
|
|
should be guarded by `testing.Short()` or an env var.
|
|
|
|
## Config
|
|
|
|
`~/.config/imagen.yaml` (override with `--config`). Top-level keys:
|
|
|
|
- `default_backend` — instance name used when `--backend` is omitted.
|
|
- `output.directory` / `output.naming` / `output.write_metadata_json`.
|
|
- `backends:` — map of instance-name → `{type, …adapter-specific…}`.
|
|
|
|
The framework parses `type` and stuffs the rest into `BackendSpec.Raw`. The
|
|
adapter is free to define any schema it likes inside its block.
|
|
|
|
## Credentials
|
|
|
|
Never hardcode. Always reference env-var names from the config:
|
|
|
|
```yaml
|
|
flux-dev-replicate:
|
|
type: replicate
|
|
api_token_env: REPLICATE_API_TOKEN
|
|
```
|
|
|
|
The adapter then `os.Getenv("REPLICATE_API_TOKEN")` at construction and fails
|
|
fast if unset. Tokens never go through `imagen.yaml` in plaintext.
|
|
|
|
## How the `/imagine` skill calls into imagen
|
|
|
|
The skill (issue #4) wraps `imagen generate` and post-processes the path it
|
|
prints on stdout. Slash-command surface area:
|
|
|
|
```
|
|
/imagine "a cat in a fishbowl" --style blog-header --size 1024x1024
|
|
```
|
|
|
|
The skill resolves to `imagen generate "<prompt>" --backend <default> …` and
|
|
returns the image path so otto can attach it to a chat reply.
|
|
|
|
## References
|
|
|
|
- mAi project conventions: `~/.m/docs/msystem.md`
|
|
- Backend follow-ups: ImaGen issues #2 (ComfyUI on mRock), #3 (Replicate), #4 (skill)
|
|
- mRock GPU: NVIDIA RTX 4070 Ti SUPER, 16 GB VRAM, runs Ollama + F5-TTS
|
|
|
|
## House rules
|
|
|
|
- No technical debt. No TODOs in landed code. If something can't be done now,
|
|
open an issue.
|
|
- All user-facing strings: ASCII or proper Unicode (Umlaute), never `ae/oe/ue`.
|
|
- Tests live next to the package they cover (`*_test.go`). No `tests/` dir.
|
|
- `go build ./...` and `go test ./...` must be clean before any commit.
|
|
- Run `task build` (or `make build`) for the full build; both call into
|
|
`go build -o bin/imagen ./cmd/imagen`.
|