mAi: #2 - phase 1 PoC: ComfyUI on mRock + first FLUX schnell image

Native systemd install (matches Ollama pattern on Arch — Docker on mRock has no nvidia runtime; native venv via uv is the lighter path). The Black-Forest-Labs FLUX.1-schnell HF repo is gated, so the download script points at ungated mirrors (Comfy-Org/flux1-schnell + sirorable/flux-ae-vae) that ship the same Apache-2.0 weights. First image — cat in a fishbowl, 1024x1024, 4 steps — generated end-to-end in 9.79s via curl + workflow JSON; stored at /home/m/dev/ImaGen/poc/first-image.png on mRiver (not committed; transient PoC artefact). Go adapter is phase 2.
2026-05-08 16:50:16 +02:00
parent 20490913c1
commit a24ac2826f
5 changed files with 330 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -7,3 +7,4 @@
 .env.local
 /imagen
 /coverage.txt
+/.m/
--- a/docs/setup-comfyui-mrock.md
+++ b/docs/setup-comfyui-mrock.md
@@ -0,0 +1,181 @@
+# ComfyUI on mRock — install + ops
+
+ImaGen's `flux-schnell-local` backend talks to ComfyUI on mRock at
+`http://mrock:8188` (Tailscale-internal). This document is the reproducible
+install path from a clean mRock state.
+
+mRock runs Arch Linux + systemd with an NVIDIA RTX 4070 Ti SUPER (16 GB
+VRAM). Ollama is already a native systemd service, so ComfyUI follows the
+same pattern (native Python venv + systemd unit) instead of Docker — Docker
+on mRock has no `nvidia` runtime configured, and adding one is more invasive
+than another systemd unit.
+
+## Prerequisites on mRock
+
+- Python via `uv` (already installed).
+- NVIDIA driver new enough for CUDA 12.4. `nvidia-smi --query-gpu=driver_version`
+  should show >= 550. Driver 595 is what mRock has today.
+- ~35 GB free on `/home` for the model files.
+- `ollama.service` running on port 11434 — coexistence notes below.
+
+## 1. Clone ComfyUI + Python venv
+
+```bash
+mkdir -p ~/dev && cd ~/dev
+git clone --depth 1 https://github.com/comfyanonymous/ComfyUI.git comfyui
+cd comfyui
+uv venv --python 3.12 .venv
+source .venv/bin/activate.fish
+
+# PyTorch CUDA 12.4 wheels — match the system driver
+uv pip install --no-cache torch torchvision torchaudio \
+    --index-url https://download.pytorch.org/whl/cu124
+
+uv pip install --no-cache -r requirements.txt
+```
+
+Verify CUDA is wired up:
+
+```bash
+.venv/bin/python -c \
+  "import torch; print(torch.__version__, torch.cuda.is_available(), torch.cuda.get_device_name(0))"
+# expected: 2.6.0+cu124 True NVIDIA GeForce RTX 4070 Ti SUPER
+```
+
+## 2. Models — FLUX.1 schnell
+
+The Black-Forest-Labs primary repo (`black-forest-labs/FLUX.1-schnell`) is
+**gated** — `curl` against it without an HF token returns HTTP 401. We pull
+the weights from ungated mirrors of the same Apache-2.0 release.
+
+| File | Where it goes | Source |
+|------|---------------|--------|
+| `flux1-schnell.safetensors` (~23.8 GB, fp16) | `models/unet/` | `Comfy-Org/flux1-schnell` |
+| `ae.safetensors` (~335 MB) | `models/vae/` | `sirorable/flux-ae-vae` |
+| `clip_l.safetensors` (~246 MB) | `models/clip/` | `comfyanonymous/flux_text_encoders` |
+| `t5xxl_fp8_e4m3fn.safetensors` (~4.9 GB) | `models/clip/` | `comfyanonymous/flux_text_encoders` |
+
+```bash
+cd ~/dev/comfyui/models
+
+curl -L -o unet/flux1-schnell.safetensors \
+  https://huggingface.co/Comfy-Org/flux1-schnell/resolve/main/flux1-schnell.safetensors
+curl -L -o vae/ae.safetensors \
+  https://huggingface.co/sirorable/flux-ae-vae/resolve/main/ae.safetensors
+curl -L -o clip/clip_l.safetensors \
+  https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/clip_l.safetensors
+curl -L -o clip/t5xxl_fp8_e4m3fn.safetensors \
+  https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp8_e4m3fn.safetensors
+```
+
+If a new HF token is configured later (`~/.cache/huggingface/token`), the
+official `black-forest-labs/FLUX.1-schnell` URL is byte-identical and can be
+swapped in.
+
+## 3. systemd unit
+
+Drop `/etc/systemd/system/comfyui.service`:
+
+```ini
+[Unit]
+Description=ComfyUI image generation server
+Documentation=https://github.com/comfyanonymous/ComfyUI
+After=network-online.target
+Wants=network-online.target
+
+[Service]
+Type=simple
+User=m
+Group=m
+WorkingDirectory=/home/m/dev/comfyui
+ExecStart=/home/m/dev/comfyui/.venv/bin/python /home/m/dev/comfyui/main.py \
+    --listen 0.0.0.0 --port 8188 \
+    --output-directory /home/m/dev/comfyui/output \
+    --temp-directory /home/m/dev/comfyui/temp
+Restart=on-failure
+RestartSec=5
+TimeoutStopSec=30
+NoNewPrivileges=true
+PrivateTmp=true
+LimitNOFILE=65535
+
+[Install]
+WantedBy=multi-user.target
+```
+
+Then:
+
+```bash
+sudo systemctl daemon-reload
+sudo systemctl enable --now comfyui.service
+systemctl status comfyui.service
+```
+
+The service binds `0.0.0.0:8188`. Tailscale's wireguard fence is the only
+auth — do **not** expose port 8188 to the public internet.
+
+## 4. Health check
+
+```bash
+curl -fsS --max-time 5 http://mrock:8188/system_stats | jq '.devices[0]'
+# expected: name "cuda:0 NVIDIA GeForce RTX 4070 Ti SUPER ...", vram_total ~16 GB
+```
+
+`imagen backends` (from a host with the ImaGen CLI installed) should also
+report `flux-schnell-local: ok`.
+
+## 5. VRAM coexistence with Ollama
+
+mRock has 16 GB VRAM total. Ollama parks ~8 GB resident for its current
+model. FLUX schnell at fp16 weights with `weight_dtype=fp8_e4m3fn` (the
+default the adapter requests) needs roughly 10–12 GB peak for a 1024×1024
+generation, so concurrent Ollama + FLUX on mRock will OOM.
+
+Two practical options:
+
+- **Stop Ollama before generating** — `sudo systemctl stop ollama` frees
+  the GPU, run the generation, `sudo systemctl start ollama` afterwards.
+  Adequate while we don't have many concurrent users.
+- **Move Ollama off mRock** — when ImaGen is in regular use, push Ollama to
+  another host so the GPU is dedicated. Tracked separately.
+
+Both decisions live with whoever operates the box; the adapter does not try
+to manage Ollama.
+
+## 6. Smoke test (direct, without the imagen CLI)
+
+```bash
+# 1) Submit a workflow
+curl -fsS --max-time 30 -X POST -H 'Content-Type: application/json' \
+     -d @flux-schnell-workflow.json \
+     http://mrock:8188/prompt
+# returns: {"prompt_id": "...", "number": ..., "node_errors": {}}
+
+# 2) Poll history until the prompt completes
+PID=...   # from above
+until curl -fsS http://mrock:8188/history/$PID | jq -e ".\"$PID\".status.completed == true" >/dev/null; do
+  sleep 1
+done
+
+# 3) Pull the image
+NAME=$(curl -fsS http://mrock:8188/history/$PID \
+       | jq -r ".\"$PID\".outputs[\"9\"].images[0].filename")
+curl -fsS "http://mrock:8188/view?filename=$NAME&type=output" -o /tmp/cat.png
+file /tmp/cat.png       # PNG image data, 1024 x 1024
+```
+
+The full ImaGen smoke test is in [usage.md](usage.md) once the Go adapter
+ships.
+
+## Troubleshooting
+
+- **`vram_free` < 6 GB in `/system_stats`**: another GPU process is holding
+  memory. Usually Ollama (`sudo systemctl stop ollama`).
+- **Workflow returns `node_errors` with `Required input is missing` for
+  CLIPLoader**: text encoder filenames don't match step 2 — check that
+  `clip_l.safetensors` and `t5xxl_fp8_e4m3fn.safetensors` are in
+  `models/clip/`, not `models/text_encoders/`.
+- **`Access to model … is restricted`** during a model pull: the script is
+  hitting a gated mirror. Use the ungated URLs from step 2.
+- **Service won't start**: check `journalctl -u comfyui --since '5 min ago'`.
+  Common cause is a stale `pip` install — re-run step 1.
--- a/scripts/comfyui.service
+++ b/scripts/comfyui.service
@@ -0,0 +1,24 @@
+[Unit]
+Description=ComfyUI image generation server
+Documentation=https://github.com/comfyanonymous/ComfyUI
+After=network-online.target
+Wants=network-online.target
+
+[Service]
+Type=simple
+User=m
+Group=m
+WorkingDirectory=/home/m/dev/comfyui
+ExecStart=/home/m/dev/comfyui/.venv/bin/python /home/m/dev/comfyui/main.py \
+    --listen 0.0.0.0 --port 8188 \
+    --output-directory /home/m/dev/comfyui/output \
+    --temp-directory /home/m/dev/comfyui/temp
+Restart=on-failure
+RestartSec=5
+TimeoutStopSec=30
+NoNewPrivileges=true
+PrivateTmp=true
+LimitNOFILE=65535
+
+[Install]
+WantedBy=multi-user.target
--- a/scripts/download-flux-schnell.sh
+++ b/scripts/download-flux-schnell.sh
@@ -0,0 +1,37 @@
+#!/bin/bash
+# Download FLUX.1 schnell + accompanying VAE/text encoders into a ComfyUI tree.
+# Uses ungated mirrors — the official Black-Forest-Labs repo is gated and
+# requires an HF token. See docs/setup-comfyui-mrock.md.
+
+set -euo pipefail
+
+ROOT="${1:-$HOME/dev/comfyui/models}"
+
+if [ ! -d "$ROOT" ]; then
+    echo "models root $ROOT does not exist — pass it as the first argument" >&2
+    exit 1
+fi
+
+mkdir -p "$ROOT/unet" "$ROOT/vae" "$ROOT/clip"
+
+CKPT="https://huggingface.co/Comfy-Org/flux1-schnell/resolve/main/flux1-schnell.safetensors"
+VAE="https://huggingface.co/sirorable/flux-ae-vae/resolve/main/ae.safetensors"
+CLIP_L="https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/clip_l.safetensors"
+T5="https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp8_e4m3fn.safetensors"
+
+dl() {
+    local url=$1 dest=$2
+    if [ -s "$dest" ]; then
+        echo "skip $dest (already present)"
+        return
+    fi
+    echo "downloading $url -> $dest"
+    curl -L --fail --retry 3 --retry-delay 5 -C - -o "$dest" "$url"
+}
+
+dl "$CKPT"   "$ROOT/unet/flux1-schnell.safetensors"
+dl "$VAE"    "$ROOT/vae/ae.safetensors"
+dl "$CLIP_L" "$ROOT/clip/clip_l.safetensors"
+dl "$T5"     "$ROOT/clip/t5xxl_fp8_e4m3fn.safetensors"
+
+echo "done"
--- a/scripts/flux-schnell-poc.json
+++ b/scripts/flux-schnell-poc.json
@@ -0,0 +1,87 @@
+{
+  "prompt": {
+    "6": {
+      "class_type": "CLIPTextEncode",
+      "inputs": {
+        "text": "a small fishbowl with a cat staring out, photo, soft light",
+        "clip": ["11", 0]
+      }
+    },
+    "8": {
+      "class_type": "VAEDecode",
+      "inputs": {
+        "samples": ["31", 0],
+        "vae": ["10", 0]
+      }
+    },
+    "9": {
+      "class_type": "SaveImage",
+      "inputs": {
+        "filename_prefix": "imagen-poc",
+        "images": ["8", 0]
+      }
+    },
+    "10": {
+      "class_type": "VAELoader",
+      "inputs": {
+        "vae_name": "ae.safetensors"
+      }
+    },
+    "11": {
+      "class_type": "DualCLIPLoader",
+      "inputs": {
+        "clip_name1": "t5xxl_fp8_e4m3fn.safetensors",
+        "clip_name2": "clip_l.safetensors",
+        "type": "flux"
+      }
+    },
+    "12": {
+      "class_type": "UNETLoader",
+      "inputs": {
+        "unet_name": "flux1-schnell.safetensors",
+        "weight_dtype": "fp8_e4m3fn"
+      }
+    },
+    "13": {
+      "class_type": "CLIPTextEncode",
+      "inputs": {
+        "text": "",
+        "clip": ["11", 0]
+      }
+    },
+    "27": {
+      "class_type": "EmptySD3LatentImage",
+      "inputs": {
+        "width": 1024,
+        "height": 1024,
+        "batch_size": 1
+      }
+    },
+    "30": {
+      "class_type": "ModelSamplingFlux",
+      "inputs": {
+        "model": ["12", 0],
+        "max_shift": 1.15,
+        "base_shift": 0.5,
+        "width": 1024,
+        "height": 1024
+      }
+    },
+    "31": {
+      "class_type": "KSampler",
+      "inputs": {
+        "model": ["30", 0],
+        "seed": 1234567,
+        "steps": 4,
+        "cfg": 1.0,
+        "sampler_name": "euler",
+        "scheduler": "simple",
+        "denoise": 1.0,
+        "positive": ["6", 0],
+        "negative": ["13", 0],
+        "latent_image": ["27", 0]
+      }
+    }
+  },
+  "client_id": "imagen-poc-001"
+}