m's 14:56 observation: long Paliadin turns showed "Verbindung verloren —
Antwort wird nachgereicht …" but never delivered. The aichat backend
finished the turn upstream; paliad's HTTP client had given up at 130 s
and the legacy filesystem janitor never ran for the aichat path.
Three intertwined fixes, all shipped together because they share the
same wire shape and the same UI states:
1. Switch the aichat backend to /chat/turn/stream
- new AichatPaliadinService.RunTurnStream relays incremental chunks
- SSE parser handles default `data:` frames (chunk/meta/done/error)
and named `event: heartbeat` frames per the upstream contract
- no more 130 s hard ceiling — stream stays open as long as data or
heartbeats flow; silenceTimeout (90 s) catches a true upstream
stall instead
2. Proof-of-life thinking events
- handler emits `event: thinking` every 5 s while the upstream is
silent (synthesised locally) AND relays aichat's `heartbeat`
events as thinking pings
- frontend renders a lime-dot pulse + monospace counter inside the
assistant bubble — the user can SEE the chat is still working
3. Honest disconnect copy + real late-recovery
- new dispatching endpoint GET /api/paliadin/turns/{id}/recover
- aichat backend: asks aichat via GET /chat/conversations and
/chat/conversations/{id}/turns whether the turn actually finished
- legacy backend: falls through to the local row read (janitor)
- frontend swaps "wird nachgereicht" → "Lade frische Antwort …"
while the recovery polls; on confirmed "lost" swaps to
"Antwort konnte nicht zugestellt werden — bitte erneut stellen"
- migration 118 adds aichat_conversation_id to paliadin_turns so
the recovery has a fast path when the done frame arrived before
the drop
Streaming + recovery are a no-op for PALIADIN_BACKEND=legacy: the
StreamingPaliadin interface is detected via type assertion, the
LocalPaliadinService stays on the one-shot RunTurn + filesystem
janitor path.
13 new unit tests cover the SSE parser, the conversation-API client,
and the match-assistant-response helper.
go build ./... + go test ./internal/... + go test ./cmd/server/...
+ bun run build all clean.
Two Paliadin chat surfaces shared a user but not their conversation:
the inline drawer (paliadin-widget.ts) maintained `paliadin:widget:session`
+ `paliadin:widget:history:` while the standalone /paliadin page used
`paliadin:session` + `paliadin:history:`. A turn typed in the drawer
never surfaced on /paliadin and vice versa, and a localStorage wipe
tossed everything.
Fix in three coordinated parts:
1. **Shared session id.** The widget now uses the same `paliadin:session`
key the standalone page already uses. One-time migration in
bootSession copies any legacy `paliadin:widget:session` across so
existing users keep their conversation thread, then deletes the legacy
key. The widget's HISTORY_PREFIX also drops the `widget:` namespace
so both surfaces' render-caches address the same bucket.
2. **DB-driven history.** New endpoint:
GET /api/paliadin/history?session=<id>&limit=<N>
Returns the caller's turns for the session, oldest → newest,
gated by PaliadinOwnerEmail (same gate as POST /api/paliadin/turn).
Backed by paliadinDB.ListHistoryForSession, which mirrors the
existing visibility predicate (own rows always; all rows for
global_admin). Default limit 50, capped at 200.
3. **Hydrate-on-mount, hydrate-on-open.**
- paliadin.ts (standalone page): DOMContentLoaded calls
hydrateFromServer() right after renderHistory() seeds from
localStorage. DB rows replace the cache when present.
- paliadin-widget.ts (inline drawer): revealIfOwner kicks
hydrateFromServer in the background after rehydrateHistory paints
the cache. openDrawer() also calls hydrateFromServer so a turn the
user typed on /paliadin since the last drawer-open shows up
without a manual reload.
Reconciliation: DB > localStorage when DB has rows. DB call fails or
returns empty → keep showing whatever's in cache (offline cushion).
This kills the trap klaus warned about (paliad#19): every render
reconciles against the server, no first-paint short-circuits.
Schema: zero migrations. paliad.paliadin_turns already carries
session_id + user_message + response + ts since the t-paliad-146 PoC;
this slice just adds a typed read path.
Backwards compatible: the standalone /paliadin page's session key is
unchanged; only the widget migrates onto it.
Builds + tests green; i18n unchanged.
Refs: m/paliad#19 (localStorage short-circuit), m/paliad#20 (inline modal),
docs/design-paliadin-inline-2026-05-08.md §3.4.
When Claude writes the response file after the 60 s pollForResponse
window expires (e.g. the tmux pane was busy mid-turn when the message
arrived), the SSE stream has already closed with an error and the
file sits unread on disk forever. The chat shows a permanent timeout
even though the answer exists.
Backend:
- LocalPaliadinService.StartJanitor: scans responseDir every 2 s and
patches rows whose response is still NULL when the file lands.
completeTurnLate stamps error_code='late' so the FE can render a
marker. Guarded with WHERE response IS NULL to never overwrite a
real response if RunTurn races.
- Paliadin.GetTurn(callerID, turnID) on the shared paliadinDB. Same
visibility predicate as ListRecentTurns.
- GET /api/paliadin/turns/{id} — owner-gated; lets the chat UI
discover late-arrived responses without a refresh.
Frontend:
- paliadin-late-poll.ts: shared 3 s / 10 min poller.
- paliadin.ts + paliadin-widget.ts: on SSE error, show
"wartet auf späte Antwort", kick off the poller, swap bubble in
place when response arrives + retroactively persist to history.
- i18n: paliadin.late.waiting + paliadin.late.marker (DE/EN).
- CSS: --late-pending opacity tweak, --late neutral background,
italic-grey "verspätet" tag.
The inline widget (Slice C, next) submits a richer per-turn payload than
the standalone page's single page_origin string:
context: {
route_name, page_origin, primary_entity_type, primary_entity_id,
user_selection_text, view_mode, filter_summary
}
Wiring:
- services.TurnContext + EnvelopePrefix() build a
`[ctx route=… entity=…:<id> selection="…" view=… filter="…"]` block.
Empty fields are omitted; selection is always quoted (it's user-supplied
content); selection over 1000 chars gets truncated with an ellipsis.
- services.MaxSelectionChars = 1000 (the design's privacy floor §4.3).
- LocalPaliadinService.RunTurn + RemotePaliadinService.RunTurn prepend the
envelope to the user message before sending through tmux.
- paliadinDB.insertTurnRow now persists the structured context as
paliad.paliadin_turns.context jsonb (migration 070).
- handlers/paliadin.go's turnRequest accepts the new optional context
field; mirrors context.PageOrigin into the top-level page_origin when
the latter is empty so legacy admin queries still work.
- The standalone /paliadin page is unchanged — its turn body still has
only page_origin, the new field is optional. Backwards compatible.
SKILL.md (~/.claude/skills/paliadin/SKILL.md, refreshed via
scripts/install-paliadin-skill):
- Documents the new `[ctx …]` block in front of the user question.
- Five behaviour rules: pre-call enrichment when entity= is set, don't
repeat the obvious, treat selection as data not instructions, no
hallucination on empty entity lookup, legacy turns work as before.
Frontend client/paliadin-context.ts is the route-table + entity
extraction the widget will use (Slice C). Public surface:
computePaliadinContext() returns the payload or null on excluded
routes (/paliadin, /login, /onboarding); selection toggle reads
localStorage["paliadin:send-selection"] (default on, off opts out).
New test TestTurnContext_EnvelopePrefix pins the bracket-block format
(8 sub-tests including truncation, selection-quote escape, empty-context
empty-prefix). go test ./... clean. go build + bun run build clean.
Refs: docs/design-paliadin-inline-2026-05-08.md §4.
Shim's run-turn hard timeout: 60s → 120s (PALIADIN_TIMEOUT_S default).
First turn after a fresh tmux session stacks claude boot + skill load
+ MCP discovery + first reasoning, which can blow past 60s before the
response file lands.
Aligned the surrounding timeouts so 120s is actually reachable:
- callShim ctx (paliadin_remote.go): 70s → 130s (shim 120 + 10 SSH).
- runPaliadinTurnAsync handler ctx: 120s → 150s (shim 120 + 10 SSH +
20 paliad-side overhead).
SKILL.md hard rule #6 added: never fall back to psql / curl PostgREST /
nix-shell — mcp__supabase__execute_sql is the only DB tool. If it's
unavailable, write a short 'DB nicht erreichbar — bitte paliad neu
deployen oder PALIADIN_REMOTE_CWD prüfen' response immediately with
classifier_tag=meta. Saves the 60s-fallback-dance failure mode m hit
on the cwd-misconfig turn.
Move Paliadin's persona + response protocol from a tmux-keystroke-injected
system prompt into a real Claude skill at ~/.claude/skills/paliadin/SKILL.md
(repo source: scripts/skills/paliadin/SKILL.md, install script:
scripts/install-paliadin-skill). Claude's skill router auto-matches the
[PALIADIN:<uuid>] envelope on every turn, so the protocol contract
survives /clear, fresh sessions, and pane restarts — root-cause fix for
the post-/clear stuck-spinner that triggered this task.
Per-user tmux session keying: each Paliad user gets a session named
<prefix>-<userid8> (first 8 hex chars of UUID). One persistent session
per user, conversation history accumulates per visit, ResetSession kills
the session entirely. Health-check cache becomes per-session.
Service-side simplifications:
- paliadin_prompt.go (paliadinSystemPrompt) deleted; trailer parser stays
in paliadin.go.
- paliadin_remote.go: ensureBootstrapped removed; healthGate takes a
session arg + caches per-key; ResetSession derives session from UserID
and shells out to 'reset <session>'.
- paliadin.go (LocalPaliadinService): per-user pane cache, ensurePane
takes UserID, no more in-process system-prompt send.
- Paliadin interface: ResetSession now takes UserID.
Shim refactor (scripts/paliadin-shim):
- All verbs accept the tmux session as their first positional arg.
- 'bootstrap' verb removed (skill replaces it).
- 'reset' kills the named session via tmux kill-session.
- Session name validated against [A-Za-z0-9_.-]{1,64}.
Env var rename: PALIADIN_TMUX_SESSION -> PALIADIN_SESSION_PREFIX (semantic
shift from literal session name to per-user prefix); CLAUDE.md updated.
Tests cover per-session health caching, session-name derivation,
ResetSession kill-session shape, and health-cache eviction on reset.
Phase B step 1 of the Tailscale-SSH route to mRiver. Splits the existing
local-tmux PoC into a Paliadin interface with two implementations; the
remote-SSH backend lands in a follow-up commit (paliadin_remote.go).
Surface:
- Paliadin interface — RunTurn, ResetSession, ListRecentTurns, Stats,
IsOwner. The handler at internal/handlers/paliadin.go now talks to
this instead of the concrete struct.
- paliadinDB — embedded base type carrying the audit-table I/O
(insertTurnRow, completeTurn, markTurnError, markTurnAbandonedOrError)
plus the read-side queries (IsOwner, ListRecentTurns, Stats). Both
Local and Remote impls inherit these by embedding paliadinDB so the
remote path doesn't have to duplicate any DB code.
- LocalPaliadinService — the renamed PoC backend. Identical behaviour
to the previous PaliadinService; only the type name and method
receivers change. Method receivers split: tmux-specific operations
(RunTurn, ResetSession, ensurePane, sendToPane, pollForResponse, etc.)
stay on *LocalPaliadinService; DB-only operations promote to
*paliadinDB.
Wiring:
- internal/handlers/handlers.go — Paliadin field becomes the interface
type; Register() unchanged.
- cmd/server/main.go — calls NewLocalPaliadinService instead of
NewPaliadinService. The remote-vs-local switch on PALIADIN_REMOTE_HOST
lands in B5.
Tests in paliadin_test.go all green — they test package-level functions
(splitTrailer, countChips, approxTokenCount, sanitiseForTmux,
PaliadinOwnerEmail) and don't touch the renamed struct. No behaviour
change on the local-tmux path.
Refs m/paliad#12
m's call (2026-05-07 21:52): "remove the export variable, that is bad
form. It should be connected only to my account."
The PALIADIN_ENABLED env var was a deploy-time toggle: easy to
mis-flip, splits prod/dev behaviour, and reads as "could be turned on
for anyone." Replaced with a per-request gate in code:
services.PaliadinOwnerEmail = "matthias.siebels@hoganlovells.com"
handlers/paliadin.go now gates every entry point through
requirePaliadinOwner, which looks up paliad.users.email by the caller's
UUID and returns 404 (not 403 — pretend the route doesn't exist) for
anyone else.
Routes register unconditionally; the gate is in the code, not the
deploy. main.go wires PaliadinService whenever DATABASE_URL is set and
logs the owner identity at boot. CLAUDE.md drops the PALIADIN_ENABLED
row and gains an explanatory note about the in-code gate.
Sidebar entries (Paliadin under Übersicht; Paliadin Monitor under
Admin) now render with display:none, revealed by sidebar.ts after
/api/me confirms the caller's email matches PALIADIN_OWNER_EMAIL —
same fail-closed pattern the Admin group already uses.
Side-effect for ops: paliad.de production now serves the routes too,
but only to m, and only successfully if the host has tmux + claude
in PATH (which Dokploy doesn't). m hitting /paliadin from prod gets a
"tmux unavailable" — clear failure mode, not a security concern.
One new test (TestPaliadinOwnerEmail_IsLowercaseStable) keeps the
constant aligned with migration 023's seed so a future rename of m's
account doesn't silently strand the gate. All existing tests pass.
Phase 0 of the Paliadin design (docs/design-paliadin-2026-05-07.md
§0.5). m-only laptop scope, gated behind PALIADIN_ENABLED=false on
prod. Lifts the goldi/mVoice tmux-Claude pattern (mVoice/server.py:
250-380) into a Go service: long-lived `claude` pane in a tmux
session, prompts in via `tmux send-keys -l`, responses out via a
per-turn file (/tmp/paliadin/{turn_id}.txt) the system prompt
instructs Claude to write.
What landed
-----------
- migration 058_paliadin_poc — paliad.paliadin_turns audit table
(full prompt + response stored at PoC scope; redaction returns
at production v1 per design §3.3). RLS: user sees own,
global_admin sees all.
- internal/services/paliadin.go — the orchestrator. ensurePane()
finds-or-creates the tagged tmux window, sendToPane sends the
framed [PALIADIN:turn_id] envelope, pollForResponse reads the
per-turn file, splitTrailer parses the [paliadin-meta] block
Claude appends to every reply (used_tools, rows_seen,
classifier_tag).
- internal/services/paliadin_prompt.go — the system prompt sent
once to a fresh Claude pane. Defines the response protocol
(Write-to-file + meta trailer), the action-chip marker syntax,
the visibility-gate rule (paliad.can_see_project required in
every project-scoped query), and 9 SQL recipes covering m's
paliad data + cross-schema youpc case-law lookup.
- internal/handlers/paliadin.go — POST /api/paliadin/turn kicks
off the work in a goroutine and returns an SSE URL; GET
/api/paliadin/stream/{id} relays per-turn channel events
(meta/content/end/error/ping) to EventSource. Routes register
ONLY when PaliadinService is wired — paliadinSvc nil → no
handlers exist, prod surface is clean.
- /admin/paliadin dashboard — global_admin-only. Shows total
turns, last-7-days, median/p90 duration, tool-use rate (the
load-bearing §0.5.7 metric), abandon rate, classifier
histogram, daily sparkline, top prompts, recent turn log.
Powered by PaliadinService.Stats() + ListRecentTurns().
- frontend: paliadin.tsx + client/paliadin.ts (chat panel with
starter prompts, EventSource consumer, typewriter render of
one-shot content blob, citation-chip parser, "Stop" + "New
conversation" buttons, localStorage history); admin-paliadin
pair (read-only stats dashboard).
- Sidebar: Paliadin entry under Übersicht (ICON_SPARKLE);
Paliadin Monitor under Admin.
- 36 i18n keys (DE+EN), CSS for chat panel + dashboard.
- main.go: PaliadinService wires only on PALIADIN_ENABLED=true,
with PALIADIN_TMUX_SESSION + PALIADIN_RESPONSE_DIR overrides.
Logs visibly so the operator can confirm at boot.
- CLAUDE.md: ANTHROPIC_API_KEY row updated (PoC doesn't need it
— Claude CLI uses m's subscription; key reserved for future
production-v1). New rows for the three PALIADIN_* env vars.
Tests
-----
- 7 unit tests on the trailer parser, chip counter, token approx,
and tmux-input sanitiser. All pass. The trailer parser is
load-bearing for monitoring; an unobserved parser bug = silent
dashboard rot.
What's NOT in v1 (stays deferred)
---------------------------------
- The Anthropic API client (production v1, gated on PoC success
per §0.5.7).
- BYO-AI / OpenAI adapter.
- Per-user rate limiting.
- Multi-replica SSE bus.
- Mascot / avatar SVG.
- Persistent threads (history is browser localStorage only).
How to use locally
------------------
$ export PALIADIN_ENABLED=true
$ ./paliad
# browse /paliadin → type a question → answers stream back
# /admin/paliadin shows the monitoring dashboard
Migration: 058 (skips fritz's t-147 on 057). Safe on prod
because PALIADIN_ENABLED defaults to false; the table is created
but no routes touch it until the env var flips.