design(t-paliad-146): Paliadin — in-app AI buddy

Inventor design pass for the Paliadin: a Claude-backed conversational assistant grounded in the user's own paliad data + paliad's static reference (courts, glossary, deadline rules, Fristenrechner concept tree). Long-lived in-process Go service that calls Anthropic's Messages API directly with tool use; every tool is a thin shim over an existing service (Dashboard / Project / Deadline / Appointment / Court / Glossary / DeadlineRule). RLS / visibility inherited from those services — Paliadin literally cannot see what the caller cannot. Five coordinated sub-designs answer the issue's 20 open questions: A. LLM architecture + tool-use + prompts (§2) B. Data access + RLS + PII (§3) C. UX (§4) D. Token budget + cost + audit (§5) E. Phasing (§7) Phase 1 v1: /paliadin full page + sidebar entry, SSE stream of Anthropic, 7 read-only tools, session-only history, 30/hour user cap + 1000/hour global cap, audit row per turn (metadata only — no transcript), 4k input + 2k output token caps, no avatar/mascot, no proactive onboarding. Migration 057 introduces paliadin_turns + paliadin_rate_limit. Single PR, ~3500-4500 LoC. mlex / /lex-* reuse: shape (system-prompt voice, tool-catalog idea, citation style) — NOT code. mLex is a workspace, not a Go/TS repo; the /lex-* skills drive Claude against youpc's MCP and cannot be embedded in a paliad service. Premise verifications surfaced one CLAUDE.md doc-bug (the ANTHROPIC_API_KEY "Reserved for Phase H — do not set" row needs to flip in the implementation PR — Paliadin un-defers it). 12 open questions for m in §8.5 — Anthropic key choice (personal vs HLC enterprise), default model (Sonnet vs Haiku), surface (/paliadin page vs drawer), mascot phase, 2-PA sanity check before locking scope, etc. Same adoption-risk concern that just parked t-paliad-145 — Paliadin's edge over open-Claude-in-another-tab is data grounding, which only works if v1 makes it visible (citation chips + tool-call evidence + tagline). STOP after design. Awaiting m go/no-go before coder shift.
2026-05-07 20:45:31 +02:00
parent 99f08e3863
commit dc7c807725
1 changed files with 773 additions and 0 deletions
--- a/docs/design-paliadin-2026-05-07.md
+++ b/docs/design-paliadin-2026-05-07.md
@@ -0,0 +1,773 @@
+# Design: Paliadin — in-app AI buddy / pet (t-paliad-146)
+
+**Status:** READY FOR REVIEW
+**Author:** noether (inventor)
+**Issue:** [m/paliad#9](https://mgit.msbls.de/m/paliad/issues/9)
+**Date:** 2026-05-07
+**Branch:** `mai/noether/inventor-paliadin-in-app`
+
+---
+
+## §0 TL;DR
+
+A new conversational surface inside paliad: **Paliadin**, a Claude‑backed assistant that answers questions grounded in the user's own paliad data and paliad's domain knowledge. The Paliadin is a long‑lived in‑process Go service, not a per‑session worker spawn — it talks to the Anthropic Messages API directly with **tool use**, where every tool is a thin shim over an existing paliad service (DashboardService, ProjectService, DeadlineService, CourtService, GlossaryService, DeadlineRuleService, AgendaService). RLS / visibility is enforced at the service layer, exactly as it is for the rest of the app, so Paliadin literally cannot see what the caller cannot see.
+
+Phase 1 surface: **dedicated `/paliadin` page + a sidebar entry under "Übersicht"**, server‑side SSE stream of Anthropic's response (same shape paliad's parked t‑145 chat design specced), session‑only conversation (no DB persistence in v1), 7 read‑only tools, ~30 turns/hour rate limit per user, hard token caps (4 k input + 2 k output per turn), per‑request audit row (no full transcript v1 — store a redacted hash + token counts + tool‑call list).
+
+**No avatar, no mascot SVG, no proactive onboarding pop‑up in v1.** Just a clean chat panel with the name "Paliadin" in the header. Mascot, drawer mode, persistent threads, write‑tools, and youpc.org case‑law lookup all deferred to Phase 2/3.
+
+**mlex / `/lex-*` reuse: pattern, not code.** mLex turns out to be a *workspace* (`extractions/`, `analysis/`, `docs/`) — there is no Go/TS code to fork. The `/lex-*` skills are Claude Code instruction docs that drive *Claude itself* against youpc's MCP tools; they cannot be embedded in a paliad Go service. What carries over is the **shape**: tool catalog (search → fetch → cite), system‑prompt voice (precise, citation‑backed, flag uncertainty honestly), and the "every legal claim needs a citation" guardrail. §2.4 maps the carry‑over precisely.
+
+**Trade‑off flagged up‑front (read §9.1 before approving):** the same adoption‑risk concern that just parked the local‑chat design (t‑paliad‑145, today 17:03) applies here. Paliadin's edge over "open ChatGPT in another tab" is *only* that it sees the user's own data — and that edge collapses if v1 doesn't make the data‑grounding visible (citation chips, tool‑call evidence) and explicit ("Paliadin sees only YOUR projects"). Without those, Paliadin is just a worse Claude. With them, it's the only Claude that can answer "welche Frist ist als nächstes auf dem Müller‑Verfahren?".
+
+---
+
+## §1 Premises verified live (2026-05-07)
+
+Before designing on top, I checked each load‑bearing claim against the running system rather than CLAUDE.md / memory.
+
+| Claim | Source | Verification |
+|---|---|---|
+| **mLex is a workspace, not a code repo** | issue framing "mlex project we could partially reuse" | `~/dev/mLex/` contains only `extractions/`, `analysis/`, `docs/`, plus `CLAUDE.md` + `AGENTS.md`. No `*.go`, no `package.json`, no tools that aren't Claude skills. The "code" is the `/lex-*` skill family in `~/.claude/skills/`, which is instruction docs driving Claude against `mcp__youpc__*` MCP tools. **Carry‑over is shape (system prompt, tool catalog, citation style), not adapters.** |
+| `/lex-*` skill family | brief reference | `~/.claude/skills/{lex-research,lex-extract,lex-classify,lex-classify-patent,mai-lexy}/SKILL.md`. All five inventoried in §2.4. |
+| Paliad has no anthropic / claude code | CLAUDE.md `ANTHROPIC_API_KEY` "do not set" row | `grep -ri anthropic ~/dev/paliad/internal ~/dev/paliad/cmd` → only `internal/branding/firm.go` comment unrelated to AI. `go.mod` has no `anthropic-sdk-go` dep. **This task un‑defers the env var; CLAUDE.md row needs updating in the same PR.** |
+| Paliad has no SSE pattern shipped | substrate scan | `grep -rn 'http.Flusher\|text/event-stream' internal/` returns only references inside the parked t‑145 chat design doc — no live code. We bring our own. |
+| Paliad and youpc share the same physical Postgres | infra | Both run on `100.99.98.201:11833` (port 11833 = ydb). Paliad's schema is `paliad`; youpc's is `data`. **A future "search UPC case law" tool would be a same‑DB cross‑schema SELECT, not an HTTP hop** — but Phase 1 still excludes case‑law lookup (see §3). |
+| Visibility is enforced at service layer (not via SET LOCAL auth.uid) | code | `internal/services/visibility.go` defines `visibilityPredicate(alias)` + `visibilityPredicatePositional(alias, idx)`; every project‑scoped query inlines it. Paliadin's tools call existing services, inheriting the predicate. |
+| `paliad.can_see_project()` is the canonical visibility function in DB (RLS, t‑139) | t‑139 migration 055 | `internal/db/migrations/055_hierarchy_aggregation.up.sql:144` `CREATE OR REPLACE FUNCTION paliad.can_see_project(_project_id uuid)`. Same predicate echoed in `services/visibility.go`. |
+| Migration tracker is at 56 (`056_user_views`) | t‑144 A1 | `paliad_schema_migrations` row. Next migration is **057**. (t‑145 was parked before its `057_chat` shipped, so 057 is open.) |
+| t‑paliad‑145 (local chat) was parked today 2026-05-07 17:03 | memory + commit log | Commit `99f08e3` "Merge: t-paliad-145 design doc only — local chat feature PARKED per m's call". The chat SSE substrate that would have been shared is **not** built — Paliadin builds its own minimal stream. |
+| Sidebar bell pattern (`sidebar-inbox-badge`) is reusable for a chat‑style entry | t‑138 | `frontend/src/components/Sidebar.tsx` — `navItem(href, icon, i18nKey, label, currentPath, badgeID?)` already takes an optional badge id. The same plumbing fits a Paliadin entry. |
+| Sidebar `ICON_SPARKLE` already exists | UI scan | `frontend/src/components/Sidebar.tsx` defines `ICON_SPARKLE` (a star/sparkle SVG). Free icon for the Paliadin nav item. |
+| `auth.UserIDFromContext(r.Context())` is the standard handler‑side user lookup | code | `internal/handlers/dashboard.go:31` is the canonical pattern. Paliadin handlers will use it. |
+| `branding.Name` (default "HLC") is the firm‑name source | t‑paliad‑065 | `internal/branding/firm.go` reads `FIRM_NAME` once at boot. Paliadin's system prompt + greeting must use `branding.Name`, never hardcode "HLC". |
+| Single web replica on Dokploy today | `docker-compose.yml` | One `web` service. SSE state in‑process is fine v1; multi‑replica migration deferred along with chat. |
+
+**Doc‑vs‑live conflicts encountered (must be fixed in the implementation PR):**
+
+1. **CLAUDE.md** still says `ANTHROPIC_API_KEY` is "Reserved for Phase H (AI Frist‑Extraktion) which is deferred per m's 2026-04-16 decision. Do not set." Paliadin un‑defers it. The CLAUDE.md row needs to flip to "Required for Paliadin (read‑only Claude assistant) — set on Dokploy."
+2. The earlier "do not want anthropic API" decision (memory `b6a11b55…`, 2026-04-16) was specifically about *Frist extraction from documents*. Paliadin is a different surface (interactive read‑only Q&A over already‑structured data). It does not silently revive the parked extraction feature — t‑paliad‑011 stays blocked unless m explicitly un‑parks it too.
+
+---
+
+## §2 Sub-design A — LLM architecture, prompt, tool use, mlex/lex reuse
+
+Answers Q1, Q2, Q3, Q4, Q17, Q18.
+
+### 2.1 LLM provider (Q1)
+
+**Recommendation: Anthropic Claude, single provider, accessed directly via the Messages API. Lock to Claude in v1; abstract behind a one‑function interface so future portability is cheap.**
+
+| Provider | v1? | Why |
+|---|---|---|
+| Anthropic Claude (Messages API + tool use) | ✅ | Matches m's "wire into my claude" framing. Tool‑use shape is mature. Streaming via SSE is native. Paliad already has `ANTHROPIC_API_KEY` reserved. |
+| Mixed (Claude reasoning + smaller routing model) | ❌ | Premature optimisation; for ~30 turns/hour/user we don't need the routing layer. Single‑model latency is fine. |
+| OpenAI / open weight | ❌ | No HLC compliance review for those vendors; m's Anthropic key is on file. |
+
+**Model selection within Anthropic:** default to **Claude Sonnet 4.6** (fast, tool‑use‑capable, cheap enough for chat use). Allow override via `PALIADIN_MODEL` env var so we can drop down to Haiku for cost or up to Opus for tricky onboarding sessions without redeploying.
+
+**Wire shape:** one Go HTTP client (`internal/services/paliadin/anthropic.go`) that POSTs `/v1/messages` with `stream: true`. We do not adopt `github.com/anthropics/anthropic-sdk-go` in v1 — the API surface we use (one streaming POST + tool‑use loop) is small enough that a hand‑rolled client is shorter than wiring the SDK and safer than depending on a Go SDK that has historically broken on minor version bumps in mAi's experience. Keep the option open for Phase 2 if the token‑accounting / structured tool‑use helpers in the SDK become attractive.
+
+```go
+// internal/services/paliadin/anthropic.go
+type AnthropicClient interface {
+    Stream(ctx context.Context, req MessagesRequest, w StreamWriter) (Usage, error)
+}
+```
+
+The interface is the only swap‑point. Switching providers later means a new implementation, not a rewrite.
+
+### 2.2 System prompt + message shape (Q2)
+
+**Recommendation: single `system` prompt with paliad context + tool definitions; one persistent prompt across pages (no per‑route system prompts in v1).**
+
+#### 2.2.1 System prompt (locked, v1)
+
+The system prompt is computed at process start from `branding.Name`, the user's locale (DE/EN), the user's `display_name`, the current date, and the visible‑project count (a single count, not the project list — keeps the prompt small). Computed *per request*, not per process — but its template is a constant.
+
+```
+You are Paliadin, an AI assistant inside {{firm}}'s patent practice
+platform "Paliad". You help {{display_name}} ({{office}}) answer
+questions about their own work in Paliad and about UPC / EPO / DPMA
+patent practice.
+
+Today is {{today}}. The user's display language is {{language}}; reply
+in {{language}} unless the user switches mid‑conversation.
+
+You have read‑only access to the following tools:
+- whats_on_my_plate     — the user's dashboard (deadline / appointment / matter buckets)
+- list_my_projects      — every project the user can see
+- get_project_detail    — full detail of one project (deadlines, appointments, parties, partner units)
+- search_my_deadlines   — filter the user's deadlines by status / date / project
+- list_my_appointments  — the user's upcoming appointments (next 30 days by default)
+- lookup_court          — Paliad's catalog of patent courts (UPC LDs, German LGs/OLGs/BGH, EPO, DPMA, ...)
+- lookup_glossary_term  — Paliad's bilingual patent glossary
+- lookup_deadline_rule  — Paliad's Fristenrechner concept tree (named deadline rules + their triggers)
+
+Hard rules:
+1. Never invent facts. If a tool returns nothing, say so. Do not guess
+   case numbers, deadline dates, court names, or party names.
+2. Every concrete factual claim about the user's work MUST come from a
+   tool call in the current conversation. Cite using "[#deadline-XXXX]",
+   "[#projekt-XXXX]", "[court: Munich LD]", "[glossary: Klageerwiderung]"
+   so the UI can render citation chips.
+3. You cannot mutate any data. If the user asks you to change something,
+   explain that v1 is read‑only and point them to the right page in
+   Paliad.
+4. Visibility is enforced before tools return — if your tool call comes
+   back empty, the data either doesn't exist OR the user can't see it.
+   Never disclose the latter; just answer "I couldn't find anything
+   matching that".
+5. You cannot answer questions about other users' projects, even if the
+   user names them.
+6. Respect the user's role. If the user has global_role=standard, do not
+   speculate about admin‑only functions.
+
+Style:
+- Direct, professional, slightly warm. Lawyer‑adjacent.
+- Reply in Markdown. Use lists, code blocks, blockquotes.
+- Cite specifically (case numbers, dates, court names) — never "around
+  the 14th".
+- When uncertain, flag it. ("I don't see a deadline matching that
+  description on the projects you can access.")
+- No emojis unless the user uses one first.
+
+You are NOT:
+- A code‑writing assistant
+- A replacement for legal advice
+- A web search
+```
+
+This is ~250 input tokens — well under the budget.
+
+#### 2.2.2 Per‑message envelope
+
+The browser POSTs to `/api/paliadin/turn` with `{ session_id, user_message, history }`, where `history` is the prior turns *in the current session only* (session = browser tab; localStorage backs it). The server prepends the system prompt and runs the tool‑use loop.
+
+#### 2.2.3 Tool use vs RAG‑only (Q2 secondary)
+
+**Tool use, not RAG.** RAG (vector search over chunks of paliad content) is the wrong shape for this surface — paliad data is highly structured, the most useful answers come from filtered SQL queries (e.g. "all deadlines on my projects with `status='pending'` and `due_date<=now()+7d`"), and a vector store would just paraphrase what an SQL query returns more accurately. Tools give the model the same query power the user has, with hard visibility gates. Phase 2 may add RAG over a small static corpus (HL Patents Style guide, Paliadin docs) if onboarding queries don't get good answers from glossary lookups alone.
+
+### 2.3 Long‑lived service vs lexy‑style worker spawn (Q4)
+
+**Recommendation: long‑lived Go service (in‑process) — *not* a per‑session Claude Code worker.**
+
+| Option | Latency to first token | Cost / turn | Operational shape |
+|---|---|---|---|
+| In‑process Go service calling Anthropic API directly | < 1 s (just network + queueing) | Pay only for the model tokens we use | Single binary, single Postgres conn, scales with paliad |
+| `mai hire paliadin` per session (Claude Code worker) | 5–15 s | Worker startup overhead × N concurrent sessions × Claude Code's own context overhead | Operational footprint of running a worker per active user — dozens of tmux panes, tasks, reports |
+
+The lexy / cassandra worker pattern works because it's *batch*: classify N judgments, emit JSON, exit. A chat surface needs sub‑second response times across dozens of HLC users in parallel. A Claude‑Code‑per‑session pattern would give each user their own Claude in the loop, with all the tooling and message‑bus scaffolding that implies — wrong scale of abstraction.
+
+**That said, two things from the worker pattern do carry over:**
+1. **System‑prompt voice.** The lexy / mai-lexy SKILL.md persona ("Sharp, analytical, direct. Cites provisions and case law naturally. Flags uncertainty honestly.") is the right voice for Paliadin. We borrow it — see §2.2.1.
+2. **Tool catalog shape.** The lex-research SKILL.md tool list (search → fetch full text → enrich → analyse → cite) maps cleanly onto Paliadin's read tools — see §3.
+
+### 2.4 mlex / `/lex-*` carry‑over map (Q3, Q18)
+
+**Inventory result, with the shape‑vs‑code split called out for each:**
+
+| Skill / asset | What it does | Carry‑over to Paliadin |
+|---|---|---|
+| `~/dev/mLex/` (workspace) | `extractions/` (per‑case JSON), `analysis/` (markdown reports), `docs/` (legal references), `extractions/queue.json` | **None as code.** Workspace artifacts are the *output* of the skills — they don't give us anything embeddable. |
+| `lex-research` skill | UPC case law search → analysis report. Tool catalog: `mcp__supabase__execute_sql`, `mcp__youpc__*`, `mcp__youpc-memory__*`. Output format: structured markdown with citation tables. | **Voice + tool‑catalog shape.** "Search → enrich → analyse → cite" is the Paliadin flow. The skill's output‑format conventions (case number on first mention, division comparison tables) seed the system prompt's style guidance. |
+| `lex-extract` skill | Read full judgment text → structured holdings / principles / interpretations JSON. | **Not v1.** Phase 2 candidate iff Paliadin gets a `extract_judgment(node_id)` write tool — orthogonal to read‑only v1. |
+| `lex-classify` skill | Classify judgments against a 47‑leaf taxonomy. | **Not v1.** Same as above — write‑surface, batch‑shaped, irrelevant to interactive Q&A. |
+| `lex-classify-patent` skill | Classify patents into IPC technology sectors via Anthropic. | **Pattern reference only.** It's already an Anthropic‑backed pipeline, so its prompt structure is a working example we can crib from for the system‑prompt template — but the actual classification target is paliad‑irrelevant. |
+| `mai-lexy` skill | Lawyer persona that orchestrates the above. "Citation‑backed, flags uncertainty." | **Voice template.** The persona text is the closest thing to a working Paliadin system prompt; §2.2.1 borrows directly from it. |
+| `claude-api` skill | Anthropic SDK / Messages API patterns + prompt caching guidance. | **Implementation reference for the Go client + caching strategy.** §6.4 picks up its prompt caching guidance. |
+
+**Anti‑reuse:** the `mcp__youpc__*` MCP tools that `lex-research` uses are designed for an interactive Claude Code session. Paliadin's tools must instead be Go service calls — same data shape, different transport. Don't try to embed an MCP client in a paliad Go process; rebuild the same SQL queries against the same Postgres directly.
+
+### 2.5 Tool catalog v1 (Q17)
+
+Seven read‑only tools. Each is a thin Go shim around an existing service; each enforces visibility through that service's existing `visibilityPredicate`.
+
+| Tool name | Backing service / method | Inputs | Output (truncated to fit budget) |
+|---|---|---|---|
+| `whats_on_my_plate` | `DashboardService.Get(userID)` | none | `{deadline_summary, appointment_summary, matter_summary, upcoming_deadlines[≤10], upcoming_appointments[≤10], recent_activity[≤10]}` |
+| `list_my_projects` | `ProjectService.ListVisible(userID, filter)` | optional `{status, kind}` | `[{id, kind, label, status, parent_id, path}]` paged 25 |
+| `get_project_detail` | `ProjectService.Get(userID, id) + DeadlineService.ListByProject + AppointmentService.ListByProject + PartyService.ListByProject + DerivationService.AttachedUnits` | `{project_id}` | `{project, deadlines[≤25], appointments[≤25], parties[≤10], partner_units[≤5]}` — 503 if user can't see it (LLM gets a clean "not found", same response as truly missing) |
+| `search_my_deadlines` | new helper on `DeadlineService` (reuses `visibilityPredicate`) | `{q?, status?, project_id?, due_after?, due_before?, limit≤25}` | `[{id, title, due_date, status, project_label, court}]` |
+| `list_my_appointments` | new helper on `AppointmentService` | `{from, to, project_id?}` | `[{id, title, start_at, end_at, location, project_label}]` |
+| `lookup_court` | `CourtService.Search(q)` (firm‑wide; no visibility filter — courts are reference data) | `{q}` | `[{slug, name, country, kind, address, vacation_periods[≤4]}]` truncated 10 |
+| `lookup_glossary_term` | static JSON loader (`internal/handlers/glossary.go` data) | `{q, lang?}` | `[{de, en, definition, category}]` top 5 |
+| `lookup_deadline_rule` | `DeadlineRuleService.SearchConcept(q)` | `{q}` | `[{rule_code, concept_label, trigger_event, deadline_text, legal_source}]` top 5 |
+
+**Bumped out of v1 (Phase 2 candidates):**
+
+- `list_my_pending_approvals` (the inbox bell payload) — useful but adds RLS surface; let v1 stabilise first.
+- `search_youpc_case_law` — m's framing example, but cross‑schema → bigger blast radius. Phase 2 once Paliadin proves its weight on paliad‑internal data.
+- `search_my_audit_log` — high signal but PII heavy.
+- `compute_frist` — would invoke the existing `DeadlineCalculator`. Useful but the user can already do this on `/tools/fristenrechner`; defer until we see queries that actually want it.
+- All write tools (`create_deadline`, `attach_partner_unit`, etc.) — Phase 3 minimum, with hard confirmation gate (see §6).
+
+### 2.6 The tool‑use loop (Q2 tertiary)
+
+Standard Anthropic tool‑use loop:
+
+```
+1. Build messages = [system, ...history, user_message]
+2. POST /v1/messages with tools=[...catalog]
+3. Stream assistant reply chunks → relay to client SSE
+4. If stop_reason == "tool_use":
+     for each tool_use block:
+        execute tool(input) on the matching Go service
+        emit tool_result block back into messages
+     goto 2 (with the same stream/SSE connection)
+5. If stop_reason == "end_turn": close stream
+```
+
+**Hard cap on the loop:** ≤ 5 tool‑call rounds per turn. After 5 rounds without `end_turn`, force‑close with "Sorry, I got stuck — try rephrasing." Hitting the cap is a UI red flag we want to see in audit (see §6.3).
+
+---
+
+## §3 Sub-design B — Data access, RLS, PII
+
+Answers Q5, Q6, Q7.
+
+### 3.1 Knowledge sources for v1 (Q5)
+
+**Recommendation: paliad‑internal data + paliad's static reference data ONLY. youpc.org case law deferred to Phase 2.**
+
+| Source | v1 | Reason |
+|---|---|---|
+| **Per‑user paliad data** (deadlines, appointments, projects, parties, partner units, attached units) | ✅ | The whole point of Paliadin. Visibility enforced via `visibilityPredicate` (every backing service already does this; tool inherits it). |
+| **Static reference data** in paliad (court catalog t‑122, glossary, deadline rules, Fristenrechner concept tree) | ✅ | Firm‑wide, no per‑user gating, low blast radius. |
+| **UPC case law** (youpc Postgres `data.judgments`, `data.judgment_markdown_content`) | ❌ Phase 2 | Cross‑schema SELECT is technically trivial (same Postgres) but: (a) inflates the v1 surface; (b) brings in 1700+ judgments → scaling RAG/full‑text question; (c) m's framing called out research as a *use case*, not a v1 must‑have. Ship paliad‑internal Q&A first; layer case‑law on once the substrate is proven. |
+| **HL Patents Style guide / Paliad onboarding docs** | ❌ Phase 2 | No internal corpus exists yet; would need docs‑authoring + indexing. The `lookup_glossary_term` tool already covers the most common onboarding question shape ("was bedeutet X?"). |
+| **External web search** | ❌ | Out of scope; Paliadin is a *grounded* assistant, not a web surfer. m can use the regular Claude for that. |
+
+**Ranking inside the v1 set (when Paliadin has to choose):**
+
+1. User‑data tools first when the question references "my", "the case", "the deadline", or names a project / case number that resolves.
+2. Static reference next when the question is conceptual ("what's a Klageerwiderung?", "which court is the Munich LD?").
+3. Combine when both apply ("when is my Klageerwiderung due?" → `lookup_deadline_rule` for the rule + `search_my_deadlines` for the user's instance).
+
+The system prompt names tools in this priority order; the model's tool‑selection follows.
+
+### 3.2 Auth / visibility boundary (Q6)
+
+**The gate:** every backing service already runs `visibilityPredicate(alias)` against the caller's UUID. The Paliadin tool shim is a 5‑line wrapper that calls the service with `userID` derived from `auth.UserIDFromContext(r.Context())` at the SSE handler boundary. There is no service‑role escape — the shim simply has no other UUID to pass in.
+
+**Belt‑and‑braces:** every tool result is inspected for `project_id` columns; for each distinct `project_id`, the shim asserts `paliad.can_see_project(_project_id)` returns `true`. (Defence‑in‑depth: catches any future service‑layer regression where someone forgets the predicate. Costs one extra cheap function call per tool turn; cheap.)
+
+**The "tell, don't disclose" rule (§2.2.1 hard‑rule 4):** if the user names a project they cannot see, the tool returns `{error: "not found"}` — same response as a project that doesn't exist. The system prompt instructs the model to say "I couldn't find anything matching that" without distinguishing the two cases. This is the same rule the t‑144 ViewService already applies.
+
+**Cross‑user PII in tool outputs:** tool outputs may legitimately contain other users' display names (e.g. project teams, deadline assignees). These are visible to the caller through the regular UI already, so disclosing them through Paliadin is no worse. We do NOT redact them.
+
+**Approval / partner‑unit derivation:** `get_project_detail` returns the derived team (per t‑139 `DerivationService.AttachedUnits`). Same predicate as the rest of the app.
+
+### 3.3 PII handling, retention, encryption (Q7)
+
+**v1 stance: minimum viable persistence, maximum auditability of the access pattern.**
+
+| Data | Stored where | Retention | Encryption | Notes |
+|---|---|---|---|---|
+| Conversation history (the actual messages) | **Browser localStorage only.** Cleared on browser data wipe / reload‑with‑fresh‑session. | Session only | n/a | Phase 2: opt‑in DB persistence with retention controls. |
+| Per‑request audit row | New `paliad.paliadin_turns` table | Forever (matches audit‑log pattern; soft‑delete only) | At‑rest by Postgres / Supabase volume encryption | Stores: `turn_id, user_id, started_at, finished_at, model, input_tokens, output_tokens, tool_calls (jsonb of tool names + arg hashes — NOT arg values), prompt_hash (sha256 of redacted user message), error_code`. **No prompt body, no completion body.** |
+| Tool‑call inputs (e.g. project_id arguments) | Hashed (sha256) into the audit row's `tool_calls` jsonb | Forever | n/a | The hash is enough to detect "this user kept asking about project X" patterns without storing the readable id. |
+| Anthropic API request/response bodies | **Not stored.** Streamed through the Go service straight to the SSE writer. | n/a | TLS in flight | Anthropic's own retention is governed by the org's API contract — pulling Paliad onto an existing HLC enterprise key would inherit that. |
+
+**Why this shape:**
+
+- **Compliance‑lite v1.** HLC's compliance team has not yet weighed in on AI‑mediated PII (memory says the Phase H decision was "we don't want anthropic API… for a while"). Storing the full transcript opens a retention/disclosure question we don't need to answer to ship Paliadin's MVP. The audit‑metadata row is enough to demonstrate: (a) who used it, (b) how often, (c) what tools they triggered, (d) cost.
+- **Phase 2 transcript persistence** would add a `paliadin_messages` table (turn_id FK, role, content, redact_marks jsonb) and a per‑user setting "keep my history". Default off.
+- **Why no PII redaction in the user prompt?** v1 is opt‑in (the user typed the prompt). Redacting client names / case numbers in the audit hash would defeat the point; we redact by *not storing the prompt*, only its hash.
+
+**The Anthropic side:** if HLC's enterprise contract forbids vendor‑side retention, the Go client must set `metadata: {user_id: "<hash>"}` and ensure the API call is on an org with zero‑retention guarantees. **Open question for m: which Anthropic key are we using — m's personal key (existing `ANTHROPIC_API_KEY` precedent in mAi/youpcms) or a new HLC enterprise key?** This is the single biggest compliance question; see §9.2.
+
+---
+
+## §4 Sub-design C — UX
+
+Answers Q8, Q9, Q10, Q11, Q12.
+
+### 4.1 Surface placement (Q8)
+
+**Recommendation (counter to brief): start with a dedicated `/paliadin` full‑page route + a sidebar entry under the "Übersicht" group. Defer the right‑drawer to Phase 2.**
+
+| Option | v1? | Why |
+|---|---|---|
+| **`/paliadin` full page** + sidebar entry | ✅ | Lowest CSS risk; mobile‑responsive for free (paliad's existing breakpoints work); easy to test via Playwright; matches paliad's "every feature is a top‑level page" pattern; no z‑index / overlay debugging. |
+| Right‑drawer slide‑out from any page | ❌ Phase 2 | Pretty, matches m's "panel docked into UI" framing — but adds: drawer toggle wiring on all 30 pages, scroll‑lock interaction, focus management, mobile small‑screen fallback. Not worth the v1 surface area. Phase 2 wraps the same `/paliadin` UI in a slide‑out container. |
+| Floating bottom‑right bubble | ❌ | Clippy comparison is *visual*, not *positional*. A floating overlay on every page collides with the BottomNav on mobile (already 5/5 slots) and the inbox bell on desktop. |
+| Page‑embedded panel on `/paliadin` only | — | This *is* the v1 recommendation, just framed differently. |
+
+**Sidebar entry:**
+
+```
+Übersicht
+  Start
+  Agenda
+  Inbox 🛎
+  Paliadin ✨   ← new, ICON_SPARKLE
+```
+
+Group placement under Übersicht (not under Tools or Wissen) because Paliadin is conversation about *the user's work*, not a knowledge tool.
+
+**Mobile:** Paliadin is reachable via the sidebar drawer (existing mobile pattern). No BottomNav slot — those are full and the ranking (Start / Projekte / + / Agenda / Menü) is more important than a chat shortcut for v1.
+
+### 4.2 Avatar / personality (Q9)
+
+**Recommendation: no avatar SVG in v1. Just a chat panel with the name "Paliadin" in the header. Mascot is Phase 2.**
+
+Why:
+
+- Mascot design is a real design exercise (3–4 iterations to get something that doesn't read as kitsch in a law firm). Not inventor's call to bash one out in a v1 ship.
+- The brand cue (lime‑green `#c6f41c` accent) is enough to make Paliadin feel like part of paliad without a character.
+- Paliadin's *personality* lives in the system prompt (§2.2.1), not in pixels. Voice carries the buddy framing; mascot makes it visual but isn't load‑bearing.
+
+What we ship in v1 instead:
+
+- Header: "✨ Paliadin" (sparkle icon + name) above the chat panel.
+- Empty‑state prompt: "Was kann ich für dich tun?" (DE) / "How can I help?" (EN).
+- One‑line tagline under the header: "Ich kenne deine Akten und Paliads Wissensbasis." (DE) / "I know your matters and Paliad's knowledge base." (EN). This is the *only* v1 affordance that explicitly tells the user "I see your data" — load‑bearing for the differentiation argument in §0/§9.1.
+
+**Phase 2 mascot brief (for when m greenlights it):** small SVG, friendly, lime‑green primary, no eyes‑darting / animated‑on‑idle (creepy), modular pose set so it can react to "thinking" / "found it" / "stuck" without being an MMORPG pet.
+
+### 4.3 Onboarding hint (Q10)
+
+**Recommendation: silent‑until‑invoked. No proactive pop‑up, no first‑run modal, no toast.**
+
+Why:
+
+- Paliad already has a polished onboarding flow (t‑paliad‑034). Adding a Paliadin pop‑up on top would be the kind of "surprise the user" affordance that erodes trust the first time it misfires.
+- The empty‑state inside `/paliadin` itself is the right onboarding surface: 3 starter‑prompt buttons rendered when the chat is empty.
+
+**Three starter prompts (DE primary):**
+
+1. "Was steht heute an?" → triggers `whats_on_my_plate`
+2. "Welche Fristen sind diese Woche fällig?" → triggers `search_my_deadlines` with `due_before=now()+7d`
+3. "Erkläre mir Klageerwiderung." → triggers `lookup_glossary_term` + `lookup_deadline_rule`
+
+EN equivalents: "What's on my plate?" / "Which deadlines are due this week?" / "Explain Klageerwiderung."
+
+Picking one from the row sends it as if the user typed it. Keeps the surface zero‑weight when ignored.
+
+**Phase 2 candidate:** post‑onboarding email / inbox card "Paliadin ist live, frag ihn was deine Daten dir sagen." Driven by the existing reminder/email substrate. Out of v1 scope.
+
+### 4.4 Action chips in responses (Q11)
+
+**Recommendation: action chips parsed from a simple inline syntax in the model's reply, rendered client‑side, NOT a tool the model invokes.**
+
+Why simple syntax over a tool: tool invocations cost a round‑trip; we want the model to "suggest" an action without paying for an extra tool turn. The model emits a structured marker in its prose; the frontend client parses it and renders a chip below the bubble.
+
+**Marker format:**
+
+```
+[#deadline-OPEN:c47bd2]
+[#projekt-OPEN:slug-x]
+[#frist-OPEN:c47bd2]
+[#termin-OPEN:abc123]
+[chip:nav:/projects/abc-123]   (for arbitrary navigation)
+[chip:filter:status=pending&due=this_week]   (for parameterised inbox links)
+```
+
+The system prompt teaches the model to emit chips when navigation or filtering would help the user act on the answer. Each marker resolves to one chip, rendered as:
+
+```
+┌──────────────────────────────────────┐
+│ Frist 16.05.2026 fällt morgen.       │
+│ [Frist öffnen] [Akte ansehen]        │
+└──────────────────────────────────────┘
+```
+
+**Client parser** (`frontend/src/client/paliadin.ts`): regex over the streamed text, replaces marker with a button. Buttons are real `<a>` elements (Cmd‑click works, keyboard works), styled like the existing `.entity-table` row chips.
+
+**Why not let the model embed full URLs?** Two reasons:
+1. URLs change (we renamed `/akten` → `/projekte` mid‑project). Markers are stable; we resolve them at render time.
+2. Hallucinated URLs are real risk. If the model can only emit a marker tied to an id we *know* it just retrieved, the chip can't navigate to a fake page.
+
+### 4.5 Streaming + interruption (Q12)
+
+**Recommendation: SSE stream from `/api/paliadin/stream`, client EventSource, user‑initiated abort via "Stop" button.**
+
+#### 4.5.1 Stream shape
+
+Mirrors Anthropic's native streaming events, adapted for our SSE consumer:
+
+```
+event: meta
+data: {"turn_id":"01H…","model":"claude-sonnet-4-6"}
+
+event: content_delta
+data: {"text":"Auf der Akte Müller…"}
+
+event: tool_call
+data: {"name":"search_my_deadlines","args_hash":"…","status":"running"}
+
+event: tool_result
+data: {"name":"search_my_deadlines","status":"ok","summary":"3 results"}
+
+event: content_delta
+data: {"text":"… ist die Klageerwiderung am 16.05. fällig."}
+
+event: chip
+data: {"kind":"deadline","action":"open","id":"c47bd2"}
+
+event: end
+data: {"input_tokens":342,"output_tokens":88,"tool_calls":1}
+
+# heartbeat every 25 s to keep Traefik from reaping
+event: ping
+data: {}
+```
+
+The `tool_call` / `tool_result` events are visible in the UI as small dim "ran search_my_deadlines (3 results)" lines under the bubble — the **citation evidence** that distinguishes Paliadin from a generic chatbot. (Direct quote from the §0 framing: "the differentiation collapses if v1 doesn't make the data‑grounding visible.")
+
+#### 4.5.2 Interruption
+
+- "Stop" button next to the input. Click → `EventSource.close()` + `fetch('/api/paliadin/stream/{turn_id}/abort', {method:'POST'})`.
+- Server abort closes the upstream Anthropic request via context cancellation.
+- Stopped turns still write an audit row with `error_code='user_aborted'` so we see how often users hit it.
+
+#### 4.5.3 Reconnect
+
+Same Last‑Event‑ID resume pattern the t‑145 chat design specced. Server keeps the in‑flight stream buffered for 30 s after disconnect; reconnect within that window replays missed events. After 30 s, the turn is considered done — reconnect arrives at the start of a fresh session.
+
+---
+
+## §5 Sub-design D — Token budget, cost, audit
+
+Answers Q13, Q14, Q15, Q16.
+
+### 5.1 Per‑request token cap (Q13)
+
+**Recommendation: `max_input_tokens=4000` (model's view of input including system + history + tool defs + user msg) and `max_tokens=2000` (model's max output) — same as brief. Hard‑fail above; soft‑truncate history below.**
+
+Rationale:
+
+- A typical paliad data tool result is < 500 tokens (truncated lists, capped at 25 rows). Even with system prompt (~250) + tool defs (~600) + 5 prior turns (~600 each on average) the input stays well under 4 k.
+- If the conversation runs long (~8+ turns), the client/server soft‑truncates history (drops oldest user/assistant pairs first) before sending. The user sees a "Earlier in this conversation, we discussed X (truncated)" pseudo‑system message. Cleaner than failing the turn.
+- Hard cap at 6 k input tokens — over that, refuse the turn with "Conversation too long, start a new one." Defends against jailbreak attempts that try to balloon the prompt.
+
+**Cost math at Sonnet 4.6 per‑turn typical (3 k input, 1 k output):** ~$0.012/turn. At 30 turns/hour/user × 38 onboarded HLC users × 5 working hours/day = ~5 700 turns/day = **~$70/day worst case**. Realistic load is probably 10× lower. Phase 2: prompt caching (§5.4) drops it further.
+
+### 5.2 Conversation history persistence (Q14)
+
+**Recommendation: session‑only in v1. Persistent threads in Phase 2.**
+
+| Option | v1? | Why |
+|---|---|---|
+| Session‑only (browser localStorage, cleared on tab close + Sign Out) | ✅ | Zero schema. Zero retention question. Aligns with §3.3 "minimum viable persistence." Lets us ship paliadin without compliance review of stored transcripts. |
+| Persistent threads (DB‑stored, named) | ❌ Phase 2 | Real schema (`paliadin_threads`, `paliadin_messages`), retention policy, cross‑device sync, "delete my history" UX, possibly opt‑in toggle. None of which is needed to validate "is Paliadin actually useful". |
+
+**Edge case: page reload during a conversation.** localStorage persists the history *for that browser tab*. Closing and reopening the tab restores. Closing the browser & reopening also restores. Sign‑out clears. Multi‑device = different histories. We're explicit about this in the panel header: "Conversation lives in this browser only" tooltip.
+
+**Why opt for slightly worse UX over the easy schema work:** the t‑paliad‑145 chat just got parked over an *adoption*‑risk concern, not a schema concern. Paliadin should ship the smallest possible footprint that proves usefulness. Persistent threads can be a "you asked for this" Phase 2.
+
+### 5.3 Rate limit per user (Q15)
+
+**Recommendation: 30 turns/hour/user (slightly tighter than the brief's 50). Plus a global ceiling of 1 000 turns/hour across the firm. Both configurable.**
+
+Per‑user 30/hour because:
+
+- 30/hour ≈ one turn every two minutes during sustained use. That's heavy use. A reasonable user asks 3–5 questions in a session.
+- Soft hint at 25 ("you've used 25 of 30 messages this hour"), hard block at 30 with retry‑after.
+- Lower than 50 to give us a safety margin for runaway cost in week 1; we can raise it once we see real usage.
+
+Global 1 000/hour ceiling because:
+
+- Global cap = circuit breaker against the long tail (a script that sends 1000 turns/hour from one user we missed in the per‑user cap, or a developer bug).
+- 1 000 turns × ~$0.012 = $12/hour worst case = $288/day. We tolerate that for a day; we'd notice and tune.
+
+**Storage:** simple Postgres `paliad.paliadin_rate_limit` table with `(user_id, hour_bucket, turn_count)` upserted on every turn start. No Redis, no extra dependency. Fast at this scale.
+
+**Admin override:** global_admin can lift their own cap (they typically test things). Surface this in the audit row, not in a CLI.
+
+### 5.4 Audit + logging (Q16)
+
+**Recommendation: every turn writes a metadata‑only row to `paliad.paliadin_turns`. Full transcripts are NOT stored in v1. Tool‑call args are hashed. Anthropic vendor side is governed by org‑level retention.**
+
+#### 5.4.1 Schema (migration 057)
+
+```sql
+CREATE TABLE paliad.paliadin_turns (
+    turn_id           uuid PRIMARY KEY,
+    user_id           uuid NOT NULL REFERENCES paliad.users(id),
+    session_id        text NOT NULL,                  -- browser session, opaque
+    started_at        timestamptz NOT NULL DEFAULT now(),
+    finished_at       timestamptz,                    -- NULL until end‑of‑turn
+    model             text NOT NULL,                  -- e.g. 'claude-sonnet-4-6'
+    input_tokens      int,                            -- from Anthropic usage block
+    output_tokens     int,
+    tool_calls        jsonb NOT NULL DEFAULT '[]',    -- [{name, args_hash, status, latency_ms}]
+    prompt_hash       text,                           -- sha256 of user_message after PII redaction (best effort)
+    response_hash     text,                           -- sha256 of full response (citation only, not stored)
+    chip_count        int NOT NULL DEFAULT 0,
+    error_code        text,                           -- NULL on success; 'user_aborted', 'rate_limited', 'token_cap', 'tool_loop_cap', 'upstream_error'
+    estimated_cost_usd numeric(10, 6)                 -- for ops dashboards
+);
+
+CREATE INDEX paliadin_turns_user_started_idx
+    ON paliad.paliadin_turns(user_id, started_at DESC);
+CREATE INDEX paliadin_turns_started_idx
+    ON paliad.paliadin_turns(started_at DESC);
+
+ALTER TABLE paliad.paliadin_turns ENABLE ROW LEVEL SECURITY;
+
+-- User sees their own; global_admin sees all.
+CREATE POLICY paliadin_turns_select
+    ON paliad.paliadin_turns FOR SELECT
+    USING (
+      user_id = auth.uid()
+      OR EXISTS (SELECT 1 FROM paliad.users u
+                  WHERE u.id = auth.uid() AND u.global_role = 'global_admin')
+    );
+
+-- Service-role (paliad backend) writes; no user‑direct INSERT.
+-- (Paliad uses service-role conn, so policies on writes are inert,
+-- but we still ENABLE RLS so future direct‑auth callers are gated.)
+```
+
+Rate‑limit table also lives in this migration:
+
+```sql
+CREATE TABLE paliad.paliadin_rate_limit (
+    user_id     uuid NOT NULL REFERENCES paliad.users(id),
+    hour_bucket timestamptz NOT NULL,
+    turn_count  int NOT NULL DEFAULT 0,
+    PRIMARY KEY (user_id, hour_bucket)
+);
+```
+
+#### 5.4.2 What we DON'T store (v1)
+
+- The user's actual prompt text. Only `prompt_hash`.
+- The model's actual response text. Only `response_hash`.
+- The tool inputs. Only `tool_calls[].args_hash`.
+
+**Phase 2 transcript persistence** unlocks all three — deliberately separate migration so the compliance review sits at *that* boundary.
+
+#### 5.4.3 Vendor retention
+
+The Anthropic side is governed by the org‑level contract. **Open question for m (§9.2):** does HLC have an enterprise / zero‑retention agreement, or are we using m's personal key (matches existing `ANTHROPIC_API_KEY` precedent in mAi/youpcms)? The answer changes whether v1 needs a "data sent to Anthropic" disclosure on first use.
+
+#### 5.4.4 Prompt caching (Phase 2)
+
+The Anthropic API supports prompt caching for repeated system prompts + tool definitions. Our system prompt + 7 tool defs is ~850 tokens — perfect cache target. Phase 2: enable cache_control on the system block; cuts input cost by ~90% on repeat turns within the 5‑minute cache window. Skip in v1 to keep the client minimal; pick up after the API surface stabilises.
+
+---
+
+## §6 Schema, endpoints, files
+
+### 6.1 New endpoints
+
+| Method | Path | Purpose | Auth |
+|---|---|---|---|
+| `POST` | `/api/paliadin/turn` | Initiate a turn — assigns `turn_id`, opens SSE | logged‑in (302 to /login otherwise) |
+| `GET` | `/api/paliadin/stream/{turn_id}` | SSE stream of the turn's response (mostly invoked from the same `POST` to keep the connection live; separate GET supports reconnect) | logged‑in |
+| `POST` | `/api/paliadin/stream/{turn_id}/abort` | User cancels mid‑turn | logged‑in, must own the turn |
+| `GET` | `/api/paliadin/limits` | Returns `{used_this_hour, hourly_cap, global_cap, global_used}` | logged‑in |
+| `GET` | `/paliadin` | The page shell (server‑renders the panel + initial empty state) | logged‑in |
+| `GET` | `/admin/paliadin` | Per‑user usage / cost dashboard | global_admin |
+
+The `POST /api/paliadin/turn` returns `{turn_id, sse_url}`; the client opens an `EventSource` on `sse_url`. Two‑step keeps the POST cheap for telemetry / audit row creation, while the long‑lived stream lives on a GET that's safe to retry / resume.
+
+### 6.2 New / extended services
+
+| File | Status | Purpose |
+|---|---|---|
+| `internal/services/paliadin/service.go` | NEW | The orchestrator: run loop, history truncation, rate‑limit check, audit‑row writer |
+| `internal/services/paliadin/anthropic.go` | NEW | Hand‑rolled Messages API client (POST `/v1/messages`, stream parser) |
+| `internal/services/paliadin/tools.go` | NEW | Tool catalog declaration + dispatch into existing services |
+| `internal/services/paliadin/prompt.go` | NEW | System prompt template + per‑turn assembly |
+| `internal/handlers/paliadin.go` | NEW | HTTP / SSE handlers |
+| `internal/services/deadline_service.go` | extend | Add `SearchVisible(userID, q, status, projectID, dueAfter, dueBefore, limit)` (currently search is only on the global Fristenrechner matview) |
+| `internal/services/appointment_service.go` | extend | Add `ListVisibleInWindow(userID, from, to, projectID)` |
+| `internal/services/glossary_service.go` | NEW (or refactor of glossary handler data load) | A real service so the tool can call it; today it lives inline in the handler |
+
+### 6.3 Frontend
+
+| File | Status | Purpose |
+|---|---|---|
+| `frontend/src/paliadin.tsx` | NEW | Page shell |
+| `frontend/src/client/paliadin.ts` | NEW | Chat panel, EventSource, history serialise to localStorage, chip parser, "Stop" button |
+| `frontend/src/styles/global.css` | extend | New CSS section: `.paliadin-panel`, `.paliadin-bubble`, `.paliadin-bubble--user/--assistant/--tool`, `.paliadin-chip`, `.paliadin-input`, `.paliadin-meta` |
+| `frontend/src/components/Sidebar.tsx` | extend | Add Paliadin navItem to the Übersicht group with `ICON_SPARKLE` |
+| `frontend/src/i18n-keys.ts` | extend | ~25 new keys: `paliadin.title`, `paliadin.tagline`, `paliadin.starter.*`, `paliadin.empty`, `paliadin.input.placeholder`, `paliadin.stop`, `paliadin.rate_limited`, `paliadin.error.*` |
+
+### 6.4 Migration 057
+
+```
+057_paliadin.up.sql:
+  - paliad.paliadin_turns (audit row, RLS, indexes)
+  - paliad.paliadin_rate_limit (counter table, PK on user+hour)
+  - GRANTs: service-role full, anon read disallowed by RLS
+057_paliadin.down.sql: drop both tables.
+```
+
+### 6.5 Env vars (add to CLAUDE.md table)
+
+| Variable | Required | Purpose |
+|---|---|---|
+| `ANTHROPIC_API_KEY` | for Paliadin | Anthropic Messages API key. **Replaces** the "do not set" row that referred to the parked Phase H. Without it, `/paliadin` returns 503 (server still boots; the rest of paliad keeps working). |
+| `PALIADIN_MODEL` | optional (default `claude-sonnet-4-6`) | Override model for tuning / fallback to Haiku for cost or Opus for accuracy without redeploying. |
+| `PALIADIN_HOURLY_CAP` | optional (default `30`) | Per‑user turn cap per hour. |
+| `PALIADIN_GLOBAL_HOURLY_CAP` | optional (default `1000`) | Firm‑wide turn cap per hour. |
+| `PALIADIN_MAX_INPUT_TOKENS` | optional (default `4000`) | Soft cap; over this we truncate history. |
+| `PALIADIN_MAX_OUTPUT_TOKENS` | optional (default `2000`) | Hard cap; passed straight to Anthropic. |
+
+The Service must boot **without** `ANTHROPIC_API_KEY` (return 503 on `/paliadin*` routes; rest of paliad keeps working). Same pattern as `DATABASE_URL` and `CALDAV_ENCRYPTION_KEY`.
+
+---
+
+## §7 Sub-design E — Phasing
+
+Answers Q19, Q20.
+
+### 7.1 Phase 1 (v1) — confirmed scope
+
+**Single coherent slice that proves the value proposition end‑to‑end.**
+
+| Item | In v1 |
+|---|---|
+| `/paliadin` page + sidebar entry under Übersicht | ✅ |
+| Migration 057 (`paliadin_turns` + `paliadin_rate_limit`) | ✅ |
+| Anthropic client (hand‑rolled, streaming) | ✅ |
+| 7 read‑only tools | ✅ |
+| System prompt with `branding.Name` + visibility rules | ✅ |
+| SSE stream with `meta`/`content_delta`/`tool_call`/`tool_result`/`chip`/`end`/`ping` events | ✅ |
+| Citation chips (parsed from inline markers) | ✅ |
+| Rate limiting (per‑user + global) | ✅ |
+| Audit row per turn (metadata only, no transcript) | ✅ |
+| Session‑only history (browser localStorage) | ✅ |
+| 3 starter prompts in DE+EN | ✅ |
+| Token caps + soft history truncation | ✅ |
+| `/admin/paliadin` cost dashboard (global_admin only) | ✅ |
+| ~25 i18n keys (DE+EN) | ✅ |
+| Mobile responsiveness (uses sidebar drawer like every other page) | ✅ |
+| CLAUDE.md update flipping the `ANTHROPIC_API_KEY` row | ✅ |
+
+**Estimated scope:** ~3 500–4 500 LoC for the bundled v1 ship. Comparable to t‑144 (Custom Views) and t‑145's would‑have‑been chat slice.
+
+**Single PR or split?** Recommend **single PR** for v1. The Anthropic client + tool dispatch + handler + frontend panel are too tightly coupled to ship one without the others — every component is on the critical path of "demonstrate Paliadin actually works". Splitting buys nothing review‑wise (no reviewer can validate "Anthropic client works" without "the tool dispatch that exercises it"). Use the same single‑PR pattern as t‑144 A1+A2 in retrospect.
+
+### 7.2 Phase 2 candidates (post‑v1, prioritised)
+
+In rough order of value:
+
+1. **Persistent threads** + per‑user "keep my history" toggle. Adds `paliadin_threads` + `paliadin_messages` tables, retention policy, cross‑device sync. Compliance review attaches here, not to v1.
+2. **Prompt caching** for system prompt + tool defs. ~90 % input‑cost reduction on repeat turns. Pure server‑side change.
+3. **`search_youpc_case_law` tool.** Cross‑schema SELECT into `data.judgments` + `data.judgment_markdown_content`. Returns case number, division, date, headnote, top 3 holdings. The "research assistant" use case from m's framing.
+4. **Right‑drawer mode.** Wrap the `/paliadin` panel in a slide‑out container; toggle on every page from a header button.
+5. **Mascot SVG** + idle / thinking / found‑it pose set. Real visual design pass.
+6. **Onboarding tip** — post‑onboarding inbox card or one‑time toast on first dashboard visit after Paliadin lands.
+7. **`list_my_pending_approvals` tool.** Wraps inbox bell payload.
+8. **Voice input / output.** Web Speech API (paliad already has the substrate from the no‑Voice‑v1 t‑paliad‑042 PWA).
+
+### 7.3 Phase 3 candidates (validate first)
+
+- **Write tools.** `create_deadline`, `create_appointment`, `attach_partner_unit`, `add_party`. Each behind a hard confirmation gate ("Paliadin will create a deadline 16.05. on project X — confirm? [Yes / No]"). Audit‑row marks these as mutating turns. Heavy compliance question; not Phase 2.
+- **Per‑deadline / per‑termin micro‑threads.** Long‑lived per‑entity Q&A. Plumbing collision with the (parked) chat design — re‑evaluate when chat un‑parks.
+- **Proactive Paliadin.** Push tips when the user hits a known confused state ("You've been on /tools/fristenrechner for 8 minutes — want me to walk you through it?"). Powerful, but creepy if poorly tuned.
+- **Compliance‑aware redaction layer.** Strip client names from the prompt before it leaves the building, swap stable hashes back in client‑side. Big project; only sensible if HLC compliance forbids vendor‑side PII.
+
+---
+
+## §8 Risks, mitigations, open questions
+
+### 8.1 Adoption risk (the §0 callout, expanded)
+
+**The risk:** Paliadin competes with three things HLC already has:
+1. The user's own Claude / ChatGPT in another tab (for general patent‑practice questions).
+2. "Ask a colleague on Teams" (for paliad‑specific questions about how to use the app).
+3. Just clicking around the UI (for "what's on my plate today").
+
+Paliadin's edge over (1) is data grounding. Edge over (2) is 24/7 + privacy. Edge over (3) is conversational discovery and answering one‑shot natural‑language queries that the structured UI doesn't expose.
+
+**The risk realised:** if v1 doesn't make the data‑grounding visible (citation chips, tool‑call evidence under each bubble, the tagline "I see your data"), users default to ChatGPT for everything, and Paliadin becomes a ghost feature that ate 3 weeks of build. Same pattern that just parked t‑paliad‑145.
+
+**Mitigations baked into v1:**
+
+- **Tool‑call evidence visible** in every bubble. The user *sees* "ran search_my_deadlines (3 results)" — instant differentiation from a generic chatbot.
+- **Citation chips** make answers actionable, not just informative.
+- **Tagline + empty state** explicitly say "I see your projects."
+- **Three starter prompts** demonstrate the data‑grounding immediately on first use.
+
+**Mitigations m should consider before approving:**
+
+- **Sanity‑check with two PA colleagues** before locking v1 scope. Same recommendation t‑145 got. If two PAs say "I'd just open Claude in another tab", the scope shifts toward making the data‑grounding *more* prominent (e.g. ship "Paliadin sees only your data" as a persistent banner above the input, not a tooltip) before shipping at all.
+- **Soft launch + telemetry.** v1's audit row gives us cheap measurement of: (a) total turns/day, (b) turns per user, (c) tool‑call frequency (low = Paliadin is being used like ChatGPT, defeating the differentiation). Watch for two weeks; if tool‑calls/turn < 1.5 average, the feature isn't doing what we shipped it for and Phase 2 priorities change.
+
+### 8.2 Compliance / vendor‑data risk
+
+**The risk:** sending client names + case content to Anthropic's API may not be sanctioned by HLC IT/compliance. The 2026‑04‑16 "we don't want anthropic API… for a while" decision (memory `b6a11b55…`) was about *Frist extraction from documents*; Paliadin is conversational, but the data envelope sent to Anthropic still contains PII whenever a tool returns a project name.
+
+**Mitigations:**
+
+- **HLC enterprise key** (vs m's personal key) if available — gives org‑level retention + DPA coverage.
+- **Zero‑retention configuration** on the Anthropic call (`metadata: {user_id: "<hash>"}`, `cache_control` only on the system block, no `eval` enrolment).
+- **First‑use disclosure** in the panel: "Your messages and the data Paliadin retrieves on your behalf are sent to Anthropic. [Learn more]" — load‑bearing and required if the legal answer to §9.2 is "personal key, not enterprise".
+- **Phase 2 hardening:** server‑side redaction layer that swaps client names → stable hashes before the API call, restores them client‑side after. Big project; only sensible if compliance forbids vendor‑side PII.
+
+### 8.3 Rate‑limit / runaway‑cost risk
+
+**The risk:** a user (or a bug) loops fast enough to drain budget before alarms fire.
+
+**Mitigations:**
+
+- Per‑user 30/hour + global 1 000/hour caps (§5.3). Both surfaced on `/admin/paliadin`.
+- Per‑turn token cap (§5.1).
+- Per‑turn tool‑loop cap (≤ 5 rounds, §2.6).
+- Audit row written *before* the upstream call so a rate‑limit‑evading bug still leaves traces.
+- `PALIADIN_HOURLY_CAP` / `PALIADIN_GLOBAL_HOURLY_CAP` are env‑var configurable so we can tighten without a deploy.
+
+### 8.4 Hallucination risk (model invents a deadline)
+
+**The risk:** the model fabricates a deadline date / case number that doesn't exist in the user's data.
+
+**Mitigations:**
+
+- Hard rule in system prompt: "Every concrete factual claim about the user's work MUST come from a tool call in the current conversation."
+- Citation markers tied to tool‑result IDs only. Marker `#deadline-OPEN:c47bd2` resolves only if the id was returned by a real tool call this turn (frontend validates).
+- Tool‑call‑evidence visibility: the user can see that a tool ran and what it returned. Hallucination becomes obvious because the chip says "0 results" but the bubble claims a deadline.
+- **Phase 2:** server‑side post‑hoc validation that checks every cited id against the tool‑result set; reject the message and retry if the model invented one.
+
+### 8.5 Open questions for m (please decide before coder shift)
+
+1. **Q‑A:** Anthropic key — m's personal key (existing pattern, fast) or HLC enterprise key (compliant, slower setup)? §3.3 + §8.2.
+2. **Q‑B:** First‑use disclosure required? Yes if (Q‑A = personal key) OR if compliance hasn't reviewed.
+3. **Q‑C:** Default model — Sonnet 4.6 (recommendation) or Haiku 4.5 (cheaper)? Sonnet's tool‑use quality is a meaningful step up; Haiku is fine for "what's on my plate" but weaker on multi‑tool conversations.
+4. **Q‑D:** Sanity‑check with two PAs before locking scope? (Same recommendation that just parked t‑145.) If yes, this is the gate before any coder shift starts.
+5. **Q‑E:** Surface — confirm `/paliadin` full page + sidebar entry, drawer deferred? Or push for drawer in v1?
+6. **Q‑F:** Mascot — defer to Phase 2 (recommendation), or commission an inventor‑separate design doc now so we can ship Paliadin with the visual identity?
+7. **Q‑G:** Starter prompts — are the three I picked the right entry points, or are there better DE‑first one‑liners that map to common HLC PA queries?
+8. **Q‑H:** Should Paliadin know `branding.Name` of the firm in its system prompt? Recommendation: yes (warmer voice, "in HLC's patent practice platform"). Risk: if `FIRM_NAME` rotates, prompt rotates with it; cache invalidates. Acceptable.
+9. **Q‑I:** Per‑user 30/hour cap — too low? Too high? Easy to tune later, but worth a sanity check.
+10. **Q‑J:** youpc case‑law lookup tool — keep it firmly in Phase 2, or fast‑track if HL research is high‑value?
+11. **Q‑K:** Audit row retention — forever (current recommendation, matches audit‑log pattern), or a fixed window (e.g. 90 days for cost rows, forever for compliance‑relevant)?
+12. **Q‑L:** Default language — auto‑detect from user `locale` (`paliad.users.locale` is a known pref), or follow the user's last‑message language? Recommendation: start in user's locale; switch on first non‑locale user message.
+
+---
+
+## §9 What this design does NOT cover (deliberately)
+
+- **The implementation.** This is a design pass; coder shift writes the code. No commits beyond this doc on the inventor branch.
+- **Mascot visual design.** Phase 2; deserves its own design pass (and probably a designer's eye, not an inventor's).
+- **HL Patents Style guide ingestion.** Out of v1; Phase 2 RAG candidate.
+- **Voice input / TTS output.** Phase 2.
+- **Multi‑user collaboration (e.g. share a paliadin chat).** Out of scope; users have their own visibility, and joint chat is a chat‑feature shape (parked).
+- **Offline mode.** Paliadin is online‑only by definition (it calls Anthropic). The PWA service worker should NOT cache `/paliadin` responses.
+- **The renaming question.** "Paliadin" is m's name. Locked.
+
+---
+
+## §10 Recommended implementer
+
+Same recommendation as t‑145: **noether, or a fresh coder Sonnet that has noether's substrate context.** NOT cronus per the standing memory directive on paliad.
+
+Why:
+
+- Substrate touchpoints are the same set the chat design covered: `visibilityPredicate`, `auth.UserIDFromContext`, sidebar entry pattern, migration tracker discipline, Dashboard/Agenda/Project/Deadline service interfaces. noether built half of these; the other half noether mapped during the chat design pass.
+- Anthropic Go client is novel in paliad but is small and well‑specified by §6.2 + the `claude-api` skill.
+- Front‑end SSE consumer + chip parser is a one‑page TS file.
+
+---
+
+## §11 End of design — STOP
+
+This is the inventor deliverable. Per the role brief: **STOP after design. Do not begin implementation. Do not load `/mai-coder`.** Wait for m's explicit go/no‑go on the questions in §8.5 before any coder shift starts.
+
+The completion signal sent to head will use the literal phrase **"DESIGN READY FOR REVIEW"** so the head's gate fires.