Merge: t-paliad-213 — mendel test-strategy design doc

2026-05-19 10:11:26 +02:00
parent 1e1c84b0f6 8414aa4c14
commit 6e8e2e7653
1 changed files with 557 additions and 0 deletions
--- a/docs/design-paliad-test-strategy-2026-05-19.md
+++ b/docs/design-paliad-test-strategy-2026-05-19.md
@@ -0,0 +1,557 @@
+# Design — Paliad Test Strategy (production-grade)
+
+**Author:** mendel (inventor)
+**Date:** 2026-05-19
+**Task:** t-paliad-213
+**Branch:** `mai/mendel/inventor-test-strategy`
+**Status:** DESIGN READY FOR REVIEW. No test files / Make targets / CI configs touched. Awaiting m go/no-go on §5 slice plan + §6 open questions before any coder shift.
+
+---
+
+## 0. TL;DR
+
+Paliad has accidental test discipline today: 59 `_test.go` files / 323 test functions in Go (≈45 % of services tested, ≈12 % of handlers tested) and 4 frontend test files for 90+ client modules (≈4 %). There is no committed end-to-end suite and no CI — every smoke pass is human-driven via the manual reports in `tests/`. The `mig 098` prod crash-loop, the `t-paliad-036` triple-bug after the German→English rename, and a long tail of UX regressions (deadline-done modal, calendar column drift) would all have been caught by a 10-test boot-and-click smoke pass.
+
+This design proposes a six-layer test pyramid with a concrete tool per layer (stdlib `testing` + bun's built-in `bun:test` + `playwright` for E2E — nothing third-party we don't already use). It pins three lessons paliad has paid for in commits:
+
+1. **No mocks at the service↔DB boundary.** Live-DB tests against a per-developer Postgres are the floor; in-memory mocks for `paliad.*` would have hidden every rename-after-DROP-CASCADE bug. Project preference is already in this direction (27/44 service tests are live-DB-gated); we double down rather than reverse.
+2. **Migrations must dry-run before they merge.** Every recent prod-down (mig 098, mig 020-after-rename, mig 099 audit_reason gap) was a migration that compiled, passed `go test ./...` (which skips without `TEST_DATABASE_URL`), and broke on first apply against the real schema. A `make verify-migrations` target that does BEGIN/apply/ROLLBACK in CI fixes the entire failure mode.
+3. **Browser-shaped bugs need a browser.** The fristenrechner cascade, shape-timeline render, calendar grid, inline paliadin widget — these are JS state machines. Bun's stdlib `bun:test` covers the pure parser/codec code; Playwright covers the auth-gated DOM. Don't try to substitute one for the other.
+
+Six slices roll the strategy out as tracer-bullet PRs, each independently shippable. Slice 1 (migration dry-run harness) and Slice 4 (Playwright golden-path smoke) buy the most outage-prevention per LoC; the rest is widening proven patterns.
+
+Six open questions for m at §6. Most surface a coverage-vs-cost trade-off — the picks that need m's call before any code lands are CI infrastructure choice (Q2), per-PR run-time budget (Q1), and live-DB-vs-dockerised Postgres (Q3).
+
+---
+
+## 1. Audit — what exists today
+
+Counts taken on `mai/mendel/inventor-test-strategy` @ HEAD (2026-05-19, 100 migrations applied).
+
+### 1.1 Go test inventory
+
+| Package | Source files | Test files | Test functions | Notes |
+|---|---|---|---|---|
+| `internal/services` | 56 | 44 | ~200 | 26 live-DB-gated (`TEST_DATABASE_URL`), 18 pure-Go. 24 services have **no test file at all** — see §1.4. |
+| `internal/handlers` | 59 | 7 | ~30 | Only auth-domain check, search, audit-parse, approval-error-mapping, redirects, verfahrensablauf-redirect, chart-404 covered. **53 handlers have no test file.** |
+| `internal/auth` | small | 2 | ~10 | Session middleware + require-admin. |
+| `internal/branding` | small | 1 | small | Firm-name override. |
+| `internal/offices` | small | 1 | small | Office enum. |
+| `internal/changelog` | small | 1 | small | Pure parser. |
+| `internal/calc` | small | 1 | small | Fees / fee tables. |
+| `cmd/server` | 1 | 1 | small | `main_paliadin_backend_test.go` covers env-gate selection. |
+| **Total** | **133** | **58** | **323** | |
+
+`go test ./...` runs all 58 files. Without `TEST_DATABASE_URL` set, 27 of them silently skip their live-DB cases — the suite still passes, but coverage of mutation paths drops to near zero.
+
+### 1.2 Frontend test inventory
+
+| Path | Test files | Tested |
+|---|---|---|
+| `frontend/src/client/filter-bar/url-codec.test.ts` | 1 | FilterBar URL codec round-trip. |
+| `frontend/src/client/views/format.test.ts` | 1 | Date/time formatters (regression for t-paliad-153). |
+| `frontend/src/client/views/shape-timeline-chart.test.ts` | 1 | Chart layout pure function. |
+| `frontend/src/client/views/shape-timeline-cv.test.ts` | 1 | Continuous-view shape layout. |
+| **Total** | **4** | Out of ~90 client modules (`frontend/src/client/*.ts`). |
+
+All four use bun's built-in `bun:test` (no extra dep). No DOM/jsdom tests. No Playwright. No `bun test` script in `package.json` (`bun run build` is the only script).
+
+### 1.3 End-to-end / smoke
+
+- `tests/smoke-2026-04-25.md`, `tests/smoke-auth-2026-04-25.md`, `tests/smoke-auth-2026-04-26-cleanup.md` — human-written reports with screenshots committed under `tests/screenshots-*`. No code. No re-runnable script.
+- `mai-tester` skill uses Playwright for ad-hoc runs; nothing committed.
+- No `e2e/`, no `.gitea/workflows/`, no `.github/workflows/`, no `Makefile`.
+
+### 1.4 Critical service paths with no test file
+
+These are `internal/services/*.go` for which no `*_test.go` sibling exists:
+
+| Service | Risk class | Why it matters |
+|---|---|---|
+| `caldav_service.go`, `caldav_client.go`, `caldav_crypto.go`, `caldav_ical.go` | High | Per-user push/pull goroutines + AES-GCM at rest. One pure parser test (`caldav_ical_timeline_test.go`) exists but the service + crypto + WebDAV client are blind. |
+| `agenda_service.go` | High | Dashboard agenda query; reused by `/agenda` page. Exercised transitively by visibility tests but no direct test. |
+| `dashboard_service.go` | High | Traffic-light + summary counts. Same story — transitively covered via visibility, no direct test. |
+| `derivation_service.go` | Medium | Project-tree derivation (the new t-paliad-194-era subtree machinery). |
+| `team_service.go` | Medium | Team membership / inheritance. |
+| `partner_unit_service.go` | Medium | Dezernat replacement (t-paliad-070). |
+| `party_service.go`, `note_service.go`, `link_service.go`, `checklist_instance_service.go` | Medium | All do project-scoped CRUD with the same RLS+audit pattern that `t-paliad-036` proved easy to break. |
+| `appointment_service.go` | High | Hot — every calendar mutation. Exercised through approval tests but has no own test file. |
+| `view_service.go` | Medium | Powers the substrate (`/views/*`). |
+| `paliadin_jwt.go` | Medium | Per-turn JWT mint for the aichat path (`t-paliad-194`). No call sites in tests today. |
+| `markdown.go` | Low | Glossary + checklist content render. |
+
+### 1.5 Handlers with no test file
+
+53 of 59. Notably: **`auth.go` itself** (login / logout / session creation), **`projects.go`** (the most-mutated entity), **`deadlines.go` / `appointments.go`** (writes), **`paliadin.go` / `paliadin_suggest.go`** (m-only routes — never click-tested), **`fristenrechner.go` / `fristenrechner_search.go` / `fristenrechner_event_categories.go`** (the cascade users live in), **`dashboard.go` / `agenda.go`** (landing), **`onboarding.go` / `onboarding_gate.go`** (every new user's first three minutes), **`invite.go`** (rate-limited write path). The currently-tested handlers (search, audit-parse, approval error mapping, etc.) are the cheap pure-Go ones; every handler that touches the DB is untested at handler level.
+
+### 1.6 Live-DB test scaffold — is it sound?
+
+The pattern (read from `internal/services/visibility_test.go`):
+
+```go
+url := os.Getenv("TEST_DATABASE_URL")
+if url == "" { t.Skip("TEST_DATABASE_URL not set — skipping live DB test") }
+if err := db.ApplyMigrations(url); err != nil { t.Fatalf(...) }
+pool, _ := sqlx.Connect("postgres", url)
+defer pool.Close()
+// per-test seed + cleanup via DELETE + defer cleanup()
+```
+
+Verdict: **sound, but has rough edges that need addressing before we widen.**
+
+- ✅ Migrations apply at test startup against the test DB — catches every "you forgot to add a CHECK" / "you reference a column that doesn't exist" before a real-DB-touching test runs.
+- ✅ Per-test cleanup via `DELETE FROM ... WHERE id IN ($1,...)` is explicit and idempotent.
+- ✅ The `paliad.paliad_schema_migrations` tracker collision noted in memory `0b900afa…` is a pre-existing issue, not introduced by this design.
+- ⚠️ Cleanup-via-DELETE is fragile: a test that creates a row referenced by FK from another table needs to remember to clean both. A few existing tests (see `audit_service_test.go`) already chain 5+ DELETEs.
+- ⚠️ Tests can't run in parallel against the same `TEST_DATABASE_URL` because they share schema state. `go test ./...` defaults to `-parallel` per-package; same-package tests with overlapping cleanup IDs can interfere.
+- ⚠️ No CI today actually exercises `TEST_DATABASE_URL` — so every live-DB test is effectively run only on the author's laptop or not at all. Half the value is paid-for but unbilled.
+
+### 1.7 Migration tooling
+
+- `internal/db/migrate.go` embeds `migrations/*.sql` and applies on server boot via `golang-migrate/v4` with the `paliad_schema_migrations` tracker in `public` schema.
+- 100 migrations on disk (`001` → `100`).
+- **No dry-run gate today.** A bad migration breaks `paliad.de` at boot (Dokploy crash-loops the container). Recent prod incidents: mig 098 (submission code rename), mig 099 (with_po flag drop missed audit_reason gap), mig 020 (function rename without body rewrite — see memory `49a05cfa…`).
+- `down.sql` exists for every migration but no test ever exercises it.
+
+### 1.8 CI / deploy loop
+
+- No CI. Push-to-main → Gitea webhook → Dokploy auto-builds the Dockerfile and replaces the container. The Dockerfile runs `bun run build` then `go build`. **Neither `go test` nor `bun test` runs in the build pipeline.**
+- Pre-commit hooks: none in repo. Each worker runs `go build / go vet / go test / bun run build` by convention (see memories — every shipped task report ends with "build hygiene held").
+
+---
+
+## 2. Test pyramid — recommended shape
+
+```
+                           ┌─────────────────┐
+                           │  E2E (Playwright)│  ~10 flows
+                           │  L6              │
+                           └─────────────────┘
+                       ┌─────────────────────────┐
+                       │  Handler integration    │  ~30 routes
+                       │  L5 (httptest + real DB)│
+                       └─────────────────────────┘
+                  ┌──────────────────────────────────┐
+                  │  Service-layer (live DB)         │  ~60 tests
+                  │  L4 (BEGIN/ROLLBACK harness)     │
+                  └──────────────────────────────────┘
+              ┌──────────────────────────────────────────┐
+              │  Frontend DOM / cascade (bun:test+jsdom) │  ~15 modules
+              │  L3                                      │
+              └──────────────────────────────────────────┘
+        ┌──────────────────────────────────────────────────────┐
+        │  Frontend unit (bun:test pure TS)                    │  ~30 modules
+        │  L2                                                   │
+        └──────────────────────────────────────────────────────┘
+   ┌──────────────────────────────────────────────────────────────┐
+   │  Go unit (stdlib testing, table-driven, pure functions)      │  ~150 tests
+   │  L1                                                          │
+   └──────────────────────────────────────────────────────────────┘
+   ┌──────────────────────────────────────────────────────────────┐
+   │  Migration dry-run (make verify-migrations)                  │  100 mig
+   │  L0 — gate on every PR                                       │
+   └──────────────────────────────────────────────────────────────┘
+```
+
+### Layer 0 — Migration dry-run
+
+**What:** Every `*.up.sql` in `internal/db/migrations/` is applied inside a single `BEGIN ... ROLLBACK` transaction against a scratch Postgres, in numeric order. The harness asserts each statement succeeds *and* asserts no statement leaves the schema in a `paliad_schema_migrations.dirty=true` state. A second pass applies all up-migrations end-to-end (no rollback) and then re-applies the latest up-migration to assert idempotency (every paliad migration since `t-paliad-070` has been written to be idempotent — this enforces it).
+
+**Tool:** stdlib `testing` package, no third-party. Pattern: `internal/db/migrate_test.go` with a `TestMigrations_DryRun` driven from `TEST_DATABASE_URL`. A `make verify-migrations` target wraps it.
+
+**Why this layer matters most:** Every recent prod-down was a migration. Catching them on a CI run before merge is the highest-leverage test investment paliad can make. Cost: one ~100-line Go file + one Postgres in CI.
+
+**Coverage target:** 100 % of `*.up.sql` files. Hard gate on PR — no exceptions.
+
+### Layer 1 — Go unit (pure)
+
+**What:** `go test ./...` against pure functions — formatters, parsers, validators, calculators, fee tables, deadline calculators, projection lookahead clamping, codec round-trips. No DB, no HTTP.
+
+**Tool:** stdlib `testing`. Table-driven `cases := []struct{...}{...}` style is already the house pattern (see `auth_test.go` / `projection_anchor_test.go`). **Do not introduce testify or any matcher library** — the current code reads cleanly without one, and 323 existing test functions don't need a rename pass.
+
+**What's already there:** 19 pure-Go test files (calculator, mapping, codec, holiday, fees, etc.). Density is good; targeted infill rather than re-architecture.
+
+**Coverage target:** Every pure function in `internal/services/`, `internal/handlers/`, `internal/calc/`, `internal/changelog/`. Aim for "every branch in a decision table has at least one test row." Don't chase % — chase "the obvious edge that would burn a coworker".
+
+### Layer 2 — Frontend unit (pure)
+
+**What:** `bun test` against pure TS modules — URL codecs (`filter-bar/url-codec`), formatters, parsers, i18n key correctness (every `data-i18n` attribute used in TSX has a key in `i18n.ts`), view-spec parsers, projection-row mapping helpers.
+
+**Tool:** `bun:test` (built into bun, no install). Already in use in 4 files — extend the same pattern. Add `bun test` to `package.json` `scripts`.
+
+**What to add:**
+- i18n key audit (every `t("foo.bar")` and `data-i18n="foo.bar"` resolves in both `de` and `en`).
+- `filter-bar/` types + render helpers (paliad has shipped 4 FilterBar slices; coverage is one codec test).
+- `paliadin-context.ts` route table + entity extraction (the `[ctx …]` envelope is a stable contract paliadin's SKILL.md depends on; any drift here is a silent failure).
+- `paliadin-starters.ts` registry — every route maps to ≥1 starter; every starter is bilingual.
+- View-spec parsers in `views/`.
+
+**Coverage target:** Every pure TS module in `frontend/src/client/`. Pages (TSX renderers) are E2E concern, not unit concern.
+
+### Layer 3 — Frontend DOM (cascade / jsdom)
+
+**What:** `bun test` with jsdom global, exercising the interactive cascade modules — the fristenrechner cascade builder, the shape-timeline render, the FilterBar UI (chips, panels), the calendar grid, the inline Paliadin widget message stream, the inbox-row click handler, the dashboard activity item navigation.
+
+These modules contain enough state that pure-function tests miss real bugs (e.g. the t-paliad-098 `.entity-table` row-cursor lie was a CSS+DOM bug; t-paliad-099's modal close was a DOM-event bug; t-paliad-103's `::before` overlay click-swallow was a DOM bug).
+
+**Tool:** bun + `happy-dom` is the lighter choice; if it can't handle event ordering, fall back to `jsdom`. Both are ESM-clean and bun-friendly. **Pick one and stick with it — running both means twice the dependency surface.** Default pick: `happy-dom` (smaller, paliad doesn't need legacy IE semantics).
+
+**Pattern:** import the cascade module, build a minimal DOM (`document.body.innerHTML = …`), dispatch synthetic events, assert resulting state. Reuses the production renderers — no test-only fakes.
+
+**Coverage target:** ~15 modules. Specifically:
+- `client/filter-bar/index.ts` chip render + active-state.
+- `client/fristenrechner.ts` cascade — most complex JS in the codebase; depend chains light up every UPC bug we know.
+- `client/shape-timeline.ts` lane mode + track mode (envelope wire shape brittle to refactor).
+- `client/projects-detail.ts` row click + Verlauf render.
+- `client/paliadin-widget.ts` + `paliadin-context.ts` interaction.
+- `client/inbox.ts` row-action click routing.
+- `client/dashboard.ts` activity-item nav.
+- `client/deadlines-calendar.ts` / `appointments-calendar.ts` column layout (the calendar-column-drift bug class).
+
+Not unit tests; not E2E. They are the missing middle.
+
+### Layer 4 — Service-layer (live DB)
+
+**What:** Go service methods against a real Postgres, using the existing `TEST_DATABASE_URL` pattern. Two improvements:
+
+1. **Replace per-test DELETE cleanup with a per-test transaction harness** — open a transaction, run the test inside it, ROLLBACK. Faster, isolating, no cleanup forgotten. Already viable because the service layer accepts `*sqlx.DB`-or-tx-shaped interfaces in many places; needs a small `internal/services/internal/testdb` package that exposes `WithTx(t *testing.T, fn func(*sqlx.Tx))`. Migration is mechanical, can happen alongside infill.
+
+   *Caveat:* some service methods open their own transactions internally (`approval_service.submit` is one). Those keep DELETE cleanup; the tx harness is a default, not a mandate.
+
+2. **Make `TEST_DATABASE_URL` mandatory in CI.** Today these tests are skipped on every machine that doesn't `export TEST_DATABASE_URL=…` — i.e. they don't run on autoatic pipelines because there's no pipeline. Once CI exists (§3.5), it becomes a required env var.
+
+**Tool:** stdlib `testing` + `sqlx` (already in `go.mod`). **No mocks at the service↔DB boundary.** This is m's hardest line — see global CLAUDE.md memory pattern and `t-paliad-036` (the bug that masked two other bugs would have been caught instantly by a real-DB test).
+
+**Where to invest first:** Approval (already heavy), Projection (already heavy), Fristenrechner (already heavy), DeadlineService Create/Update/Complete/Delete with `pending_request_id` interplay, AppointmentService same, ProjectService visibility predicate, CalDAV push (the four CalDAV `*.go` files have zero direct test).
+
+**Coverage target:** Every service method that mutates the DB has at least one happy-path live-DB test. RLS predicate (`visibilityPredicatePositional`) has one test per role (global_admin, member, non-member).
+
+### Layer 5 — Handler integration (httptest + real DB)
+
+**What:** Spin a real `services.DBService`, mount the protected mux, drive `httptest.NewRequest` + `ServeHTTP` against it. Auth via a fake session cookie produced by a `testauth.Login(t, userID)` helper that mints the same Supabase JWT shape `auth.UserIDFromContext` expects.
+
+**Why:** The 53 untested handlers are where the request shape ↔ service interaction lives. Examples that would have caught real bugs:
+- `t-paliad-036`'s "`/projects/{id}` 404 while `/api/projects/{id}` 200" mismatch — a 5-line handler test would have failed before the migration ran.
+- mig 020's three-stacked bug — a handler test that POSTs a deadline and asserts a 200 + read-back row would have failed at submit-time, not boot-time.
+- The audit-log query timezone bug — handler test asserts the JSON contains the expected `event_date`.
+
+**Tool:** stdlib `net/http/httptest`. **No new framework.** Pattern: handler tests live next to the handler file (`internal/handlers/deadlines_test.go` next to `deadlines.go`).
+
+**Coverage target:** Every handler that gates a state-changing route — `POST/PATCH/DELETE` flavour. Plus `GET` handlers that compose a non-trivial query (dashboard, agenda, search, audit-log).
+
+### Layer 6 — End-to-end (Playwright)
+
+**What:** A small Playwright suite (~10 flows) committed at `e2e/` with a `bun run e2e` entry. Targets a local `./paliad` against a scratch Postgres (the same `TEST_DATABASE_URL`). Each test logs in, drives the UI through one user journey, asserts visible state.
+
+**Why ~10 not 100:** Per-PR budget caps at ~2 min total (§6 Q1). Playwright tests are the most expensive minute-per-confidence in this stack; they pay for themselves on the *golden path* and nothing else. The deep-coverage layer is L5; E2E is *"is the app still alive end to end?"*.
+
+**Tool:** `playwright` (npm; bun installs cleanly). No third-party test runner — Playwright ships its own. Tests live in `e2e/*.spec.ts`. **Not bun:test.** Playwright's runner is purpose-built for browser-driving and integrates with their tracing — don't fight it.
+
+**Cap:** 10 flows. If a new test wants in, an existing one must drop out (or we have a real reason to widen). This is the cheapest discipline available: it forces the suite to remain a smoke pass, not a regression-test dumping ground.
+
+**Coverage target:** See §4.
+
+---
+
+## 3. Tooling — concrete picks per layer
+
+| Layer | Tool | Already in deps? | Install? |
+|---|---|---|---|
+| L0 — migration dry-run | stdlib `testing` + `migrate/v4` | yes | no |
+| L1 — Go unit | stdlib `testing` | yes | no |
+| L2 — Frontend unit | `bun:test` | yes (built into bun) | no |
+| L3 — Frontend DOM | `bun:test` + `happy-dom` | bun yes, happy-dom **new** | `bun add -d happy-dom` (one dep, ~200 KB) |
+| L4 — Service live-DB | stdlib + sqlx | yes | no |
+| L5 — Handler integration | stdlib `net/http/httptest` + sqlx | yes | no |
+| L6 — E2E | `@playwright/test` | **new** | `bun add -d @playwright/test` + `npx playwright install chromium` |
+
+Net new deps: **2** (happy-dom + playwright). Both are mainstream, both have small surface area, both align with bun's ecosystem.
+
+Explicit rejects:
+- ❌ **testify** — current tests read cleanly with stdlib; adding it forces a rename pass nobody wants.
+- ❌ **vitest** — bun's built-in test runner is faster and the tests are already in `bun:test` shape.
+- ❌ **dockertest / testcontainers-go** — m's preference is real-DB tests against the existing Postgres; spinning ephemeral Docker Postgres per package run adds latency and surface area for marginal isolation gain. See Q3.
+- ❌ **sqlmock / gomock for DB** — banned by §0 lesson 1.
+- ❌ **cypress** — Playwright is the better tool today, and the team's existing skill (`/mai-tester`) already uses it.
+
+### 3.1 Per-PR run-time budget
+
+Target (subject to m's call in Q1): **≤ 90 s for the gating tier (L0+L1+L2+L4 subset+L5 happy-path)**, ≤ 4 min for the full suite (add L3+L4 full+L6). The gating tier blocks merge; the full suite blocks deploy.
+
+Indicative times (estimated, validate when slice 1 lands):
+
+| Tier | Layers | Est. time | Blocks |
+|---|---|---|---|
+| **Gate (every PR)** | L0 + L1 + L2 + L5 happy-path + L4 critical | 60–90 s | merge |
+| **Full (every merge to main)** | + L4 full + L3 + L6 | 3–4 min | deploy |
+
+### 3.2 CI — proposal, not commitment
+
+paliad has no CI today. Two routes:
+
+- **Gitea Actions** (m's stack already runs `mgit.msbls.de`). Self-hosted; same auth model as the rest of mAi. Adds a `.gitea/workflows/test.yml`. Postgres comes from a service container.
+- **Stay click-deploy.** No CI. Workers run tests locally; Dokploy auto-deploys on green-main convention.
+
+Recommendation: **Gitea Actions for the gate tier only** (L0 + L1 + L2), driven by a single short workflow. The L3-L6 expansion can be a follow-up once the gate tier proves stable. Deferred to Q2 for m's call.
+
+### 3.3 Test DB — live YouPC vs ephemeral
+
+The `paliad` schema lives on the shared YouPC Postgres (port 11833). Three options:
+
+| Option | Pros | Cons |
+|---|---|---|
+| **Per-developer separate DB on YouPC** (`TEST_DATABASE_URL` per laptop) | Closest to prod; existing pattern. | Cleanup discipline matters; cross-developer contention possible. |
+| **Ephemeral docker postgres per CI run** | Full isolation; parallel-safe; reset for free. | New infra; ~5 s container startup per CI invocation. |
+| **Dedicated test DB on a paliad-only Postgres** | Isolated; cheap. | New infra to maintain. |
+
+Recommendation: **option 1 for developers (no-op change), option 2 for CI** (Gitea Actions postgres service container). Deferred to Q3 for m's call.
+
+### 3.4 Coverage targets
+
+Don't gate on percentage. Gate on critical-path coverage (§4). Add `go test -coverprofile=` output to CI for visibility, not as a merge gate. Coverage % gating produces tests-for-tests'-sake; we want the tests that catch the bugs we've shipped.
+
+---
+
+## 4. Critical journeys — what MUST be covered
+
+These are the golden-path flows. Anything not on this list is L1-L5 territory, not L6. The list is intentionally short; if it grows beyond 10, we are doing E2E wrong.
+
+| # | Flow | Why it's critical | Layer mix |
+|---|---|---|---|
+| 1 | **Login → dashboard renders → traffic-light counts match** | Every user does this every day; broken auth = paliad is offline. | L6 (Playwright) + L5 handler (auth.go) |
+| 2 | **Create project (Client → Litigation → Patent → Case)** | Hierarchy with team inheritance — the data model's spine. | L6 + L5 + L4 (project_service) |
+| 3 | **Submit deadline → routes to /inbox → approver approves → state flips** | The 4-eye flow (t-paliad-138). Most-mutated paliad surface. | L6 + L5 (deadlines, approvals) + L4 (approval_service) |
+| 4 | **Fristenrechner: pick proceeding → cascade fires → result shows** | The platform's flagship interactive tool. JS cascade. | L6 + L3 (fristenrechner cascade) + L4 (fristenrechner) |
+| 5 | **SmartTimeline: anchor a projected row → predecessor-missing-error handled** | Recent Slice-2 work (t-paliad-173 / #31). High-touch surface. | L6 + L3 (shape-timeline) + L4 (projection_service) |
+| 6 | **CalDAV sync: PUT a Termin → external client sees it, edits there → pull reconciles** | Owned-event semantics + foreign-UID skip rule from Phase F. Untested today. | L4 (caldav_service push/pull) — gated on Q3 (live YouPC vs ephemeral) |
+| 7 | **Paliadin chat: anon visit hits 404; m's session opens widget; turn renders** | Owner-gated `/paliadin` is the only m-only surface. Quiet failures here are silent. | L6 (smoke) + L5 (paliadin_suggest) + L4 (paliadin / aichat_paliadin) |
+| 8 | **/admin/rules: filter → edit one rule → lifecycle transition → audit log row** | Rules drive the cascade; bad edits break every user's fristenrechner. | L6 + L5 (admin_rules) + L4 (rule_editor_service) |
+| 9 | **Onboarding: new user with allowed email → onboarding form → first project membership** | The new-user funnel; gateOnboarded middleware traps. | L6 + L5 (onboarding, invite) |
+| 10 | **Migration boot smoke: spin paliad against an empty DB → server binds 8080** | Catches every mig-N crash-loop. | L0 (migration dry-run) + L4 boot-smoke variant |
+
+Picks 1, 3, 4 and 10 are the highest-value-per-cost — they cover the routes most regressions land on (auth, mutation, cascade, boot).
+
+---
+
+## 5. Slice plan — tracer-bullet roll-out
+
+Each slice is a shippable PR with a concrete deliverable, in order of expected outage-prevention payoff. Sized for a single coder shift unless flagged. No slice depends on a later one being merged. Hour estimates intentionally omitted (per global CLAUDE.md).
+
+### Slice 1 — Migration dry-run harness + boot smoke (highest leverage)
+
+**Branch:** `mai/<coder>/test-strategy-slice-1-migrations`
+
+**Deliverable:**
+- `internal/db/migrate_test.go` — `TestMigrations_DryRun` (per-mig BEGIN/ROLLBACK), `TestMigrations_EndToEnd` (full apply, then re-apply latest to assert idempotency), `TestMigrations_Down` (apply N→0).
+- `Makefile` with `make verify-migrations` (the gate target), `make test` (run everything), `make test-go`, `make test-frontend`.
+- `cmd/server/main_paliadin_backend_test.go` already exists; extend with a `TestMain_BindsHTTPAfterMigrate` that boots the full server against `TEST_DATABASE_URL`, asserts `:8080` is listening, then shuts down. Catches the mig-098-class crash-loop in a single test.
+- README section: how to set `TEST_DATABASE_URL` locally.
+
+**Catches:** Every mig-98-class crash-loop; every drop-cascade-with-stale-policy-name regression (t-paliad-036).
+
+### Slice 2 — Service-layer infill: critical mutators
+
+**Branch:** `mai/<coder>/test-strategy-slice-2-services`
+
+**Deliverable:**
+- Test files for the three highest-impact untested services:
+  - `internal/services/agenda_service_test.go` (live-DB, dashboard agenda query)
+  - `internal/services/dashboard_service_test.go` (traffic-light counts)
+  - `internal/services/team_service_test.go` (membership + inheritance — RLS-load-bearing)
+- Tighten existing `approval_service_test.go` + `deadline_service_test.go` coverage of the create/update/complete/delete × pending-request matrix where there are demonstrable gaps.
+- Add `internal/services/internal/testdb/withtx.go` — the per-test tx harness (optional adoption; existing tests stay).
+
+**Catches:** RLS regressions, approval interplay regressions, dashboard count drift after schema renames.
+
+### Slice 3 — Frontend bun:test setup + L2 infill
+
+**Branch:** `mai/<coder>/test-strategy-slice-3-frontend-unit`
+
+**Deliverable:**
+- `frontend/package.json` `scripts.test = "bun test"`.
+- New tests under `frontend/src/client/`:
+  - `paliadin-context.test.ts` (route table, entity extraction, selection truncation).
+  - `paliadin-starters.test.ts` (every route ≥1 starter, every starter bilingual).
+  - `filter-bar/index.test.ts` (chip render + active state — pure DOM-less helpers).
+  - i18n key audit: `frontend/scripts/i18n-audit.test.ts` parses every `data-i18n="…"` from `dist/` HTML and every `t("…")` call from `src/`, asserts both `de` and `en` resolve. Runs as part of `bun test`.
+- `make test-frontend` wires `cd frontend && bun test`.
+
+**Catches:** i18n drift (untranslated key shipped to user), context-envelope contract drift (paliadin SKILL.md depends on it), starter-registry regressions.
+
+### Slice 4 — Playwright golden-path smoke
+
+**Branch:** `mai/<coder>/test-strategy-slice-4-e2e`
+
+**Deliverable:**
+- `e2e/` directory at repo root.
+- `playwright.config.ts` pointing at `http://localhost:8080` (paliad started by the test, not assumed).
+- Five Playwright `*.spec.ts` files covering critical journeys 1, 3, 4, 7, 9 from §4.
+- `make e2e` target that:
+  1. starts paliad against `TEST_DATABASE_URL`,
+  2. waits for `:8080` to be live,
+  3. runs `npx playwright test`,
+  4. tears the server down.
+- `bun add -d @playwright/test` + `npx playwright install chromium`.
+
+**Catches:** Auth regressions, deadline-mutation regressions, fristenrechner cascade regressions, owner-gated /paliadin leaks, onboarding-gate misbehaviour.
+
+### Slice 5 — Handler integration tests for the 5 most-touched routes
+
+**Branch:** `mai/<coder>/test-strategy-slice-5-handlers`
+
+**Deliverable:**
+- `internal/handlers/auth_test.go` extended with `TestLogin_HappyPath` + `TestLogout_ClearsCookie` (real DB).
+- `internal/handlers/projects_test.go` — `TestProjectsCreate` (POST 200, row inserted, audit emitted), `TestProjectsGetByID_RespectsVisibility` (404 for non-member).
+- `internal/handlers/deadlines_test.go` — `TestDeadlinesCreate_TriggersApproval` (verifies pending pill).
+- `internal/handlers/appointments_test.go` — same shape.
+- `internal/handlers/paliadin_test.go` — `TestPaliadinPage_404ForNonOwner`, `TestPaliadinPage_200ForOwner`.
+- Shared `internal/handlers/testauth/testauth.go` — mints a session cookie for `userID` so handler tests don't reinvent auth seeding.
+
+**Catches:** Handler ↔ service wiring drift, visibility-predicate handler-side bugs (t-paliad-036 bug 2 was exactly this), owner-gate bypass.
+
+### Slice 6 — Frontend L3 (DOM) cascade tests
+
+**Branch:** `mai/<coder>/test-strategy-slice-6-frontend-dom`
+
+**Deliverable:**
+- `bun add -d happy-dom`.
+- DOM-driven tests for the three most-touched cascades:
+  - `client/fristenrechner.test.ts` (cascade activate → row appears → date-set fires fetch).
+  - `client/shape-timeline.test.ts` (lane render, track render, projected-row click).
+  - `client/filter-bar/index.test.ts` (chip click toggles state, URL params update).
+
+**Catches:** The whole class of "the function exists and is unit-tested but the cascade in the browser doesn't fire it" bugs. This is the layer that catches t-paliad-098 / 099 / 102 / 103.
+
+### Slice 7 — CI wiring (deferred — Q2 dependent)
+
+**Branch:** `mai/<coder>/test-strategy-slice-7-ci` (gated on m's Q2 pick)
+
+**Deliverable:**
+- `.gitea/workflows/test.yml` (or stay click-deploy if m picks that).
+- Gate tier runs on every PR; full suite runs on merge to main.
+- Postgres service container provides `TEST_DATABASE_URL`.
+- Slack/Gotify ping on red main.
+
+**Catches:** Drift between "tests pass on my laptop" and prod reality.
+
+### Slice 8 — Coverage reporting + dashboard (lowest priority)
+
+**Branch:** `mai/<coder>/test-strategy-slice-8-coverage`
+
+**Deliverable:**
+- `go test -coverprofile=` aggregated into a single `coverage.html`.
+- Bun's coverage output similarly.
+- A `docs/coverage.md` index updated by CI.
+- **Not a merge gate.** Visibility only.
+
+**Catches:** Slow drift; nice-to-have once the floor is in.
+
+### Slice order rationale
+
+1, 4, 5 are the highest outage-prevention per LoC: migration dry-run kills crash-loops, E2E kills regressions, handler tests kill wiring drift. 2, 3, 6 widen the floor; 7-8 are infrastructure.
+
+---
+
+## 6. Open questions for m
+
+These need m's call before any coder shift starts (or before specific slices start, where noted).
+
+### Q1 — Per-PR test-run budget
+
+How long is acceptable to wait on the gate tier before merge?
+
+- 30 s — only L0 + L1 (no L2+ on the gate).
+- **60–90 s (recommended)** — L0 + L1 + L2 + L5 happy-path + L4 critical.
+- 2 min — add L3 + L4 full.
+- 4+ min — add L6 (E2E on gate).
+
+The pick determines whether E2E gates merge or only deploy.
+
+### Q2 — CI infrastructure
+
+- **Gitea Actions** (self-hosted, gate tier only, recommended) — minimal new infra; aligns with m's existing stack.
+- **Stay click-deploy** — workers run tests locally; merge discipline enforced by convention. Today's reality; we keep it.
+- **Both:** start with click-deploy, add Gitea Actions in Slice 7 once gate tier proves stable.
+
+### Q3 — Live-DB vs ephemeral docker Postgres for tests
+
+- **Per-developer YouPC DB (current pattern)** — closest to prod; existing tests work unchanged.
+- **Ephemeral docker postgres in CI, YouPC for devs (recommended hybrid)** — keeps local-dev simple, gives CI deterministic isolation.
+- **YouPC everywhere** — simplest, but parallel CI runs would contend.
+
+### Q4 — Coverage targets — % or critical-path?
+
+- **Critical-path only (recommended)** — §4's 10 flows + every state-mutating service method has a test. No % gate.
+- **% gate** — set a floor (e.g. 60 % lines, 50 % branches) and refuse merges below it.
+- **Both** — critical-path is mandatory, % is informational.
+
+m's prior preference (memory pattern: "tests that catch real bugs > coverage theatre") points at critical-path-only. Confirming.
+
+### Q5 — Which slices land before paliad is "production-grade"?
+
+paliad is already live at `paliad.de` and being used by HLC colleagues. "Production-grade" here means "next time someone ships, we don't go down."
+
+Picks:
+- **Slices 1 + 4 + 5 are the production-grade floor (recommended).** Migration dry-run + golden-path E2E + handler integration tests cover the failure modes that hit prod since the rebrand.
+- Add Slice 2 + 3 + 6 as widening passes, on their own cadence.
+- Slice 7-8 are nice-to-haves.
+
+Confirming the floor pick — and whether m wants all three to land before any new feature work, or whether they roll out alongside.
+
+### Q6 — Who owns each slice?
+
+Recommendation: rotate coder slots so the same person isn't on every slice. Suggested assignment (head can override):
+
+| Slice | Profile fit |
+|---|---|
+| 1 — migrations | Backend-heavy coder (knuth, gauss, cronus). |
+| 2 — service infill | Backend-heavy coder; whoever owns approval/projection. |
+| 3 — frontend unit | Frontend-heavy coder. |
+| 4 — Playwright E2E | Cross-stack coder; ideally one familiar with `/mai-tester`. |
+| 5 — handler integration | Backend coder. |
+| 6 — frontend DOM | Frontend coder (same person as 3 makes sense). |
+
+Inventor does **not** decide assignments; head + m do.
+
+---
+
+## 7. Out of scope (explicit)
+
+- **No rewrite of any existing test.** The 323 existing test functions stay. New tests use the new patterns; old tests are migrated only when their files are touched for unrelated reasons.
+- **No third-party framework where stdlib + bun:test suffice** (testify, vitest, etc. — see §3).
+- **No mocks at the service↔DB boundary.** This is the lock-in. Mocks lie; the live-DB tests we already have are paliad's most useful safety net.
+- **No new feature work in this strategy.** The doc proposes infra; feature scope is unchanged.
+- **No retirement of the `tests/smoke-*.md` human-written reports.** Those are great for one-shot regression hunts; they coexist with the automated suite.
+
+---
+
+## 8. Implementation notes for the eventual coder
+
+(For whichever coder picks up a slice. Not exhaustive.)
+
+- **Test-name collisions in Go's flat package namespace bite when a service grows N implementations.** Memory note from `t-paliad-194` already records this. Prefix tests with the service name (e.g. `TestAichatPaliadin_RunTurn_…` not `TestRunTurn_…`).
+- **`httptest.NewRequest` does not URL-encode** — use `url.QueryEscape` for any `?q=…` argument. Memory note from `t-paliad-026`.
+- **sqlx v1.4.0 `Named` parser strips one colon from `::uuid[]`** — known pitfall, repro lives at `internal/services/project_service.go`. Use `CAST(... AS uuid[])` in new query strings.
+- **Live-DB cleanup must DELETE FKs first.** Order matters (auth.users last). Look at `audit_service_test.go` for the chain pattern.
+- **`paliad.paliad_schema_migrations` tracker collision** is documented but unresolved. Slice 1 should add a `make reset-test-db` target that drops both `public.paliad_schema_migrations` *and* `paliad.paliad_schema_migrations` to keep developers unblocked.
+- **`bun:test` matchers are Jest-compatible** — `expect().toEqual()`, `expect().toHaveBeenCalled()`, etc. No deps needed.
+- **happy-dom does not implement** every DOM method (notably some `<dialog>` semantics). If a cascade test fails on something missing, jsdom is the escape hatch.
+
+---
+
+## 9. Decision summary — pick list for m
+
+| # | Question | Inventor recommends |
+|---|---|---|
+| Q1 | Per-PR budget | 60–90 s gate, 3–4 min full |
+| Q2 | CI infra | Gitea Actions, gate tier only |
+| Q3 | Test DB | YouPC for devs, ephemeral docker for CI |
+| Q4 | Coverage target | Critical-path only, no % gate |
+| Q5 | Production-grade floor | Slices 1 + 4 + 5 before new feature work |
+| Q6 | Slice ownership | Rotate per profile; head decides |
+
+If m's calls match inventor's, the implementer's brief writes itself: Slice 1 first, then 4 + 5 in parallel, then 2/3/6 as widening passes.
+
+---
+
+**Status:** DESIGN READY FOR REVIEW. Awaiting m go/no-go on §5 slice plan + §6 open questions before any coder shift starts.