# Design — Paliad Test Strategy (production-grade)

**Author:** mendel (inventor)
**Date:** 2026-05-19
**Task:** t-paliad-213
**Branch:** `mai/mendel/inventor-test-strategy`
**Status:** DESIGN READY FOR REVIEW. No test files / Make targets / CI configs touched. Awaiting m go/no-go on §5 slice plan + §6 open questions before any coder shift.

---

## 0. TL;DR

Paliad has accidental test discipline today: 59 `_test.go` files / 323 test functions in Go (≈45 % of services tested, ≈12 % of handlers tested) and 4 frontend test files for 90+ client modules (≈4 %). There is no committed end-to-end suite and no CI — every smoke pass is human-driven via the manual reports in `tests/`. The `mig 098` prod crash-loop, the `t-paliad-036` triple-bug after the German→English rename, and a long tail of UX regressions (deadline-done modal, calendar column drift) would all have been caught by a 10-test boot-and-click smoke pass.

This design proposes a six-layer test pyramid with a concrete tool per layer (stdlib `testing` + bun's built-in `bun:test` + `playwright` for E2E — nothing third-party we don't already use). It pins three lessons paliad has paid for in commits:

1. **No mocks at the service↔DB boundary.** Live-DB tests against a per-developer Postgres are the floor; in-memory mocks for `paliad.*` would have hidden every rename-after-DROP-CASCADE bug. Project preference is already in this direction (27/44 service tests are live-DB-gated); we double down rather than reverse.
2. **Migrations must dry-run before they merge.** Every recent prod-down (mig 098, mig 020-after-rename, mig 099 audit_reason gap) was a migration that compiled, passed `go test ./...` (which skips without `TEST_DATABASE_URL`), and broke on first apply against the real schema. A `make verify-migrations` target that does BEGIN/apply/ROLLBACK in CI fixes the entire failure mode.
3. **Browser-shaped bugs need a browser.** The fristenrechner cascade, shape-timeline render, calendar grid, inline paliadin widget — these are JS state machines. Bun's stdlib `bun:test` covers the pure parser/codec code; Playwright covers the auth-gated DOM. Don't try to substitute one for the other.

Six slices roll the strategy out as tracer-bullet PRs, each independently shippable. Slice 1 (migration dry-run harness) and Slice 4 (Playwright golden-path smoke) buy the most outage-prevention per LoC; the rest is widening proven patterns.

Six open questions for m at §6. Most surface a coverage-vs-cost trade-off — the picks that need m's call before any code lands are CI infrastructure choice (Q2), per-PR run-time budget (Q1), and live-DB-vs-dockerised Postgres (Q3).

---

## 1. Audit — what exists today

Counts taken on `mai/mendel/inventor-test-strategy` @ HEAD (2026-05-19, 100 migrations applied).

### 1.1 Go test inventory

| Package | Source files | Test files | Test functions | Notes |
|---|---|---|---|---|
| `internal/services` | 56 | 44 | ~200 | 26 live-DB-gated (`TEST_DATABASE_URL`), 18 pure-Go. 24 services have **no test file at all** — see §1.4. |
| `internal/handlers` | 59 | 7 | ~30 | Only auth-domain check, search, audit-parse, approval-error-mapping, redirects, verfahrensablauf-redirect, chart-404 covered. **53 handlers have no test file.** |
| `internal/auth` | small | 2 | ~10 | Session middleware + require-admin. |
| `internal/branding` | small | 1 | small | Firm-name override. |
| `internal/offices` | small | 1 | small | Office enum. |
| `internal/changelog` | small | 1 | small | Pure parser. |
| `internal/calc` | small | 1 | small | Fees / fee tables. |
| `cmd/server` | 1 | 1 | small | `main_paliadin_backend_test.go` covers env-gate selection. |
| **Total** | **133** | **58** | **323** | |

`go test ./...` runs all 58 files. Without `TEST_DATABASE_URL` set, 27 of them silently skip their live-DB cases — the suite still passes, but coverage of mutation paths drops to near zero.

### 1.2 Frontend test inventory

| Path | Test files | Tested |
|---|---|---|
| `frontend/src/client/filter-bar/url-codec.test.ts` | 1 | FilterBar URL codec round-trip. |
| `frontend/src/client/views/format.test.ts` | 1 | Date/time formatters (regression for t-paliad-153). |
| `frontend/src/client/views/shape-timeline-chart.test.ts` | 1 | Chart layout pure function. |
| `frontend/src/client/views/shape-timeline-cv.test.ts` | 1 | Continuous-view shape layout. |
| **Total** | **4** | Out of ~90 client modules (`frontend/src/client/*.ts`). |

All four use bun's built-in `bun:test` (no extra dep). No DOM/jsdom tests. No Playwright. No `bun test` script in `package.json` (`bun run build` is the only script).

### 1.3 End-to-end / smoke

- `tests/smoke-2026-04-25.md`, `tests/smoke-auth-2026-04-25.md`, `tests/smoke-auth-2026-04-26-cleanup.md` — human-written reports with screenshots committed under `tests/screenshots-*`. No code. No re-runnable script.
- `mai-tester` skill uses Playwright for ad-hoc runs; nothing committed.
- No `e2e/`, no `.gitea/workflows/`, no `.github/workflows/`, no `Makefile`.

### 1.4 Critical service paths with no test file

These are `internal/services/*.go` for which no `*_test.go` sibling exists:

| Service | Risk class | Why it matters |
|---|---|---|
| `caldav_service.go`, `caldav_client.go`, `caldav_crypto.go`, `caldav_ical.go` | High | Per-user push/pull goroutines + AES-GCM at rest. One pure parser test (`caldav_ical_timeline_test.go`) exists but the service + crypto + WebDAV client are blind. |
| `agenda_service.go` | High | Dashboard agenda query; reused by `/agenda` page. Exercised transitively by visibility tests but no direct test. |
| `dashboard_service.go` | High | Traffic-light + summary counts. Same story — transitively covered via visibility, no direct test. |
| `derivation_service.go` | Medium | Project-tree derivation (the new t-paliad-194-era subtree machinery). |
| `team_service.go` | Medium | Team membership / inheritance. |
| `partner_unit_service.go` | Medium | Dezernat replacement (t-paliad-070). |
| `party_service.go`, `note_service.go`, `link_service.go`, `checklist_instance_service.go` | Medium | All do project-scoped CRUD with the same RLS+audit pattern that `t-paliad-036` proved easy to break. |
| `appointment_service.go` | High | Hot — every calendar mutation. Exercised through approval tests but has no own test file. |
| `view_service.go` | Medium | Powers the substrate (`/views/*`). |
| `paliadin_jwt.go` | Medium | Per-turn JWT mint for the aichat path (`t-paliad-194`). No call sites in tests today. |
| `markdown.go` | Low | Glossary + checklist content render. |

### 1.5 Handlers with no test file

53 of 59. Notably: **`auth.go` itself** (login / logout / session creation), **`projects.go`** (the most-mutated entity), **`deadlines.go` / `appointments.go`** (writes), **`paliadin.go` / `paliadin_suggest.go`** (m-only routes — never click-tested), **`fristenrechner.go` / `fristenrechner_search.go` / `fristenrechner_event_categories.go`** (the cascade users live in), **`dashboard.go` / `agenda.go`** (landing), **`onboarding.go` / `onboarding_gate.go`** (every new user's first three minutes), **`invite.go`** (rate-limited write path). The currently-tested handlers (search, audit-parse, approval error mapping, etc.) are the cheap pure-Go ones; every handler that touches the DB is untested at handler level.

### 1.6 Live-DB test scaffold — is it sound?

The pattern (read from `internal/services/visibility_test.go`):

```go
url := os.Getenv("TEST_DATABASE_URL")
if url == "" { t.Skip("TEST_DATABASE_URL not set — skipping live DB test") }
if err := db.ApplyMigrations(url); err != nil { t.Fatalf(...) }
pool, _ := sqlx.Connect("postgres", url)
defer pool.Close()
// per-test seed + cleanup via DELETE + defer cleanup()
```

Verdict: **sound, but has rough edges that need addressing before we widen.**

- ✅ Migrations apply at test startup against the test DB — catches every "you forgot to add a CHECK" / "you reference a column that doesn't exist" before a real-DB-touching test runs.
- ✅ Per-test cleanup via `DELETE FROM ... WHERE id IN ($1,...)` is explicit and idempotent.
- ✅ The `paliad.paliad_schema_migrations` tracker collision noted in memory `0b900afa…` is a pre-existing issue, not introduced by this design.
- ⚠️ Cleanup-via-DELETE is fragile: a test that creates a row referenced by FK from another table needs to remember to clean both. A few existing tests (see `audit_service_test.go`) already chain 5+ DELETEs.
- ⚠️ Tests can't run in parallel against the same `TEST_DATABASE_URL` because they share schema state. `go test ./...` defaults to `-parallel` per-package; same-package tests with overlapping cleanup IDs can interfere.
- ⚠️ No CI today actually exercises `TEST_DATABASE_URL` — so every live-DB test is effectively run only on the author's laptop or not at all. Half the value is paid-for but unbilled.

### 1.7 Migration tooling

- `internal/db/migrate.go` embeds `migrations/*.sql` and applies on server boot via `golang-migrate/v4` with the `paliad_schema_migrations` tracker in `public` schema.
- 100 migrations on disk (`001` → `100`).
- **No dry-run gate today.** A bad migration breaks `paliad.de` at boot (Dokploy crash-loops the container). Recent prod incidents: mig 098 (submission code rename), mig 099 (with_po flag drop missed audit_reason gap), mig 020 (function rename without body rewrite — see memory `49a05cfa…`).
- `down.sql` exists for every migration but no test ever exercises it.

### 1.8 CI / deploy loop

- No CI. Push-to-main → Gitea webhook → Dokploy auto-builds the Dockerfile and replaces the container. The Dockerfile runs `bun run build` then `go build`. **Neither `go test` nor `bun test` runs in the build pipeline.**
- Pre-commit hooks: none in repo. Each worker runs `go build / go vet / go test / bun run build` by convention (see memories — every shipped task report ends with "build hygiene held").

---

## 2. Test pyramid — recommended shape

```
                           ┌─────────────────┐
                           │  E2E (Playwright)│  ~10 flows
                           │  L6              │
                           └─────────────────┘
                       ┌─────────────────────────┐
                       │  Handler integration    │  ~30 routes
                       │  L5 (httptest + real DB)│
                       └─────────────────────────┘
                  ┌──────────────────────────────────┐
                  │  Service-layer (live DB)         │  ~60 tests
                  │  L4 (BEGIN/ROLLBACK harness)     │
                  └──────────────────────────────────┘
              ┌──────────────────────────────────────────┐
              │  Frontend DOM / cascade (bun:test+jsdom) │  ~15 modules
              │  L3                                      │
              └──────────────────────────────────────────┘
        ┌──────────────────────────────────────────────────────┐
        │  Frontend unit (bun:test pure TS)                    │  ~30 modules
        │  L2                                                   │
        └──────────────────────────────────────────────────────┘
   ┌──────────────────────────────────────────────────────────────┐
   │  Go unit (stdlib testing, table-driven, pure functions)      │  ~150 tests
   │  L1                                                          │
   └──────────────────────────────────────────────────────────────┘
   ┌──────────────────────────────────────────────────────────────┐
   │  Migration dry-run (make verify-migrations)                  │  100 mig
   │  L0 — gate on every PR                                       │
   └──────────────────────────────────────────────────────────────┘
```

### Layer 0 — Migration dry-run

**What:** Every `*.up.sql` in `internal/db/migrations/` is applied inside a single `BEGIN ... ROLLBACK` transaction against a scratch Postgres, in numeric order. The harness asserts each statement succeeds *and* asserts no statement leaves the schema in a `paliad_schema_migrations.dirty=true` state. A second pass applies all up-migrations end-to-end (no rollback) and then re-applies the latest up-migration to assert idempotency (every paliad migration since `t-paliad-070` has been written to be idempotent — this enforces it).

**Tool:** stdlib `testing` package, no third-party. Pattern: `internal/db/migrate_test.go` with a `TestMigrations_DryRun` driven from `TEST_DATABASE_URL`. A `make verify-migrations` target wraps it.

**Why this layer matters most:** Every recent prod-down was a migration. Catching them on a CI run before merge is the highest-leverage test investment paliad can make. Cost: one ~100-line Go file + one Postgres in CI.

**Coverage target:** 100 % of `*.up.sql` files. Hard gate on PR — no exceptions.

### Layer 1 — Go unit (pure)

**What:** `go test ./...` against pure functions — formatters, parsers, validators, calculators, fee tables, deadline calculators, projection lookahead clamping, codec round-trips. No DB, no HTTP.

**Tool:** stdlib `testing`. Table-driven `cases := []struct{...}{...}` style is already the house pattern (see `auth_test.go` / `projection_anchor_test.go`). **Do not introduce testify or any matcher library** — the current code reads cleanly without one, and 323 existing test functions don't need a rename pass.

**What's already there:** 19 pure-Go test files (calculator, mapping, codec, holiday, fees, etc.). Density is good; targeted infill rather than re-architecture.

**Coverage target:** Every pure function in `internal/services/`, `internal/handlers/`, `internal/calc/`, `internal/changelog/`. Aim for "every branch in a decision table has at least one test row." Don't chase % — chase "the obvious edge that would burn a coworker".

### Layer 2 — Frontend unit (pure)

**What:** `bun test` against pure TS modules — URL codecs (`filter-bar/url-codec`), formatters, parsers, i18n key correctness (every `data-i18n` attribute used in TSX has a key in `i18n.ts`), view-spec parsers, projection-row mapping helpers.

**Tool:** `bun:test` (built into bun, no install). Already in use in 4 files — extend the same pattern. Add `bun test` to `package.json` `scripts`.

**What to add:**
- i18n key audit (every `t("foo.bar")` and `data-i18n="foo.bar"` resolves in both `de` and `en`).
- `filter-bar/` types + render helpers (paliad has shipped 4 FilterBar slices; coverage is one codec test).
- `paliadin-context.ts` route table + entity extraction (the `[ctx …]` envelope is a stable contract paliadin's SKILL.md depends on; any drift here is a silent failure).
- `paliadin-starters.ts` registry — every route maps to ≥1 starter; every starter is bilingual.
- View-spec parsers in `views/`.

**Coverage target:** Every pure TS module in `frontend/src/client/`. Pages (TSX renderers) are E2E concern, not unit concern.

### Layer 3 — Frontend DOM (cascade / jsdom)

**What:** `bun test` with jsdom global, exercising the interactive cascade modules — the fristenrechner cascade builder, the shape-timeline render, the FilterBar UI (chips, panels), the calendar grid, the inline Paliadin widget message stream, the inbox-row click handler, the dashboard activity item navigation.

These modules contain enough state that pure-function tests miss real bugs (e.g. the t-paliad-098 `.entity-table` row-cursor lie was a CSS+DOM bug; t-paliad-099's modal close was a DOM-event bug; t-paliad-103's `::before` overlay click-swallow was a DOM bug).

**Tool:** bun + `happy-dom` is the lighter choice; if it can't handle event ordering, fall back to `jsdom`. Both are ESM-clean and bun-friendly. **Pick one and stick with it — running both means twice the dependency surface.** Default pick: `happy-dom` (smaller, paliad doesn't need legacy IE semantics).

**Pattern:** import the cascade module, build a minimal DOM (`document.body.innerHTML = …`), dispatch synthetic events, assert resulting state. Reuses the production renderers — no test-only fakes.

**Coverage target:** ~15 modules. Specifically:
- `client/filter-bar/index.ts` chip render + active-state.
- `client/fristenrechner.ts` cascade — most complex JS in the codebase; depend chains light up every UPC bug we know.
- `client/shape-timeline.ts` lane mode + track mode (envelope wire shape brittle to refactor).
- `client/projects-detail.ts` row click + Verlauf render.
- `client/paliadin-widget.ts` + `paliadin-context.ts` interaction.
- `client/inbox.ts` row-action click routing.
- `client/dashboard.ts` activity-item nav.
- `client/deadlines-calendar.ts` / `appointments-calendar.ts` column layout (the calendar-column-drift bug class).

Not unit tests; not E2E. They are the missing middle.

### Layer 4 — Service-layer (live DB)

**What:** Go service methods against a real Postgres, using the existing `TEST_DATABASE_URL` pattern. Two improvements:

1. **Replace per-test DELETE cleanup with a per-test transaction harness** — open a transaction, run the test inside it, ROLLBACK. Faster, isolating, no cleanup forgotten. Already viable because the service layer accepts `*sqlx.DB`-or-tx-shaped interfaces in many places; needs a small `internal/services/internal/testdb` package that exposes `WithTx(t *testing.T, fn func(*sqlx.Tx))`. Migration is mechanical, can happen alongside infill.

   *Caveat:* some service methods open their own transactions internally (`approval_service.submit` is one). Those keep DELETE cleanup; the tx harness is a default, not a mandate.

2. **Make `TEST_DATABASE_URL` mandatory in CI.** Today these tests are skipped on every machine that doesn't `export TEST_DATABASE_URL=…` — i.e. they don't run on autoatic pipelines because there's no pipeline. Once CI exists (§3.5), it becomes a required env var.

**Tool:** stdlib `testing` + `sqlx` (already in `go.mod`). **No mocks at the service↔DB boundary.** This is m's hardest line — see global CLAUDE.md memory pattern and `t-paliad-036` (the bug that masked two other bugs would have been caught instantly by a real-DB test).

**Where to invest first:** Approval (already heavy), Projection (already heavy), Fristenrechner (already heavy), DeadlineService Create/Update/Complete/Delete with `pending_request_id` interplay, AppointmentService same, ProjectService visibility predicate, CalDAV push (the four CalDAV `*.go` files have zero direct test).

**Coverage target:** Every service method that mutates the DB has at least one happy-path live-DB test. RLS predicate (`visibilityPredicatePositional`) has one test per role (global_admin, member, non-member).

### Layer 5 — Handler integration (httptest + real DB)

**What:** Spin a real `services.DBService`, mount the protected mux, drive `httptest.NewRequest` + `ServeHTTP` against it. Auth via a fake session cookie produced by a `testauth.Login(t, userID)` helper that mints the same Supabase JWT shape `auth.UserIDFromContext` expects.

**Why:** The 53 untested handlers are where the request shape ↔ service interaction lives. Examples that would have caught real bugs:
- `t-paliad-036`'s "`/projects/{id}` 404 while `/api/projects/{id}` 200" mismatch — a 5-line handler test would have failed before the migration ran.
- mig 020's three-stacked bug — a handler test that POSTs a deadline and asserts a 200 + read-back row would have failed at submit-time, not boot-time.
- The audit-log query timezone bug — handler test asserts the JSON contains the expected `event_date`.

**Tool:** stdlib `net/http/httptest`. **No new framework.** Pattern: handler tests live next to the handler file (`internal/handlers/deadlines_test.go` next to `deadlines.go`).

**Coverage target:** Every handler that gates a state-changing route — `POST/PATCH/DELETE` flavour. Plus `GET` handlers that compose a non-trivial query (dashboard, agenda, search, audit-log).

### Layer 6 — End-to-end (Playwright)

**What:** A small Playwright suite (~10 flows) committed at `e2e/` with a `bun run e2e` entry. Targets a local `./paliad` against a scratch Postgres (the same `TEST_DATABASE_URL`). Each test logs in, drives the UI through one user journey, asserts visible state.

**Why ~10 not 100:** Per-PR budget caps at ~2 min total (§6 Q1). Playwright tests are the most expensive minute-per-confidence in this stack; they pay for themselves on the *golden path* and nothing else. The deep-coverage layer is L5; E2E is *"is the app still alive end to end?"*.

**Tool:** `playwright` (npm; bun installs cleanly). No third-party test runner — Playwright ships its own. Tests live in `e2e/*.spec.ts`. **Not bun:test.** Playwright's runner is purpose-built for browser-driving and integrates with their tracing — don't fight it.

**Cap:** 10 flows. If a new test wants in, an existing one must drop out (or we have a real reason to widen). This is the cheapest discipline available: it forces the suite to remain a smoke pass, not a regression-test dumping ground.

**Coverage target:** See §4.

---

## 3. Tooling — concrete picks per layer

| Layer | Tool | Already in deps? | Install? |
|---|---|---|---|
| L0 — migration dry-run | stdlib `testing` + `migrate/v4` | yes | no |
| L1 — Go unit | stdlib `testing` | yes | no |
| L2 — Frontend unit | `bun:test` | yes (built into bun) | no |
| L3 — Frontend DOM | `bun:test` + `happy-dom` | bun yes, happy-dom **new** | `bun add -d happy-dom` (one dep, ~200 KB) |
| L4 — Service live-DB | stdlib + sqlx | yes | no |
| L5 — Handler integration | stdlib `net/http/httptest` + sqlx | yes | no |
| L6 — E2E | `@playwright/test` | **new** | `bun add -d @playwright/test` + `npx playwright install chromium` |

Net new deps: **2** (happy-dom + playwright). Both are mainstream, both have small surface area, both align with bun's ecosystem.

Explicit rejects:
- ❌ **testify** — current tests read cleanly with stdlib; adding it forces a rename pass nobody wants.
- ❌ **vitest** — bun's built-in test runner is faster and the tests are already in `bun:test` shape.
- ❌ **dockertest / testcontainers-go** — m's preference is real-DB tests against the existing Postgres; spinning ephemeral Docker Postgres per package run adds latency and surface area for marginal isolation gain. See Q3.
- ❌ **sqlmock / gomock for DB** — banned by §0 lesson 1.
- ❌ **cypress** — Playwright is the better tool today, and the team's existing skill (`/mai-tester`) already uses it.

### 3.1 Per-PR run-time budget

Target (subject to m's call in Q1): **≤ 90 s for the gating tier (L0+L1+L2+L4 subset+L5 happy-path)**, ≤ 4 min for the full suite (add L3+L4 full+L6). The gating tier blocks merge; the full suite blocks deploy.

Indicative times (estimated, validate when slice 1 lands):

| Tier | Layers | Est. time | Blocks |
|---|---|---|---|
| **Gate (every PR)** | L0 + L1 + L2 + L5 happy-path + L4 critical | 60–90 s | merge |
| **Full (every merge to main)** | + L4 full + L3 + L6 | 3–4 min | deploy |

### 3.2 CI — proposal, not commitment

paliad has no CI today. Two routes:

- **Gitea Actions** (m's stack already runs `mgit.msbls.de`). Self-hosted; same auth model as the rest of mAi. Adds a `.gitea/workflows/test.yml`. Postgres comes from a service container.
- **Stay click-deploy.** No CI. Workers run tests locally; Dokploy auto-deploys on green-main convention.

Recommendation: **Gitea Actions for the gate tier only** (L0 + L1 + L2), driven by a single short workflow. The L3-L6 expansion can be a follow-up once the gate tier proves stable. Deferred to Q2 for m's call.

### 3.3 Test DB — live YouPC vs ephemeral

The `paliad` schema lives on the shared YouPC Postgres (port 11833). Three options:

| Option | Pros | Cons |
|---|---|---|
| **Per-developer separate DB on YouPC** (`TEST_DATABASE_URL` per laptop) | Closest to prod; existing pattern. | Cleanup discipline matters; cross-developer contention possible. |
| **Ephemeral docker postgres per CI run** | Full isolation; parallel-safe; reset for free. | New infra; ~5 s container startup per CI invocation. |
| **Dedicated test DB on a paliad-only Postgres** | Isolated; cheap. | New infra to maintain. |

Recommendation: **option 1 for developers (no-op change), option 2 for CI** (Gitea Actions postgres service container). Deferred to Q3 for m's call.

### 3.4 Coverage targets

Don't gate on percentage. Gate on critical-path coverage (§4). Add `go test -coverprofile=` output to CI for visibility, not as a merge gate. Coverage % gating produces tests-for-tests'-sake; we want the tests that catch the bugs we've shipped.

---

## 4. Critical journeys — what MUST be covered

These are the golden-path flows. Anything not on this list is L1-L5 territory, not L6. The list is intentionally short; if it grows beyond 10, we are doing E2E wrong.

| # | Flow | Why it's critical | Layer mix |
|---|---|---|---|
| 1 | **Login → dashboard renders → traffic-light counts match** | Every user does this every day; broken auth = paliad is offline. | L6 (Playwright) + L5 handler (auth.go) |
| 2 | **Create project (Client → Litigation → Patent → Case)** | Hierarchy with team inheritance — the data model's spine. | L6 + L5 + L4 (project_service) |
| 3 | **Submit deadline → routes to /inbox → approver approves → state flips** | The 4-eye flow (t-paliad-138). Most-mutated paliad surface. | L6 + L5 (deadlines, approvals) + L4 (approval_service) |
| 4 | **Fristenrechner: pick proceeding → cascade fires → result shows** | The platform's flagship interactive tool. JS cascade. | L6 + L3 (fristenrechner cascade) + L4 (fristenrechner) |
| 5 | **SmartTimeline: anchor a projected row → predecessor-missing-error handled** | Recent Slice-2 work (t-paliad-173 / #31). High-touch surface. | L6 + L3 (shape-timeline) + L4 (projection_service) |
| 6 | **CalDAV sync: PUT a Termin → external client sees it, edits there → pull reconciles** | Owned-event semantics + foreign-UID skip rule from Phase F. Untested today. | L4 (caldav_service push/pull) — gated on Q3 (live YouPC vs ephemeral) |
| 7 | **Paliadin chat: anon visit hits 404; m's session opens widget; turn renders** | Owner-gated `/paliadin` is the only m-only surface. Quiet failures here are silent. | L6 (smoke) + L5 (paliadin_suggest) + L4 (paliadin / aichat_paliadin) |
| 8 | **/admin/rules: filter → edit one rule → lifecycle transition → audit log row** | Rules drive the cascade; bad edits break every user's fristenrechner. | L6 + L5 (admin_rules) + L4 (rule_editor_service) |
| 9 | **Onboarding: new user with allowed email → onboarding form → first project membership** | The new-user funnel; gateOnboarded middleware traps. | L6 + L5 (onboarding, invite) |
| 10 | **Migration boot smoke: spin paliad against an empty DB → server binds 8080** | Catches every mig-N crash-loop. | L0 (migration dry-run) + L4 boot-smoke variant |

Picks 1, 3, 4 and 10 are the highest-value-per-cost — they cover the routes most regressions land on (auth, mutation, cascade, boot).

---

## 5. Slice plan — tracer-bullet roll-out

Each slice is a shippable PR with a concrete deliverable, in order of expected outage-prevention payoff. Sized for a single coder shift unless flagged. No slice depends on a later one being merged. Hour estimates intentionally omitted (per global CLAUDE.md).

### Slice 1 — Migration dry-run harness + boot smoke (highest leverage)

**Branch:** `mai/<coder>/test-strategy-slice-1-migrations`

**Deliverable:**
- `internal/db/migrate_test.go` — `TestMigrations_DryRun` (per-mig BEGIN/ROLLBACK), `TestMigrations_EndToEnd` (full apply, then re-apply latest to assert idempotency), `TestMigrations_Down` (apply N→0).
- `Makefile` with `make verify-migrations` (the gate target), `make test` (run everything), `make test-go`, `make test-frontend`.
- `cmd/server/main_paliadin_backend_test.go` already exists; extend with a `TestMain_BindsHTTPAfterMigrate` that boots the full server against `TEST_DATABASE_URL`, asserts `:8080` is listening, then shuts down. Catches the mig-098-class crash-loop in a single test.
- README section: how to set `TEST_DATABASE_URL` locally.

**Catches:** Every mig-98-class crash-loop; every drop-cascade-with-stale-policy-name regression (t-paliad-036).

### Slice 2 — Service-layer infill: critical mutators

**Branch:** `mai/<coder>/test-strategy-slice-2-services`

**Deliverable:**
- Test files for the three highest-impact untested services:
  - `internal/services/agenda_service_test.go` (live-DB, dashboard agenda query)
  - `internal/services/dashboard_service_test.go` (traffic-light counts)
  - `internal/services/team_service_test.go` (membership + inheritance — RLS-load-bearing)
- Tighten existing `approval_service_test.go` + `deadline_service_test.go` coverage of the create/update/complete/delete × pending-request matrix where there are demonstrable gaps.
- Add `internal/services/internal/testdb/withtx.go` — the per-test tx harness (optional adoption; existing tests stay).

**Catches:** RLS regressions, approval interplay regressions, dashboard count drift after schema renames.

### Slice 3 — Frontend bun:test setup + L2 infill

**Branch:** `mai/<coder>/test-strategy-slice-3-frontend-unit`

**Deliverable:**
- `frontend/package.json` `scripts.test = "bun test"`.
- New tests under `frontend/src/client/`:
  - `paliadin-context.test.ts` (route table, entity extraction, selection truncation).
  - `paliadin-starters.test.ts` (every route ≥1 starter, every starter bilingual).
  - `filter-bar/index.test.ts` (chip render + active state — pure DOM-less helpers).
  - i18n key audit: `frontend/scripts/i18n-audit.test.ts` parses every `data-i18n="…"` from `dist/` HTML and every `t("…")` call from `src/`, asserts both `de` and `en` resolve. Runs as part of `bun test`.
- `make test-frontend` wires `cd frontend && bun test`.

**Catches:** i18n drift (untranslated key shipped to user), context-envelope contract drift (paliadin SKILL.md depends on it), starter-registry regressions.

### Slice 4 — Playwright golden-path smoke

**Branch:** `mai/<coder>/test-strategy-slice-4-e2e`

**Deliverable:**
- `e2e/` directory at repo root.
- `playwright.config.ts` pointing at `http://localhost:8080` (paliad started by the test, not assumed).
- Five Playwright `*.spec.ts` files covering critical journeys 1, 3, 4, 7, 9 from §4.
- `make e2e` target that:
  1. starts paliad against `TEST_DATABASE_URL`,
  2. waits for `:8080` to be live,
  3. runs `npx playwright test`,
  4. tears the server down.
- `bun add -d @playwright/test` + `npx playwright install chromium`.

**Catches:** Auth regressions, deadline-mutation regressions, fristenrechner cascade regressions, owner-gated /paliadin leaks, onboarding-gate misbehaviour.

### Slice 5 — Handler integration tests for the 5 most-touched routes

**Branch:** `mai/<coder>/test-strategy-slice-5-handlers`

**Deliverable:**
- `internal/handlers/auth_test.go` extended with `TestLogin_HappyPath` + `TestLogout_ClearsCookie` (real DB).
- `internal/handlers/projects_test.go` — `TestProjectsCreate` (POST 200, row inserted, audit emitted), `TestProjectsGetByID_RespectsVisibility` (404 for non-member).
- `internal/handlers/deadlines_test.go` — `TestDeadlinesCreate_TriggersApproval` (verifies pending pill).
- `internal/handlers/appointments_test.go` — same shape.
- `internal/handlers/paliadin_test.go` — `TestPaliadinPage_404ForNonOwner`, `TestPaliadinPage_200ForOwner`.
- Shared `internal/handlers/testauth/testauth.go` — mints a session cookie for `userID` so handler tests don't reinvent auth seeding.

**Catches:** Handler ↔ service wiring drift, visibility-predicate handler-side bugs (t-paliad-036 bug 2 was exactly this), owner-gate bypass.

### Slice 6 — Frontend L3 (DOM) cascade tests

**Branch:** `mai/<coder>/test-strategy-slice-6-frontend-dom`

**Deliverable:**
- `bun add -d happy-dom`.
- DOM-driven tests for the three most-touched cascades:
  - `client/fristenrechner.test.ts` (cascade activate → row appears → date-set fires fetch).
  - `client/shape-timeline.test.ts` (lane render, track render, projected-row click).
  - `client/filter-bar/index.test.ts` (chip click toggles state, URL params update).

**Catches:** The whole class of "the function exists and is unit-tested but the cascade in the browser doesn't fire it" bugs. This is the layer that catches t-paliad-098 / 099 / 102 / 103.

### Slice 7 — CI wiring (deferred — Q2 dependent)

**Branch:** `mai/<coder>/test-strategy-slice-7-ci` (gated on m's Q2 pick)

**Deliverable:**
- `.gitea/workflows/test.yml` (or stay click-deploy if m picks that).
- Gate tier runs on every PR; full suite runs on merge to main.
- Postgres service container provides `TEST_DATABASE_URL`.
- Slack/Gotify ping on red main.

**Catches:** Drift between "tests pass on my laptop" and prod reality.

### Slice 8 — Coverage reporting + dashboard (lowest priority)

**Branch:** `mai/<coder>/test-strategy-slice-8-coverage`

**Deliverable:**
- `go test -coverprofile=` aggregated into a single `coverage.html`.
- Bun's coverage output similarly.
- A `docs/coverage.md` index updated by CI.
- **Not a merge gate.** Visibility only.

**Catches:** Slow drift; nice-to-have once the floor is in.

### Slice order rationale

1, 4, 5 are the highest outage-prevention per LoC: migration dry-run kills crash-loops, E2E kills regressions, handler tests kill wiring drift. 2, 3, 6 widen the floor; 7-8 are infrastructure.

---

## 6. Open questions for m

These need m's call before any coder shift starts (or before specific slices start, where noted).

### Q1 — Per-PR test-run budget

How long is acceptable to wait on the gate tier before merge?

- 30 s — only L0 + L1 (no L2+ on the gate).
- **60–90 s (recommended)** — L0 + L1 + L2 + L5 happy-path + L4 critical.
- 2 min — add L3 + L4 full.
- 4+ min — add L6 (E2E on gate).

The pick determines whether E2E gates merge or only deploy.

### Q2 — CI infrastructure

- **Gitea Actions** (self-hosted, gate tier only, recommended) — minimal new infra; aligns with m's existing stack.
- **Stay click-deploy** — workers run tests locally; merge discipline enforced by convention. Today's reality; we keep it.
- **Both:** start with click-deploy, add Gitea Actions in Slice 7 once gate tier proves stable.

### Q3 — Live-DB vs ephemeral docker Postgres for tests

- **Per-developer YouPC DB (current pattern)** — closest to prod; existing tests work unchanged.
- **Ephemeral docker postgres in CI, YouPC for devs (recommended hybrid)** — keeps local-dev simple, gives CI deterministic isolation.
- **YouPC everywhere** — simplest, but parallel CI runs would contend.

### Q4 — Coverage targets — % or critical-path?

- **Critical-path only (recommended)** — §4's 10 flows + every state-mutating service method has a test. No % gate.
- **% gate** — set a floor (e.g. 60 % lines, 50 % branches) and refuse merges below it.
- **Both** — critical-path is mandatory, % is informational.

m's prior preference (memory pattern: "tests that catch real bugs > coverage theatre") points at critical-path-only. Confirming.

### Q5 — Which slices land before paliad is "production-grade"?

paliad is already live at `paliad.de` and being used by HLC colleagues. "Production-grade" here means "next time someone ships, we don't go down."

Picks:
- **Slices 1 + 4 + 5 are the production-grade floor (recommended).** Migration dry-run + golden-path E2E + handler integration tests cover the failure modes that hit prod since the rebrand.
- Add Slice 2 + 3 + 6 as widening passes, on their own cadence.
- Slice 7-8 are nice-to-haves.

Confirming the floor pick — and whether m wants all three to land before any new feature work, or whether they roll out alongside.

### Q6 — Who owns each slice?

Recommendation: rotate coder slots so the same person isn't on every slice. Suggested assignment (head can override):

| Slice | Profile fit |
|---|---|
| 1 — migrations | Backend-heavy coder (knuth, gauss, cronus). |
| 2 — service infill | Backend-heavy coder; whoever owns approval/projection. |
| 3 — frontend unit | Frontend-heavy coder. |
| 4 — Playwright E2E | Cross-stack coder; ideally one familiar with `/mai-tester`. |
| 5 — handler integration | Backend coder. |
| 6 — frontend DOM | Frontend coder (same person as 3 makes sense). |

Inventor does **not** decide assignments; head + m do.

---

## 7. Out of scope (explicit)

- **No rewrite of any existing test.** The 323 existing test functions stay. New tests use the new patterns; old tests are migrated only when their files are touched for unrelated reasons.
- **No third-party framework where stdlib + bun:test suffice** (testify, vitest, etc. — see §3).
- **No mocks at the service↔DB boundary.** This is the lock-in. Mocks lie; the live-DB tests we already have are paliad's most useful safety net.
- **No new feature work in this strategy.** The doc proposes infra; feature scope is unchanged.
- **No retirement of the `tests/smoke-*.md` human-written reports.** Those are great for one-shot regression hunts; they coexist with the automated suite.

---

## 8. Implementation notes for the eventual coder

(For whichever coder picks up a slice. Not exhaustive.)

- **Test-name collisions in Go's flat package namespace bite when a service grows N implementations.** Memory note from `t-paliad-194` already records this. Prefix tests with the service name (e.g. `TestAichatPaliadin_RunTurn_…` not `TestRunTurn_…`).
- **`httptest.NewRequest` does not URL-encode** — use `url.QueryEscape` for any `?q=…` argument. Memory note from `t-paliad-026`.
- **sqlx v1.4.0 `Named` parser strips one colon from `::uuid[]`** — known pitfall, repro lives at `internal/services/project_service.go`. Use `CAST(... AS uuid[])` in new query strings.
- **Live-DB cleanup must DELETE FKs first.** Order matters (auth.users last). Look at `audit_service_test.go` for the chain pattern.
- **`paliad.paliad_schema_migrations` tracker collision** is documented but unresolved. Slice 1 should add a `make reset-test-db` target that drops both `public.paliad_schema_migrations` *and* `paliad.paliad_schema_migrations` to keep developers unblocked.
- **`bun:test` matchers are Jest-compatible** — `expect().toEqual()`, `expect().toHaveBeenCalled()`, etc. No deps needed.
- **happy-dom does not implement** every DOM method (notably some `<dialog>` semantics). If a cascade test fails on something missing, jsdom is the escape hatch.

---

## 9. Decision summary — pick list for m

| # | Question | Inventor recommends |
|---|---|---|
| Q1 | Per-PR budget | 60–90 s gate, 3–4 min full |
| Q2 | CI infra | Gitea Actions, gate tier only |
| Q3 | Test DB | YouPC for devs, ephemeral docker for CI |
| Q4 | Coverage target | Critical-path only, no % gate |
| Q5 | Production-grade floor | Slices 1 + 4 + 5 before new feature work |
| Q6 | Slice ownership | Rotate per profile; head decides |

If m's calls match inventor's, the implementer's brief writes itself: Slice 1 first, then 4 + 5 in parallel, then 2/3/6 as widening passes.

---

**Status:** DESIGN READY FOR REVIEW. Awaiting m go/no-go on §5 slice plan + §6 open questions before any coder shift starts.

---

## 10. m's decisions (2026-05-19, locked)

Walked through §6 with m via the AskUserQuestion interview (per head's 2026-05-19 workflow rule: inventor questions are resolved before parking, not after). Six picks locked, all matching inventor's recommendation.

| # | Question | m's answer | Effect on plan |
|---|---|---|---|
| Q1 | Per-PR test-run budget | **Inventor's call** (m deferred). Pick: **60–90 s gate, 3–4 min full.** | Gate tier = L0 + L1 + L2 + L5 happy-path + L4 critical. L6 E2E gates deploy, not merge. |
| Q2 | CI infrastructure | **Gitea Actions, gate tier only.** | Slice 7 adds `.gitea/workflows/test.yml` running the gate tier; full suite stays on merge-to-main. |
| Q3 | Test DB topology | **YouPC for devs + ephemeral docker for CI.** | Local dev unchanged. Slice 7 wires Postgres service container in Gitea Actions. |
| Q4 | Coverage target | **Critical-path only, no % gate.** | §4's 10 flows + every state-mutating service method gets a test. Coverage % output is informational in Slice 8, never a merge gate. |
| Q5 | Production-grade floor | **Slices 1 + 4 + 5 before new feature work.** | These three land before any new paliad feature gets a coder shift. Slices 2, 3, 6 widen the floor on their own cadence. Slices 7-8 are nice-to-haves. |
| Q6 | Slice ownership | **Head decides + rotate per profile.** | Backend slices (1, 2, 5) → backend-heavy coder. Frontend slices (3, 6) → frontend-heavy coder. E2E (4) → cross-stack. Head picks at dispatch time. |

**Implementer brief (post-m-decisions):**

1. **Slice 1 starts first** — migration dry-run harness + `make verify-migrations` + boot-smoke variant of `cmd/server/main_paliadin_backend_test.go`. Backend-heavy coder.
2. **Slice 4 + Slice 5 in parallel** once Slice 1 is merged — Playwright golden-path (cross-stack coder, 5 specs) and handler integration (backend coder, auth/projects/deadlines/appointments/paliadin).
3. Slice 7 (Gitea Actions wiring) follows once Slice 1 gate tier is proven locally.
4. Slices 2, 3, 6 enter rotation alongside feature work — not blocking.
5. Slice 8 (coverage reporting) lowest priority.

**Status:** DESIGN APPROVED — awaiting head's dispatch of Slice 1 coder shift.