# Design — Paliad Test Strategy (production-grade) **Author:** mendel (inventor) **Date:** 2026-05-19 **Task:** t-paliad-213 **Branch:** `mai/mendel/inventor-test-strategy` **Status:** DESIGN READY FOR REVIEW. No test files / Make targets / CI configs touched. Awaiting m go/no-go on §5 slice plan + §6 open questions before any coder shift. --- ## 0. TL;DR Paliad has accidental test discipline today: 59 `_test.go` files / 323 test functions in Go (≈45 % of services tested, ≈12 % of handlers tested) and 4 frontend test files for 90+ client modules (≈4 %). There is no committed end-to-end suite and no CI — every smoke pass is human-driven via the manual reports in `tests/`. The `mig 098` prod crash-loop, the `t-paliad-036` triple-bug after the German→English rename, and a long tail of UX regressions (deadline-done modal, calendar column drift) would all have been caught by a 10-test boot-and-click smoke pass. This design proposes a six-layer test pyramid with a concrete tool per layer (stdlib `testing` + bun's built-in `bun:test` + `playwright` for E2E — nothing third-party we don't already use). It pins three lessons paliad has paid for in commits: 1. **No mocks at the service↔DB boundary.** Live-DB tests against a per-developer Postgres are the floor; in-memory mocks for `paliad.*` would have hidden every rename-after-DROP-CASCADE bug. Project preference is already in this direction (27/44 service tests are live-DB-gated); we double down rather than reverse. 2. **Migrations must dry-run before they merge.** Every recent prod-down (mig 098, mig 020-after-rename, mig 099 audit_reason gap) was a migration that compiled, passed `go test ./...` (which skips without `TEST_DATABASE_URL`), and broke on first apply against the real schema. A `make verify-migrations` target that does BEGIN/apply/ROLLBACK in CI fixes the entire failure mode. 3. **Browser-shaped bugs need a browser.** The fristenrechner cascade, shape-timeline render, calendar grid, inline paliadin widget — these are JS state machines. Bun's stdlib `bun:test` covers the pure parser/codec code; Playwright covers the auth-gated DOM. Don't try to substitute one for the other. Six slices roll the strategy out as tracer-bullet PRs, each independently shippable. Slice 1 (migration dry-run harness) and Slice 4 (Playwright golden-path smoke) buy the most outage-prevention per LoC; the rest is widening proven patterns. Six open questions for m at §6. Most surface a coverage-vs-cost trade-off — the picks that need m's call before any code lands are CI infrastructure choice (Q2), per-PR run-time budget (Q1), and live-DB-vs-dockerised Postgres (Q3). --- ## 1. Audit — what exists today Counts taken on `mai/mendel/inventor-test-strategy` @ HEAD (2026-05-19, 100 migrations applied). ### 1.1 Go test inventory | Package | Source files | Test files | Test functions | Notes | |---|---|---|---|---| | `internal/services` | 56 | 44 | ~200 | 26 live-DB-gated (`TEST_DATABASE_URL`), 18 pure-Go. 24 services have **no test file at all** — see §1.4. | | `internal/handlers` | 59 | 7 | ~30 | Only auth-domain check, search, audit-parse, approval-error-mapping, redirects, verfahrensablauf-redirect, chart-404 covered. **53 handlers have no test file.** | | `internal/auth` | small | 2 | ~10 | Session middleware + require-admin. | | `internal/branding` | small | 1 | small | Firm-name override. | | `internal/offices` | small | 1 | small | Office enum. | | `internal/changelog` | small | 1 | small | Pure parser. | | `internal/calc` | small | 1 | small | Fees / fee tables. | | `cmd/server` | 1 | 1 | small | `main_paliadin_backend_test.go` covers env-gate selection. | | **Total** | **133** | **58** | **323** | | `go test ./...` runs all 58 files. Without `TEST_DATABASE_URL` set, 27 of them silently skip their live-DB cases — the suite still passes, but coverage of mutation paths drops to near zero. ### 1.2 Frontend test inventory | Path | Test files | Tested | |---|---|---| | `frontend/src/client/filter-bar/url-codec.test.ts` | 1 | FilterBar URL codec round-trip. | | `frontend/src/client/views/format.test.ts` | 1 | Date/time formatters (regression for t-paliad-153). | | `frontend/src/client/views/shape-timeline-chart.test.ts` | 1 | Chart layout pure function. | | `frontend/src/client/views/shape-timeline-cv.test.ts` | 1 | Continuous-view shape layout. | | **Total** | **4** | Out of ~90 client modules (`frontend/src/client/*.ts`). | All four use bun's built-in `bun:test` (no extra dep). No DOM/jsdom tests. No Playwright. No `bun test` script in `package.json` (`bun run build` is the only script). ### 1.3 End-to-end / smoke - `tests/smoke-2026-04-25.md`, `tests/smoke-auth-2026-04-25.md`, `tests/smoke-auth-2026-04-26-cleanup.md` — human-written reports with screenshots committed under `tests/screenshots-*`. No code. No re-runnable script. - `mai-tester` skill uses Playwright for ad-hoc runs; nothing committed. - No `e2e/`, no `.gitea/workflows/`, no `.github/workflows/`, no `Makefile`. ### 1.4 Critical service paths with no test file These are `internal/services/*.go` for which no `*_test.go` sibling exists: | Service | Risk class | Why it matters | |---|---|---| | `caldav_service.go`, `caldav_client.go`, `caldav_crypto.go`, `caldav_ical.go` | High | Per-user push/pull goroutines + AES-GCM at rest. One pure parser test (`caldav_ical_timeline_test.go`) exists but the service + crypto + WebDAV client are blind. | | `agenda_service.go` | High | Dashboard agenda query; reused by `/agenda` page. Exercised transitively by visibility tests but no direct test. | | `dashboard_service.go` | High | Traffic-light + summary counts. Same story — transitively covered via visibility, no direct test. | | `derivation_service.go` | Medium | Project-tree derivation (the new t-paliad-194-era subtree machinery). | | `team_service.go` | Medium | Team membership / inheritance. | | `partner_unit_service.go` | Medium | Dezernat replacement (t-paliad-070). | | `party_service.go`, `note_service.go`, `link_service.go`, `checklist_instance_service.go` | Medium | All do project-scoped CRUD with the same RLS+audit pattern that `t-paliad-036` proved easy to break. | | `appointment_service.go` | High | Hot — every calendar mutation. Exercised through approval tests but has no own test file. | | `view_service.go` | Medium | Powers the substrate (`/views/*`). | | `paliadin_jwt.go` | Medium | Per-turn JWT mint for the aichat path (`t-paliad-194`). No call sites in tests today. | | `markdown.go` | Low | Glossary + checklist content render. | ### 1.5 Handlers with no test file 53 of 59. Notably: **`auth.go` itself** (login / logout / session creation), **`projects.go`** (the most-mutated entity), **`deadlines.go` / `appointments.go`** (writes), **`paliadin.go` / `paliadin_suggest.go`** (m-only routes — never click-tested), **`fristenrechner.go` / `fristenrechner_search.go` / `fristenrechner_event_categories.go`** (the cascade users live in), **`dashboard.go` / `agenda.go`** (landing), **`onboarding.go` / `onboarding_gate.go`** (every new user's first three minutes), **`invite.go`** (rate-limited write path). The currently-tested handlers (search, audit-parse, approval error mapping, etc.) are the cheap pure-Go ones; every handler that touches the DB is untested at handler level. ### 1.6 Live-DB test scaffold — is it sound? The pattern (read from `internal/services/visibility_test.go`): ```go url := os.Getenv("TEST_DATABASE_URL") if url == "" { t.Skip("TEST_DATABASE_URL not set — skipping live DB test") } if err := db.ApplyMigrations(url); err != nil { t.Fatalf(...) } pool, _ := sqlx.Connect("postgres", url) defer pool.Close() // per-test seed + cleanup via DELETE + defer cleanup() ``` Verdict: **sound, but has rough edges that need addressing before we widen.** - ✅ Migrations apply at test startup against the test DB — catches every "you forgot to add a CHECK" / "you reference a column that doesn't exist" before a real-DB-touching test runs. - ✅ Per-test cleanup via `DELETE FROM ... WHERE id IN ($1,...)` is explicit and idempotent. - ✅ The `paliad.paliad_schema_migrations` tracker collision noted in memory `0b900afa…` is a pre-existing issue, not introduced by this design. - ⚠️ Cleanup-via-DELETE is fragile: a test that creates a row referenced by FK from another table needs to remember to clean both. A few existing tests (see `audit_service_test.go`) already chain 5+ DELETEs. - ⚠️ Tests can't run in parallel against the same `TEST_DATABASE_URL` because they share schema state. `go test ./...` defaults to `-parallel` per-package; same-package tests with overlapping cleanup IDs can interfere. - ⚠️ No CI today actually exercises `TEST_DATABASE_URL` — so every live-DB test is effectively run only on the author's laptop or not at all. Half the value is paid-for but unbilled. ### 1.7 Migration tooling - `internal/db/migrate.go` embeds `migrations/*.sql` and applies on server boot via `golang-migrate/v4` with the `paliad_schema_migrations` tracker in `public` schema. - 100 migrations on disk (`001` → `100`). - **No dry-run gate today.** A bad migration breaks `paliad.de` at boot (Dokploy crash-loops the container). Recent prod incidents: mig 098 (submission code rename), mig 099 (with_po flag drop missed audit_reason gap), mig 020 (function rename without body rewrite — see memory `49a05cfa…`). - `down.sql` exists for every migration but no test ever exercises it. ### 1.8 CI / deploy loop - No CI. Push-to-main → Gitea webhook → Dokploy auto-builds the Dockerfile and replaces the container. The Dockerfile runs `bun run build` then `go build`. **Neither `go test` nor `bun test` runs in the build pipeline.** - Pre-commit hooks: none in repo. Each worker runs `go build / go vet / go test / bun run build` by convention (see memories — every shipped task report ends with "build hygiene held"). --- ## 2. Test pyramid — recommended shape ``` ┌─────────────────┐ │ E2E (Playwright)│ ~10 flows │ L6 │ └─────────────────┘ ┌─────────────────────────┐ │ Handler integration │ ~30 routes │ L5 (httptest + real DB)│ └─────────────────────────┘ ┌──────────────────────────────────┐ │ Service-layer (live DB) │ ~60 tests │ L4 (BEGIN/ROLLBACK harness) │ └──────────────────────────────────┘ ┌──────────────────────────────────────────┐ │ Frontend DOM / cascade (bun:test+jsdom) │ ~15 modules │ L3 │ └──────────────────────────────────────────┘ ┌──────────────────────────────────────────────────────┐ │ Frontend unit (bun:test pure TS) │ ~30 modules │ L2 │ └──────────────────────────────────────────────────────┘ ┌──────────────────────────────────────────────────────────────┐ │ Go unit (stdlib testing, table-driven, pure functions) │ ~150 tests │ L1 │ └──────────────────────────────────────────────────────────────┘ ┌──────────────────────────────────────────────────────────────┐ │ Migration dry-run (make verify-migrations) │ 100 mig │ L0 — gate on every PR │ └──────────────────────────────────────────────────────────────┘ ``` ### Layer 0 — Migration dry-run **What:** Every `*.up.sql` in `internal/db/migrations/` is applied inside a single `BEGIN ... ROLLBACK` transaction against a scratch Postgres, in numeric order. The harness asserts each statement succeeds *and* asserts no statement leaves the schema in a `paliad_schema_migrations.dirty=true` state. A second pass applies all up-migrations end-to-end (no rollback) and then re-applies the latest up-migration to assert idempotency (every paliad migration since `t-paliad-070` has been written to be idempotent — this enforces it). **Tool:** stdlib `testing` package, no third-party. Pattern: `internal/db/migrate_test.go` with a `TestMigrations_DryRun` driven from `TEST_DATABASE_URL`. A `make verify-migrations` target wraps it. **Why this layer matters most:** Every recent prod-down was a migration. Catching them on a CI run before merge is the highest-leverage test investment paliad can make. Cost: one ~100-line Go file + one Postgres in CI. **Coverage target:** 100 % of `*.up.sql` files. Hard gate on PR — no exceptions. ### Layer 1 — Go unit (pure) **What:** `go test ./...` against pure functions — formatters, parsers, validators, calculators, fee tables, deadline calculators, projection lookahead clamping, codec round-trips. No DB, no HTTP. **Tool:** stdlib `testing`. Table-driven `cases := []struct{...}{...}` style is already the house pattern (see `auth_test.go` / `projection_anchor_test.go`). **Do not introduce testify or any matcher library** — the current code reads cleanly without one, and 323 existing test functions don't need a rename pass. **What's already there:** 19 pure-Go test files (calculator, mapping, codec, holiday, fees, etc.). Density is good; targeted infill rather than re-architecture. **Coverage target:** Every pure function in `internal/services/`, `internal/handlers/`, `internal/calc/`, `internal/changelog/`. Aim for "every branch in a decision table has at least one test row." Don't chase % — chase "the obvious edge that would burn a coworker". ### Layer 2 — Frontend unit (pure) **What:** `bun test` against pure TS modules — URL codecs (`filter-bar/url-codec`), formatters, parsers, i18n key correctness (every `data-i18n` attribute used in TSX has a key in `i18n.ts`), view-spec parsers, projection-row mapping helpers. **Tool:** `bun:test` (built into bun, no install). Already in use in 4 files — extend the same pattern. Add `bun test` to `package.json` `scripts`. **What to add:** - i18n key audit (every `t("foo.bar")` and `data-i18n="foo.bar"` resolves in both `de` and `en`). - `filter-bar/` types + render helpers (paliad has shipped 4 FilterBar slices; coverage is one codec test). - `paliadin-context.ts` route table + entity extraction (the `[ctx …]` envelope is a stable contract paliadin's SKILL.md depends on; any drift here is a silent failure). - `paliadin-starters.ts` registry — every route maps to ≥1 starter; every starter is bilingual. - View-spec parsers in `views/`. **Coverage target:** Every pure TS module in `frontend/src/client/`. Pages (TSX renderers) are E2E concern, not unit concern. ### Layer 3 — Frontend DOM (cascade / jsdom) **What:** `bun test` with jsdom global, exercising the interactive cascade modules — the fristenrechner cascade builder, the shape-timeline render, the FilterBar UI (chips, panels), the calendar grid, the inline Paliadin widget message stream, the inbox-row click handler, the dashboard activity item navigation. These modules contain enough state that pure-function tests miss real bugs (e.g. the t-paliad-098 `.entity-table` row-cursor lie was a CSS+DOM bug; t-paliad-099's modal close was a DOM-event bug; t-paliad-103's `::before` overlay click-swallow was a DOM bug). **Tool:** bun + `happy-dom` is the lighter choice; if it can't handle event ordering, fall back to `jsdom`. Both are ESM-clean and bun-friendly. **Pick one and stick with it — running both means twice the dependency surface.** Default pick: `happy-dom` (smaller, paliad doesn't need legacy IE semantics). **Pattern:** import the cascade module, build a minimal DOM (`document.body.innerHTML = …`), dispatch synthetic events, assert resulting state. Reuses the production renderers — no test-only fakes. **Coverage target:** ~15 modules. Specifically: - `client/filter-bar/index.ts` chip render + active-state. - `client/fristenrechner.ts` cascade — most complex JS in the codebase; depend chains light up every UPC bug we know. - `client/shape-timeline.ts` lane mode + track mode (envelope wire shape brittle to refactor). - `client/projects-detail.ts` row click + Verlauf render. - `client/paliadin-widget.ts` + `paliadin-context.ts` interaction. - `client/inbox.ts` row-action click routing. - `client/dashboard.ts` activity-item nav. - `client/deadlines-calendar.ts` / `appointments-calendar.ts` column layout (the calendar-column-drift bug class). Not unit tests; not E2E. They are the missing middle. ### Layer 4 — Service-layer (live DB) **What:** Go service methods against a real Postgres, using the existing `TEST_DATABASE_URL` pattern. Two improvements: 1. **Replace per-test DELETE cleanup with a per-test transaction harness** — open a transaction, run the test inside it, ROLLBACK. Faster, isolating, no cleanup forgotten. Already viable because the service layer accepts `*sqlx.DB`-or-tx-shaped interfaces in many places; needs a small `internal/services/internal/testdb` package that exposes `WithTx(t *testing.T, fn func(*sqlx.Tx))`. Migration is mechanical, can happen alongside infill. *Caveat:* some service methods open their own transactions internally (`approval_service.submit` is one). Those keep DELETE cleanup; the tx harness is a default, not a mandate. 2. **Make `TEST_DATABASE_URL` mandatory in CI.** Today these tests are skipped on every machine that doesn't `export TEST_DATABASE_URL=…` — i.e. they don't run on autoatic pipelines because there's no pipeline. Once CI exists (§3.5), it becomes a required env var. **Tool:** stdlib `testing` + `sqlx` (already in `go.mod`). **No mocks at the service↔DB boundary.** This is m's hardest line — see global CLAUDE.md memory pattern and `t-paliad-036` (the bug that masked two other bugs would have been caught instantly by a real-DB test). **Where to invest first:** Approval (already heavy), Projection (already heavy), Fristenrechner (already heavy), DeadlineService Create/Update/Complete/Delete with `pending_request_id` interplay, AppointmentService same, ProjectService visibility predicate, CalDAV push (the four CalDAV `*.go` files have zero direct test). **Coverage target:** Every service method that mutates the DB has at least one happy-path live-DB test. RLS predicate (`visibilityPredicatePositional`) has one test per role (global_admin, member, non-member). ### Layer 5 — Handler integration (httptest + real DB) **What:** Spin a real `services.DBService`, mount the protected mux, drive `httptest.NewRequest` + `ServeHTTP` against it. Auth via a fake session cookie produced by a `testauth.Login(t, userID)` helper that mints the same Supabase JWT shape `auth.UserIDFromContext` expects. **Why:** The 53 untested handlers are where the request shape ↔ service interaction lives. Examples that would have caught real bugs: - `t-paliad-036`'s "`/projects/{id}` 404 while `/api/projects/{id}` 200" mismatch — a 5-line handler test would have failed before the migration ran. - mig 020's three-stacked bug — a handler test that POSTs a deadline and asserts a 200 + read-back row would have failed at submit-time, not boot-time. - The audit-log query timezone bug — handler test asserts the JSON contains the expected `event_date`. **Tool:** stdlib `net/http/httptest`. **No new framework.** Pattern: handler tests live next to the handler file (`internal/handlers/deadlines_test.go` next to `deadlines.go`). **Coverage target:** Every handler that gates a state-changing route — `POST/PATCH/DELETE` flavour. Plus `GET` handlers that compose a non-trivial query (dashboard, agenda, search, audit-log). ### Layer 6 — End-to-end (Playwright) **What:** A small Playwright suite (~10 flows) committed at `e2e/` with a `bun run e2e` entry. Targets a local `./paliad` against a scratch Postgres (the same `TEST_DATABASE_URL`). Each test logs in, drives the UI through one user journey, asserts visible state. **Why ~10 not 100:** Per-PR budget caps at ~2 min total (§6 Q1). Playwright tests are the most expensive minute-per-confidence in this stack; they pay for themselves on the *golden path* and nothing else. The deep-coverage layer is L5; E2E is *"is the app still alive end to end?"*. **Tool:** `playwright` (npm; bun installs cleanly). No third-party test runner — Playwright ships its own. Tests live in `e2e/*.spec.ts`. **Not bun:test.** Playwright's runner is purpose-built for browser-driving and integrates with their tracing — don't fight it. **Cap:** 10 flows. If a new test wants in, an existing one must drop out (or we have a real reason to widen). This is the cheapest discipline available: it forces the suite to remain a smoke pass, not a regression-test dumping ground. **Coverage target:** See §4. --- ## 3. Tooling — concrete picks per layer | Layer | Tool | Already in deps? | Install? | |---|---|---|---| | L0 — migration dry-run | stdlib `testing` + `migrate/v4` | yes | no | | L1 — Go unit | stdlib `testing` | yes | no | | L2 — Frontend unit | `bun:test` | yes (built into bun) | no | | L3 — Frontend DOM | `bun:test` + `happy-dom` | bun yes, happy-dom **new** | `bun add -d happy-dom` (one dep, ~200 KB) | | L4 — Service live-DB | stdlib + sqlx | yes | no | | L5 — Handler integration | stdlib `net/http/httptest` + sqlx | yes | no | | L6 — E2E | `@playwright/test` | **new** | `bun add -d @playwright/test` + `npx playwright install chromium` | Net new deps: **2** (happy-dom + playwright). Both are mainstream, both have small surface area, both align with bun's ecosystem. Explicit rejects: - ❌ **testify** — current tests read cleanly with stdlib; adding it forces a rename pass nobody wants. - ❌ **vitest** — bun's built-in test runner is faster and the tests are already in `bun:test` shape. - ❌ **dockertest / testcontainers-go** — m's preference is real-DB tests against the existing Postgres; spinning ephemeral Docker Postgres per package run adds latency and surface area for marginal isolation gain. See Q3. - ❌ **sqlmock / gomock for DB** — banned by §0 lesson 1. - ❌ **cypress** — Playwright is the better tool today, and the team's existing skill (`/mai-tester`) already uses it. ### 3.1 Per-PR run-time budget Target (subject to m's call in Q1): **≤ 90 s for the gating tier (L0+L1+L2+L4 subset+L5 happy-path)**, ≤ 4 min for the full suite (add L3+L4 full+L6). The gating tier blocks merge; the full suite blocks deploy. Indicative times (estimated, validate when slice 1 lands): | Tier | Layers | Est. time | Blocks | |---|---|---|---| | **Gate (every PR)** | L0 + L1 + L2 + L5 happy-path + L4 critical | 60–90 s | merge | | **Full (every merge to main)** | + L4 full + L3 + L6 | 3–4 min | deploy | ### 3.2 CI — proposal, not commitment paliad has no CI today. Two routes: - **Gitea Actions** (m's stack already runs `mgit.msbls.de`). Self-hosted; same auth model as the rest of mAi. Adds a `.gitea/workflows/test.yml`. Postgres comes from a service container. - **Stay click-deploy.** No CI. Workers run tests locally; Dokploy auto-deploys on green-main convention. Recommendation: **Gitea Actions for the gate tier only** (L0 + L1 + L2), driven by a single short workflow. The L3-L6 expansion can be a follow-up once the gate tier proves stable. Deferred to Q2 for m's call. ### 3.3 Test DB — live YouPC vs ephemeral The `paliad` schema lives on the shared YouPC Postgres (port 11833). Three options: | Option | Pros | Cons | |---|---|---| | **Per-developer separate DB on YouPC** (`TEST_DATABASE_URL` per laptop) | Closest to prod; existing pattern. | Cleanup discipline matters; cross-developer contention possible. | | **Ephemeral docker postgres per CI run** | Full isolation; parallel-safe; reset for free. | New infra; ~5 s container startup per CI invocation. | | **Dedicated test DB on a paliad-only Postgres** | Isolated; cheap. | New infra to maintain. | Recommendation: **option 1 for developers (no-op change), option 2 for CI** (Gitea Actions postgres service container). Deferred to Q3 for m's call. ### 3.4 Coverage targets Don't gate on percentage. Gate on critical-path coverage (§4). Add `go test -coverprofile=` output to CI for visibility, not as a merge gate. Coverage % gating produces tests-for-tests'-sake; we want the tests that catch the bugs we've shipped. --- ## 4. Critical journeys — what MUST be covered These are the golden-path flows. Anything not on this list is L1-L5 territory, not L6. The list is intentionally short; if it grows beyond 10, we are doing E2E wrong. | # | Flow | Why it's critical | Layer mix | |---|---|---|---| | 1 | **Login → dashboard renders → traffic-light counts match** | Every user does this every day; broken auth = paliad is offline. | L6 (Playwright) + L5 handler (auth.go) | | 2 | **Create project (Client → Litigation → Patent → Case)** | Hierarchy with team inheritance — the data model's spine. | L6 + L5 + L4 (project_service) | | 3 | **Submit deadline → routes to /inbox → approver approves → state flips** | The 4-eye flow (t-paliad-138). Most-mutated paliad surface. | L6 + L5 (deadlines, approvals) + L4 (approval_service) | | 4 | **Fristenrechner: pick proceeding → cascade fires → result shows** | The platform's flagship interactive tool. JS cascade. | L6 + L3 (fristenrechner cascade) + L4 (fristenrechner) | | 5 | **SmartTimeline: anchor a projected row → predecessor-missing-error handled** | Recent Slice-2 work (t-paliad-173 / #31). High-touch surface. | L6 + L3 (shape-timeline) + L4 (projection_service) | | 6 | **CalDAV sync: PUT a Termin → external client sees it, edits there → pull reconciles** | Owned-event semantics + foreign-UID skip rule from Phase F. Untested today. | L4 (caldav_service push/pull) — gated on Q3 (live YouPC vs ephemeral) | | 7 | **Paliadin chat: anon visit hits 404; m's session opens widget; turn renders** | Owner-gated `/paliadin` is the only m-only surface. Quiet failures here are silent. | L6 (smoke) + L5 (paliadin_suggest) + L4 (paliadin / aichat_paliadin) | | 8 | **/admin/rules: filter → edit one rule → lifecycle transition → audit log row** | Rules drive the cascade; bad edits break every user's fristenrechner. | L6 + L5 (admin_rules) + L4 (rule_editor_service) | | 9 | **Onboarding: new user with allowed email → onboarding form → first project membership** | The new-user funnel; gateOnboarded middleware traps. | L6 + L5 (onboarding, invite) | | 10 | **Migration boot smoke: spin paliad against an empty DB → server binds 8080** | Catches every mig-N crash-loop. | L0 (migration dry-run) + L4 boot-smoke variant | Picks 1, 3, 4 and 10 are the highest-value-per-cost — they cover the routes most regressions land on (auth, mutation, cascade, boot). --- ## 5. Slice plan — tracer-bullet roll-out Each slice is a shippable PR with a concrete deliverable, in order of expected outage-prevention payoff. Sized for a single coder shift unless flagged. No slice depends on a later one being merged. Hour estimates intentionally omitted (per global CLAUDE.md). ### Slice 1 — Migration dry-run harness + boot smoke (highest leverage) **Branch:** `mai//test-strategy-slice-1-migrations` **Deliverable:** - `internal/db/migrate_test.go` — `TestMigrations_DryRun` (per-mig BEGIN/ROLLBACK), `TestMigrations_EndToEnd` (full apply, then re-apply latest to assert idempotency), `TestMigrations_Down` (apply N→0). - `Makefile` with `make verify-migrations` (the gate target), `make test` (run everything), `make test-go`, `make test-frontend`. - `cmd/server/main_paliadin_backend_test.go` already exists; extend with a `TestMain_BindsHTTPAfterMigrate` that boots the full server against `TEST_DATABASE_URL`, asserts `:8080` is listening, then shuts down. Catches the mig-098-class crash-loop in a single test. - README section: how to set `TEST_DATABASE_URL` locally. **Catches:** Every mig-98-class crash-loop; every drop-cascade-with-stale-policy-name regression (t-paliad-036). ### Slice 2 — Service-layer infill: critical mutators **Branch:** `mai//test-strategy-slice-2-services` **Deliverable:** - Test files for the three highest-impact untested services: - `internal/services/agenda_service_test.go` (live-DB, dashboard agenda query) - `internal/services/dashboard_service_test.go` (traffic-light counts) - `internal/services/team_service_test.go` (membership + inheritance — RLS-load-bearing) - Tighten existing `approval_service_test.go` + `deadline_service_test.go` coverage of the create/update/complete/delete × pending-request matrix where there are demonstrable gaps. - Add `internal/services/internal/testdb/withtx.go` — the per-test tx harness (optional adoption; existing tests stay). **Catches:** RLS regressions, approval interplay regressions, dashboard count drift after schema renames. ### Slice 3 — Frontend bun:test setup + L2 infill **Branch:** `mai//test-strategy-slice-3-frontend-unit` **Deliverable:** - `frontend/package.json` `scripts.test = "bun test"`. - New tests under `frontend/src/client/`: - `paliadin-context.test.ts` (route table, entity extraction, selection truncation). - `paliadin-starters.test.ts` (every route ≥1 starter, every starter bilingual). - `filter-bar/index.test.ts` (chip render + active state — pure DOM-less helpers). - i18n key audit: `frontend/scripts/i18n-audit.test.ts` parses every `data-i18n="…"` from `dist/` HTML and every `t("…")` call from `src/`, asserts both `de` and `en` resolve. Runs as part of `bun test`. - `make test-frontend` wires `cd frontend && bun test`. **Catches:** i18n drift (untranslated key shipped to user), context-envelope contract drift (paliadin SKILL.md depends on it), starter-registry regressions. ### Slice 4 — Playwright golden-path smoke **Branch:** `mai//test-strategy-slice-4-e2e` **Deliverable:** - `e2e/` directory at repo root. - `playwright.config.ts` pointing at `http://localhost:8080` (paliad started by the test, not assumed). - Five Playwright `*.spec.ts` files covering critical journeys 1, 3, 4, 7, 9 from §4. - `make e2e` target that: 1. starts paliad against `TEST_DATABASE_URL`, 2. waits for `:8080` to be live, 3. runs `npx playwright test`, 4. tears the server down. - `bun add -d @playwright/test` + `npx playwright install chromium`. **Catches:** Auth regressions, deadline-mutation regressions, fristenrechner cascade regressions, owner-gated /paliadin leaks, onboarding-gate misbehaviour. ### Slice 5 — Handler integration tests for the 5 most-touched routes **Branch:** `mai//test-strategy-slice-5-handlers` **Deliverable:** - `internal/handlers/auth_test.go` extended with `TestLogin_HappyPath` + `TestLogout_ClearsCookie` (real DB). - `internal/handlers/projects_test.go` — `TestProjectsCreate` (POST 200, row inserted, audit emitted), `TestProjectsGetByID_RespectsVisibility` (404 for non-member). - `internal/handlers/deadlines_test.go` — `TestDeadlinesCreate_TriggersApproval` (verifies pending pill). - `internal/handlers/appointments_test.go` — same shape. - `internal/handlers/paliadin_test.go` — `TestPaliadinPage_404ForNonOwner`, `TestPaliadinPage_200ForOwner`. - Shared `internal/handlers/testauth/testauth.go` — mints a session cookie for `userID` so handler tests don't reinvent auth seeding. **Catches:** Handler ↔ service wiring drift, visibility-predicate handler-side bugs (t-paliad-036 bug 2 was exactly this), owner-gate bypass. ### Slice 6 — Frontend L3 (DOM) cascade tests **Branch:** `mai//test-strategy-slice-6-frontend-dom` **Deliverable:** - `bun add -d happy-dom`. - DOM-driven tests for the three most-touched cascades: - `client/fristenrechner.test.ts` (cascade activate → row appears → date-set fires fetch). - `client/shape-timeline.test.ts` (lane render, track render, projected-row click). - `client/filter-bar/index.test.ts` (chip click toggles state, URL params update). **Catches:** The whole class of "the function exists and is unit-tested but the cascade in the browser doesn't fire it" bugs. This is the layer that catches t-paliad-098 / 099 / 102 / 103. ### Slice 7 — CI wiring (deferred — Q2 dependent) **Branch:** `mai//test-strategy-slice-7-ci` (gated on m's Q2 pick) **Deliverable:** - `.gitea/workflows/test.yml` (or stay click-deploy if m picks that). - Gate tier runs on every PR; full suite runs on merge to main. - Postgres service container provides `TEST_DATABASE_URL`. - Slack/Gotify ping on red main. **Catches:** Drift between "tests pass on my laptop" and prod reality. ### Slice 8 — Coverage reporting + dashboard (lowest priority) **Branch:** `mai//test-strategy-slice-8-coverage` **Deliverable:** - `go test -coverprofile=` aggregated into a single `coverage.html`. - Bun's coverage output similarly. - A `docs/coverage.md` index updated by CI. - **Not a merge gate.** Visibility only. **Catches:** Slow drift; nice-to-have once the floor is in. ### Slice order rationale 1, 4, 5 are the highest outage-prevention per LoC: migration dry-run kills crash-loops, E2E kills regressions, handler tests kill wiring drift. 2, 3, 6 widen the floor; 7-8 are infrastructure. --- ## 6. Open questions for m These need m's call before any coder shift starts (or before specific slices start, where noted). ### Q1 — Per-PR test-run budget How long is acceptable to wait on the gate tier before merge? - 30 s — only L0 + L1 (no L2+ on the gate). - **60–90 s (recommended)** — L0 + L1 + L2 + L5 happy-path + L4 critical. - 2 min — add L3 + L4 full. - 4+ min — add L6 (E2E on gate). The pick determines whether E2E gates merge or only deploy. ### Q2 — CI infrastructure - **Gitea Actions** (self-hosted, gate tier only, recommended) — minimal new infra; aligns with m's existing stack. - **Stay click-deploy** — workers run tests locally; merge discipline enforced by convention. Today's reality; we keep it. - **Both:** start with click-deploy, add Gitea Actions in Slice 7 once gate tier proves stable. ### Q3 — Live-DB vs ephemeral docker Postgres for tests - **Per-developer YouPC DB (current pattern)** — closest to prod; existing tests work unchanged. - **Ephemeral docker postgres in CI, YouPC for devs (recommended hybrid)** — keeps local-dev simple, gives CI deterministic isolation. - **YouPC everywhere** — simplest, but parallel CI runs would contend. ### Q4 — Coverage targets — % or critical-path? - **Critical-path only (recommended)** — §4's 10 flows + every state-mutating service method has a test. No % gate. - **% gate** — set a floor (e.g. 60 % lines, 50 % branches) and refuse merges below it. - **Both** — critical-path is mandatory, % is informational. m's prior preference (memory pattern: "tests that catch real bugs > coverage theatre") points at critical-path-only. Confirming. ### Q5 — Which slices land before paliad is "production-grade"? paliad is already live at `paliad.de` and being used by HLC colleagues. "Production-grade" here means "next time someone ships, we don't go down." Picks: - **Slices 1 + 4 + 5 are the production-grade floor (recommended).** Migration dry-run + golden-path E2E + handler integration tests cover the failure modes that hit prod since the rebrand. - Add Slice 2 + 3 + 6 as widening passes, on their own cadence. - Slice 7-8 are nice-to-haves. Confirming the floor pick — and whether m wants all three to land before any new feature work, or whether they roll out alongside. ### Q6 — Who owns each slice? Recommendation: rotate coder slots so the same person isn't on every slice. Suggested assignment (head can override): | Slice | Profile fit | |---|---| | 1 — migrations | Backend-heavy coder (knuth, gauss, cronus). | | 2 — service infill | Backend-heavy coder; whoever owns approval/projection. | | 3 — frontend unit | Frontend-heavy coder. | | 4 — Playwright E2E | Cross-stack coder; ideally one familiar with `/mai-tester`. | | 5 — handler integration | Backend coder. | | 6 — frontend DOM | Frontend coder (same person as 3 makes sense). | Inventor does **not** decide assignments; head + m do. --- ## 7. Out of scope (explicit) - **No rewrite of any existing test.** The 323 existing test functions stay. New tests use the new patterns; old tests are migrated only when their files are touched for unrelated reasons. - **No third-party framework where stdlib + bun:test suffice** (testify, vitest, etc. — see §3). - **No mocks at the service↔DB boundary.** This is the lock-in. Mocks lie; the live-DB tests we already have are paliad's most useful safety net. - **No new feature work in this strategy.** The doc proposes infra; feature scope is unchanged. - **No retirement of the `tests/smoke-*.md` human-written reports.** Those are great for one-shot regression hunts; they coexist with the automated suite. --- ## 8. Implementation notes for the eventual coder (For whichever coder picks up a slice. Not exhaustive.) - **Test-name collisions in Go's flat package namespace bite when a service grows N implementations.** Memory note from `t-paliad-194` already records this. Prefix tests with the service name (e.g. `TestAichatPaliadin_RunTurn_…` not `TestRunTurn_…`). - **`httptest.NewRequest` does not URL-encode** — use `url.QueryEscape` for any `?q=…` argument. Memory note from `t-paliad-026`. - **sqlx v1.4.0 `Named` parser strips one colon from `::uuid[]`** — known pitfall, repro lives at `internal/services/project_service.go`. Use `CAST(... AS uuid[])` in new query strings. - **Live-DB cleanup must DELETE FKs first.** Order matters (auth.users last). Look at `audit_service_test.go` for the chain pattern. - **`paliad.paliad_schema_migrations` tracker collision** is documented but unresolved. Slice 1 should add a `make reset-test-db` target that drops both `public.paliad_schema_migrations` *and* `paliad.paliad_schema_migrations` to keep developers unblocked. - **`bun:test` matchers are Jest-compatible** — `expect().toEqual()`, `expect().toHaveBeenCalled()`, etc. No deps needed. - **happy-dom does not implement** every DOM method (notably some `` semantics). If a cascade test fails on something missing, jsdom is the escape hatch. --- ## 9. Decision summary — pick list for m | # | Question | Inventor recommends | |---|---|---| | Q1 | Per-PR budget | 60–90 s gate, 3–4 min full | | Q2 | CI infra | Gitea Actions, gate tier only | | Q3 | Test DB | YouPC for devs, ephemeral docker for CI | | Q4 | Coverage target | Critical-path only, no % gate | | Q5 | Production-grade floor | Slices 1 + 4 + 5 before new feature work | | Q6 | Slice ownership | Rotate per profile; head decides | If m's calls match inventor's, the implementer's brief writes itself: Slice 1 first, then 4 + 5 in parallel, then 2/3/6 as widening passes. --- **Status:** DESIGN READY FOR REVIEW. Awaiting m go/no-go on §5 slice plan + §6 open questions before any coder shift starts. --- ## 10. m's decisions (2026-05-19, locked) Walked through §6 with m via the AskUserQuestion interview (per head's 2026-05-19 workflow rule: inventor questions are resolved before parking, not after). Six picks locked, all matching inventor's recommendation. | # | Question | m's answer | Effect on plan | |---|---|---|---| | Q1 | Per-PR test-run budget | **Inventor's call** (m deferred). Pick: **60–90 s gate, 3–4 min full.** | Gate tier = L0 + L1 + L2 + L5 happy-path + L4 critical. L6 E2E gates deploy, not merge. | | Q2 | CI infrastructure | **Gitea Actions, gate tier only.** | Slice 7 adds `.gitea/workflows/test.yml` running the gate tier; full suite stays on merge-to-main. | | Q3 | Test DB topology | **YouPC for devs + ephemeral docker for CI.** | Local dev unchanged. Slice 7 wires Postgres service container in Gitea Actions. | | Q4 | Coverage target | **Critical-path only, no % gate.** | §4's 10 flows + every state-mutating service method gets a test. Coverage % output is informational in Slice 8, never a merge gate. | | Q5 | Production-grade floor | **Slices 1 + 4 + 5 before new feature work.** | These three land before any new paliad feature gets a coder shift. Slices 2, 3, 6 widen the floor on their own cadence. Slices 7-8 are nice-to-haves. | | Q6 | Slice ownership | **Head decides + rotate per profile.** | Backend slices (1, 2, 5) → backend-heavy coder. Frontend slices (3, 6) → frontend-heavy coder. E2E (4) → cross-stack. Head picks at dispatch time. | **Implementer brief (post-m-decisions):** 1. **Slice 1 starts first** — migration dry-run harness + `make verify-migrations` + boot-smoke variant of `cmd/server/main_paliadin_backend_test.go`. Backend-heavy coder. 2. **Slice 4 + Slice 5 in parallel** once Slice 1 is merged — Playwright golden-path (cross-stack coder, 5 specs) and handler integration (backend coder, auth/projects/deadlines/appointments/paliadin). 3. Slice 7 (Gitea Actions wiring) follows once Slice 1 gate tier is proven locally. 4. Slices 2, 3, 6 enter rotation alongside feature work — not blocking. 5. Slice 8 (coverage reporting) lowest priority. **Status:** DESIGN APPROVED — awaiting head's dispatch of Slice 1 coder shift.