Files
paliad/docs/design-paliad-test-strategy-2026-05-19.md
mAi 621fe35d79 docs(test-strategy): fold m's §10 decisions addendum
m's 2026-05-19 picks via AskUserQuestion interview:
- Q1 budget: 60–90s gate, 3–4min full (inventor's call — m deferred)
- Q2 CI: Gitea Actions, gate tier only
- Q3 test DB: YouPC for devs + ephemeral docker for CI
- Q4 coverage: critical-path only, no % gate
- Q5 floor: Slices 1+4+5 before new feature work
- Q6 ownership: head decides + rotate per profile

All six matched inventor's recommendation. Slice 1 (migration
dry-run + boot smoke) starts first; Slices 4+5 in parallel after.
2026-05-19 10:30:25 +02:00

42 KiB
Raw Permalink Blame History

Design — Paliad Test Strategy (production-grade)

Author: mendel (inventor) Date: 2026-05-19 Task: t-paliad-213 Branch: mai/mendel/inventor-test-strategy Status: DESIGN READY FOR REVIEW. No test files / Make targets / CI configs touched. Awaiting m go/no-go on §5 slice plan + §6 open questions before any coder shift.


0. TL;DR

Paliad has accidental test discipline today: 59 _test.go files / 323 test functions in Go (≈45 % of services tested, ≈12 % of handlers tested) and 4 frontend test files for 90+ client modules (≈4 %). There is no committed end-to-end suite and no CI — every smoke pass is human-driven via the manual reports in tests/. The mig 098 prod crash-loop, the t-paliad-036 triple-bug after the German→English rename, and a long tail of UX regressions (deadline-done modal, calendar column drift) would all have been caught by a 10-test boot-and-click smoke pass.

This design proposes a six-layer test pyramid with a concrete tool per layer (stdlib testing + bun's built-in bun:test + playwright for E2E — nothing third-party we don't already use). It pins three lessons paliad has paid for in commits:

  1. No mocks at the service↔DB boundary. Live-DB tests against a per-developer Postgres are the floor; in-memory mocks for paliad.* would have hidden every rename-after-DROP-CASCADE bug. Project preference is already in this direction (27/44 service tests are live-DB-gated); we double down rather than reverse.
  2. Migrations must dry-run before they merge. Every recent prod-down (mig 098, mig 020-after-rename, mig 099 audit_reason gap) was a migration that compiled, passed go test ./... (which skips without TEST_DATABASE_URL), and broke on first apply against the real schema. A make verify-migrations target that does BEGIN/apply/ROLLBACK in CI fixes the entire failure mode.
  3. Browser-shaped bugs need a browser. The fristenrechner cascade, shape-timeline render, calendar grid, inline paliadin widget — these are JS state machines. Bun's stdlib bun:test covers the pure parser/codec code; Playwright covers the auth-gated DOM. Don't try to substitute one for the other.

Six slices roll the strategy out as tracer-bullet PRs, each independently shippable. Slice 1 (migration dry-run harness) and Slice 4 (Playwright golden-path smoke) buy the most outage-prevention per LoC; the rest is widening proven patterns.

Six open questions for m at §6. Most surface a coverage-vs-cost trade-off — the picks that need m's call before any code lands are CI infrastructure choice (Q2), per-PR run-time budget (Q1), and live-DB-vs-dockerised Postgres (Q3).


1. Audit — what exists today

Counts taken on mai/mendel/inventor-test-strategy @ HEAD (2026-05-19, 100 migrations applied).

1.1 Go test inventory

Package Source files Test files Test functions Notes
internal/services 56 44 ~200 26 live-DB-gated (TEST_DATABASE_URL), 18 pure-Go. 24 services have no test file at all — see §1.4.
internal/handlers 59 7 ~30 Only auth-domain check, search, audit-parse, approval-error-mapping, redirects, verfahrensablauf-redirect, chart-404 covered. 53 handlers have no test file.
internal/auth small 2 ~10 Session middleware + require-admin.
internal/branding small 1 small Firm-name override.
internal/offices small 1 small Office enum.
internal/changelog small 1 small Pure parser.
internal/calc small 1 small Fees / fee tables.
cmd/server 1 1 small main_paliadin_backend_test.go covers env-gate selection.
Total 133 58 323

go test ./... runs all 58 files. Without TEST_DATABASE_URL set, 27 of them silently skip their live-DB cases — the suite still passes, but coverage of mutation paths drops to near zero.

1.2 Frontend test inventory

Path Test files Tested
frontend/src/client/filter-bar/url-codec.test.ts 1 FilterBar URL codec round-trip.
frontend/src/client/views/format.test.ts 1 Date/time formatters (regression for t-paliad-153).
frontend/src/client/views/shape-timeline-chart.test.ts 1 Chart layout pure function.
frontend/src/client/views/shape-timeline-cv.test.ts 1 Continuous-view shape layout.
Total 4 Out of ~90 client modules (frontend/src/client/*.ts).

All four use bun's built-in bun:test (no extra dep). No DOM/jsdom tests. No Playwright. No bun test script in package.json (bun run build is the only script).

1.3 End-to-end / smoke

  • tests/smoke-2026-04-25.md, tests/smoke-auth-2026-04-25.md, tests/smoke-auth-2026-04-26-cleanup.md — human-written reports with screenshots committed under tests/screenshots-*. No code. No re-runnable script.
  • mai-tester skill uses Playwright for ad-hoc runs; nothing committed.
  • No e2e/, no .gitea/workflows/, no .github/workflows/, no Makefile.

1.4 Critical service paths with no test file

These are internal/services/*.go for which no *_test.go sibling exists:

Service Risk class Why it matters
caldav_service.go, caldav_client.go, caldav_crypto.go, caldav_ical.go High Per-user push/pull goroutines + AES-GCM at rest. One pure parser test (caldav_ical_timeline_test.go) exists but the service + crypto + WebDAV client are blind.
agenda_service.go High Dashboard agenda query; reused by /agenda page. Exercised transitively by visibility tests but no direct test.
dashboard_service.go High Traffic-light + summary counts. Same story — transitively covered via visibility, no direct test.
derivation_service.go Medium Project-tree derivation (the new t-paliad-194-era subtree machinery).
team_service.go Medium Team membership / inheritance.
partner_unit_service.go Medium Dezernat replacement (t-paliad-070).
party_service.go, note_service.go, link_service.go, checklist_instance_service.go Medium All do project-scoped CRUD with the same RLS+audit pattern that t-paliad-036 proved easy to break.
appointment_service.go High Hot — every calendar mutation. Exercised through approval tests but has no own test file.
view_service.go Medium Powers the substrate (/views/*).
paliadin_jwt.go Medium Per-turn JWT mint for the aichat path (t-paliad-194). No call sites in tests today.
markdown.go Low Glossary + checklist content render.

1.5 Handlers with no test file

53 of 59. Notably: auth.go itself (login / logout / session creation), projects.go (the most-mutated entity), deadlines.go / appointments.go (writes), paliadin.go / paliadin_suggest.go (m-only routes — never click-tested), fristenrechner.go / fristenrechner_search.go / fristenrechner_event_categories.go (the cascade users live in), dashboard.go / agenda.go (landing), onboarding.go / onboarding_gate.go (every new user's first three minutes), invite.go (rate-limited write path). The currently-tested handlers (search, audit-parse, approval error mapping, etc.) are the cheap pure-Go ones; every handler that touches the DB is untested at handler level.

1.6 Live-DB test scaffold — is it sound?

The pattern (read from internal/services/visibility_test.go):

url := os.Getenv("TEST_DATABASE_URL")
if url == "" { t.Skip("TEST_DATABASE_URL not set — skipping live DB test") }
if err := db.ApplyMigrations(url); err != nil { t.Fatalf(...) }
pool, _ := sqlx.Connect("postgres", url)
defer pool.Close()
// per-test seed + cleanup via DELETE + defer cleanup()

Verdict: sound, but has rough edges that need addressing before we widen.

  • Migrations apply at test startup against the test DB — catches every "you forgot to add a CHECK" / "you reference a column that doesn't exist" before a real-DB-touching test runs.
  • Per-test cleanup via DELETE FROM ... WHERE id IN ($1,...) is explicit and idempotent.
  • The paliad.paliad_schema_migrations tracker collision noted in memory 0b900afa… is a pre-existing issue, not introduced by this design.
  • ⚠️ Cleanup-via-DELETE is fragile: a test that creates a row referenced by FK from another table needs to remember to clean both. A few existing tests (see audit_service_test.go) already chain 5+ DELETEs.
  • ⚠️ Tests can't run in parallel against the same TEST_DATABASE_URL because they share schema state. go test ./... defaults to -parallel per-package; same-package tests with overlapping cleanup IDs can interfere.
  • ⚠️ No CI today actually exercises TEST_DATABASE_URL — so every live-DB test is effectively run only on the author's laptop or not at all. Half the value is paid-for but unbilled.

1.7 Migration tooling

  • internal/db/migrate.go embeds migrations/*.sql and applies on server boot via golang-migrate/v4 with the paliad_schema_migrations tracker in public schema.
  • 100 migrations on disk (001100).
  • No dry-run gate today. A bad migration breaks paliad.de at boot (Dokploy crash-loops the container). Recent prod incidents: mig 098 (submission code rename), mig 099 (with_po flag drop missed audit_reason gap), mig 020 (function rename without body rewrite — see memory 49a05cfa…).
  • down.sql exists for every migration but no test ever exercises it.

1.8 CI / deploy loop

  • No CI. Push-to-main → Gitea webhook → Dokploy auto-builds the Dockerfile and replaces the container. The Dockerfile runs bun run build then go build. Neither go test nor bun test runs in the build pipeline.
  • Pre-commit hooks: none in repo. Each worker runs go build / go vet / go test / bun run build by convention (see memories — every shipped task report ends with "build hygiene held").

                           ┌─────────────────┐
                           │  E2E (Playwright)│  ~10 flows
                           │  L6              │
                           └─────────────────┘
                       ┌─────────────────────────┐
                       │  Handler integration    │  ~30 routes
                       │  L5 (httptest + real DB)│
                       └─────────────────────────┘
                  ┌──────────────────────────────────┐
                  │  Service-layer (live DB)         │  ~60 tests
                  │  L4 (BEGIN/ROLLBACK harness)     │
                  └──────────────────────────────────┘
              ┌──────────────────────────────────────────┐
              │  Frontend DOM / cascade (bun:test+jsdom) │  ~15 modules
              │  L3                                      │
              └──────────────────────────────────────────┘
        ┌──────────────────────────────────────────────────────┐
        │  Frontend unit (bun:test pure TS)                    │  ~30 modules
        │  L2                                                   │
        └──────────────────────────────────────────────────────┘
   ┌──────────────────────────────────────────────────────────────┐
   │  Go unit (stdlib testing, table-driven, pure functions)      │  ~150 tests
   │  L1                                                          │
   └──────────────────────────────────────────────────────────────┘
   ┌──────────────────────────────────────────────────────────────┐
   │  Migration dry-run (make verify-migrations)                  │  100 mig
   │  L0 — gate on every PR                                       │
   └──────────────────────────────────────────────────────────────┘

Layer 0 — Migration dry-run

What: Every *.up.sql in internal/db/migrations/ is applied inside a single BEGIN ... ROLLBACK transaction against a scratch Postgres, in numeric order. The harness asserts each statement succeeds and asserts no statement leaves the schema in a paliad_schema_migrations.dirty=true state. A second pass applies all up-migrations end-to-end (no rollback) and then re-applies the latest up-migration to assert idempotency (every paliad migration since t-paliad-070 has been written to be idempotent — this enforces it).

Tool: stdlib testing package, no third-party. Pattern: internal/db/migrate_test.go with a TestMigrations_DryRun driven from TEST_DATABASE_URL. A make verify-migrations target wraps it.

Why this layer matters most: Every recent prod-down was a migration. Catching them on a CI run before merge is the highest-leverage test investment paliad can make. Cost: one ~100-line Go file + one Postgres in CI.

Coverage target: 100 % of *.up.sql files. Hard gate on PR — no exceptions.

Layer 1 — Go unit (pure)

What: go test ./... against pure functions — formatters, parsers, validators, calculators, fee tables, deadline calculators, projection lookahead clamping, codec round-trips. No DB, no HTTP.

Tool: stdlib testing. Table-driven cases := []struct{...}{...} style is already the house pattern (see auth_test.go / projection_anchor_test.go). Do not introduce testify or any matcher library — the current code reads cleanly without one, and 323 existing test functions don't need a rename pass.

What's already there: 19 pure-Go test files (calculator, mapping, codec, holiday, fees, etc.). Density is good; targeted infill rather than re-architecture.

Coverage target: Every pure function in internal/services/, internal/handlers/, internal/calc/, internal/changelog/. Aim for "every branch in a decision table has at least one test row." Don't chase % — chase "the obvious edge that would burn a coworker".

Layer 2 — Frontend unit (pure)

What: bun test against pure TS modules — URL codecs (filter-bar/url-codec), formatters, parsers, i18n key correctness (every data-i18n attribute used in TSX has a key in i18n.ts), view-spec parsers, projection-row mapping helpers.

Tool: bun:test (built into bun, no install). Already in use in 4 files — extend the same pattern. Add bun test to package.json scripts.

What to add:

  • i18n key audit (every t("foo.bar") and data-i18n="foo.bar" resolves in both de and en).
  • filter-bar/ types + render helpers (paliad has shipped 4 FilterBar slices; coverage is one codec test).
  • paliadin-context.ts route table + entity extraction (the [ctx …] envelope is a stable contract paliadin's SKILL.md depends on; any drift here is a silent failure).
  • paliadin-starters.ts registry — every route maps to ≥1 starter; every starter is bilingual.
  • View-spec parsers in views/.

Coverage target: Every pure TS module in frontend/src/client/. Pages (TSX renderers) are E2E concern, not unit concern.

Layer 3 — Frontend DOM (cascade / jsdom)

What: bun test with jsdom global, exercising the interactive cascade modules — the fristenrechner cascade builder, the shape-timeline render, the FilterBar UI (chips, panels), the calendar grid, the inline Paliadin widget message stream, the inbox-row click handler, the dashboard activity item navigation.

These modules contain enough state that pure-function tests miss real bugs (e.g. the t-paliad-098 .entity-table row-cursor lie was a CSS+DOM bug; t-paliad-099's modal close was a DOM-event bug; t-paliad-103's ::before overlay click-swallow was a DOM bug).

Tool: bun + happy-dom is the lighter choice; if it can't handle event ordering, fall back to jsdom. Both are ESM-clean and bun-friendly. Pick one and stick with it — running both means twice the dependency surface. Default pick: happy-dom (smaller, paliad doesn't need legacy IE semantics).

Pattern: import the cascade module, build a minimal DOM (document.body.innerHTML = …), dispatch synthetic events, assert resulting state. Reuses the production renderers — no test-only fakes.

Coverage target: ~15 modules. Specifically:

  • client/filter-bar/index.ts chip render + active-state.
  • client/fristenrechner.ts cascade — most complex JS in the codebase; depend chains light up every UPC bug we know.
  • client/shape-timeline.ts lane mode + track mode (envelope wire shape brittle to refactor).
  • client/projects-detail.ts row click + Verlauf render.
  • client/paliadin-widget.ts + paliadin-context.ts interaction.
  • client/inbox.ts row-action click routing.
  • client/dashboard.ts activity-item nav.
  • client/deadlines-calendar.ts / appointments-calendar.ts column layout (the calendar-column-drift bug class).

Not unit tests; not E2E. They are the missing middle.

Layer 4 — Service-layer (live DB)

What: Go service methods against a real Postgres, using the existing TEST_DATABASE_URL pattern. Two improvements:

  1. Replace per-test DELETE cleanup with a per-test transaction harness — open a transaction, run the test inside it, ROLLBACK. Faster, isolating, no cleanup forgotten. Already viable because the service layer accepts *sqlx.DB-or-tx-shaped interfaces in many places; needs a small internal/services/internal/testdb package that exposes WithTx(t *testing.T, fn func(*sqlx.Tx)). Migration is mechanical, can happen alongside infill.

    Caveat: some service methods open their own transactions internally (approval_service.submit is one). Those keep DELETE cleanup; the tx harness is a default, not a mandate.

  2. Make TEST_DATABASE_URL mandatory in CI. Today these tests are skipped on every machine that doesn't export TEST_DATABASE_URL=… — i.e. they don't run on autoatic pipelines because there's no pipeline. Once CI exists (§3.5), it becomes a required env var.

Tool: stdlib testing + sqlx (already in go.mod). No mocks at the service↔DB boundary. This is m's hardest line — see global CLAUDE.md memory pattern and t-paliad-036 (the bug that masked two other bugs would have been caught instantly by a real-DB test).

Where to invest first: Approval (already heavy), Projection (already heavy), Fristenrechner (already heavy), DeadlineService Create/Update/Complete/Delete with pending_request_id interplay, AppointmentService same, ProjectService visibility predicate, CalDAV push (the four CalDAV *.go files have zero direct test).

Coverage target: Every service method that mutates the DB has at least one happy-path live-DB test. RLS predicate (visibilityPredicatePositional) has one test per role (global_admin, member, non-member).

Layer 5 — Handler integration (httptest + real DB)

What: Spin a real services.DBService, mount the protected mux, drive httptest.NewRequest + ServeHTTP against it. Auth via a fake session cookie produced by a testauth.Login(t, userID) helper that mints the same Supabase JWT shape auth.UserIDFromContext expects.

Why: The 53 untested handlers are where the request shape ↔ service interaction lives. Examples that would have caught real bugs:

  • t-paliad-036's "/projects/{id} 404 while /api/projects/{id} 200" mismatch — a 5-line handler test would have failed before the migration ran.
  • mig 020's three-stacked bug — a handler test that POSTs a deadline and asserts a 200 + read-back row would have failed at submit-time, not boot-time.
  • The audit-log query timezone bug — handler test asserts the JSON contains the expected event_date.

Tool: stdlib net/http/httptest. No new framework. Pattern: handler tests live next to the handler file (internal/handlers/deadlines_test.go next to deadlines.go).

Coverage target: Every handler that gates a state-changing route — POST/PATCH/DELETE flavour. Plus GET handlers that compose a non-trivial query (dashboard, agenda, search, audit-log).

Layer 6 — End-to-end (Playwright)

What: A small Playwright suite (~10 flows) committed at e2e/ with a bun run e2e entry. Targets a local ./paliad against a scratch Postgres (the same TEST_DATABASE_URL). Each test logs in, drives the UI through one user journey, asserts visible state.

Why ~10 not 100: Per-PR budget caps at ~2 min total (§6 Q1). Playwright tests are the most expensive minute-per-confidence in this stack; they pay for themselves on the golden path and nothing else. The deep-coverage layer is L5; E2E is "is the app still alive end to end?".

Tool: playwright (npm; bun installs cleanly). No third-party test runner — Playwright ships its own. Tests live in e2e/*.spec.ts. Not bun:test. Playwright's runner is purpose-built for browser-driving and integrates with their tracing — don't fight it.

Cap: 10 flows. If a new test wants in, an existing one must drop out (or we have a real reason to widen). This is the cheapest discipline available: it forces the suite to remain a smoke pass, not a regression-test dumping ground.

Coverage target: See §4.


3. Tooling — concrete picks per layer

Layer Tool Already in deps? Install?
L0 — migration dry-run stdlib testing + migrate/v4 yes no
L1 — Go unit stdlib testing yes no
L2 — Frontend unit bun:test yes (built into bun) no
L3 — Frontend DOM bun:test + happy-dom bun yes, happy-dom new bun add -d happy-dom (one dep, ~200 KB)
L4 — Service live-DB stdlib + sqlx yes no
L5 — Handler integration stdlib net/http/httptest + sqlx yes no
L6 — E2E @playwright/test new bun add -d @playwright/test + npx playwright install chromium

Net new deps: 2 (happy-dom + playwright). Both are mainstream, both have small surface area, both align with bun's ecosystem.

Explicit rejects:

  • testify — current tests read cleanly with stdlib; adding it forces a rename pass nobody wants.
  • vitest — bun's built-in test runner is faster and the tests are already in bun:test shape.
  • dockertest / testcontainers-go — m's preference is real-DB tests against the existing Postgres; spinning ephemeral Docker Postgres per package run adds latency and surface area for marginal isolation gain. See Q3.
  • sqlmock / gomock for DB — banned by §0 lesson 1.
  • cypress — Playwright is the better tool today, and the team's existing skill (/mai-tester) already uses it.

3.1 Per-PR run-time budget

Target (subject to m's call in Q1): ≤ 90 s for the gating tier (L0+L1+L2+L4 subset+L5 happy-path), ≤ 4 min for the full suite (add L3+L4 full+L6). The gating tier blocks merge; the full suite blocks deploy.

Indicative times (estimated, validate when slice 1 lands):

Tier Layers Est. time Blocks
Gate (every PR) L0 + L1 + L2 + L5 happy-path + L4 critical 6090 s merge
Full (every merge to main) + L4 full + L3 + L6 34 min deploy

3.2 CI — proposal, not commitment

paliad has no CI today. Two routes:

  • Gitea Actions (m's stack already runs mgit.msbls.de). Self-hosted; same auth model as the rest of mAi. Adds a .gitea/workflows/test.yml. Postgres comes from a service container.
  • Stay click-deploy. No CI. Workers run tests locally; Dokploy auto-deploys on green-main convention.

Recommendation: Gitea Actions for the gate tier only (L0 + L1 + L2), driven by a single short workflow. The L3-L6 expansion can be a follow-up once the gate tier proves stable. Deferred to Q2 for m's call.

3.3 Test DB — live YouPC vs ephemeral

The paliad schema lives on the shared YouPC Postgres (port 11833). Three options:

Option Pros Cons
Per-developer separate DB on YouPC (TEST_DATABASE_URL per laptop) Closest to prod; existing pattern. Cleanup discipline matters; cross-developer contention possible.
Ephemeral docker postgres per CI run Full isolation; parallel-safe; reset for free. New infra; ~5 s container startup per CI invocation.
Dedicated test DB on a paliad-only Postgres Isolated; cheap. New infra to maintain.

Recommendation: option 1 for developers (no-op change), option 2 for CI (Gitea Actions postgres service container). Deferred to Q3 for m's call.

3.4 Coverage targets

Don't gate on percentage. Gate on critical-path coverage (§4). Add go test -coverprofile= output to CI for visibility, not as a merge gate. Coverage % gating produces tests-for-tests'-sake; we want the tests that catch the bugs we've shipped.


4. Critical journeys — what MUST be covered

These are the golden-path flows. Anything not on this list is L1-L5 territory, not L6. The list is intentionally short; if it grows beyond 10, we are doing E2E wrong.

# Flow Why it's critical Layer mix
1 Login → dashboard renders → traffic-light counts match Every user does this every day; broken auth = paliad is offline. L6 (Playwright) + L5 handler (auth.go)
2 Create project (Client → Litigation → Patent → Case) Hierarchy with team inheritance — the data model's spine. L6 + L5 + L4 (project_service)
3 Submit deadline → routes to /inbox → approver approves → state flips The 4-eye flow (t-paliad-138). Most-mutated paliad surface. L6 + L5 (deadlines, approvals) + L4 (approval_service)
4 Fristenrechner: pick proceeding → cascade fires → result shows The platform's flagship interactive tool. JS cascade. L6 + L3 (fristenrechner cascade) + L4 (fristenrechner)
5 SmartTimeline: anchor a projected row → predecessor-missing-error handled Recent Slice-2 work (t-paliad-173 / #31). High-touch surface. L6 + L3 (shape-timeline) + L4 (projection_service)
6 CalDAV sync: PUT a Termin → external client sees it, edits there → pull reconciles Owned-event semantics + foreign-UID skip rule from Phase F. Untested today. L4 (caldav_service push/pull) — gated on Q3 (live YouPC vs ephemeral)
7 Paliadin chat: anon visit hits 404; m's session opens widget; turn renders Owner-gated /paliadin is the only m-only surface. Quiet failures here are silent. L6 (smoke) + L5 (paliadin_suggest) + L4 (paliadin / aichat_paliadin)
8 /admin/rules: filter → edit one rule → lifecycle transition → audit log row Rules drive the cascade; bad edits break every user's fristenrechner. L6 + L5 (admin_rules) + L4 (rule_editor_service)
9 Onboarding: new user with allowed email → onboarding form → first project membership The new-user funnel; gateOnboarded middleware traps. L6 + L5 (onboarding, invite)
10 Migration boot smoke: spin paliad against an empty DB → server binds 8080 Catches every mig-N crash-loop. L0 (migration dry-run) + L4 boot-smoke variant

Picks 1, 3, 4 and 10 are the highest-value-per-cost — they cover the routes most regressions land on (auth, mutation, cascade, boot).


5. Slice plan — tracer-bullet roll-out

Each slice is a shippable PR with a concrete deliverable, in order of expected outage-prevention payoff. Sized for a single coder shift unless flagged. No slice depends on a later one being merged. Hour estimates intentionally omitted (per global CLAUDE.md).

Slice 1 — Migration dry-run harness + boot smoke (highest leverage)

Branch: mai/<coder>/test-strategy-slice-1-migrations

Deliverable:

  • internal/db/migrate_test.goTestMigrations_DryRun (per-mig BEGIN/ROLLBACK), TestMigrations_EndToEnd (full apply, then re-apply latest to assert idempotency), TestMigrations_Down (apply N→0).
  • Makefile with make verify-migrations (the gate target), make test (run everything), make test-go, make test-frontend.
  • cmd/server/main_paliadin_backend_test.go already exists; extend with a TestMain_BindsHTTPAfterMigrate that boots the full server against TEST_DATABASE_URL, asserts :8080 is listening, then shuts down. Catches the mig-098-class crash-loop in a single test.
  • README section: how to set TEST_DATABASE_URL locally.

Catches: Every mig-98-class crash-loop; every drop-cascade-with-stale-policy-name regression (t-paliad-036).

Slice 2 — Service-layer infill: critical mutators

Branch: mai/<coder>/test-strategy-slice-2-services

Deliverable:

  • Test files for the three highest-impact untested services:
    • internal/services/agenda_service_test.go (live-DB, dashboard agenda query)
    • internal/services/dashboard_service_test.go (traffic-light counts)
    • internal/services/team_service_test.go (membership + inheritance — RLS-load-bearing)
  • Tighten existing approval_service_test.go + deadline_service_test.go coverage of the create/update/complete/delete × pending-request matrix where there are demonstrable gaps.
  • Add internal/services/internal/testdb/withtx.go — the per-test tx harness (optional adoption; existing tests stay).

Catches: RLS regressions, approval interplay regressions, dashboard count drift after schema renames.

Slice 3 — Frontend bun:test setup + L2 infill

Branch: mai/<coder>/test-strategy-slice-3-frontend-unit

Deliverable:

  • frontend/package.json scripts.test = "bun test".
  • New tests under frontend/src/client/:
    • paliadin-context.test.ts (route table, entity extraction, selection truncation).
    • paliadin-starters.test.ts (every route ≥1 starter, every starter bilingual).
    • filter-bar/index.test.ts (chip render + active state — pure DOM-less helpers).
    • i18n key audit: frontend/scripts/i18n-audit.test.ts parses every data-i18n="…" from dist/ HTML and every t("…") call from src/, asserts both de and en resolve. Runs as part of bun test.
  • make test-frontend wires cd frontend && bun test.

Catches: i18n drift (untranslated key shipped to user), context-envelope contract drift (paliadin SKILL.md depends on it), starter-registry regressions.

Slice 4 — Playwright golden-path smoke

Branch: mai/<coder>/test-strategy-slice-4-e2e

Deliverable:

  • e2e/ directory at repo root.
  • playwright.config.ts pointing at http://localhost:8080 (paliad started by the test, not assumed).
  • Five Playwright *.spec.ts files covering critical journeys 1, 3, 4, 7, 9 from §4.
  • make e2e target that:
    1. starts paliad against TEST_DATABASE_URL,
    2. waits for :8080 to be live,
    3. runs npx playwright test,
    4. tears the server down.
  • bun add -d @playwright/test + npx playwright install chromium.

Catches: Auth regressions, deadline-mutation regressions, fristenrechner cascade regressions, owner-gated /paliadin leaks, onboarding-gate misbehaviour.

Slice 5 — Handler integration tests for the 5 most-touched routes

Branch: mai/<coder>/test-strategy-slice-5-handlers

Deliverable:

  • internal/handlers/auth_test.go extended with TestLogin_HappyPath + TestLogout_ClearsCookie (real DB).
  • internal/handlers/projects_test.goTestProjectsCreate (POST 200, row inserted, audit emitted), TestProjectsGetByID_RespectsVisibility (404 for non-member).
  • internal/handlers/deadlines_test.goTestDeadlinesCreate_TriggersApproval (verifies pending pill).
  • internal/handlers/appointments_test.go — same shape.
  • internal/handlers/paliadin_test.goTestPaliadinPage_404ForNonOwner, TestPaliadinPage_200ForOwner.
  • Shared internal/handlers/testauth/testauth.go — mints a session cookie for userID so handler tests don't reinvent auth seeding.

Catches: Handler ↔ service wiring drift, visibility-predicate handler-side bugs (t-paliad-036 bug 2 was exactly this), owner-gate bypass.

Slice 6 — Frontend L3 (DOM) cascade tests

Branch: mai/<coder>/test-strategy-slice-6-frontend-dom

Deliverable:

  • bun add -d happy-dom.
  • DOM-driven tests for the three most-touched cascades:
    • client/fristenrechner.test.ts (cascade activate → row appears → date-set fires fetch).
    • client/shape-timeline.test.ts (lane render, track render, projected-row click).
    • client/filter-bar/index.test.ts (chip click toggles state, URL params update).

Catches: The whole class of "the function exists and is unit-tested but the cascade in the browser doesn't fire it" bugs. This is the layer that catches t-paliad-098 / 099 / 102 / 103.

Slice 7 — CI wiring (deferred — Q2 dependent)

Branch: mai/<coder>/test-strategy-slice-7-ci (gated on m's Q2 pick)

Deliverable:

  • .gitea/workflows/test.yml (or stay click-deploy if m picks that).
  • Gate tier runs on every PR; full suite runs on merge to main.
  • Postgres service container provides TEST_DATABASE_URL.
  • Slack/Gotify ping on red main.

Catches: Drift between "tests pass on my laptop" and prod reality.

Slice 8 — Coverage reporting + dashboard (lowest priority)

Branch: mai/<coder>/test-strategy-slice-8-coverage

Deliverable:

  • go test -coverprofile= aggregated into a single coverage.html.
  • Bun's coverage output similarly.
  • A docs/coverage.md index updated by CI.
  • Not a merge gate. Visibility only.

Catches: Slow drift; nice-to-have once the floor is in.

Slice order rationale

1, 4, 5 are the highest outage-prevention per LoC: migration dry-run kills crash-loops, E2E kills regressions, handler tests kill wiring drift. 2, 3, 6 widen the floor; 7-8 are infrastructure.


6. Open questions for m

These need m's call before any coder shift starts (or before specific slices start, where noted).

Q1 — Per-PR test-run budget

How long is acceptable to wait on the gate tier before merge?

  • 30 s — only L0 + L1 (no L2+ on the gate).
  • 6090 s (recommended) — L0 + L1 + L2 + L5 happy-path + L4 critical.
  • 2 min — add L3 + L4 full.
  • 4+ min — add L6 (E2E on gate).

The pick determines whether E2E gates merge or only deploy.

Q2 — CI infrastructure

  • Gitea Actions (self-hosted, gate tier only, recommended) — minimal new infra; aligns with m's existing stack.
  • Stay click-deploy — workers run tests locally; merge discipline enforced by convention. Today's reality; we keep it.
  • Both: start with click-deploy, add Gitea Actions in Slice 7 once gate tier proves stable.

Q3 — Live-DB vs ephemeral docker Postgres for tests

  • Per-developer YouPC DB (current pattern) — closest to prod; existing tests work unchanged.
  • Ephemeral docker postgres in CI, YouPC for devs (recommended hybrid) — keeps local-dev simple, gives CI deterministic isolation.
  • YouPC everywhere — simplest, but parallel CI runs would contend.

Q4 — Coverage targets — % or critical-path?

  • Critical-path only (recommended) — §4's 10 flows + every state-mutating service method has a test. No % gate.
  • % gate — set a floor (e.g. 60 % lines, 50 % branches) and refuse merges below it.
  • Both — critical-path is mandatory, % is informational.

m's prior preference (memory pattern: "tests that catch real bugs > coverage theatre") points at critical-path-only. Confirming.

Q5 — Which slices land before paliad is "production-grade"?

paliad is already live at paliad.de and being used by HLC colleagues. "Production-grade" here means "next time someone ships, we don't go down."

Picks:

  • Slices 1 + 4 + 5 are the production-grade floor (recommended). Migration dry-run + golden-path E2E + handler integration tests cover the failure modes that hit prod since the rebrand.
  • Add Slice 2 + 3 + 6 as widening passes, on their own cadence.
  • Slice 7-8 are nice-to-haves.

Confirming the floor pick — and whether m wants all three to land before any new feature work, or whether they roll out alongside.

Q6 — Who owns each slice?

Recommendation: rotate coder slots so the same person isn't on every slice. Suggested assignment (head can override):

Slice Profile fit
1 — migrations Backend-heavy coder (knuth, gauss, cronus).
2 — service infill Backend-heavy coder; whoever owns approval/projection.
3 — frontend unit Frontend-heavy coder.
4 — Playwright E2E Cross-stack coder; ideally one familiar with /mai-tester.
5 — handler integration Backend coder.
6 — frontend DOM Frontend coder (same person as 3 makes sense).

Inventor does not decide assignments; head + m do.


7. Out of scope (explicit)

  • No rewrite of any existing test. The 323 existing test functions stay. New tests use the new patterns; old tests are migrated only when their files are touched for unrelated reasons.
  • No third-party framework where stdlib + bun:test suffice (testify, vitest, etc. — see §3).
  • No mocks at the service↔DB boundary. This is the lock-in. Mocks lie; the live-DB tests we already have are paliad's most useful safety net.
  • No new feature work in this strategy. The doc proposes infra; feature scope is unchanged.
  • No retirement of the tests/smoke-*.md human-written reports. Those are great for one-shot regression hunts; they coexist with the automated suite.

8. Implementation notes for the eventual coder

(For whichever coder picks up a slice. Not exhaustive.)

  • Test-name collisions in Go's flat package namespace bite when a service grows N implementations. Memory note from t-paliad-194 already records this. Prefix tests with the service name (e.g. TestAichatPaliadin_RunTurn_… not TestRunTurn_…).
  • httptest.NewRequest does not URL-encode — use url.QueryEscape for any ?q=… argument. Memory note from t-paliad-026.
  • sqlx v1.4.0 Named parser strips one colon from ::uuid[] — known pitfall, repro lives at internal/services/project_service.go. Use CAST(... AS uuid[]) in new query strings.
  • Live-DB cleanup must DELETE FKs first. Order matters (auth.users last). Look at audit_service_test.go for the chain pattern.
  • paliad.paliad_schema_migrations tracker collision is documented but unresolved. Slice 1 should add a make reset-test-db target that drops both public.paliad_schema_migrations and paliad.paliad_schema_migrations to keep developers unblocked.
  • bun:test matchers are Jest-compatibleexpect().toEqual(), expect().toHaveBeenCalled(), etc. No deps needed.
  • happy-dom does not implement every DOM method (notably some <dialog> semantics). If a cascade test fails on something missing, jsdom is the escape hatch.

9. Decision summary — pick list for m

# Question Inventor recommends
Q1 Per-PR budget 6090 s gate, 34 min full
Q2 CI infra Gitea Actions, gate tier only
Q3 Test DB YouPC for devs, ephemeral docker for CI
Q4 Coverage target Critical-path only, no % gate
Q5 Production-grade floor Slices 1 + 4 + 5 before new feature work
Q6 Slice ownership Rotate per profile; head decides

If m's calls match inventor's, the implementer's brief writes itself: Slice 1 first, then 4 + 5 in parallel, then 2/3/6 as widening passes.


Status: DESIGN READY FOR REVIEW. Awaiting m go/no-go on §5 slice plan + §6 open questions before any coder shift starts.


10. m's decisions (2026-05-19, locked)

Walked through §6 with m via the AskUserQuestion interview (per head's 2026-05-19 workflow rule: inventor questions are resolved before parking, not after). Six picks locked, all matching inventor's recommendation.

# Question m's answer Effect on plan
Q1 Per-PR test-run budget Inventor's call (m deferred). Pick: 6090 s gate, 34 min full. Gate tier = L0 + L1 + L2 + L5 happy-path + L4 critical. L6 E2E gates deploy, not merge.
Q2 CI infrastructure Gitea Actions, gate tier only. Slice 7 adds .gitea/workflows/test.yml running the gate tier; full suite stays on merge-to-main.
Q3 Test DB topology YouPC for devs + ephemeral docker for CI. Local dev unchanged. Slice 7 wires Postgres service container in Gitea Actions.
Q4 Coverage target Critical-path only, no % gate. §4's 10 flows + every state-mutating service method gets a test. Coverage % output is informational in Slice 8, never a merge gate.
Q5 Production-grade floor Slices 1 + 4 + 5 before new feature work. These three land before any new paliad feature gets a coder shift. Slices 2, 3, 6 widen the floor on their own cadence. Slices 7-8 are nice-to-haves.
Q6 Slice ownership Head decides + rotate per profile. Backend slices (1, 2, 5) → backend-heavy coder. Frontend slices (3, 6) → frontend-heavy coder. E2E (4) → cross-stack. Head picks at dispatch time.

Implementer brief (post-m-decisions):

  1. Slice 1 starts first — migration dry-run harness + make verify-migrations + boot-smoke variant of cmd/server/main_paliadin_backend_test.go. Backend-heavy coder.
  2. Slice 4 + Slice 5 in parallel once Slice 1 is merged — Playwright golden-path (cross-stack coder, 5 specs) and handler integration (backend coder, auth/projects/deadlines/appointments/paliadin).
  3. Slice 7 (Gitea Actions wiring) follows once Slice 1 gate tier is proven locally.
  4. Slices 2, 3, 6 enter rotation alongside feature work — not blocking.
  5. Slice 8 (coverage reporting) lowest priority.

Status: DESIGN APPROVED — awaiting head's dispatch of Slice 1 coder shift.