m's 2026-05-19 picks via AskUserQuestion interview: - Q1 budget: 60–90s gate, 3–4min full (inventor's call — m deferred) - Q2 CI: Gitea Actions, gate tier only - Q3 test DB: YouPC for devs + ephemeral docker for CI - Q4 coverage: critical-path only, no % gate - Q5 floor: Slices 1+4+5 before new feature work - Q6 ownership: head decides + rotate per profile All six matched inventor's recommendation. Slice 1 (migration dry-run + boot smoke) starts first; Slices 4+5 in parallel after.
42 KiB
Design — Paliad Test Strategy (production-grade)
Author: mendel (inventor)
Date: 2026-05-19
Task: t-paliad-213
Branch: mai/mendel/inventor-test-strategy
Status: DESIGN READY FOR REVIEW. No test files / Make targets / CI configs touched. Awaiting m go/no-go on §5 slice plan + §6 open questions before any coder shift.
0. TL;DR
Paliad has accidental test discipline today: 59 _test.go files / 323 test functions in Go (≈45 % of services tested, ≈12 % of handlers tested) and 4 frontend test files for 90+ client modules (≈4 %). There is no committed end-to-end suite and no CI — every smoke pass is human-driven via the manual reports in tests/. The mig 098 prod crash-loop, the t-paliad-036 triple-bug after the German→English rename, and a long tail of UX regressions (deadline-done modal, calendar column drift) would all have been caught by a 10-test boot-and-click smoke pass.
This design proposes a six-layer test pyramid with a concrete tool per layer (stdlib testing + bun's built-in bun:test + playwright for E2E — nothing third-party we don't already use). It pins three lessons paliad has paid for in commits:
- No mocks at the service↔DB boundary. Live-DB tests against a per-developer Postgres are the floor; in-memory mocks for
paliad.*would have hidden every rename-after-DROP-CASCADE bug. Project preference is already in this direction (27/44 service tests are live-DB-gated); we double down rather than reverse. - Migrations must dry-run before they merge. Every recent prod-down (mig 098, mig 020-after-rename, mig 099 audit_reason gap) was a migration that compiled, passed
go test ./...(which skips withoutTEST_DATABASE_URL), and broke on first apply against the real schema. Amake verify-migrationstarget that does BEGIN/apply/ROLLBACK in CI fixes the entire failure mode. - Browser-shaped bugs need a browser. The fristenrechner cascade, shape-timeline render, calendar grid, inline paliadin widget — these are JS state machines. Bun's stdlib
bun:testcovers the pure parser/codec code; Playwright covers the auth-gated DOM. Don't try to substitute one for the other.
Six slices roll the strategy out as tracer-bullet PRs, each independently shippable. Slice 1 (migration dry-run harness) and Slice 4 (Playwright golden-path smoke) buy the most outage-prevention per LoC; the rest is widening proven patterns.
Six open questions for m at §6. Most surface a coverage-vs-cost trade-off — the picks that need m's call before any code lands are CI infrastructure choice (Q2), per-PR run-time budget (Q1), and live-DB-vs-dockerised Postgres (Q3).
1. Audit — what exists today
Counts taken on mai/mendel/inventor-test-strategy @ HEAD (2026-05-19, 100 migrations applied).
1.1 Go test inventory
| Package | Source files | Test files | Test functions | Notes |
|---|---|---|---|---|
internal/services |
56 | 44 | ~200 | 26 live-DB-gated (TEST_DATABASE_URL), 18 pure-Go. 24 services have no test file at all — see §1.4. |
internal/handlers |
59 | 7 | ~30 | Only auth-domain check, search, audit-parse, approval-error-mapping, redirects, verfahrensablauf-redirect, chart-404 covered. 53 handlers have no test file. |
internal/auth |
small | 2 | ~10 | Session middleware + require-admin. |
internal/branding |
small | 1 | small | Firm-name override. |
internal/offices |
small | 1 | small | Office enum. |
internal/changelog |
small | 1 | small | Pure parser. |
internal/calc |
small | 1 | small | Fees / fee tables. |
cmd/server |
1 | 1 | small | main_paliadin_backend_test.go covers env-gate selection. |
| Total | 133 | 58 | 323 |
go test ./... runs all 58 files. Without TEST_DATABASE_URL set, 27 of them silently skip their live-DB cases — the suite still passes, but coverage of mutation paths drops to near zero.
1.2 Frontend test inventory
| Path | Test files | Tested |
|---|---|---|
frontend/src/client/filter-bar/url-codec.test.ts |
1 | FilterBar URL codec round-trip. |
frontend/src/client/views/format.test.ts |
1 | Date/time formatters (regression for t-paliad-153). |
frontend/src/client/views/shape-timeline-chart.test.ts |
1 | Chart layout pure function. |
frontend/src/client/views/shape-timeline-cv.test.ts |
1 | Continuous-view shape layout. |
| Total | 4 | Out of ~90 client modules (frontend/src/client/*.ts). |
All four use bun's built-in bun:test (no extra dep). No DOM/jsdom tests. No Playwright. No bun test script in package.json (bun run build is the only script).
1.3 End-to-end / smoke
tests/smoke-2026-04-25.md,tests/smoke-auth-2026-04-25.md,tests/smoke-auth-2026-04-26-cleanup.md— human-written reports with screenshots committed undertests/screenshots-*. No code. No re-runnable script.mai-testerskill uses Playwright for ad-hoc runs; nothing committed.- No
e2e/, no.gitea/workflows/, no.github/workflows/, noMakefile.
1.4 Critical service paths with no test file
These are internal/services/*.go for which no *_test.go sibling exists:
| Service | Risk class | Why it matters |
|---|---|---|
caldav_service.go, caldav_client.go, caldav_crypto.go, caldav_ical.go |
High | Per-user push/pull goroutines + AES-GCM at rest. One pure parser test (caldav_ical_timeline_test.go) exists but the service + crypto + WebDAV client are blind. |
agenda_service.go |
High | Dashboard agenda query; reused by /agenda page. Exercised transitively by visibility tests but no direct test. |
dashboard_service.go |
High | Traffic-light + summary counts. Same story — transitively covered via visibility, no direct test. |
derivation_service.go |
Medium | Project-tree derivation (the new t-paliad-194-era subtree machinery). |
team_service.go |
Medium | Team membership / inheritance. |
partner_unit_service.go |
Medium | Dezernat replacement (t-paliad-070). |
party_service.go, note_service.go, link_service.go, checklist_instance_service.go |
Medium | All do project-scoped CRUD with the same RLS+audit pattern that t-paliad-036 proved easy to break. |
appointment_service.go |
High | Hot — every calendar mutation. Exercised through approval tests but has no own test file. |
view_service.go |
Medium | Powers the substrate (/views/*). |
paliadin_jwt.go |
Medium | Per-turn JWT mint for the aichat path (t-paliad-194). No call sites in tests today. |
markdown.go |
Low | Glossary + checklist content render. |
1.5 Handlers with no test file
53 of 59. Notably: auth.go itself (login / logout / session creation), projects.go (the most-mutated entity), deadlines.go / appointments.go (writes), paliadin.go / paliadin_suggest.go (m-only routes — never click-tested), fristenrechner.go / fristenrechner_search.go / fristenrechner_event_categories.go (the cascade users live in), dashboard.go / agenda.go (landing), onboarding.go / onboarding_gate.go (every new user's first three minutes), invite.go (rate-limited write path). The currently-tested handlers (search, audit-parse, approval error mapping, etc.) are the cheap pure-Go ones; every handler that touches the DB is untested at handler level.
1.6 Live-DB test scaffold — is it sound?
The pattern (read from internal/services/visibility_test.go):
url := os.Getenv("TEST_DATABASE_URL")
if url == "" { t.Skip("TEST_DATABASE_URL not set — skipping live DB test") }
if err := db.ApplyMigrations(url); err != nil { t.Fatalf(...) }
pool, _ := sqlx.Connect("postgres", url)
defer pool.Close()
// per-test seed + cleanup via DELETE + defer cleanup()
Verdict: sound, but has rough edges that need addressing before we widen.
- ✅ Migrations apply at test startup against the test DB — catches every "you forgot to add a CHECK" / "you reference a column that doesn't exist" before a real-DB-touching test runs.
- ✅ Per-test cleanup via
DELETE FROM ... WHERE id IN ($1,...)is explicit and idempotent. - ✅ The
paliad.paliad_schema_migrationstracker collision noted in memory0b900afa…is a pre-existing issue, not introduced by this design. - ⚠️ Cleanup-via-DELETE is fragile: a test that creates a row referenced by FK from another table needs to remember to clean both. A few existing tests (see
audit_service_test.go) already chain 5+ DELETEs. - ⚠️ Tests can't run in parallel against the same
TEST_DATABASE_URLbecause they share schema state.go test ./...defaults to-parallelper-package; same-package tests with overlapping cleanup IDs can interfere. - ⚠️ No CI today actually exercises
TEST_DATABASE_URL— so every live-DB test is effectively run only on the author's laptop or not at all. Half the value is paid-for but unbilled.
1.7 Migration tooling
internal/db/migrate.goembedsmigrations/*.sqland applies on server boot viagolang-migrate/v4with thepaliad_schema_migrationstracker inpublicschema.- 100 migrations on disk (
001→100). - No dry-run gate today. A bad migration breaks
paliad.deat boot (Dokploy crash-loops the container). Recent prod incidents: mig 098 (submission code rename), mig 099 (with_po flag drop missed audit_reason gap), mig 020 (function rename without body rewrite — see memory49a05cfa…). down.sqlexists for every migration but no test ever exercises it.
1.8 CI / deploy loop
- No CI. Push-to-main → Gitea webhook → Dokploy auto-builds the Dockerfile and replaces the container. The Dockerfile runs
bun run buildthengo build. Neithergo testnorbun testruns in the build pipeline. - Pre-commit hooks: none in repo. Each worker runs
go build / go vet / go test / bun run buildby convention (see memories — every shipped task report ends with "build hygiene held").
2. Test pyramid — recommended shape
┌─────────────────┐
│ E2E (Playwright)│ ~10 flows
│ L6 │
└─────────────────┘
┌─────────────────────────┐
│ Handler integration │ ~30 routes
│ L5 (httptest + real DB)│
└─────────────────────────┘
┌──────────────────────────────────┐
│ Service-layer (live DB) │ ~60 tests
│ L4 (BEGIN/ROLLBACK harness) │
└──────────────────────────────────┘
┌──────────────────────────────────────────┐
│ Frontend DOM / cascade (bun:test+jsdom) │ ~15 modules
│ L3 │
└──────────────────────────────────────────┘
┌──────────────────────────────────────────────────────┐
│ Frontend unit (bun:test pure TS) │ ~30 modules
│ L2 │
└──────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ Go unit (stdlib testing, table-driven, pure functions) │ ~150 tests
│ L1 │
└──────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ Migration dry-run (make verify-migrations) │ 100 mig
│ L0 — gate on every PR │
└──────────────────────────────────────────────────────────────┘
Layer 0 — Migration dry-run
What: Every *.up.sql in internal/db/migrations/ is applied inside a single BEGIN ... ROLLBACK transaction against a scratch Postgres, in numeric order. The harness asserts each statement succeeds and asserts no statement leaves the schema in a paliad_schema_migrations.dirty=true state. A second pass applies all up-migrations end-to-end (no rollback) and then re-applies the latest up-migration to assert idempotency (every paliad migration since t-paliad-070 has been written to be idempotent — this enforces it).
Tool: stdlib testing package, no third-party. Pattern: internal/db/migrate_test.go with a TestMigrations_DryRun driven from TEST_DATABASE_URL. A make verify-migrations target wraps it.
Why this layer matters most: Every recent prod-down was a migration. Catching them on a CI run before merge is the highest-leverage test investment paliad can make. Cost: one ~100-line Go file + one Postgres in CI.
Coverage target: 100 % of *.up.sql files. Hard gate on PR — no exceptions.
Layer 1 — Go unit (pure)
What: go test ./... against pure functions — formatters, parsers, validators, calculators, fee tables, deadline calculators, projection lookahead clamping, codec round-trips. No DB, no HTTP.
Tool: stdlib testing. Table-driven cases := []struct{...}{...} style is already the house pattern (see auth_test.go / projection_anchor_test.go). Do not introduce testify or any matcher library — the current code reads cleanly without one, and 323 existing test functions don't need a rename pass.
What's already there: 19 pure-Go test files (calculator, mapping, codec, holiday, fees, etc.). Density is good; targeted infill rather than re-architecture.
Coverage target: Every pure function in internal/services/, internal/handlers/, internal/calc/, internal/changelog/. Aim for "every branch in a decision table has at least one test row." Don't chase % — chase "the obvious edge that would burn a coworker".
Layer 2 — Frontend unit (pure)
What: bun test against pure TS modules — URL codecs (filter-bar/url-codec), formatters, parsers, i18n key correctness (every data-i18n attribute used in TSX has a key in i18n.ts), view-spec parsers, projection-row mapping helpers.
Tool: bun:test (built into bun, no install). Already in use in 4 files — extend the same pattern. Add bun test to package.json scripts.
What to add:
- i18n key audit (every
t("foo.bar")anddata-i18n="foo.bar"resolves in bothdeanden). filter-bar/types + render helpers (paliad has shipped 4 FilterBar slices; coverage is one codec test).paliadin-context.tsroute table + entity extraction (the[ctx …]envelope is a stable contract paliadin's SKILL.md depends on; any drift here is a silent failure).paliadin-starters.tsregistry — every route maps to ≥1 starter; every starter is bilingual.- View-spec parsers in
views/.
Coverage target: Every pure TS module in frontend/src/client/. Pages (TSX renderers) are E2E concern, not unit concern.
Layer 3 — Frontend DOM (cascade / jsdom)
What: bun test with jsdom global, exercising the interactive cascade modules — the fristenrechner cascade builder, the shape-timeline render, the FilterBar UI (chips, panels), the calendar grid, the inline Paliadin widget message stream, the inbox-row click handler, the dashboard activity item navigation.
These modules contain enough state that pure-function tests miss real bugs (e.g. the t-paliad-098 .entity-table row-cursor lie was a CSS+DOM bug; t-paliad-099's modal close was a DOM-event bug; t-paliad-103's ::before overlay click-swallow was a DOM bug).
Tool: bun + happy-dom is the lighter choice; if it can't handle event ordering, fall back to jsdom. Both are ESM-clean and bun-friendly. Pick one and stick with it — running both means twice the dependency surface. Default pick: happy-dom (smaller, paliad doesn't need legacy IE semantics).
Pattern: import the cascade module, build a minimal DOM (document.body.innerHTML = …), dispatch synthetic events, assert resulting state. Reuses the production renderers — no test-only fakes.
Coverage target: ~15 modules. Specifically:
client/filter-bar/index.tschip render + active-state.client/fristenrechner.tscascade — most complex JS in the codebase; depend chains light up every UPC bug we know.client/shape-timeline.tslane mode + track mode (envelope wire shape brittle to refactor).client/projects-detail.tsrow click + Verlauf render.client/paliadin-widget.ts+paliadin-context.tsinteraction.client/inbox.tsrow-action click routing.client/dashboard.tsactivity-item nav.client/deadlines-calendar.ts/appointments-calendar.tscolumn layout (the calendar-column-drift bug class).
Not unit tests; not E2E. They are the missing middle.
Layer 4 — Service-layer (live DB)
What: Go service methods against a real Postgres, using the existing TEST_DATABASE_URL pattern. Two improvements:
-
Replace per-test DELETE cleanup with a per-test transaction harness — open a transaction, run the test inside it, ROLLBACK. Faster, isolating, no cleanup forgotten. Already viable because the service layer accepts
*sqlx.DB-or-tx-shaped interfaces in many places; needs a smallinternal/services/internal/testdbpackage that exposesWithTx(t *testing.T, fn func(*sqlx.Tx)). Migration is mechanical, can happen alongside infill.Caveat: some service methods open their own transactions internally (
approval_service.submitis one). Those keep DELETE cleanup; the tx harness is a default, not a mandate. -
Make
TEST_DATABASE_URLmandatory in CI. Today these tests are skipped on every machine that doesn'texport TEST_DATABASE_URL=…— i.e. they don't run on autoatic pipelines because there's no pipeline. Once CI exists (§3.5), it becomes a required env var.
Tool: stdlib testing + sqlx (already in go.mod). No mocks at the service↔DB boundary. This is m's hardest line — see global CLAUDE.md memory pattern and t-paliad-036 (the bug that masked two other bugs would have been caught instantly by a real-DB test).
Where to invest first: Approval (already heavy), Projection (already heavy), Fristenrechner (already heavy), DeadlineService Create/Update/Complete/Delete with pending_request_id interplay, AppointmentService same, ProjectService visibility predicate, CalDAV push (the four CalDAV *.go files have zero direct test).
Coverage target: Every service method that mutates the DB has at least one happy-path live-DB test. RLS predicate (visibilityPredicatePositional) has one test per role (global_admin, member, non-member).
Layer 5 — Handler integration (httptest + real DB)
What: Spin a real services.DBService, mount the protected mux, drive httptest.NewRequest + ServeHTTP against it. Auth via a fake session cookie produced by a testauth.Login(t, userID) helper that mints the same Supabase JWT shape auth.UserIDFromContext expects.
Why: The 53 untested handlers are where the request shape ↔ service interaction lives. Examples that would have caught real bugs:
t-paliad-036's "/projects/{id}404 while/api/projects/{id}200" mismatch — a 5-line handler test would have failed before the migration ran.- mig 020's three-stacked bug — a handler test that POSTs a deadline and asserts a 200 + read-back row would have failed at submit-time, not boot-time.
- The audit-log query timezone bug — handler test asserts the JSON contains the expected
event_date.
Tool: stdlib net/http/httptest. No new framework. Pattern: handler tests live next to the handler file (internal/handlers/deadlines_test.go next to deadlines.go).
Coverage target: Every handler that gates a state-changing route — POST/PATCH/DELETE flavour. Plus GET handlers that compose a non-trivial query (dashboard, agenda, search, audit-log).
Layer 6 — End-to-end (Playwright)
What: A small Playwright suite (~10 flows) committed at e2e/ with a bun run e2e entry. Targets a local ./paliad against a scratch Postgres (the same TEST_DATABASE_URL). Each test logs in, drives the UI through one user journey, asserts visible state.
Why ~10 not 100: Per-PR budget caps at ~2 min total (§6 Q1). Playwright tests are the most expensive minute-per-confidence in this stack; they pay for themselves on the golden path and nothing else. The deep-coverage layer is L5; E2E is "is the app still alive end to end?".
Tool: playwright (npm; bun installs cleanly). No third-party test runner — Playwright ships its own. Tests live in e2e/*.spec.ts. Not bun:test. Playwright's runner is purpose-built for browser-driving and integrates with their tracing — don't fight it.
Cap: 10 flows. If a new test wants in, an existing one must drop out (or we have a real reason to widen). This is the cheapest discipline available: it forces the suite to remain a smoke pass, not a regression-test dumping ground.
Coverage target: See §4.
3. Tooling — concrete picks per layer
| Layer | Tool | Already in deps? | Install? |
|---|---|---|---|
| L0 — migration dry-run | stdlib testing + migrate/v4 |
yes | no |
| L1 — Go unit | stdlib testing |
yes | no |
| L2 — Frontend unit | bun:test |
yes (built into bun) | no |
| L3 — Frontend DOM | bun:test + happy-dom |
bun yes, happy-dom new | bun add -d happy-dom (one dep, ~200 KB) |
| L4 — Service live-DB | stdlib + sqlx | yes | no |
| L5 — Handler integration | stdlib net/http/httptest + sqlx |
yes | no |
| L6 — E2E | @playwright/test |
new | bun add -d @playwright/test + npx playwright install chromium |
Net new deps: 2 (happy-dom + playwright). Both are mainstream, both have small surface area, both align with bun's ecosystem.
Explicit rejects:
- ❌ testify — current tests read cleanly with stdlib; adding it forces a rename pass nobody wants.
- ❌ vitest — bun's built-in test runner is faster and the tests are already in
bun:testshape. - ❌ dockertest / testcontainers-go — m's preference is real-DB tests against the existing Postgres; spinning ephemeral Docker Postgres per package run adds latency and surface area for marginal isolation gain. See Q3.
- ❌ sqlmock / gomock for DB — banned by §0 lesson 1.
- ❌ cypress — Playwright is the better tool today, and the team's existing skill (
/mai-tester) already uses it.
3.1 Per-PR run-time budget
Target (subject to m's call in Q1): ≤ 90 s for the gating tier (L0+L1+L2+L4 subset+L5 happy-path), ≤ 4 min for the full suite (add L3+L4 full+L6). The gating tier blocks merge; the full suite blocks deploy.
Indicative times (estimated, validate when slice 1 lands):
| Tier | Layers | Est. time | Blocks |
|---|---|---|---|
| Gate (every PR) | L0 + L1 + L2 + L5 happy-path + L4 critical | 60–90 s | merge |
| Full (every merge to main) | + L4 full + L3 + L6 | 3–4 min | deploy |
3.2 CI — proposal, not commitment
paliad has no CI today. Two routes:
- Gitea Actions (m's stack already runs
mgit.msbls.de). Self-hosted; same auth model as the rest of mAi. Adds a.gitea/workflows/test.yml. Postgres comes from a service container. - Stay click-deploy. No CI. Workers run tests locally; Dokploy auto-deploys on green-main convention.
Recommendation: Gitea Actions for the gate tier only (L0 + L1 + L2), driven by a single short workflow. The L3-L6 expansion can be a follow-up once the gate tier proves stable. Deferred to Q2 for m's call.
3.3 Test DB — live YouPC vs ephemeral
The paliad schema lives on the shared YouPC Postgres (port 11833). Three options:
| Option | Pros | Cons |
|---|---|---|
Per-developer separate DB on YouPC (TEST_DATABASE_URL per laptop) |
Closest to prod; existing pattern. | Cleanup discipline matters; cross-developer contention possible. |
| Ephemeral docker postgres per CI run | Full isolation; parallel-safe; reset for free. | New infra; ~5 s container startup per CI invocation. |
| Dedicated test DB on a paliad-only Postgres | Isolated; cheap. | New infra to maintain. |
Recommendation: option 1 for developers (no-op change), option 2 for CI (Gitea Actions postgres service container). Deferred to Q3 for m's call.
3.4 Coverage targets
Don't gate on percentage. Gate on critical-path coverage (§4). Add go test -coverprofile= output to CI for visibility, not as a merge gate. Coverage % gating produces tests-for-tests'-sake; we want the tests that catch the bugs we've shipped.
4. Critical journeys — what MUST be covered
These are the golden-path flows. Anything not on this list is L1-L5 territory, not L6. The list is intentionally short; if it grows beyond 10, we are doing E2E wrong.
| # | Flow | Why it's critical | Layer mix |
|---|---|---|---|
| 1 | Login → dashboard renders → traffic-light counts match | Every user does this every day; broken auth = paliad is offline. | L6 (Playwright) + L5 handler (auth.go) |
| 2 | Create project (Client → Litigation → Patent → Case) | Hierarchy with team inheritance — the data model's spine. | L6 + L5 + L4 (project_service) |
| 3 | Submit deadline → routes to /inbox → approver approves → state flips | The 4-eye flow (t-paliad-138). Most-mutated paliad surface. | L6 + L5 (deadlines, approvals) + L4 (approval_service) |
| 4 | Fristenrechner: pick proceeding → cascade fires → result shows | The platform's flagship interactive tool. JS cascade. | L6 + L3 (fristenrechner cascade) + L4 (fristenrechner) |
| 5 | SmartTimeline: anchor a projected row → predecessor-missing-error handled | Recent Slice-2 work (t-paliad-173 / #31). High-touch surface. | L6 + L3 (shape-timeline) + L4 (projection_service) |
| 6 | CalDAV sync: PUT a Termin → external client sees it, edits there → pull reconciles | Owned-event semantics + foreign-UID skip rule from Phase F. Untested today. | L4 (caldav_service push/pull) — gated on Q3 (live YouPC vs ephemeral) |
| 7 | Paliadin chat: anon visit hits 404; m's session opens widget; turn renders | Owner-gated /paliadin is the only m-only surface. Quiet failures here are silent. |
L6 (smoke) + L5 (paliadin_suggest) + L4 (paliadin / aichat_paliadin) |
| 8 | /admin/rules: filter → edit one rule → lifecycle transition → audit log row | Rules drive the cascade; bad edits break every user's fristenrechner. | L6 + L5 (admin_rules) + L4 (rule_editor_service) |
| 9 | Onboarding: new user with allowed email → onboarding form → first project membership | The new-user funnel; gateOnboarded middleware traps. | L6 + L5 (onboarding, invite) |
| 10 | Migration boot smoke: spin paliad against an empty DB → server binds 8080 | Catches every mig-N crash-loop. | L0 (migration dry-run) + L4 boot-smoke variant |
Picks 1, 3, 4 and 10 are the highest-value-per-cost — they cover the routes most regressions land on (auth, mutation, cascade, boot).
5. Slice plan — tracer-bullet roll-out
Each slice is a shippable PR with a concrete deliverable, in order of expected outage-prevention payoff. Sized for a single coder shift unless flagged. No slice depends on a later one being merged. Hour estimates intentionally omitted (per global CLAUDE.md).
Slice 1 — Migration dry-run harness + boot smoke (highest leverage)
Branch: mai/<coder>/test-strategy-slice-1-migrations
Deliverable:
internal/db/migrate_test.go—TestMigrations_DryRun(per-mig BEGIN/ROLLBACK),TestMigrations_EndToEnd(full apply, then re-apply latest to assert idempotency),TestMigrations_Down(apply N→0).Makefilewithmake verify-migrations(the gate target),make test(run everything),make test-go,make test-frontend.cmd/server/main_paliadin_backend_test.goalready exists; extend with aTestMain_BindsHTTPAfterMigratethat boots the full server againstTEST_DATABASE_URL, asserts:8080is listening, then shuts down. Catches the mig-098-class crash-loop in a single test.- README section: how to set
TEST_DATABASE_URLlocally.
Catches: Every mig-98-class crash-loop; every drop-cascade-with-stale-policy-name regression (t-paliad-036).
Slice 2 — Service-layer infill: critical mutators
Branch: mai/<coder>/test-strategy-slice-2-services
Deliverable:
- Test files for the three highest-impact untested services:
internal/services/agenda_service_test.go(live-DB, dashboard agenda query)internal/services/dashboard_service_test.go(traffic-light counts)internal/services/team_service_test.go(membership + inheritance — RLS-load-bearing)
- Tighten existing
approval_service_test.go+deadline_service_test.gocoverage of the create/update/complete/delete × pending-request matrix where there are demonstrable gaps. - Add
internal/services/internal/testdb/withtx.go— the per-test tx harness (optional adoption; existing tests stay).
Catches: RLS regressions, approval interplay regressions, dashboard count drift after schema renames.
Slice 3 — Frontend bun:test setup + L2 infill
Branch: mai/<coder>/test-strategy-slice-3-frontend-unit
Deliverable:
frontend/package.jsonscripts.test = "bun test".- New tests under
frontend/src/client/:paliadin-context.test.ts(route table, entity extraction, selection truncation).paliadin-starters.test.ts(every route ≥1 starter, every starter bilingual).filter-bar/index.test.ts(chip render + active state — pure DOM-less helpers).- i18n key audit:
frontend/scripts/i18n-audit.test.tsparses everydata-i18n="…"fromdist/HTML and everyt("…")call fromsrc/, asserts bothdeandenresolve. Runs as part ofbun test.
make test-frontendwirescd frontend && bun test.
Catches: i18n drift (untranslated key shipped to user), context-envelope contract drift (paliadin SKILL.md depends on it), starter-registry regressions.
Slice 4 — Playwright golden-path smoke
Branch: mai/<coder>/test-strategy-slice-4-e2e
Deliverable:
e2e/directory at repo root.playwright.config.tspointing athttp://localhost:8080(paliad started by the test, not assumed).- Five Playwright
*.spec.tsfiles covering critical journeys 1, 3, 4, 7, 9 from §4. make e2etarget that:- starts paliad against
TEST_DATABASE_URL, - waits for
:8080to be live, - runs
npx playwright test, - tears the server down.
- starts paliad against
bun add -d @playwright/test+npx playwright install chromium.
Catches: Auth regressions, deadline-mutation regressions, fristenrechner cascade regressions, owner-gated /paliadin leaks, onboarding-gate misbehaviour.
Slice 5 — Handler integration tests for the 5 most-touched routes
Branch: mai/<coder>/test-strategy-slice-5-handlers
Deliverable:
internal/handlers/auth_test.goextended withTestLogin_HappyPath+TestLogout_ClearsCookie(real DB).internal/handlers/projects_test.go—TestProjectsCreate(POST 200, row inserted, audit emitted),TestProjectsGetByID_RespectsVisibility(404 for non-member).internal/handlers/deadlines_test.go—TestDeadlinesCreate_TriggersApproval(verifies pending pill).internal/handlers/appointments_test.go— same shape.internal/handlers/paliadin_test.go—TestPaliadinPage_404ForNonOwner,TestPaliadinPage_200ForOwner.- Shared
internal/handlers/testauth/testauth.go— mints a session cookie foruserIDso handler tests don't reinvent auth seeding.
Catches: Handler ↔ service wiring drift, visibility-predicate handler-side bugs (t-paliad-036 bug 2 was exactly this), owner-gate bypass.
Slice 6 — Frontend L3 (DOM) cascade tests
Branch: mai/<coder>/test-strategy-slice-6-frontend-dom
Deliverable:
bun add -d happy-dom.- DOM-driven tests for the three most-touched cascades:
client/fristenrechner.test.ts(cascade activate → row appears → date-set fires fetch).client/shape-timeline.test.ts(lane render, track render, projected-row click).client/filter-bar/index.test.ts(chip click toggles state, URL params update).
Catches: The whole class of "the function exists and is unit-tested but the cascade in the browser doesn't fire it" bugs. This is the layer that catches t-paliad-098 / 099 / 102 / 103.
Slice 7 — CI wiring (deferred — Q2 dependent)
Branch: mai/<coder>/test-strategy-slice-7-ci (gated on m's Q2 pick)
Deliverable:
.gitea/workflows/test.yml(or stay click-deploy if m picks that).- Gate tier runs on every PR; full suite runs on merge to main.
- Postgres service container provides
TEST_DATABASE_URL. - Slack/Gotify ping on red main.
Catches: Drift between "tests pass on my laptop" and prod reality.
Slice 8 — Coverage reporting + dashboard (lowest priority)
Branch: mai/<coder>/test-strategy-slice-8-coverage
Deliverable:
go test -coverprofile=aggregated into a singlecoverage.html.- Bun's coverage output similarly.
- A
docs/coverage.mdindex updated by CI. - Not a merge gate. Visibility only.
Catches: Slow drift; nice-to-have once the floor is in.
Slice order rationale
1, 4, 5 are the highest outage-prevention per LoC: migration dry-run kills crash-loops, E2E kills regressions, handler tests kill wiring drift. 2, 3, 6 widen the floor; 7-8 are infrastructure.
6. Open questions for m
These need m's call before any coder shift starts (or before specific slices start, where noted).
Q1 — Per-PR test-run budget
How long is acceptable to wait on the gate tier before merge?
- 30 s — only L0 + L1 (no L2+ on the gate).
- 60–90 s (recommended) — L0 + L1 + L2 + L5 happy-path + L4 critical.
- 2 min — add L3 + L4 full.
- 4+ min — add L6 (E2E on gate).
The pick determines whether E2E gates merge or only deploy.
Q2 — CI infrastructure
- Gitea Actions (self-hosted, gate tier only, recommended) — minimal new infra; aligns with m's existing stack.
- Stay click-deploy — workers run tests locally; merge discipline enforced by convention. Today's reality; we keep it.
- Both: start with click-deploy, add Gitea Actions in Slice 7 once gate tier proves stable.
Q3 — Live-DB vs ephemeral docker Postgres for tests
- Per-developer YouPC DB (current pattern) — closest to prod; existing tests work unchanged.
- Ephemeral docker postgres in CI, YouPC for devs (recommended hybrid) — keeps local-dev simple, gives CI deterministic isolation.
- YouPC everywhere — simplest, but parallel CI runs would contend.
Q4 — Coverage targets — % or critical-path?
- Critical-path only (recommended) — §4's 10 flows + every state-mutating service method has a test. No % gate.
- % gate — set a floor (e.g. 60 % lines, 50 % branches) and refuse merges below it.
- Both — critical-path is mandatory, % is informational.
m's prior preference (memory pattern: "tests that catch real bugs > coverage theatre") points at critical-path-only. Confirming.
Q5 — Which slices land before paliad is "production-grade"?
paliad is already live at paliad.de and being used by HLC colleagues. "Production-grade" here means "next time someone ships, we don't go down."
Picks:
- Slices 1 + 4 + 5 are the production-grade floor (recommended). Migration dry-run + golden-path E2E + handler integration tests cover the failure modes that hit prod since the rebrand.
- Add Slice 2 + 3 + 6 as widening passes, on their own cadence.
- Slice 7-8 are nice-to-haves.
Confirming the floor pick — and whether m wants all three to land before any new feature work, or whether they roll out alongside.
Q6 — Who owns each slice?
Recommendation: rotate coder slots so the same person isn't on every slice. Suggested assignment (head can override):
| Slice | Profile fit |
|---|---|
| 1 — migrations | Backend-heavy coder (knuth, gauss, cronus). |
| 2 — service infill | Backend-heavy coder; whoever owns approval/projection. |
| 3 — frontend unit | Frontend-heavy coder. |
| 4 — Playwright E2E | Cross-stack coder; ideally one familiar with /mai-tester. |
| 5 — handler integration | Backend coder. |
| 6 — frontend DOM | Frontend coder (same person as 3 makes sense). |
Inventor does not decide assignments; head + m do.
7. Out of scope (explicit)
- No rewrite of any existing test. The 323 existing test functions stay. New tests use the new patterns; old tests are migrated only when their files are touched for unrelated reasons.
- No third-party framework where stdlib + bun:test suffice (testify, vitest, etc. — see §3).
- No mocks at the service↔DB boundary. This is the lock-in. Mocks lie; the live-DB tests we already have are paliad's most useful safety net.
- No new feature work in this strategy. The doc proposes infra; feature scope is unchanged.
- No retirement of the
tests/smoke-*.mdhuman-written reports. Those are great for one-shot regression hunts; they coexist with the automated suite.
8. Implementation notes for the eventual coder
(For whichever coder picks up a slice. Not exhaustive.)
- Test-name collisions in Go's flat package namespace bite when a service grows N implementations. Memory note from
t-paliad-194already records this. Prefix tests with the service name (e.g.TestAichatPaliadin_RunTurn_…notTestRunTurn_…). httptest.NewRequestdoes not URL-encode — useurl.QueryEscapefor any?q=…argument. Memory note fromt-paliad-026.- sqlx v1.4.0
Namedparser strips one colon from::uuid[]— known pitfall, repro lives atinternal/services/project_service.go. UseCAST(... AS uuid[])in new query strings. - Live-DB cleanup must DELETE FKs first. Order matters (auth.users last). Look at
audit_service_test.gofor the chain pattern. paliad.paliad_schema_migrationstracker collision is documented but unresolved. Slice 1 should add amake reset-test-dbtarget that drops bothpublic.paliad_schema_migrationsandpaliad.paliad_schema_migrationsto keep developers unblocked.bun:testmatchers are Jest-compatible —expect().toEqual(),expect().toHaveBeenCalled(), etc. No deps needed.- happy-dom does not implement every DOM method (notably some
<dialog>semantics). If a cascade test fails on something missing, jsdom is the escape hatch.
9. Decision summary — pick list for m
| # | Question | Inventor recommends |
|---|---|---|
| Q1 | Per-PR budget | 60–90 s gate, 3–4 min full |
| Q2 | CI infra | Gitea Actions, gate tier only |
| Q3 | Test DB | YouPC for devs, ephemeral docker for CI |
| Q4 | Coverage target | Critical-path only, no % gate |
| Q5 | Production-grade floor | Slices 1 + 4 + 5 before new feature work |
| Q6 | Slice ownership | Rotate per profile; head decides |
If m's calls match inventor's, the implementer's brief writes itself: Slice 1 first, then 4 + 5 in parallel, then 2/3/6 as widening passes.
Status: DESIGN READY FOR REVIEW. Awaiting m go/no-go on §5 slice plan + §6 open questions before any coder shift starts.
10. m's decisions (2026-05-19, locked)
Walked through §6 with m via the AskUserQuestion interview (per head's 2026-05-19 workflow rule: inventor questions are resolved before parking, not after). Six picks locked, all matching inventor's recommendation.
| # | Question | m's answer | Effect on plan |
|---|---|---|---|
| Q1 | Per-PR test-run budget | Inventor's call (m deferred). Pick: 60–90 s gate, 3–4 min full. | Gate tier = L0 + L1 + L2 + L5 happy-path + L4 critical. L6 E2E gates deploy, not merge. |
| Q2 | CI infrastructure | Gitea Actions, gate tier only. | Slice 7 adds .gitea/workflows/test.yml running the gate tier; full suite stays on merge-to-main. |
| Q3 | Test DB topology | YouPC for devs + ephemeral docker for CI. | Local dev unchanged. Slice 7 wires Postgres service container in Gitea Actions. |
| Q4 | Coverage target | Critical-path only, no % gate. | §4's 10 flows + every state-mutating service method gets a test. Coverage % output is informational in Slice 8, never a merge gate. |
| Q5 | Production-grade floor | Slices 1 + 4 + 5 before new feature work. | These three land before any new paliad feature gets a coder shift. Slices 2, 3, 6 widen the floor on their own cadence. Slices 7-8 are nice-to-haves. |
| Q6 | Slice ownership | Head decides + rotate per profile. | Backend slices (1, 2, 5) → backend-heavy coder. Frontend slices (3, 6) → frontend-heavy coder. E2E (4) → cross-stack. Head picks at dispatch time. |
Implementer brief (post-m-decisions):
- Slice 1 starts first — migration dry-run harness +
make verify-migrations+ boot-smoke variant ofcmd/server/main_paliadin_backend_test.go. Backend-heavy coder. - Slice 4 + Slice 5 in parallel once Slice 1 is merged — Playwright golden-path (cross-stack coder, 5 specs) and handler integration (backend coder, auth/projects/deadlines/appointments/paliadin).
- Slice 7 (Gitea Actions wiring) follows once Slice 1 gate tier is proven locally.
- Slices 2, 3, 6 enter rotation alongside feature work — not blocking.
- Slice 8 (coverage reporting) lowest priority.
Status: DESIGN APPROVED — awaiting head's dispatch of Slice 1 coder shift.