Commit Graph

2 Commits

Author SHA1 Message Date
mAi
7a359989a9 feat(db): t-paliad-218 — gap-tolerant migration runner with applied-set tracker
Replaces the golang-migrate single-counter tracker with a hand-rolled
runner over embed.FS that tracks applied state as a set in
paliad.applied_migrations (version PK, name, applied_at, checksum).

Closes the parallel-merge skip-hole the 2026-05-20 mig-103 incident
exposed (m/paliad#44): a migration whose version is missing from
applied_migrations runs on the next deploy regardless of which higher
versions are already applied. Gaps are first-class.

Slice 1 of the design at docs/design-migration-runner-applied-set-2026-05-20.md.
All eight design decisions m-picked = inventor recommendation.

Runner contract:
- Ensure paliad schema → pg_advisory_lock(hash('paliad.applied_migrations'))
  → CREATE TABLE IF NOT EXISTS applied_migrations.
- bootstrapFromLegacyTracker: if applied_migrations is empty and the legacy
  paliad.paliad_schema_migrations row is present and clean, INSERT rows
  1..N for every on-disk version with checksum=NULL via ON CONFLICT DO
  NOTHING. Hard-fail if legacy tracker is dirty (operator must recover).
- scanEmbeddedMigrations: hard-fail on two .up.sql files sharing a version
  prefix — the failure mode the post-mortem exposed.
- checkNameAgreement: hard-fail on rename-after-apply mismatch (disk name
  for an already-applied version != DB name).
- applyOne: SQL body + INSERT(version, name, now(), sha256(file_bytes))
  in one transaction. All-or-nothing per migration.

Checksums populated on apply for future drift detection; rows backfilled
from the legacy tracker carry NULL (we can't fabricate a hash for what
golang-migrate applied historically). Verify-on-deploy intentionally
deferred to a focused follow-up — single if-block flip when m wants it.

Up-only runner. .down.sql files stay in embed.FS as reference; manual
roll-back path is psql + DELETE FROM paliad.applied_migrations WHERE
version=N. Zero call sites for migrate.Down in the codebase today.

Drops github.com/golang-migrate/migrate/v4 from go.mod (no other
importers; verified via grep).

Tests:
- internal/db/migrate_test.go: TestMigrations_DryRun walks pending =
  on_disk \\ applied (read from paliad.applied_migrations, missing-table
  → empty set), runs each in BEGIN/ROLLBACK against the scratch DB.
- cmd/server/main_smoke_test.go: TestBootSmoke asserts the applied set
  equals the on-disk set exactly (not just max-version-match) — catches
  the skip class the post-mortem documented. Dirty-flag check removed
  (rows are committed or absent, not 'dirty').
- All 45 service-test call sites of db.ApplyMigrations work unchanged
  (same signature, same fresh-DB behavior).

Follow-up: mig 108_drop_legacy_trackers (DROP paliad.paliad_schema_migrations
and public.paliad_schema_migrations) after one or two deploys of burn-in
on this slice.
2026-05-20 12:59:16 +02:00
mAi
586ba29b86 feat(test): migration dry-run gate + boot smoke (Slice 1)
Slice 1 of docs/design-paliad-test-strategy-2026-05-19.md — the test
infrastructure that would have caught mig 098 (digit-regex) and mig 099
(missing audit_reason) before the deploy hit prod.

Three new files + one route addition:

- Makefile: `make verify-migrations` (alias `verify-mig`) runs the
  per-migration dry-run + boot smoke against TEST_DATABASE_URL. Fails
  fast with a clear error if TEST_DATABASE_URL is unset so CI can't
  silently pass a missing env var. `make test` and `make test-go`
  cover the rest of the short / full Go suites.

- internal/db/migrate_test.go (TestMigrations_DryRun): walks every
  pending *.up.sql in numeric order, applies each inside its own
  BEGIN..ROLLBACK transaction, fails on the first SQL error with the
  file name + Postgres error. "Pending" = greater than the scratch
  DB's current tracker version, so fresh-DB CI runs verify everything
  while developer scratch DBs only re-verify the new pending migration.
  Always non-destructive — the rollback runs even on success.

- cmd/server/main_smoke_test.go (TestBootSmoke): boots the apply path
  end-to-end, asserts (a) db.ApplyMigrations returns nil, (b) the
  tracker advanced to the highest *.up.sql version on disk with
  dirty=false, (c) GET /healthz on the registered mux returns 200.
  The dry-run catches per-migration syntax errors; this catches the
  apply+bind path the container actually runs.

- internal/handlers/handlers.go: adds a GET /healthz public route — a
  no-auth, no-DB liveness probe. Used by the boot smoke; also safe
  for any future orchestrator or uptime check.

Both live-DB tests gate on TEST_DATABASE_URL and skip cleanly without
it, matching the rest of paliad's live-DB test pattern.

Verification: go build ./... clean, go vet ./... clean,
go test -short ./internal/... ./cmd/... clean (all packages pass,
live-DB tests skip), bun run build clean (2436 i18n keys unchanged).

Per CLAUDE.md inventor → coder gate, NOT self-merged.
2026-05-19 12:41:15 +02:00