Files
paliad/docs/design-paliad-data-export-2026-05-19.md
mAi cc13a5b857
Some checks failed
Paliad CI gate / build (push) Has been cancelled
Paliad CI gate / test-go (push) Has been cancelled
Paliad CI gate / deploy (push) Has been cancelled
chore(admin): remove /admin/rules/export page + export-migrations API (t-paliad-297)
Workflow shifted to hand-written numbered migrations; the audit-row SQL
export tool no longer has any consumers. Pure deletion — /admin/rules
and /admin/rules/{id}/edit stay; only the export-to-SQL flow goes.

Deleted:
- frontend/src/admin-rules-export.tsx
- frontend/src/client/admin-rules-export.ts

Removed:
- routes GET /admin/rules/export and GET /admin/api/rules/export-migrations
- handleAdminExportRuleMigrations + handleAdminRulesExportPage
- RuleEditorService.ExportMigrationsSince + ExportResult + sqlEscape helper
- build.ts entries (import, client bundle, dist HTML write)
- Sidebar "Regel-Migrations" nav item + "Migrations exportieren" button on /admin/rules
- all admin.rules.export.* + nav.admin.rules_export + admin.rules.list.export i18n keys (DE+EN)
- .admin-rules-export-* CSS rules (dead after page deletion)

Doc references in design-fristen-phase2-2026-05-15.md and
design-paliad-data-export-2026-05-19.md updated to mark the endpoint as
removed (acceptance #2 requires grep to return zero hits).
2026-05-26 11:50:14 +02:00

47 KiB

Paliad data export — Excel-first, scoped (org / project-subtree / personal)

Design: archimedes (inventor), 2026-05-19. Task: t-paliad-214. Branch: mai/archimedes/inventor-excel-data. Status: READY FOR REVIEW — no code yet, awaiting m go/no-go on §11 open questions.


0. Premise check (live state, 2026-05-19)

Verified directly against the youpc Postgres paliad schema rather than against memory or older design docs.

Migration tracker. Latest applied is 100_ccr_visible_rule; next is 101.

Row counts (org-wide today):

table rows
users 47
projects 11
deadlines 26
appointments 5
parties 0
notes 4
documents 0
project_events (audit) 93
project_teams 3
approval_requests 8
approval_policies 160
checklist_instances 4
deadline_rules 254
user_views 2
partner_units 11

A full org export today is < 600 rows of user content plus reference data — synchronous streamed download is plausible for every scope. We design for an order-of-magnitude head-room.

Auth. Passwords live in Supabase Auth (separate auth schema, not paliad). The paliad.users table has no password_hash column — so the "don't export credentials" rule from the brief is enforced by absence, not by a column-deny list. Good.

Visibility. Row-level via paliad.can_see_project(project_id) (subtree-aware through ltree path). Already used as the predicate that gates every list endpoint. We reuse it for the personal and project scopes; the org scope bypasses it under global_admin.

Documents. Table exists, 0 rows. Phase H (AI Frist-Extraktion) is deferred per m's 2026-04-16 call. No ANTHROPIC_API_KEY on Dokploy. Therefore this design does not concern itself with binary attachments — only with the metadata row when documents start landing.

Audit trail. Lives in paliad.project_events (93 rows). One row per lifecycle event with event_type, metadata jsonb, event_date, created_by. The auditing union (AuditService.ListEntries) joins 5 sources (project_events, partner_unit_events, deadline_rule_audit, policy_audit_log, reminder_log). For the export we treat project_events as primary; the four auxiliary logs are scope-specific.

Existing export precedent. (Originally pointed at the admin rule-migration export. That tool was deleted in m/paliad#129 / t-paliad-297. The gating pattern — adminGate(users, …) on a download endpoint that streams a generated artifact — still lives on other admin handlers, e.g. handleAdminDownloadBackup for /api/admin/backups/{id}/file.) Re-use the gating helper.

No Go xlsx library on go.mod today. This design picks github.com/xuri/excelize/v2 in §3.


1. Why this exists

Two motivations, both load-bearing:

  1. Safety / backup. A workbook on disk is a portable artifact independent of the running app. If paliad.de is down, a partner needs the matter file. If the Dokploy compose corrupts, IT needs a recent dump. If a deadline gets accidentally deleted, we want a recoverable snapshot.

  2. No lock-in. A team or an entire org choosing to leave paliad must be able to walk away with their entire dataset in a format anyone can open. We promise this in writing as a trust signal — exactly because the alternative (silently locking customers in) is what we built paliad to not be.

The export is therefore not a "nice analytics feature" — it is a contractual guarantee that the data is yours. That framing shapes the design: completeness > convenience, portability > polish, every export auditable.


2. Scope definitions (precise)

Three scopes. The boundary is what the caller is allowed to see, joined with what makes the artifact interpretable standalone.

2.1 org scope

Caller: global_role='global_admin' only. There is no firm-admin role distinct from global_admin in paliad today (see §4).

Content: literally everything in the paliad schema that is user content or reference data the workbook needs to be readable. Specifically:

sheet source table(s) notes
projects paliad.projects (all rows) Full project tree including soft-deleted (status='deleted' / 'closed' if any).
project_teams paliad.project_teams profession + responsibility (post-t-148).
project_partner_units paliad.project_partner_units Derivation grants.
deadlines paliad.deadlines Including completed, cancelled.
appointments paliad.appointments Including completed.
parties paliad.parties All client / opposing-party data.
notes paliad.notes All four polymorphic targets resolved into the target_kind/target_id columns.
documents paliad.documents metadata (file_path, file_size, mime_type, ai_extracted) Binaries excluded (open Q1).
audit_events paliad.project_events Full audit trail per project.
approval_requests paliad.approval_requests Including completed / rejected, with requester_kind + agent_turn_id.
approval_policies paliad.approval_policies Both project-scoped and partner-unit-defaults.
policy_audit_log paliad.policy_audit_log Source #5 of the audit union.
partner_units paliad.partner_units Org chart.
partner_unit_members paliad.partner_unit_members Including unit_role.
partner_unit_events paliad.partner_unit_events Org-chart audit.
checklist_instances paliad.checklist_instances Per-project completion state.
invitations paliad.invitations (status, role, expires_at) Without raw tokens (open Q7).
users paliad.users (id, email, display_name, office, profession, …) Excludes email_preferences jsonb only if it carries channel secrets — none do today, but checked at export time.
user_views paliad.user_views Saved filters / custom layouts.
user_card_layouts paliad.user_card_layouts Project-card layouts.
user_pinned_projects paliad.user_pinned_projects Per-user pins.
user_caldav_config paliad.user_caldav_config without the ciphertext column URL + calendar IDs + last_sync; passwords NEVER exported.
reminder_log paliad.reminder_log Outbound digest history.
caldav_sync_log paliad.caldav_sync_log Per-user sync runs.
paliadin_turns paliad.paliadin_turns Excluded by default in org export (privacy — see §6) — admins opt in per Q5.
email_broadcasts paliad.email_broadcasts Outbound broadcast history.
email_templates + _versions both Custom firm templates.
reference (read-only): proceeding_types, event_types, event_categories, deadline_rules, deadline_concepts, deadline_concept_event_types, deadline_event_types, event_category_concepts, trigger_events, holidays, courts, countries One sheet per table, prefixed ref__. Embedded so the workbook is interpretable without paliad context.
deferred audit (admin opt-in): deadline_rule_audit, policy_audit_log, partner_unit_events, caldav_sync_log, paliadin_turns Behaviour per Q5/Q6.

Excluded unconditionally:

  • auth.* (Supabase Auth schema — not ours; the user can request their auth record from Supabase directly).
  • paliad_schema_migrations (operational, no business meaning).
  • *_pre_NNN shadow / pre-migration backup tables (rows are duplicates; the live table is canonical).
  • Any future *_secret / *_token columns (see §6 deny-list mechanism).

Edge cases:

  • Soft-deleted rows: paliad currently has no soft-delete columns (deleted_at etc.). When that lands, the org export includes them by default with a deleted_at column populated. Until then, this is a no-op.
  • Archived projects: projects.status can be 'closed' or future 'archived' — export includes them (the whole point of backup is recoverability of closed matters).
  • Counterclaims: projects.counterclaim_of is a self-FK. Export carries the column as-is; the relationship is reconstructable via the id column.

2.2 project scope

Caller: any team member of the project who passes the §4 profession-tier gate.

Content: one project + all descendants along the ltree path. The descendant walk is WHERE path <@ root.path (subtree-inclusive of root). Every entity gets filtered through WHERE project_id IN (subtree_ids).

Per-sheet inclusion:

  • projects (root + descendants, one row each)
  • project_teams (membership for those projects)
  • project_partner_units (derivation attachments)
  • deadlines, appointments, parties, notes, documents (metadata), audit_events, approval_requests, checklist_instances — all scoped to subtree
  • users sheet — restricted columns: only id, email, display_name, office, profession for users referenced by any FK in the export (created_by, assigned, etc.). Don't dump all 47 users when you only need 4. (Avoids accidental org-chart leak in a project-scope export shared externally.)
  • reference data: ref__proceeding_types, ref__event_types, ref__deadline_rules, ref__deadline_concepts, ref__courts, ref__countries, ref__holidays. Same as org but a smaller universe is acceptable too — the v1 ships the full reference tables for simplicity (every row count is ≤ 300; size is moot).
  • Cross-project references (e.g., a party referenced by a project outside the subtree): out of scope by the predicate. The export carries the foreign UUID so a re-import or merge could re-link, but the foreign row itself is not in the workbook. Edge case is rare — counterclaim_of is the only known cross-project pointer today.

Edge cases:

  • Partner-unit data: partner_units is org-wide; project export carries only the unit ids attached via project_partner_units. The unit name + membership are loaded into the workbook on partner_units and partner_unit_members sheets (filtered to the attached units only).
  • Policies: approval_policies rows include both project-scoped (the project + ancestors) and partner-unit-defaults attached to this project. Same MAX-of-sources logic as runtime.
  • Audit: project_events for the subtree + (admin opt-in only) deadline_rule_audit rows whose rule was used by any deadline in the subtree. Default off — these are firm-wide curation logs and don't belong in a per-project handoff.

2.3 personal scope

Semantics: "everything I can see right now in paliad, framed as my data."

That definition resolves the ambiguity in the brief: personal scope is not "rows where I am created_by" — that misses everything I see by being on a team. It is the RLS-visible projection of the schema for caller=me, plus a handful of explicitly-personal sidecars (caldav config, my pins, my views).

Per-sheet inclusion:

sheet rows
projects WHERE paliad.can_see_project(id) for the caller.
project_teams Rows where user_id = me OR the row's project is in my visible set.
deadlines Same project-visibility filter.
appointments Same.
parties, notes, documents metadata, audit_events, checklist_instances Same.
approval_requests Rows where requested_by = me OR decided_by = me OR project ∈ visible set.
me (single-row sheet) Caller's users row (id, email, display_name, office, profession, reminder_*, lang, escalation_contact_id).
my_caldav_config The caller's user_caldav_config row without the encrypted password column — sync URL, calendar IDs, last_sync_at.
my_views Caller's user_views rows.
my_pinned_projects Caller's user_pinned_projects rows.
my_card_layouts Caller's user_card_layouts rows.
my_paliadin_turns Caller's paliadin_turns rows (currently restricted to PaliadinOwnerEmail = m, so this sheet is empty for everyone else). Sensitive: AI prompts + responses. Default on for personal scope — it's literally the caller's data.
users_referenced Restricted: id + display_name + email for users referenced as FKs in the export.
reference tables Same set as project scope.

Edge cases:

  • Caller leaves a team: the export reflects the moment-in-time visibility. A generated_at timestamp in the workbook header (__meta sheet) anchors this.
  • Caller is a global_admin: their personal export is the entire org (because their visible set = all projects). This is by design — but we surface a banner ("Sie sehen alles, weil Sie global_admin sind. Ein org-scope-Export wäre identisch.") so they don't get confused thinking the personal-scope endpoint is broken.
  • Caller has no team memberships: export contains the empty workbook + the me row + their caldav config + views/pins. Still useful — they can save their preferences.

2.4 Common columns across all scopes

Every export workbook contains a __meta sheet:

schema_version:     1
firm_name:          HLC                       # from internal/branding.Name
scope:              org | project | personal
scope_root_id:      uuid or NULL              # the project id for project-scope, NULL otherwise
generated_at:       2026-05-19T14:23:00Z
generated_by_user:  <uuid> <email>            # the caller
generated_by_label: archimedes / m / ...      # display_name
row_counts:         JSON {"projects": 11, ...}
paliad_version:     <git sha at server build>
notes:              free-form, e.g., "documents binaries excluded by design"

This pins provenance + reproducibility + diffability.


3. Format choices

3.1 xlsx as the primary format

Library: github.com/xuri/excelize/v2. De-facto Go xlsx library, pure-Go (no cgo, no external libreoffice), MIT, streaming writer for large workbooks, broad format-feature support (number formats, freeze panes, hyperlinks, sheet hide). The streaming writer (NewStreamWriter) is what we use — it writes rows one at a time without holding the whole sheet in memory. At 11-projects scale this is unnecessary; at 11k-projects scale it's essential, so we set the pattern now.

Why not the alternatives:

  • tealeg/xlsx — older, unmaintained, no streaming.
  • qax-os/excelize — same project as xuri/excelize (the github org renamed); xuri is the upstream.
  • 360EntSecGroup-Skylar/excelize — defunct fork.

Workbook structure: one sheet per entity type, never a mixed-type sheet with conditional columns. Reasons:

  • Excel users sort + filter by column; a column that means "deadline due_date" on row 4 and "appointment start_at" on row 12 is unusable.
  • The "self-describing" promise (no-lock-in) is satisfied by a workbook where every sheet is a flat table with stable column headers, not by a polymorphic blob.
  • Cross-sheet relationships are represented by UUIDs in foreign-key columns + a __lookup sheet pairing UUID → display label (project title, user email) for the workbook's lifetime. This makes the workbook self-joining in Power Query / pivot tables.

Sheet conventions:

  • Sheet names use snake_case matching SQL table names (deadlines, not Fristen). Reference tables prefixed ref__. Personal sidecars prefixed my_. Meta sheet __meta. The __lookup sheet sits last.
  • Row 1 = column headers; frozen.
  • Column 1 of every entity sheet is id (uuid).
  • Dates: ISO 8601 UTC for timestamptz; YYYY-MM-DD for date. Always as Excel strings (not Excel date types) — Excel-date interpretation differs by locale (DE: Tag.Monat.Jahr, EN: Month/Day/Year) and silently corrupts on round-trip. A pinned ISO string is unambiguous and re-importable. Open Q4 covers whether to also mirror to native Excel dates for human convenience.
  • Booleans: literal TRUE / FALSE strings, same reason.
  • jsonb columns: serialised as compact JSON one-liners in the cell. Cell type = string. Power Query can Json.Document them.
  • Arrays (e.g., additional_offices text[]): semicolon-joined string. Excel's CSV-array convention is the comma but our office codes use commas; semicolon avoids the collision.
  • text[uuid[]] paths (the projects.path ltree): exported as the canonical dotted-uuid string.

Encoding: UTF-8 always. Excelize handles the xlsx packaging which is unicode-native. Umlaute round-trip correctly (verified pattern with tesla's CSV export in t-paliad-177).

3.2 CSV + JSON siblings

Per the no-lock-in promise, xlsx is not enough on its own — Excel is a proprietary format owned by Microsoft, and a workbook is opaque without a tool that understands it. For genuine portability we also produce:

  • CSV: one file per entity sheet (no reference sheets — those go as JSON), UTF-8 with BOM (\xEF\xBB\xBF) for Excel-DE compat, RFC 4180 quoting, headers row 1. Identical column shape to the xlsx sheet.
  • JSON: a single paliad-export.json per scope, top-level {"meta": {...}, "tables": {"projects": [...], "deadlines": [...], ...}}. Easiest for programmatic re-ingest. Reference tables included.

Delivery shape: all three formats live inside one .zip per export:

paliad-export-<scope>-<timestamp>.zip
├── README.txt              # human-readable: what this is, how to read it
├── paliad-export.xlsx      # canonical workbook
├── paliad-export.json      # JSON twin (machine-readable)
├── csv/
│   ├── projects.csv
│   ├── deadlines.csv
│   ├── ...
│   └── ref/
│       ├── proceeding_types.csv
│       └── ...
└── __meta.json             # standalone meta (same content as __meta sheet)

The .zip is the artifact users download. Default content is "all three" — there's no UI knob to pick (open Q1: should there be? Inventor pick = no, zip-only).

Filename convention:

paliad-export-{scope}-{timestamp}.zip
  scope     = org | project-<root-short> | personal
  timestamp = YYYY-MM-DDTHHMMZ           # UTC, no colons (Windows-safe)

Examples: paliad-export-org-2026-05-19T1423Z.zip, paliad-export-project-Siemens-AG-2026-05-19T1423Z.zip, paliad-export-personal-2026-05-19T1423Z.zip. The project-short is slugify(root.title) capped 40 chars.

Determinism (Q6 question). Two exports of the same scope at the same row state must produce byte-identical workbooks. xlsx is internally a zip of XML — file order in the zip is significant; excelize's default zip writer is non-deterministic. We can make this deterministic by sorting the file list before writing. JSON: keys sorted alphabetically. CSV: rows ordered by id ASC (stable). The only inherently non-deterministic field is generated_at; we externalise it to the filename and the __meta sheet, but the rest of the workbook is byte-stable. Inventor pick: yes, deterministic. Lets users diff exports and prove "nothing changed between Tuesday and Thursday."

3.3 Future-proofing — schema_version

__meta.schema_version = 1. When we add columns (e.g., projects.archived_at lands), we bump to 2 and note the additions in a docs/export-schema-changelog.md. Importers (us in the future, or a re-importer at a different firm) read schema_version first.


4. Authorization model

Tightly mirrored to existing paliad role surfaces. No new roles introduced.

Scope Required auth
org paliad.users.global_role = 'global_admin'. Same gate as /admin/* pages (auth.RequireAdminFunc in handlers.go:417).
project Caller must (a) pass can_see_project(root_id), AND (b) have effective project profession ≥ associate on the root. The associate floor mirrors the conservative seed in approval_policies (t-154); paralegals + PA can see data but not extract it. m-tunable per Q2.
personal Any authenticated user. No additional gate.

Profession ladder check for project scope uses the existing DerivationService.EffectiveProjectRole (t-139 phase 2) — direct membership > ancestor > derived via partner-unit. Same surface that gates approvals; same surface gates extracts.

Audit row written on every export run. A new event_type into paliad.project_events for project-scope (so it appears on the project's Verlauf), partner_unit_events for org-scope (so it appears on the partner-unit audit log of the firm-admin's home unit), and policy_audit_log is too narrow — we likely want a new audit table for org-wide actions, OR we widen project_events to allow project_id = NULL org-wide rows. Inventor pick: new table paliad.system_audit_log — clean separation, integrates into the existing 5-source AuditService union as source #6. Migration 101 adds it.

system_audit_log columns:

id          uuid PRIMARY KEY DEFAULT gen_random_uuid(),
event_type  text NOT NULL,             -- 'data_export'
actor_id    uuid REFERENCES paliad.users(id),
actor_email text NOT NULL,             -- captured at write time, survives user deletion
scope       text NOT NULL,             -- 'org' | 'project' | 'personal'
scope_root  uuid,                      -- project_id for project scope, NULL otherwise
metadata    jsonb NOT NULL DEFAULT '{}'::jsonb,  -- {"formats":["xlsx","json","csv"], "row_counts":{...}, "file_size_bytes":12345, "filename":"..."}
created_at  timestamptz NOT NULL DEFAULT now()

The audit row is written before the export runs (so failed exports are still recorded) and updated with file_size_bytes + final row_counts on success. Failure case: separate event_type='data_export_failed' row with the error string in metadata. The audit chain is the trust signal — m sees who exfiltrated what, when, and how much.

Headers on the response:

  • Content-Disposition: attachment; filename="paliad-export-<scope>-<ts>.zip"
  • X-Paliad-Export-Audit-Id: <system_audit_log.id> — so an automated client can reference the audit row.

5. Trigger model

Three trigger surfaces:

5.1 On-demand button

  • Personal: /settings → "Daten exportieren" card → button. POST /api/me/export → 200 with Content-Type: application/zip. Synchronous.
  • Project: /projects/{id} → settings/cog menu → "Daten dieses Projekts exportieren". POST /api/projects/{id}/export → 200 zip. Synchronous. Includes a "Inkl. Unterprojekte" toggle hint (it's always subtree-inclusive — the toggle is purely informational, no off switch).
  • Org: /admin/data-export (new page, card on /admin) → "Org-Export erstellen" button. POST /api/admin/export/orgasync by default (see §6.1). Returns 202 + job_id. UI polls /api/admin/export/org/jobs/{id} for status.

Why org is async even at today's scale: the principle isn't "is it slow now" — it's "the trigger model should not change as the firm grows." If the partner with the firm-wide button gets a different UX from the associate with the project button, we'd retrofit later. Sync at 600 rows works fine; the wrapping is goroutine + channel + Server-Sent Events for live progress, no new infra needed. See §6.1.

5.2 Scheduled exports

Inventor pick — defer to slice 4. Out of v1 scope. The reasoning: scheduling sits on storage + delivery + retention, all of which are also deferred to slice 3+. Building the scheduler before we know how + where the artifact lives is premature.

When it lands (slice 4), the model is:

  • A new paliad.scheduled_exports table: (id, scope, scope_root_id, owner_user_id, cadence, last_run_at, next_run_at, delivery) where delivery is {kind: 'email-link' | 'caldav-style-webdav', config: jsonb}.
  • A daily cron (mai cron or a time.Ticker goroutine) checks next_run_at < now(), runs the export, posts the link via the configured delivery channel.
  • Cadence: weekly + monthly + on-status-change (e.g., "export when project closes" — a webhook from projects.status triggers).

For now (slice 1-2), users can right-click the on-demand button and bookmark the URL — that's the only scheduled-export-y thing we offer, and it's intentional: get the manual flow rock-solid before adding cadence.

5.3 API endpoint

Same endpoints as §5.1, callable directly with the standard cookie / bearer auth. We don't add a separate "API key" surface in v1 — paliad doesn't have personal access tokens today. If a user wants to script their personal export weekly, they can use cookie auth from m/paliad automation; that's enough until power-user volume justifies a real PAT surface.

For machine ergonomics: the /api/...export endpoints accept ?format=zip (default), ?format=xlsx, ?format=json, ?format=csv-zip query params. Only zip is documented; the others are internal but reachable for automation.


6. Storage + delivery

6.1 Synchronous vs async — per-scope picks

Personal, project: Synchronous, streamed. The handler holds the HTTP connection open, writes the zip directly to http.ResponseWriter. For 1MB-class exports (today's reality at every scale up to thousands of rows per entity) this is the right call — no persistence, nothing to garbage-collect, nothing leaking onto disk. Excelize's NewStreamWriter flushes rows as they're written so RAM stays bounded.

Org: Asynchronous, in-process queue, on-disk artifact.

  • Submit (POST /api/admin/export/org) writes a system_audit_log row with status pending and dispatches a goroutine.
  • The goroutine writes the zip to /var/lib/paliad/exports/{audit_id}.zip (configurable via PALIAD_EXPORT_DIR; on Dokploy this is a mounted volume).
  • The goroutine updates the audit row's metadata with progress, then status done with file_size_bytes on success.
  • The user polls GET /api/admin/export/org/jobs/{audit_id} (SSE or simple JSON) — when ready, a download link GET /api/admin/export/org/jobs/{audit_id}/download serves the file.
  • Download deletes the file by default (one-shot link), or keeps it per Q3.

Why not S3-style bucket? Paliad already has a documents table that will need a binary store, eventually. Coupling export storage to that future store is right — but the future store doesn't exist yet, and we don't want to provision MinIO on mlake purely for exports. Inventor pick: local disk in PALIAD_EXPORT_DIR until/unless we provision a real object store; at that point the export storage moves there transparently.

6.2 Retention (Q3)

Inventor pick: 7 days, then auto-delete. Justifications:

  1. Exports contain sensitive client data — minimising the retention window minimises blast radius if the Dokploy host is compromised.
  2. 7 days covers a holiday-week round-trip ("I exported Friday, want to look at it Monday next week, missed the day-1 link").
  3. The audit row in system_audit_log persists forever — you can always tell that an export happened, even after the artifact is deleted.

A cleanup goroutine runs daily, lists system_audit_log rows older than 7 days with non-NULL file_path, deletes the file, sets metadata.deleted_at. Audit row stays.

The PALIAD_EXPORT_RETENTION_DAYS env var is the knob (default 7). m-tunable per firm.

6.3 PII / GDPR

This is where the design gets serious.

At-rest encryption. Files in PALIAD_EXPORT_DIR are plaintext on the Dokploy volume. The volume itself is encrypted at the host layer (Hostinger VPS disk encryption). We do not layer additional file-level encryption on the artifact — that would require a per-user key, key escrow, key rotation, all of which is over-engineered for a 7-day-retention exfil where the link is single-use behind cookie auth. The disk encryption + 7-day TTL + audit log is the trust boundary.

In-transit encryption. TLS via Dokploy + Traefik — paliad.de is Let's Encrypt-served. No raw HTTP path.

Download authentication. The download link /api/admin/export/org/jobs/{audit_id}/download requires the same cookie auth as the submit. No public signed URLs in v1 (deferred per Q8). When we add scheduled exports + email delivery (slice 4), we'll need expiring signed URLs — that design is captured then, not now.

Data-subject requests. A user invoking /api/me/export is, in effect, performing a self-serve GDPR Art. 15 data-portability request. Audit row records the request. If the firm receives a third-party DSR ("export the data my client Mr. Müller asked for"), a global_admin can run a project-scope export filtered to projects involving that client; this is a manual workflow we don't automate in v1 (open Q9).

Right-to-erasure. Out of scope. Erasure is a write path; export is read-only. They share no code.

External sharing of export files. A user who downloads an export and emails it to an external party has done so on their own authority and outside paliad's protection. We don't watermark the file (debated and rejected: watermarking introduces non-determinism, breaks diffability, and gives false security — anyone reading the zip can strip metadata). What we do document in the embedded README.txt:

Diese Datei enthält möglicherweise vertrauliche Mandantsdaten. Sie wurde erzeugt am {generated_at} durch {actor_email} aus Paliad ({firm_name}). Die Weitergabe an Dritte erfolgt in eigener Verantwortung des Empfängers.

A simple "you broke the seal" notice is what we offer. It's a contract, not a control.

PII column deny-list. Hard-coded in internal/services/export_service.go:

  • paliad.users.password_hash — doesn't exist, but the deny-list is the safety net if it ever does.
  • paliad.user_caldav_config.encrypted_password — explicit drop.
  • Any column whose name matches (?i)secret|token|password|api[_-]?key|private[_-]?key — caught at column-discovery time, errors loudly into system_audit_log.metadata.warnings.
  • paliadin_turns.assistant_response — present in personal export of caller's own data; off in org export by default (m's call per Q5).

6.4 GDPR-completeness note

The export of one user's personal scope is a partial Art. 15 disclosure — it contains what's in paliad's control. Other systems (Supabase Auth row, mlake logs, CalDAV provider) are out of paliad's scope and not in the export. The embedded README states this explicitly so the user knows the workbook is the paliad-side answer, not a complete personal-data dump from "the firm."


7. Slice plan

Tracer-bullet shipping. Each slice is independently shippable and reviewable. The first slice closes the no-lock-in promise for the smallest, lowest-risk scope; later slices widen.

Slice 1 — personal export, synchronous, xlsx + JSON

  • Adds excelize/v2 to go.mod.
  • New internal/services/export_service.go with the column-discovery + writer plumbing for xlsx + JSON.
  • New internal/handlers/export.go with POST /api/me/export.
  • New /settings UI: "Daten exportieren" card + button.
  • Migration 101: paliad.system_audit_log + AuditService.ListEntries 6th union branch.
  • i18n keys (settings.export.*, __meta.*).
  • Tests: export_service_test.go covers xlsx structure (one row each kind), JSON shape, PII deny-list.

Ships the no-lock-in promise for every user immediately. ~600-800 LoC + ~25 i18n keys.

Slice 2 — project export, synchronous, xlsx + JSON + CSV-zip

  • Generalises the export_service to scope-aware queries (the visibility predicate gets injected per scope).
  • New POST /api/projects/{id}/export, gated by §4.
  • Adds CSV writer alongside xlsx + JSON; bundles all three into .zip.
  • Project-detail UI gets the export menu entry.
  • README.txt template embedded.
  • Tests + e2e (Playwright) on the project page button.

~800-1000 LoC. The CSV path generalises the xlsx column-discovery so the marginal cost is low. After this slice, two of three scopes are shipped and synchronous serves both.

Slice 3 — org export, async with job tracking

  • Adds the goroutine + on-disk artifact path + PALIAD_EXPORT_DIR env.
  • POST /api/admin/export/org + job status + download endpoints.
  • New /admin/data-export page (card on /admin/).
  • Cleanup goroutine (daily, deletes artifacts > PALIAD_EXPORT_RETENTION_DAYS).
  • Refactor: extract the now-common "writeExportToWriter" core from the synchronous path so async re-uses it.

~600-800 LoC. After this slice, all three scopes ship + audit trail is complete.

Slice 4 — scheduled exports (deferred, not v1)

Designed in §5.2; building deferred until at least 2 firms ask. The contract surface is the scheduled_exports table + cadence + delivery channel.

Slice 5 — API ergonomics (deferred)

Personal Access Tokens (the "I want to cron my own export" surface). Until there's a customer, we don't build the PAT issuer + revocation + audit.

Slice 6 — GDPR DSR helpers (deferred)

A /admin/data-subject-request workflow to assemble a per-natural-person export across projects. Built on Slice 1-3 primitives; not blocked by them.

Slice 7 — document binary inclusion (deferred until documents have rows)

When the documents table starts holding real files, the export adds a documents/ subdir in the zip with the actual files, keyed by filename = {document_id}.{ext}. The metadata sheet links by id. Adds ~150 LoC + an env var for the file backend.

Critical-path slices for v1: 1 + 2 + 3. Everything after is layered, optional, m-prioritised when there's a real customer pull.


8. Trade-offs flagged

  1. xlsx-first means we own the excelize dependency forever. Mitigation: excelize is the canonical Go xlsx — replacing it would be a multi-thousand-LoC migration, but the upstream is healthy (MIT, 17k+ stars, monthly releases). Acceptable lock-in.

  2. Determinism (sorted file order, sorted JSON keys, row-id-ordered CSV) is implementation discipline, not a library default. Test that breaks if any future change introduces non-determinism is essential (helps reviewers + prevents regressions).

  3. Synchronous personal + project means a runaway export can block a request goroutine for seconds. At today's data shape this is sub-second. Watchdog: a 30s context deadline on synchronous exports; over that, return 503 with "export too large — contact admin for async." Triggers slice 3 → slice 4 of the user's mental model.

  4. Per-scope endpoints triplicate similar code paths. Mitigated by the shared ExportSpec struct + scope-aware predicate injection. Read carefully in code review — this is the place subtle scope leaks creep in.

  5. JSON twin is genuinely redundant for human users. It's there for the no-lock-in promise (a Python script can re-ingest without Excel). The cost is one extra file in the zip + one extra serialisation pass. Acceptable.

  6. No diff tooling — yet. Determinism enables diff -r between two extracted zips, but no in-app surface. Slice 4+ may layer "show me what changed between Monday's and Friday's export" once exports are scheduled and stored.

  7. paliadin_turns privacy default. Currently restricted to PaliadinOwnerEmail so the table is empty for every other user. Personal export carries them by default ("your AI history"); org export by default does NOT (admin opt-in via ?include=paliadin_turns). When Paliadin opens past owner-only (post-API cutover), revisit.

  8. Reference-data inclusion bloats every export. 254 deadline_rules + 102 trigger_events + 56 concepts + … = ~1000 reference rows in every workbook regardless of scope. At zip-compressed sizes this is < 100KB and worth the standalone-interpretability. If the workbook gets too large later, ship reference data as a separate "paliad-reference-snapshot.zip" once + reference it from each export's README.

  9. Org export volume at firm-scale. A 10k-project firm has ~50k deadlines and ~200k audit events. Even at 200 bytes/row average that's < 100MB — comfortable for the async path with 4GB Dokploy RAM. Threshold concerns kick in at 1M+ rows, which is firm-class-of-100-attorneys territory. Designed for, not blocked on.

  10. Audit-log explosion. A nightly cron + 47 users self-exporting = 50 audit rows / day. At a year that's 18k rows. Still trivial. No retention on the audit chain (the artifact retention does NOT touch audit-log retention — the audit chain is the trust signal, see §4).


Single PR, layered slices 1 → 2 → 3 as separate commits. No DB-heavy migrations; the only schema add is system_audit_log (one table, one trigger if any). The hard work is in the writer abstraction.

  • Slice 1: pattern-fluent Sonnet coder. ~600-800 LoC, mostly bookkeeping. Tests pin the shape.
  • Slice 2: same hands as slice 1 (continuity matters here — the writer abstraction is set in slice 1 and the project scope generalises it).
  • Slice 3: same hands again. The async path is its own subsystem but the writer is unchanged.

NOT cronus per memory directive 2026-05-06 (retired from paliad). NOT m — this is a coder task end-to-end.


10. Inventor → coder transition (GATED per project CLAUDE.md)

Per .claude/CLAUDE.md: design phase ends here. No code touches the tree from inventor. Head's mai-head skill gates the coder shift after m's go on §11 open questions.

When approved, the coder shift opens on mai/<coder-name>/data-export-slice-1 (fresh branch off main, NOT off the design branch — design doc commit is the only artifact this branch carries forward via cherry-pick).


11. Open questions for m

The brief lists 8 candidate questions. After live-state verification I've collapsed + sharpened to 9, each with an inventor pick + reasoning. Will be asked sequentially via AskUserQuestion (paliad dogma — no ## §X.Y markdown dump on m, per t-paliad-154 lesson).

Q1 — Bundle xlsx + CSV + JSON in one zip, or let user pick format?

Inventor pick: bundle all three in one zip, no UI knob.

Reasoning: the no-lock-in promise requires the JSON twin (Excel-independent re-ingest); the xlsx is the human-readable default; CSV is the universal lingua franca. Picking only one breaks the promise for some user. Bundle size at today's scale is < 1MB; even at firm-scale it's well under the email-attachment limit. The cost of a checkbox UI is more than the cost of three extra files.

Alternative: offer ?format=xlsx-only|json-only|csv-only query params for the API surface, default to bundle. Documented in README only. We do this in v1 anyway since multi-format is what generates the zip in the first place.

Q2 — Project-scope profession floor: associate (inventor pick) or member?

Inventor pick: associate floor.

A project export carries party names, addresses, decision-history, draft strategy notes. That's "I can write a paper for the partner" data, not "I can see the deadline calendar" data. Member is the bare-visibility tier (you got added to the team). Export is exfiltration — needs the next tier up.

Alternative: gate by responsibility ∈ {lead, member} (no profession floor, only the project-team responsibility check). Cleaner architecturally — separates the "can see" axis from the "can extract" axis using the same fields. Less restrictive in practice.

Worth choosing now because the gate text in the audit row mentions the tier.

Q3 — Org-export artifact retention: 7 days (pick) or 30 / 90?

Inventor pick: 7 days.

Default conservative. m-tunable per firm via env var.

Q4 — Excel dates: ISO strings only (pick) or also a mirrored native-Excel-date column?

Inventor pick: ISO strings only.

Native Excel dates are locale-poisoned (DE vs EN epoch interpretation flips, round-trip corruption when re-saved). ISO is the universal answer. Power users who want a sortable native-date column can derive it once in their workbook — but the canonical export stays unambiguous.

Q5 — paliadin_turns in org export: opt-in only (pick), or include by default?

Inventor pick: opt-in via ?include=paliadin_turns query.

Today it's m-only data (PaliadinOwnerEmail gate), so the privacy stakes are low — but the moment Paliadin opens beyond owner-only, the AI conversation history per user is the most sensitive personal data we carry. Setting the off-by-default precedent now means we don't accidentally start dumping it later.

Q6 — Deterministic byte-for-byte exports: yes (pick) or accept timestamp drift in zip metadata?

Inventor pick: yes, deterministic.

Lets users diff exports across time. Cost: ~50 lines of sort.Strings + a custom zip writer with stable ordering. Worth it.

Q7 — Invitation tokens in org export: drop them entirely (pick) or include as hash?

Inventor pick: drop entirely.

Tokens grant signup access. Including them in a backup creates a vulnerability surface — an exfiltrated backup could be used to sign up as someone-else with their pending invite. Hashing doesn't help because the hash is what the URL contains. The invitation row (recipient, role, expiry, sent_at) is in the export; the token is not. If you need to re-issue, you do so from paliad's invite UI.

Q8 — Public signed-URL downloads (for scheduled/email delivery): yes / not in v1 (pick)?

Inventor pick: not in v1.

Defer to slice 4. v1's download is cookie-authenticated only. Signed URLs are useful when the recipient is asynchronously notified (email link), which is the scheduled-export model — and that whole subsystem ships later.

Q9 — GDPR Art. 15 DSR helper UI: not in v1 (pick)?

Inventor pick: not in v1.

A global_admin can already assemble a DSR manually using project-scope exports filtered by client. v1 ships the primitives; v2 ships the workflow.

Closing question for m: implementer

Recommend pattern-fluent Sonnet for all three slices, same hands across (continuity matters for the writer abstraction). Specific name = your call.


12. m's decisions (addendum, 2026-05-19)

m walked the §11 questions live via AskUserQuestion. Results below — these supersede the inventor picks where they differ.

  • Q1 — Bundle format: Bundle xlsx + JSON + CSV in one .zip per export. ✓ matches pick.
  • Q2 — Project-scope floor: Any team member (responsibility ∈ {lead, member}). ⚠ Deviation from associate-floor pick — m chose the looser axis-split gate. Implementation update for §4: project-scope auth becomes (a) can_see_project(root_id) AND (b) caller is on project_teams for the root with responsibility ∈ {lead, member}. The DerivationService profession check is dropped from the export gate; observers + externals + derived-only members still cannot extract. system_audit_log.metadata records the responsibility value the caller held at export time.
  • Q3 — Org-export retention: 90 days. ⚠ Deviation from 7-day pick. Implementation update for §6.2: PALIAD_EXPORT_RETENTION_DAYS default flips from 7 to 90. The cleanup goroutine still runs daily; the threshold is just longer. Audit row unaffected (still persists forever).
  • Q4 — Date format: ISO 8601 strings only. ✓ matches pick.
  • Q5 — paliadin_turns in org export: Never include in org export.Tighter than opt-in pick. Implementation update for §2.1 + §6.3: the paliadin_turns row drops from the org-scope sheet table entirely — no ?include=paliadin_turns query param. Personal scope still carries the caller's own paliadin_turns (it's literally their data). The hard exclusion is enforced in export_service.go's scope-aware sheet registry, not just in column-discovery, so a future schema addition can't accidentally re-include it.
  • Q6 — Deterministic exports: Yes. ✓ matches pick. (m answered freeform "1" alongside the batching request — first option = deterministic.)
  • Q7 — Invitation tokens: Drop entirely. ✓ matches pick.
  • Q8 — Signed URLs in v1: Not in v1. ✓ matches pick.
  • Q9 — GDPR DSR helper UI in v1: Not in v1. ✓ matches pick.

Net effect on slice plan: unchanged shape, three modifications:

  • Slice 2 gate logic uses project_teams.responsibility only (no profession lookup).
  • Slice 3 default retention is 90 days (one env-var value change).
  • Slice 1 + 3 sheet registry omits paliadin_turns from org scope entirely.

No other slice deltas. v1 still ships slices 1+2+3.

Coder shift gating: head still gates the implementation handoff; m's decisions here close §11 but don't auto-trigger coder work.


13. Adjacent / out-of-scope

  • Import path — explicitly out per brief. A round-trip "export then re-import" is appealing but is its own design (rebinding UUIDs, conflict resolution, schema_version migrations). Don't conflate.
  • Postgres replacement — the Excel workbook is a backup + portability artifact, not a data-model alternative. Postgres stays canonical.
  • t-paliad-212 (leibniz, CalDAV multi-calendar): personal export already carries the caller's caldav config (minus ciphertext). When leibniz designs multi-calendar, the personal export's my_caldav_config sheet becomes a list rather than a single row — handled by column-discovery automatically. No design conflict; flagged for confirmation when leibniz's design lands.
  • t-paliad-213 (mendel, test strategy): export service warrants pure-function tests for column discovery, deny-list, scope predicate, plus one e2e (Playwright) per scope endpoint. Slice tests pin the contract; mendel's overall strategy decides framework choice.

14. References

  • docs/design-data-model-v2.md — projects + mandanten + ltree path + can_see_project predicate.
  • docs/design-approval-policy-ui-2026-05-07.md — 5-source audit union (this design adds the 6th source).
  • docs/design-profession-vs-project-role-2026-05-07.md — profession ladder for the §4 project gate.
  • internal/handlers/backups.gohandleAdminDownloadBackup (precedent for admin-gated artifact download; the older rule-migration export precedent was removed in t-paliad-297).
  • internal/services/project_service.go:15 — visibility predicate.
  • internal/services/derivation_service.goEffectiveProjectRole for the project gate.
  • github.com/xuri/excelize/v2 — chosen xlsx library.

END OF DESIGN. Status: READY FOR REVIEW.

Inventor parks until m's go/no-go on §11. No code touches the tree from this branch.