Commit Graph

2 Commits

Author SHA1 Message Date
mAi
d86cac0b53 feat(submissions): t-paliad-230 format-only .dotm→.docx convert
m's 2026-05-21 scope reduction of the t-paliad-215 submission generator:
ship a demo that hands the lawyer the firm style template as a clean
.docx. No variable-merge engine, no per-submission template registry,
no fallback chain — the merge slice is deferred to a future task.

Replaces the previous engine (template registry + variable bag +
{{placeholder}} renderer + dual project_events/documents writes) with:

* services.ConvertDotmToDocx — single-function .dotm/.docm/.dotx → .docx
  format converter that strips word/vbaProject.bin, word/vbaData.xml,
  word/customizations.xml, and word/_rels/vbaProject.bin.rels, rewrites
  [Content_Types].xml (demotes the macro/template main type to plain
  docx, drops the .bin Default Extension and the macro Overrides), and
  rewrites word/_rels/document.xml.rels to drop the vbaProject +
  keyMapCustomizations relationships. Idempotent on a plain .docx.
  archive/zip + regex stdlib only — no new third-party dependencies.

* handlers/submissions.go — POST /api/projects/{id}/submissions/{code}
  /generate fetches the cached HL Patents Style .dotm (via a new
  fetchHLPatentsStyleBytes accessor on files.go that shares the same
  cache as /files/{slug}), converts, writes one paliad.system_audit_log
  row (event_type='submission.generated', metadata={submission_code,
  rule_name, filename}), and streams the .docx as an attachment. GET
  /api/projects/{id}/submissions still lists filing rules but
  has_template is unconditionally true (one universal template).

* Filename per design §7: {rule.name}-{project.case_number}-{YYYY-MM-DD}
  .docx, with Umlauts ASCII-folded and slashes → underscores.

Drops services/submission_templates.go, services/submission_vars.go,
and the wiring in cmd/server/main.go + handlers/handlers.go that bound
them together. Frontend client switched to POST.

Verified the converter against the real HL Patents Style.dotm (361 KB
input → 243 KB output, 46 parts in output zip):

  unzip -tq /tmp/hl-patents-style.converted.docx   → No errors
  python3 -c "import zipfile, xml.etree.ElementTree as ET; \
              z=zipfile.ZipFile('/tmp/hl-patents-style.converted.docx'); \
              [ET.fromstring(z.read(p)) for p in z.namelist() if p.endswith('.xml')]"
  uv run --with python-docx python3 -c "import docx; \
              d=docx.Document('/tmp/hl-patents-style.converted.docx'); \
              print(len(d.paragraphs), 'paragraphs', len(d.styles), 'styles')"
              → 236 paragraphs, 168 styles, 1 section

All assertions passed: every Override in [Content_Types].xml resolves
to a real part, every internal Target in document.xml.rels resolves,
zero macro-related residue, and the document body + styles + theme
survive untouched.

go test -run TestBootSmoke ./cmd/server/... clean (route additions
register without conflict on the Go ServeMux).
2026-05-21 15:23:24 +02:00
mAi
8ea3509b98 feat(submissions): t-paliad-215 Slice 1 — in-house .docx render engine
Pure-Go {{path.dot.notation}} placeholder engine + unit tests
(t-paliad-215, design docs/design-submission-generator-2026-05-19.md
§6). Chosen over github.com/lukasjarosch/go-docx because that library
treats sibling placeholders inside one <w:t> run as nested and
refuses to replace them — patent submissions routinely carry multiple
placeholders per paragraph (party blocks especially), so the library
is a non-starter.

Two-pass strategy preserves run-level formatting on the common path:

 1. Pass 1: regex replace inside each <w:t>…</w:t> independently —
    no format loss for the 99% case where placeholders are intact.
 2. Pass 2: paragraph-level merge for paragraphs that still contain
    orphan "{{" or "}}" markers (Word fragmented the placeholder
    across runs).

Missing placeholders render [KEIN WERT: <key>] / [NO VALUE: <key>]
markers so the lawyer sees the gap in Word rather than getting a 400.

Tests cover: single-run, multi-per-run (the go-docx failure mode),
cross-run merge, missing-marker (DE+EN), XML escaping of special
chars, non-document zip entries preserved, placeholder regex
grammar.
2026-05-19 13:42:51 +02:00