Compare commits
6 Commits
9201501941
...
e189d3fe6a
| Author | SHA1 | Date | |
|---|---|---|---|
| e189d3fe6a | |||
| 58907554fc | |||
| 9b8a865c5f | |||
| f8067c2fe5 | |||
| 78a30a7ee0 | |||
| 091804923a |
495
docs/plans/prd-docforge-2026-05-29.md
Normal file
495
docs/plans/prd-docforge-2026-05-29.md
Normal file
@@ -0,0 +1,495 @@
|
|||||||
|
# PRD — `docforge`: a modular document-generator engine
|
||||||
|
|
||||||
|
**Task:** t-paliad-349 (m/paliad#157) · **Author:** leibniz (inventor) · **Date:** 2026-05-29
|
||||||
|
**Status:** DESIGN — awaiting head's go/no-go on the coder shift.
|
||||||
|
**Supersedes nothing.** Extends and re-homes the submission generator designed in
|
||||||
|
`docs/design-submission-generator-2026-05-19.md`, `…-v2-2026-05-26.md`, and
|
||||||
|
`docs/design-submission-page-2026-05-22.md`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## §0 Premises
|
||||||
|
|
||||||
|
### 0.1 What this is
|
||||||
|
|
||||||
|
m wants the paliad "doc generator" pulled apart into a clean, reusable engine.
|
||||||
|
Verbatim direction (2026-05-29):
|
||||||
|
|
||||||
|
> I want to be able to create and modify word documents, using variables inside
|
||||||
|
> the documents, "editing them live" and preview the results, export in the end.
|
||||||
|
> We should have all that modular to keep it clean. The editor is something else
|
||||||
|
> than the importing, exporting, variable exchange, data fetching etc.
|
||||||
|
>
|
||||||
|
> Currently I can't upload the base document to insert variables into to create a
|
||||||
|
> template — and then later I want to fill the template using data, modifying it
|
||||||
|
> manually where necessary, then exporting.
|
||||||
|
|
||||||
|
Two distinct user surfaces fall out of that:
|
||||||
|
|
||||||
|
- **Authoring** — upload a base `.docx` → place variable slots into it → save as a
|
||||||
|
reusable template. *This is the gap that does not exist today.*
|
||||||
|
- **Generation** — pick a template → bind variables to project data → manually edit
|
||||||
|
where needed (live editor + preview) → export `.docx`.
|
||||||
|
|
||||||
|
### 0.2 Today's state (audited 2026-05-29, verified against the live tree)
|
||||||
|
|
||||||
|
The current submission generator is ~250 KB of Go plus a 115 KB editor bundle:
|
||||||
|
|
||||||
|
- `internal/services/submission_vars.go` — variable resolution across **7 namespaces**
|
||||||
|
(`firm.*`, `today.*`, `user.*`, `project.*`, `parties.*`, `procedural_event.*`
|
||||||
|
+ `rule.*` legacy aliases, `deadline.*`). Resolution is a **push** model: each
|
||||||
|
namespace is a hardcoded `addXxxVars(bag PlaceholderMap, …)` function mutating a
|
||||||
|
shared `map[string]string`. There is **no interface and no registry** — adding a
|
||||||
|
namespace means hand-editing `Build` to call a new function.
|
||||||
|
- `internal/services/submission_merge.go` — placeholder substitution. The regex
|
||||||
|
(line 95, verified) is `\{\{\s*([A-Za-z][A-Za-z0-9_.]*)\s*\}\}`.
|
||||||
|
Two-pass: single-run replace inside each `<w:t>`, then
|
||||||
|
cross-run merge for fragmented placeholders. HTML preview wraps `(key,value)` in
|
||||||
|
Private-Use-Area sentinels so `emitTextWithDraftVars` can reconstruct
|
||||||
|
`<span class="draft-var" data-var="key">…</span>` for click-to-jump.
|
||||||
|
- `internal/services/submission_md.go` — Markdown → OOXML runs. `parseInlineSpans`
|
||||||
|
(lines 393–446) tokenises bold/italic and **preserves `{{…}}` verbatim**.
|
||||||
|
- `internal/services/submission_compose.go` — assembles the final `.docx`: unzip base,
|
||||||
|
render each included section's Markdown to OOXML, splice between
|
||||||
|
`{{#section:KEY}}…{{/section:KEY}}` anchors, patch hyperlink rels, repack, then run
|
||||||
|
the placeholder pass.
|
||||||
|
- `internal/services/submission_{draft,section,building_block,base}_service.go` — the
|
||||||
|
draft/section/building-block/base data model + CRUD.
|
||||||
|
- `internal/handlers/submission_{drafts,sections,building_blocks,bases}.go` — the HTTP
|
||||||
|
wire (the 53 KB `submission_drafts.go` is the bulk).
|
||||||
|
- `frontend/src/client/submission-draft.ts` — the editor UI (**one `.ts` bundle; there is
|
||||||
|
no `submission-draft.tsx`** — the brief was wrong on this point).
|
||||||
|
|
||||||
|
**OOXML approach (verified):** pure `archive/zip` + string manipulation of
|
||||||
|
`word/document.xml`. **No third-party docx library** — `go.mod` has none.
|
||||||
|
`lukasjarosch/go-docx` appears *only in a comment* (`submission_merge.go:13`)
|
||||||
|
documenting why it was rejected (it refuses sibling placeholders in one run). The base
|
||||||
|
stays byte-for-byte identical outside the regions we touch.
|
||||||
|
|
||||||
|
**Reference model:** `pkg/litigationplanner/` (t-paliad-292). The package **owns its
|
||||||
|
types** and exposes **interfaces for stateful inputs** (`Catalog`, `HolidayCalendar`,
|
||||||
|
`CourtRegistry`); paliad implements them against Postgres, youpc.org against an embedded
|
||||||
|
JSON snapshot. `doc.go` is the package doc; `types_wire_test.go` locks the JSON contract.
|
||||||
|
**docforge mirrors this packaging discipline exactly.**
|
||||||
|
|
||||||
|
### 0.3 Premise correction (load-bearing)
|
||||||
|
|
||||||
|
The brief lists **two consumers in scope: paliad + upc-commentary**. Verified against the
|
||||||
|
live repo: **`UPCommentary/upc-kommentar` is Bun + SvelteKit + TypeScript + PLpgSQL —
|
||||||
|
zero Go.** A SvelteKit app cannot `import` a Go `pkg/`. m's resolution (2026-05-29):
|
||||||
|
**upc-kommentar is out of scope as a live consumer for now.** docforge is a pure Go
|
||||||
|
package; paliad imports it in-process like `litigationplanner`. The interfaces are
|
||||||
|
designed so an HTTP veneer (for a future TS consumer) is *addable later* without rework —
|
||||||
|
but none is built now. See §4 D-P1 and §8.
|
||||||
|
|
||||||
|
### 0.4 Locked constraints (m, confirmed)
|
||||||
|
|
||||||
|
- One Go module: `pkg/docforge`. Same packaging model as `pkg/litigationplanner`.
|
||||||
|
- docforge **owns no database tables** — data flows in via interfaces.
|
||||||
|
- `.docx` first; engine designed format-pluggable for `.pdf`/`.html`/`.md` later.
|
||||||
|
- Authoring and Generation are **distinct pages**, but share the engine + the generic
|
||||||
|
editor plumbing.
|
||||||
|
- Generation must support **minor manual content edits** (live editor, not just
|
||||||
|
data-binding).
|
||||||
|
- Editor stays per-consumer; the **generic UX plumbing** is extracted into a reusable UI
|
||||||
|
package now.
|
||||||
|
- The neutral model must be **lossless for our own `.docx`** (the uploaded base is an
|
||||||
|
opaque carrier, preserved byte-for-byte outside touched regions).
|
||||||
|
|
||||||
|
### 0.5 Contracts that MUST survive the refactor
|
||||||
|
|
||||||
|
These are invariants. The migration (§6) protects each by moving it *with its file and its
|
||||||
|
test*, unchanged:
|
||||||
|
|
||||||
|
1. **`placeholderRegex`** = `` `\{\{\s*([A-Za-z][A-Za-z0-9_.]*)\s*\}\}` `` — underscores
|
||||||
|
and dots legal in keys; whitespace inside braces trimmed; case-sensitive.
|
||||||
|
2. **Last night's underscore fix** (commit `b78a984`): `parseInlineSpans` short-circuits
|
||||||
|
the inline scanner on `{{` and copies the placeholder literally to `}}`, so
|
||||||
|
`{{project.case_number}}` is never mangled to `{{project.casenumber}}`.
|
||||||
|
3. **`data-var` contract** — `data-var="<key>"` on both `.draft-var` preview spans and
|
||||||
|
`.submission-draft-var-input` sidebar inputs; the click-to-jump and focus-highlight are
|
||||||
|
bijective across repaints.
|
||||||
|
4. **Missing-value markers** — `[KEIN WERT: key]` (DE) / `[NO VALUE: key]` (EN) render
|
||||||
|
inline, never an error.
|
||||||
|
5. **Legacy aliases** — `procedural_event.X ≡ rule.X` resolve identically
|
||||||
|
(`submission_vars_aliases_test.go`); party variables emit comma-joined, indexed, and
|
||||||
|
flat-legacy forms (`submission_vars_parties_test.go`).
|
||||||
|
6. **Section anchor syntax** — `{{#section:KEY}}…{{/section:KEY}}`, `KEY` matched against
|
||||||
|
`[A-Za-z0-9_]+`.
|
||||||
|
7. **No binary retention** — exported `.docx` is regenerable from inputs; only audit rows
|
||||||
|
persist (`system_audit_log` `submission.exported` + `project_events`).
|
||||||
|
8. **V1 fallback path** — pre-Composer drafts (`base_id IS NULL`, no section rows) render
|
||||||
|
via the pure-placeholder path. No auto-upgrade.
|
||||||
|
9. **`{{…}}` pass-through** — the Markdown walker emits placeholders verbatim; the merge
|
||||||
|
pass substitutes them afterward. Order is load-bearing (substitution runs *inside*
|
||||||
|
compose, after section splicing).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## §1 Goals
|
||||||
|
|
||||||
|
**G1.** Extract the format-neutral document machinery (Markdown→OOXML walker, OOXML
|
||||||
|
merge/compose, placeholder engine, `.dotm`→`.docx`) into `pkg/docforge` with a clean
|
||||||
|
public surface and zero behavior change at the extraction step.
|
||||||
|
|
||||||
|
**G2.** Introduce a **neutral document/template model** so importers produce it, the engine
|
||||||
|
binds variables on it, and exporters render it out — with `.docx` as the first
|
||||||
|
importer+exporter pair, not the universe. Lossless for our own `.docx`.
|
||||||
|
|
||||||
|
**G3.** Replace the hardcoded `addXxxVars` push with a **`VariableResolver` interface per
|
||||||
|
namespace** + a `ResolverSet` that composes them, preserves aliases, and exposes the key
|
||||||
|
catalogue (label + group) so the frontend variable form/palette becomes data-driven
|
||||||
|
instead of hardcoded in TS.
|
||||||
|
|
||||||
|
**G4.** Build the **Authoring surface**: upload `.docx` → WYSIWYG render → click/select →
|
||||||
|
insert `{{slot}}` → save template. Closes the gap m named.
|
||||||
|
|
||||||
|
**G5.** Refactor **Generation** onto docforge + uploaded templates, preserving the live
|
||||||
|
editor, preview, manual-edit, and export — and every contract in §0.5.
|
||||||
|
|
||||||
|
**G6.** Extract the **generic editor UX** into `frontend/src/lib/docforge-editor/`,
|
||||||
|
consumed by both the generation and authoring shells.
|
||||||
|
|
||||||
|
**Non-goals (this PRD):** implementation, migration SQL, code. Formats beyond `.docx`
|
||||||
|
(interface only). Live upc-kommentar integration. Multi-user concurrent editing of one
|
||||||
|
draft. An HTTP service veneer.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## §2 User journeys
|
||||||
|
|
||||||
|
### 2.1 Authoring (new)
|
||||||
|
|
||||||
|
1. m opens **`/admin/templates`** (or `/templates/new`) and uploads a base `.docx`
|
||||||
|
(firm letterhead with caption layout, signature block, etc.).
|
||||||
|
2. docforge's `.docx` importer parses the upload into a **carrier** (opaque OOXML kept
|
||||||
|
intact) + a renderable preview. The page shows a **WYSIWYG-ish render** of the document.
|
||||||
|
3. m highlights a piece of text — e.g. `Az. 4c O 12/23` — and a **variable palette**
|
||||||
|
(sourced from the `ResolverSet.Keys()` catalogue, grouped DE/EN) lets him pick
|
||||||
|
`project.case_number`. The selection is **replaced with a `{{project.case_number}}`
|
||||||
|
slot**; a `template_slots` row records the slot key + its anchor position.
|
||||||
|
4. He repeats for every variable region, saves, and the template becomes pickable in
|
||||||
|
Generation. (Editing the template later creates a new **version** — see §4 D-A3.)
|
||||||
|
|
||||||
|
**Scope guard:** v1 authoring places **text-level slots in body paragraphs**. Slots in
|
||||||
|
headers/footers/tables/text-boxes are a flagged follow-up (§7 note), because the
|
||||||
|
click→OOXML-run mapping there is materially harder.
|
||||||
|
|
||||||
|
### 2.2 Generation (refactor of today)
|
||||||
|
|
||||||
|
1. Lawyer picks a template (uploaded template *or* a legacy Gitea base — both supported
|
||||||
|
during transition) for a submission code, optionally project-scoped.
|
||||||
|
2. A **draft** is created. Its template **structure is snapshotted** at create
|
||||||
|
(§4 D-A3) so later template edits don't shift an in-flight draft.
|
||||||
|
3. The sidebar shows the variable form (data-driven from `ResolverSet.Keys()`); the
|
||||||
|
resolved bag is merged with the lawyer's overrides; the live preview renders with
|
||||||
|
`data-var` click-to-jump; manual prose edits autosave (500 ms debounce).
|
||||||
|
4. Export → docforge binds the model + carrier + resolved variables → `.docx` bytes
|
||||||
|
stream as a download. Audit rows written. No binary retained.
|
||||||
|
|
||||||
|
### 2.3 upc-kommentar parallel journey (deferred — validates the abstractions)
|
||||||
|
|
||||||
|
Not built now, but the abstractions are sized for it: upc-kommentar authors work in
|
||||||
|
**Markdown** (and want to import **foreign doc/docx** as input — m, 2026-05-29 Q4). When
|
||||||
|
it becomes a consumer, it would: implement its own `VariableResolver`(s) over its Postgres
|
||||||
|
(commentary metadata), feed Markdown through docforge's **markdown importer** into the
|
||||||
|
neutral model, edit live in its own Svelte shell (reusing the *wire contract*, not Go
|
||||||
|
code), and export. The Go engine is reached over an HTTP veneer added at that point. This
|
||||||
|
journey is the litmus test for §3's seams: **a new consumer adds resolvers + a transport,
|
||||||
|
touches no engine internals.**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## §3 Module shape
|
||||||
|
|
||||||
|
### 3.1 Package tree
|
||||||
|
|
||||||
|
```
|
||||||
|
pkg/docforge/
|
||||||
|
doc.go // package doc (litigationplanner-style)
|
||||||
|
model.go // neutral model: Document, Block, InlineSpan, Slot
|
||||||
|
template.go // Template, TemplateSlot, Carrier
|
||||||
|
variables.go // VariableResolver interface, VariableKey, ResolverSet, alias registry
|
||||||
|
bind.go // binding engine: walk model, resolve slots, apply missing-marker policy
|
||||||
|
render.go // RenderHTML (preview w/ data-var spans) — format-neutral entry
|
||||||
|
importer.go // Importer interface
|
||||||
|
exporter.go // Exporter interface
|
||||||
|
store.go // TemplateStore interface (carrier bytes + slot persistence contract)
|
||||||
|
errors.go // sentinel errors (ErrUnknownTemplate, ErrUnboundSlot, …)
|
||||||
|
placeholder.go // placeholderRegex + substitution primitives (THE locked grammar)
|
||||||
|
types_wire_test.go // locks the JSON wire shape consumed by the TS editor
|
||||||
|
docx/ // the .docx adapter — first importer + exporter
|
||||||
|
importer.go // DocxImporter: parse .docx -> Carrier + detect/locate slots
|
||||||
|
exporter.go // DocxExporter: (model + carrier + vars) -> .docx bytes [today's compose+merge]
|
||||||
|
ooxml.go // archive/zip + document.xml manipulation [today's submission_merge/compose internals]
|
||||||
|
md_to_ooxml.go // Markdown -> OOXML runs [today's submission_md walker + the b78a984 fix]
|
||||||
|
dotm.go // ConvertDotmToDocx [today's pre-pass]
|
||||||
|
markdown/ // markdown importer (input content; foreign-docx import is a later sibling)
|
||||||
|
importer.go // parse Markdown -> neutral blocks
|
||||||
|
```
|
||||||
|
|
||||||
|
**What lives in docforge vs paliad:**
|
||||||
|
|
||||||
|
| Concern | Home | Why |
|
||||||
|
|---|---|---|
|
||||||
|
| Neutral model, binding, preview-render | `docforge` | format-neutral core |
|
||||||
|
| `VariableResolver` interface + `ResolverSet` | `docforge` | the seam m wants clean |
|
||||||
|
| Placeholder grammar + substitution | `docforge` | shared invariant (§0.5.1) |
|
||||||
|
| `.docx` importer + exporter, MD→OOXML walker | `docforge/docx` | first format adapter (ships *inside* the pkg, like litigationplanner's embedded snapshot) |
|
||||||
|
| Markdown importer | `docforge/markdown` | input-format adapter |
|
||||||
|
| Concrete resolvers (`project`, `parties`, `firm`, `user`, `today`, `deadline`, `procedural_event`) | **paliad** `internal/…` | they read paliad's DB/services |
|
||||||
|
| `TemplateStore` impl (Postgres bytea) | **paliad** | docforge owns no tables |
|
||||||
|
| Section / building-block model, submission codes | **paliad** | consumer-specific composition concepts |
|
||||||
|
| HTTP handlers, editor UI, authoring page | **paliad** | wire + per-consumer UI |
|
||||||
|
|
||||||
|
### 3.2 The neutral model + the carrier (resolving "intermediate, but lossless docx")
|
||||||
|
|
||||||
|
```go
|
||||||
|
// A Document is the format-neutral content model importers produce and exporters consume.
|
||||||
|
type Document struct {
|
||||||
|
Blocks []Block
|
||||||
|
}
|
||||||
|
type Block struct {
|
||||||
|
Kind BlockKind // paragraph | heading | list_item | blockquote | section_marker
|
||||||
|
Style string // logical style key (mapped to a base stylemap on export)
|
||||||
|
Spans []InlineSpan // text runs (bold/italic/link) + Slots
|
||||||
|
// …list level, section key, etc.
|
||||||
|
}
|
||||||
|
type InlineSpan struct {
|
||||||
|
Text string
|
||||||
|
Bold bool
|
||||||
|
Italic bool
|
||||||
|
Link string
|
||||||
|
Slot *Slot // non-nil => this span is a variable slot, not literal text
|
||||||
|
}
|
||||||
|
type Slot struct {
|
||||||
|
Key string // e.g. "project.case_number" — the placeholder grammar key
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**The carrier keeps the lossless guarantee.** The uploaded `.docx` chrome
|
||||||
|
(letterhead, styles, caption, signature) is **never round-tripped through `Document`**.
|
||||||
|
It is held as an opaque `Carrier` (the original OOXML), and the exporter splices the
|
||||||
|
rendered neutral content into the carrier's named anchors, then substitutes slots — exactly
|
||||||
|
today's compose mechanism, now formalised:
|
||||||
|
|
||||||
|
```go
|
||||||
|
type Carrier struct {
|
||||||
|
Format string // "docx"
|
||||||
|
Bytes []byte // original upload, preserved byte-for-byte outside anchor regions
|
||||||
|
Anchors []Anchor // {{#section:KEY}}…{{/section:KEY}} positions + slot positions
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
So **two layers**: editable content = `Document` (neutral, format-pluggable); base chrome =
|
||||||
|
`Carrier` (opaque, lossless). Foreign-docx *import as input content* (Q4) does parse into
|
||||||
|
`Document` and **is inherently lossy** — flagged as a boundary (§8), distinct from the
|
||||||
|
lossless export of *our* templates.
|
||||||
|
|
||||||
|
### 3.3 The variable resolver seam (G3)
|
||||||
|
|
||||||
|
```go
|
||||||
|
// VariableResolver answers keys within one dotted namespace.
|
||||||
|
type VariableResolver interface {
|
||||||
|
Namespace() string // e.g. "project"
|
||||||
|
Resolve(key string) (value string, ok bool)// ok=false => unknown key => missing marker
|
||||||
|
Keys() []VariableKey // catalogue for the palette + sidebar form
|
||||||
|
}
|
||||||
|
type VariableKey struct {
|
||||||
|
Key, LabelDE, LabelEN, Group string
|
||||||
|
}
|
||||||
|
|
||||||
|
// ResolverSet composes namespaced resolvers, registers canonical<->legacy aliases,
|
||||||
|
// and offers BOTH a pull path (Resolve, used during binding) and a push path
|
||||||
|
// (BuildBag, preserving today's resolved_bag/merged_bag wire).
|
||||||
|
type ResolverSet struct{ /* … */ }
|
||||||
|
func (s *ResolverSet) Resolve(key string) (string, bool)
|
||||||
|
func (s *ResolverSet) BuildBag() map[string]string // == today's PlaceholderMap
|
||||||
|
func (s *ResolverSet) Catalogue() []VariableKey // drives the data-driven form/palette
|
||||||
|
func (s *ResolverSet) RegisterAlias(canonical, legacy string)
|
||||||
|
```
|
||||||
|
|
||||||
|
paliad's seven `addXxxVars` functions become seven resolver types implementing this
|
||||||
|
interface. `BuildBag()` reproduces today's flat map exactly (alias parity tests pin it).
|
||||||
|
`Catalogue()` kills the hardcoded `VARIABLE_GROUPS`/`VARIABLE_LABELS` in the TS bundle.
|
||||||
|
**Resolver model = hybrid** (pull-capable interface, push-driven `BuildBag` default —
|
||||||
|
inventor pick, §4 D-I1).
|
||||||
|
|
||||||
|
### 3.4 Wire contract (Go ↔ TS) — preserved, locked by test
|
||||||
|
|
||||||
|
The editor wire stays as-is; `types_wire_test.go` pins it:
|
||||||
|
|
||||||
|
- `GET draft` → `{ draft, resolved_bag, merged_bag, preview_html, rule, parties, sections }`
|
||||||
|
- preview HTML carries `<span class="draft-var" data-var="<key>">…</span>` (built by
|
||||||
|
docforge's `RenderHTML`, today's `emitTextWithDraftVars`).
|
||||||
|
- `PATCH draft` ← `{ variables: PlaceholderMap, … }` (presence-tracked optional fields).
|
||||||
|
- export/preview endpoints unchanged.
|
||||||
|
- **New (authoring):** `POST /api/templates` (upload), `GET /api/templates/:id` (carrier
|
||||||
|
preview + slots), `POST /api/templates/:id/slots` (place slot), `GET /api/docforge/variables`
|
||||||
|
(the `Catalogue()`).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## §4 Decisions (m's picks, 2026-05-29)
|
||||||
|
|
||||||
|
### Prose-grill resolutions (core metaphor)
|
||||||
|
|
||||||
|
| # | Question | m's decision | Note |
|
||||||
|
|---|---|---|---|
|
||||||
|
| P1 | Cross-language sharing model | **Go pkg only; upc-kommentar out of scope for now, "reuse later somehow"** | Interfaces sized so an HTTP veneer is addable without rework. No service built. |
|
||||||
|
| P2 | Intermediate model? | **Yes — but lossless for our .docx** | → carrier (opaque OOXML) + neutral Document (editable content). §3.2. |
|
||||||
|
| P3 | Authoring slot mechanic | **(b) click-to-insert** | Upload → render → click/select → inject `{{…}}`. |
|
||||||
|
| P4 | Input formats | **Markdown primary; foreign doc/docx import later** | Markdown importer first; foreign-docx import is lossy (§8). |
|
||||||
|
| P5 | Editor sharing | **Build paliad's UI; extract generic UX into a UI package** | `frontend/src/lib/docforge-editor/`. |
|
||||||
|
|
||||||
|
### Structured decisions
|
||||||
|
|
||||||
|
| # | Decision | m's pick | Rationale / divergence |
|
||||||
|
|---|---|---|---|
|
||||||
|
| A1 | Authoring UX | **WYSIWYG inline** | Matches "insert variables into the document". Hardest part — render fidelity + click→run mapping — flagged §7. |
|
||||||
|
| A2 | Template storage | **Postgres bytea (interface-backed)** | m leans (1); flagged Supabase Storage as viable. Resolved: behind a `TemplateStore` interface, bytea impl now, Supabase Storage a one-impl swap later. No schema churn either way. |
|
||||||
|
| A3 | Versioning of existing drafts | **Snapshot at draft-create** | Lawyer's in-flight draft won't shift under them; matches today's section-seeding. |
|
||||||
|
| A4 | Migration strategy | **Extract-in-place, then extend** | Lowest risk to the recent fixes — they move with their files + tests; behavior identical at each step. |
|
||||||
|
| B1 | Package name | **`docforge`** | — |
|
||||||
|
| B2 | Schema scope | **New generic tables** (`templates`, `template_slots`, `template_versions`) | Authoring is domain-neutral; submission_bases (Gitea/section_spec) stays for legacy bases with a converge path. |
|
||||||
|
| B3 | UI package extraction | **Extract now** | Authoring reuses it this cycle — earns its keep, not speculative. |
|
||||||
|
| B4 | Exporter pluggability | **Interface now, docx-only impl** | Cheap insurance; matches "pluggable for later". |
|
||||||
|
|
||||||
|
### Inventor picks (m delegated — "whatever works best")
|
||||||
|
|
||||||
|
| # | Pick | Reasoning |
|
||||||
|
|---|---|---|
|
||||||
|
| I1 | `VariableResolver` = pull-capable interface, push `BuildBag()` default | Preserves today's flat-map wire while enabling on-demand resolution + the `Catalogue()` that data-drives the form. |
|
||||||
|
| I2 | `.docx` adapter ships **inside** `pkg/docforge/docx` | Mirrors litigationplanner shipping its embedded snapshot in-package; keeps the first adapter co-located with the engine it proves. |
|
||||||
|
| I3 | Carrier-vs-Document split (§3.2) | Only way to satisfy "intermediate model" AND "lossless our .docx" simultaneously. |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## §5 Data model deltas (paliad-side — docforge owns none)
|
||||||
|
|
||||||
|
**New tables** (additive; SQL drafted by the coder, not here):
|
||||||
|
|
||||||
|
- **`paliad.templates`** — `id`, `slug`, `name_de/en`, `kind` (`'submission'` | generic),
|
||||||
|
`source_format` (`'docx'`), `firm`, `is_active`, `created/updated_by`, timestamps,
|
||||||
|
`current_version_id` FK.
|
||||||
|
- **`paliad.template_versions`** — immutable snapshots: `id`, `template_id` FK,
|
||||||
|
`version` int, `carrier_blob` bytea (the `.docx`; or storage ref via `TemplateStore`),
|
||||||
|
`created_at`, `created_by`. Editing a template inserts a new version row.
|
||||||
|
- **`paliad.template_slots`** — `id`, `template_version_id` FK, `slot_key` (the variable
|
||||||
|
key, e.g. `project.case_number`), `anchor` (position encoding — see flag below),
|
||||||
|
`label`, `order_index`. Versioned alongside the carrier.
|
||||||
|
|
||||||
|
**Snapshot semantics (A3):** a draft pins `template_version_id`. Template edits create a
|
||||||
|
new version; existing drafts keep their pinned version. *(Flag for coder: pin
|
||||||
|
`template_version_id` on the draft vs. copy a `template_snapshot` jsonb onto the draft —
|
||||||
|
both satisfy A3; the version-table approach is preferred for auditability but the coder
|
||||||
|
picks based on query ergonomics.)*
|
||||||
|
|
||||||
|
**Touched existing tables:**
|
||||||
|
|
||||||
|
- `submission_drafts` — add nullable `template_version_id` for uploaded-template drafts;
|
||||||
|
**legacy `base_id` path preserved** (extract-in-place ⇒ no data migration of the 11
|
||||||
|
existing drafts; §0.5.8 fallback intact).
|
||||||
|
- `submission_bases`, `submission_sections`, `submission_building_blocks` — **unchanged**.
|
||||||
|
They remain paliad consumer-specific concepts that map onto docforge's neutral model at
|
||||||
|
render time. submission_bases (Gitea-backed) coexists with the new uploaded-template
|
||||||
|
tables during transition; convergence is a later, separate task.
|
||||||
|
|
||||||
|
**Slot anchor encoding (flag for coder):** how a `template_slots.anchor` records *where*
|
||||||
|
in the carrier OOXML the slot sits (run index + offset, vs. a stable sentinel token
|
||||||
|
injected into the carrier at authoring time). The sentinel-token approach is likely
|
||||||
|
simpler and reuses the existing cross-run substitution machinery — resolve in
|
||||||
|
implementation chat.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## §6 Migration plan (protects working code + the recent fixes)
|
||||||
|
|
||||||
|
**Principle:** extract-in-place (A4). Each step **compiles, passes the moved tests, and
|
||||||
|
leaves observable behavior identical.** The recent fixes travel *with their files*:
|
||||||
|
|
||||||
|
- The **b78a984 underscore fix** → `pkg/docforge/docx/md_to_ooxml.go` (was
|
||||||
|
`submission_md.go` `parseInlineSpans`), `submission_md_test.go` moves alongside.
|
||||||
|
- **`placeholderRegex`** → `pkg/docforge/placeholder.go`; its tests move.
|
||||||
|
- **`data-var` / `emitTextWithDraftVars`** → `pkg/docforge/render.go` (`RenderHTML`);
|
||||||
|
wire test moves and is pinned in `types_wire_test.go`.
|
||||||
|
- **Cross-run merge, `.dotm`→`.docx`, anchor splicing** → `pkg/docforge/docx/`; tests move.
|
||||||
|
- **Building-block + section model, submission codes, the 7 concrete resolvers** stay in
|
||||||
|
`internal/` (consumer-specific) — now calling into docforge.
|
||||||
|
|
||||||
|
**Safety rails per step:** (1) `go build ./...` green; (2) the moved test files green; (3)
|
||||||
|
a golden-export check — generate a known draft before and after the step, assert byte-equal
|
||||||
|
`.docx`; (4) the live preview HTML for a fixture draft is string-equal (the `data-var`
|
||||||
|
contract). No step ships until all four hold.
|
||||||
|
|
||||||
|
**What is explicitly NOT migrated:** the 11 pre-Composer drafts (`base_id IS NULL`) keep
|
||||||
|
the v1 fallback render path; no auto-upgrade (§0.5.8).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## §7 Slice train
|
||||||
|
|
||||||
|
Tracer-bullet vertical slices, each independently shippable. Slices 1–3 are pure
|
||||||
|
behavior-preserving refactors (the risky-to-working-code part, front-loaded under golden
|
||||||
|
checks); 4–7 build the new capability; 8 sets up the future.
|
||||||
|
|
||||||
|
1. **Extract the docx engine** — move MD→OOXML walker, OOXML merge/compose, placeholder
|
||||||
|
grammar, `.dotm`→`.docx` into `pkg/docforge/{placeholder.go, render.go, docx/}`.
|
||||||
|
paliad's `submission_*` services become thin adapters. Golden-export + preview checks
|
||||||
|
green. *Protects b78a984, the regex, the data-var contract.*
|
||||||
|
2. **Neutral model + binding** — introduce `Document`/`Block`/`Slot`/`Carrier` + `bind.go`;
|
||||||
|
refactor the docx exporter to consume the neutral model (sections → blocks → OOXML
|
||||||
|
spliced into carrier). Behavior identical (golden checks).
|
||||||
|
3. **`VariableResolver` interface** — refactor the 7 `addXxxVars` into resolver types +
|
||||||
|
`ResolverSet`; `BuildBag()` reproduces today's map (alias-parity tests pin it);
|
||||||
|
`Catalogue()` exposed. Frontend form switched to consume `Catalogue()` (kills hardcoded
|
||||||
|
`VARIABLE_GROUPS`).
|
||||||
|
4. **Template store + schema** — `templates`/`template_versions`/`template_slots` +
|
||||||
|
Postgres-bytea `TemplateStore` impl. No UI yet. Additive migrations.
|
||||||
|
5. **UI package extraction** — pull generic plumbing (debounced autosave, data-var wiring,
|
||||||
|
preview/export round-trip, focus preservation, sticky collapse) into
|
||||||
|
`frontend/src/lib/docforge-editor/`; submission editor consumes it. Refactor, behavior
|
||||||
|
identical.
|
||||||
|
6. **Authoring page** — upload `.docx` → docforge docx-importer → WYSIWYG render → select
|
||||||
|
text → pick variable from `Catalogue()` palette → inject slot (writes
|
||||||
|
`template_slots` + new `template_version`). Reuses the UI package + docforge importer.
|
||||||
|
*(v1: body-paragraph text slots only.)*
|
||||||
|
7. **Generation on uploaded templates** — generation page picks an uploaded template
|
||||||
|
(`template_version_id` path) alongside legacy bases; snapshot-at-create; data-bind +
|
||||||
|
manual edit + export via docforge. Legacy base path still works.
|
||||||
|
8. **Markdown importer + exporter-interface finalisation** — `docforge/markdown` importer
|
||||||
|
as input; `Exporter` interface locked (docx-only impl). Sets up future formats +
|
||||||
|
eventual upc-kommentar reuse.
|
||||||
|
|
||||||
|
**Flagged follow-ups (post-train, separate tasks):** slots in headers/footers/tables;
|
||||||
|
foreign-docx import fidelity; the HTTP veneer + a TS consumer; submission_bases →
|
||||||
|
templates convergence; auto-upgrade of pre-Composer drafts.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## §8 Out of scope
|
||||||
|
|
||||||
|
- **Implementation, migration SQL, code.** PRD only.
|
||||||
|
- **upc-kommentar as a live consumer** — deferred; abstractions sized for it, nothing built.
|
||||||
|
- **An HTTP service veneer** — addable later without engine rework; not now.
|
||||||
|
- **Formats beyond `.docx`** — `Exporter` interface defined (B4), only the docx impl built.
|
||||||
|
- **Lossless import of *foreign* `.docx`** — our own templates export losslessly via the
|
||||||
|
carrier; importing an arbitrary third-party Word doc as input content is best-effort and
|
||||||
|
inherently lossy. Distinct guarantee.
|
||||||
|
- **Multi-user concurrent editing** of one draft.
|
||||||
|
- **Re-proposing the current `submission_*.go` shape** — the point is to extract + clean it.
|
||||||
|
- **Slots outside body paragraphs** (headers/footers/tables/text-boxes) in authoring v1.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix — open flags for the coder (resolve in implementation chat)
|
||||||
|
|
||||||
|
1. **Slot anchor encoding** — run-index+offset vs. injected sentinel token (§5). Lean
|
||||||
|
sentinel.
|
||||||
|
2. **Snapshot mechanism** — pinned `template_version_id` vs. `template_snapshot` jsonb on
|
||||||
|
the draft (§5). Lean version-pin.
|
||||||
|
3. **Authoring render fidelity** — reuse the existing lossy `docXMLToHTML` preview for the
|
||||||
|
WYSIWYG surface, or invest in higher fidelity. Lean reuse for v1, accept that
|
||||||
|
complex layouts render approximately while slots still anchor correctly.
|
||||||
|
4. **Storage backend** — Postgres bytea now; Supabase Storage is a clean `TemplateStore`
|
||||||
|
swap if template volume/size grows.
|
||||||
59
internal/services/docforge_shims.go
Normal file
59
internal/services/docforge_shims.go
Normal file
@@ -0,0 +1,59 @@
|
|||||||
|
package services
|
||||||
|
|
||||||
|
// Shims bridging the submission generator to the extracted docforge .docx
|
||||||
|
// adapter (pkg/docforge/docx). Slice 1 of the docforge train
|
||||||
|
// (t-paliad-349 / m/paliad#157) relocated the Markdown→OOXML walker, the
|
||||||
|
// placeholder substitution engine, and the .dotm→.docx converter into
|
||||||
|
// pkg/docforge/docx with no behaviour change. These type aliases and
|
||||||
|
// forwarders keep every existing caller in internal/services and
|
||||||
|
// internal/handlers compiling and behaving identically — the names,
|
||||||
|
// signatures, and semantics are unchanged; only the implementation moved.
|
||||||
|
//
|
||||||
|
// Later slices retire these shims as the submission services are
|
||||||
|
// refactored to call docforge directly through the neutral model and the
|
||||||
|
// VariableResolver interface.
|
||||||
|
|
||||||
|
import "mgit.msbls.de/m/paliad/pkg/docforge/docx"
|
||||||
|
|
||||||
|
// PlaceholderMap is the variable bag (dotted-key → substituted value),
|
||||||
|
// built by SubmissionVarsService and consumed by the renderer.
|
||||||
|
type PlaceholderMap = docx.PlaceholderMap
|
||||||
|
|
||||||
|
// MissingPlaceholderFn translates an unbound placeholder key into the
|
||||||
|
// in-document marker token.
|
||||||
|
type MissingPlaceholderFn = docx.MissingPlaceholderFn
|
||||||
|
|
||||||
|
// SubmissionRenderer renders a .docx template by substituting
|
||||||
|
// {{placeholder}} tokens. Stateless; safe for concurrent use.
|
||||||
|
type SubmissionRenderer = docx.SubmissionRenderer
|
||||||
|
|
||||||
|
// HyperlinkAllocator hands the Markdown walker a rId for each external
|
||||||
|
// URL it encounters in [label](url) inline links.
|
||||||
|
type HyperlinkAllocator = docx.HyperlinkAllocator
|
||||||
|
|
||||||
|
// NewSubmissionRenderer constructs the renderer.
|
||||||
|
func NewSubmissionRenderer() *SubmissionRenderer { return docx.NewSubmissionRenderer() }
|
||||||
|
|
||||||
|
// DefaultMissingMarker returns the standard missing-value marker for the
|
||||||
|
// given UI language ("[KEIN WERT: <key>]" / "[NO VALUE: <key>]").
|
||||||
|
func DefaultMissingMarker(lang string) MissingPlaceholderFn { return docx.DefaultMissingMarker(lang) }
|
||||||
|
|
||||||
|
// RenderMarkdownToOOXML renders Markdown source into OOXML paragraph
|
||||||
|
// elements using a single paragraph style.
|
||||||
|
func RenderMarkdownToOOXML(md, paragraphStyle string) string {
|
||||||
|
return docx.RenderMarkdownToOOXML(md, paragraphStyle)
|
||||||
|
}
|
||||||
|
|
||||||
|
// RenderMarkdownToOOXMLWithStyles is the full rich-prose entry point
|
||||||
|
// (headings, lists, blockquote, inline hyperlinks via the allocator).
|
||||||
|
func RenderMarkdownToOOXMLWithStyles(md string, stylemap map[string]string, links HyperlinkAllocator) string {
|
||||||
|
return docx.RenderMarkdownToOOXMLWithStyles(md, stylemap, links)
|
||||||
|
}
|
||||||
|
|
||||||
|
// ConvertDotmToDocx rewrites a .dotm/.docm/.dotx zip into a clean .docx
|
||||||
|
// zip. Idempotent on a zip that is already a plain .docx.
|
||||||
|
func ConvertDotmToDocx(dotmBytes []byte) ([]byte, error) { return docx.ConvertDotmToDocx(dotmBytes) }
|
||||||
|
|
||||||
|
// SanitiseSubmissionFileName cleans a string for use inside a download
|
||||||
|
// filename (strips path separators / quotes, ASCII-folds DE umlauts).
|
||||||
|
func SanitiseSubmissionFileName(s string) string { return docx.SanitiseSubmissionFileName(s) }
|
||||||
@@ -1,93 +1,73 @@
|
|||||||
package services
|
package services
|
||||||
|
|
||||||
// Composer render pipeline — t-paliad-313 Slice B (design doc §9.1 +
|
// Composer wrapper — bridges paliad's submission draft model
|
||||||
// §9.2). Assembles a base .docx and a draft's section rows into a
|
// (SubmissionSection + SubmissionBase) to the format-neutral docforge
|
||||||
// merged .docx ready for export.
|
// .docx composer (pkg/docforge/docx), extracted in slice 2 of the
|
||||||
|
// docforge train (t-paliad-349 / m/paliad#157).
|
||||||
//
|
//
|
||||||
// Pipeline (high-level):
|
// The full splice/assembly pipeline now lives in pkg/docforge/docx
|
||||||
|
// (compose.go): macro pre-pass, anchor-pair splicing, append-before-sectPr,
|
||||||
|
// hyperlink-rels patching, zip repack, and the final placeholder pass. This
|
||||||
|
// wrapper does the one thing the engine must not know about — mapping
|
||||||
|
// paliad's DB row types onto the neutral docx.Section / docx.Carrier
|
||||||
|
// inputs. Behaviour is byte-identical to the pre-extraction composer; the
|
||||||
|
// in-package compose_test still drives this wrapper end-to-end.
|
||||||
//
|
//
|
||||||
// 1. ConvertDotmToDocx pre-pass on the base bytes (idempotent on .docx).
|
// Slice note: the paragraph-level neutral document model (Document / Block
|
||||||
// 2. Locate `word/document.xml` inside the zip; pull the body XML.
|
// / Slot) the PRD §3.2 sketches lands in slice 6, where the authoring
|
||||||
// 3. For each section in the draft (order_index ASC, included=true):
|
// importer and the format exporters actually consume it. Building it now,
|
||||||
// render content_md_<lang> → OOXML via RenderMarkdownToOOXML using
|
// ahead of any consumer, would be speculative and would put the
|
||||||
// base.section_spec.stylemap.paragraph.
|
// byte-identical guarantee at risk for no gain (PRD §4 B3 principle:
|
||||||
// 4. Splice the rendered OOXML into the base body. Two splice modes:
|
// extractions earn their keep this cycle).
|
||||||
// - Anchor mode: when the body carries `{{#section:KEY}}` /
|
|
||||||
// `{{/section:KEY}}` marker pairs, replace the slot's content
|
|
||||||
// (including the anchor paragraphs themselves) with the rendered
|
|
||||||
// section.
|
|
||||||
// - Append mode: when no anchor pair is found for a section, the
|
|
||||||
// rendered OOXML appends at the end of the body, just before any
|
|
||||||
// `<w:sectPr>` element. Sections with `included=false` are
|
|
||||||
// dropped silently.
|
|
||||||
// 5. Strip any leftover unmatched anchor paragraphs.
|
|
||||||
// 6. Re-pack the document.xml into the zip, leaving every other part
|
|
||||||
// untouched.
|
|
||||||
// 7. Run the v1 SubmissionRenderer placeholder pass over the assembly
|
|
||||||
// so `{{path}}` placeholders inside section content (and inside
|
|
||||||
// the base's untouched chrome) get substituted by the merged bag.
|
|
||||||
// Cross-run merge in pass 2 handles autocorrect-fragmented
|
|
||||||
// placeholders the same as v1.
|
|
||||||
//
|
|
||||||
// Result: a fully-merged .docx. No new third-party Go dep — reuses
|
|
||||||
// archive/zip + the existing SubmissionRenderer.
|
|
||||||
|
|
||||||
import (
|
import (
|
||||||
"archive/zip"
|
|
||||||
"bytes"
|
|
||||||
"context"
|
"context"
|
||||||
"fmt"
|
"fmt"
|
||||||
"io"
|
|
||||||
"regexp"
|
"mgit.msbls.de/m/paliad/pkg/docforge/docx"
|
||||||
"sort"
|
|
||||||
"strings"
|
|
||||||
"time"
|
|
||||||
)
|
)
|
||||||
|
|
||||||
// SubmissionComposer assembles base + sections into a final .docx.
|
// SubmissionComposer assembles a base + a draft's sections into a final
|
||||||
// Stateless; safe for concurrent use.
|
// .docx. Stateless; safe for concurrent use.
|
||||||
type SubmissionComposer struct {
|
type SubmissionComposer struct {
|
||||||
renderer *SubmissionRenderer
|
inner *docx.Composer
|
||||||
}
|
}
|
||||||
|
|
||||||
// NewSubmissionComposer wires the composer. The renderer is required —
|
// NewSubmissionComposer wires the composer. The renderer is required — a
|
||||||
// a nil renderer is a programmer error and the composer panics at
|
// nil renderer is a programmer error and the composer panics at
|
||||||
// construction.
|
// construction.
|
||||||
func NewSubmissionComposer(renderer *SubmissionRenderer) *SubmissionComposer {
|
func NewSubmissionComposer(renderer *SubmissionRenderer) *SubmissionComposer {
|
||||||
if renderer == nil {
|
return &SubmissionComposer{inner: docx.NewComposer(renderer)}
|
||||||
panic("submission composer: renderer required")
|
|
||||||
}
|
|
||||||
return &SubmissionComposer{renderer: renderer}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// ComposeOptions carries the per-call composition inputs.
|
// ComposeOptions carries the per-call composition inputs in paliad's own
|
||||||
|
// terms (SubmissionSection rows + the SubmissionBase chrome).
|
||||||
type ComposeOptions struct {
|
type ComposeOptions struct {
|
||||||
// Sections are the draft's section rows in display order. The
|
// Sections are the draft's section rows in display order. Included
|
||||||
// composer renders included sections; excluded rows are dropped.
|
// sections render; excluded rows are dropped. The caller is
|
||||||
// Caller is responsible for visibility — by the time the composer
|
// responsible for visibility — by the time the composer runs the rows
|
||||||
// runs, the section rows have already been gated through
|
// have already been gated through SubmissionDraftService.Get +
|
||||||
// SubmissionDraftService.Get + can_see_project.
|
// can_see_project.
|
||||||
Sections []SubmissionSection
|
Sections []SubmissionSection
|
||||||
|
|
||||||
// Base supplies the document chrome (.docx body host) plus the
|
// Base supplies the document chrome plus the stylemap for the MD
|
||||||
// stylemap for the MD walker. Must not be nil.
|
// walker. Must not be nil.
|
||||||
Base *SubmissionBase
|
Base *SubmissionBase
|
||||||
|
|
||||||
// BaseBytes is the raw .docx bytes for the base. Typically fetched
|
// BaseBytes is the raw .docx bytes for the base, typically fetched
|
||||||
// from Gitea via the existing template cache.
|
// from Gitea via the existing template cache.
|
||||||
BaseBytes []byte
|
BaseBytes []byte
|
||||||
|
|
||||||
// Lang ('de' or 'en') selects which content_md_* column the
|
// Lang ('de' or 'en') selects which content_md_* column the composer
|
||||||
// composer reads per section. Defaults to 'de' if empty.
|
// reads per section. Defaults to 'de' if empty.
|
||||||
Lang string
|
Lang string
|
||||||
|
|
||||||
// Vars is the merged placeholder bag the v1 renderer pass
|
// Vars is the merged placeholder bag the renderer pass substitutes
|
||||||
// substitutes after the composer assembly. Passed straight through
|
// after assembly.
|
||||||
// to SubmissionRenderer.Render.
|
|
||||||
Vars PlaceholderMap
|
Vars PlaceholderMap
|
||||||
|
|
||||||
// Missing translates an unbound placeholder key into the marker
|
// Missing translates an unbound placeholder key into the marker the
|
||||||
// the lawyer sees in Word. Passed straight to the renderer.
|
// lawyer sees in Word.
|
||||||
Missing MissingPlaceholderFn
|
Missing MissingPlaceholderFn
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -96,512 +76,24 @@ func (c *SubmissionComposer) Compose(ctx context.Context, opts ComposeOptions) (
|
|||||||
if opts.Base == nil {
|
if opts.Base == nil {
|
||||||
return nil, fmt.Errorf("submission compose: base required")
|
return nil, fmt.Errorf("submission compose: base required")
|
||||||
}
|
}
|
||||||
_ = ctx // reserved for cancellation propagation in later slices
|
secs := make([]docx.Section, len(opts.Sections))
|
||||||
sections := opts.Sections
|
for i, s := range opts.Sections {
|
||||||
|
secs[i] = docx.Section{
|
||||||
// Pre-pass: strip macros so the base reads as a plain .docx zip.
|
Key: s.SectionKey,
|
||||||
cleanBytes, err := ConvertDotmToDocx(opts.BaseBytes)
|
OrderIndex: s.OrderIndex,
|
||||||
if err != nil {
|
Included: s.Included,
|
||||||
return nil, fmt.Errorf("submission compose: convert base: %w", err)
|
ContentMDDE: s.ContentMDDE,
|
||||||
}
|
ContentMDEN: s.ContentMDEN,
|
||||||
|
|
||||||
// Locate + extract word/document.xml so we can splice in-place.
|
|
||||||
documentXML, otherParts, err := splitBaseZip(cleanBytes)
|
|
||||||
if err != nil {
|
|
||||||
return nil, err
|
|
||||||
}
|
|
||||||
|
|
||||||
// Per-compose hyperlink allocator. Each unique URL gets a fresh
|
|
||||||
// rId outside the base's existing namespace. The post-pass
|
|
||||||
// (patchDocumentXMLRels) writes the matching Relationship rows
|
|
||||||
// before the zip is repacked. Slice D adds inline `[label](url)`
|
|
||||||
// hyperlink support.
|
|
||||||
linkAlloc := newComposerLinkAllocator()
|
|
||||||
|
|
||||||
// Build the rendered-section map: section_key → OOXML span.
|
|
||||||
stylemap := opts.Base.SectionSpec.Stylemap
|
|
||||||
rendered := make(map[string]string, len(sections))
|
|
||||||
keptSections := make([]SubmissionSection, 0, len(sections))
|
|
||||||
for _, sec := range sections {
|
|
||||||
if !sec.Included {
|
|
||||||
continue
|
|
||||||
}
|
}
|
||||||
md := sec.ContentMDDE
|
|
||||||
if strings.EqualFold(opts.Lang, "en") {
|
|
||||||
md = sec.ContentMDEN
|
|
||||||
}
|
|
||||||
rendered[sec.SectionKey] = RenderMarkdownToOOXMLWithStyles(md, stylemap, linkAlloc.Alloc)
|
|
||||||
keptSections = append(keptSections, sec)
|
|
||||||
}
|
}
|
||||||
// Stable order — already sorted ascending by ListForDraft, but
|
return c.inner.Compose(ctx, docx.ComposeOptions{
|
||||||
// belt-and-braces in case the caller swaps the ordering policy
|
Sections: secs,
|
||||||
// later.
|
Carrier: docx.Carrier{
|
||||||
sort.SliceStable(keptSections, func(i, j int) bool {
|
Bytes: opts.BaseBytes,
|
||||||
return keptSections[i].OrderIndex < keptSections[j].OrderIndex
|
Stylemap: opts.Base.SectionSpec.Stylemap,
|
||||||
|
},
|
||||||
|
Lang: opts.Lang,
|
||||||
|
Vars: opts.Vars,
|
||||||
|
Missing: opts.Missing,
|
||||||
})
|
})
|
||||||
|
|
||||||
assembledBody := spliceSections(documentXML, rendered, keptSections, sections)
|
|
||||||
|
|
||||||
// Slice D hyperlink patch: when the walker emitted hyperlink rIds
|
|
||||||
// for inline `[label](url)` links, the base's
|
|
||||||
// word/_rels/document.xml.rels needs matching <Relationship>
|
|
||||||
// entries so Word can resolve the rIds. Mutates one zip part in
|
|
||||||
// otherParts (or appends if missing).
|
|
||||||
if linkAlloc.HasLinks() {
|
|
||||||
updatedParts, err := patchDocumentXMLRels(otherParts, linkAlloc.Pairs())
|
|
||||||
if err != nil {
|
|
||||||
return nil, err
|
|
||||||
}
|
|
||||||
otherParts = updatedParts
|
|
||||||
}
|
|
||||||
|
|
||||||
// Re-pack into a zip with the assembled document.xml. All other
|
|
||||||
// parts (styles, fonts, headers, footers, theme, settings) pass
|
|
||||||
// through bit-for-bit at their original mtime + compression.
|
|
||||||
repacked, err := repackBaseZip(otherParts, assembledBody)
|
|
||||||
if err != nil {
|
|
||||||
return nil, err
|
|
||||||
}
|
|
||||||
|
|
||||||
// Final pass: substitute placeholders against the merged bag. The
|
|
||||||
// existing renderer handles cross-run fragmentation, the `{{rule.X}}`
|
|
||||||
// alias contract, and the missing-marker emission. Reusing it
|
|
||||||
// guarantees v1's placeholder grammar stays intact inside section
|
|
||||||
// content + base chrome.
|
|
||||||
merged, err := c.renderer.Render(repacked, opts.Vars, opts.Missing)
|
|
||||||
if err != nil {
|
|
||||||
return nil, fmt.Errorf("submission compose: placeholder pass: %w", err)
|
|
||||||
}
|
|
||||||
return merged, nil
|
|
||||||
}
|
|
||||||
|
|
||||||
// ─────────────────────────────────────────────────────────────────────
|
|
||||||
// Section splicing
|
|
||||||
// ─────────────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
// Anchor markers as they appear inside a <w:t> text node. We don't
|
|
||||||
// need a full XML parse — finding the marker text inside the body is
|
|
||||||
// sufficient because:
|
|
||||||
// - {{ and }} are never legitimate document content (placeholders
|
|
||||||
// follow the same convention everywhere else in paliad).
|
|
||||||
// - The anchor key grammar [A-Za-z0-9_]+ rules out any HTML/XML
|
|
||||||
// special characters.
|
|
||||||
// - Each anchor lives in exactly one <w:t>...<w:t>, which lives in
|
|
||||||
// exactly one <w:r>...</w:r>, which lives in exactly one
|
|
||||||
// <w:p>...</w:p>. We expand from the marker outward to find the
|
|
||||||
// enclosing <w:p> span and drop the entire paragraph as part of
|
|
||||||
// the splice.
|
|
||||||
//
|
|
||||||
// RE2 has no lookahead, so the "find enclosing <w:p>" logic is
|
|
||||||
// implemented as manual byte-index search around the marker hit
|
|
||||||
// (anchorParagraphSpan below) rather than a single regex pattern.
|
|
||||||
|
|
||||||
const (
|
|
||||||
anchorOpenPrefix = "{{#section:"
|
|
||||||
anchorClosePrefix = "{{/section:"
|
|
||||||
anchorSuffix = "}}"
|
|
||||||
)
|
|
||||||
|
|
||||||
// anchorKeyRegex validates that the captured anchor key is a clean
|
|
||||||
// identifier. Keys that include other characters (which can't actually
|
|
||||||
// appear in our authored .docx) are treated as no match.
|
|
||||||
var anchorKeyRegex = regexp.MustCompile(`^[A-Za-z0-9_]+$`)
|
|
||||||
|
|
||||||
// anchorPair records the byte span of one matched anchor pair inside
|
|
||||||
// the body — from the start of the opening anchor's <w:p> element
|
|
||||||
// through the end of the closing anchor's </w:p>.
|
|
||||||
type anchorPair struct {
|
|
||||||
key string
|
|
||||||
openStart int // start of <w:p> for the opening anchor
|
|
||||||
closeEnd int // index just past </w:p> for the closing anchor
|
|
||||||
}
|
|
||||||
|
|
||||||
// findAllAnchorPairs scans the body for matched open/close anchor
|
|
||||||
// pairs. Unbalanced markers (open without close, or vice versa) are
|
|
||||||
// dropped from the result. Returns pairs in body-order; each pair's
|
|
||||||
// span is non-overlapping.
|
|
||||||
func findAllAnchorPairs(body string) []anchorPair {
|
|
||||||
type marker struct {
|
|
||||||
key string
|
|
||||||
paraStart int
|
|
||||||
paraEnd int
|
|
||||||
isOpen bool
|
|
||||||
}
|
|
||||||
var markers []marker
|
|
||||||
|
|
||||||
collect := func(prefix string, isOpen bool) {
|
|
||||||
offset := 0
|
|
||||||
for {
|
|
||||||
idx := strings.Index(body[offset:], prefix)
|
|
||||||
if idx < 0 {
|
|
||||||
return
|
|
||||||
}
|
|
||||||
start := offset + idx
|
|
||||||
suffixIdx := strings.Index(body[start+len(prefix):], anchorSuffix)
|
|
||||||
if suffixIdx < 0 {
|
|
||||||
return
|
|
||||||
}
|
|
||||||
key := body[start+len(prefix) : start+len(prefix)+suffixIdx]
|
|
||||||
if !anchorKeyRegex.MatchString(key) {
|
|
||||||
offset = start + len(prefix)
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
markerEnd := start + len(prefix) + suffixIdx + len(anchorSuffix)
|
|
||||||
pStart, pEnd, ok := paragraphSpanAround(body, start, markerEnd)
|
|
||||||
if !ok {
|
|
||||||
offset = markerEnd
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
markers = append(markers, marker{key: key, paraStart: pStart, paraEnd: pEnd, isOpen: isOpen})
|
|
||||||
offset = pEnd
|
|
||||||
}
|
|
||||||
}
|
|
||||||
collect(anchorOpenPrefix, true)
|
|
||||||
collect(anchorClosePrefix, false)
|
|
||||||
|
|
||||||
// Walk markers in body-order, matching each open with the next
|
|
||||||
// close that carries the same key.
|
|
||||||
sort.SliceStable(markers, func(i, j int) bool {
|
|
||||||
return markers[i].paraStart < markers[j].paraStart
|
|
||||||
})
|
|
||||||
var pairs []anchorPair
|
|
||||||
openStack := map[string]marker{}
|
|
||||||
for _, m := range markers {
|
|
||||||
if m.isOpen {
|
|
||||||
openStack[m.key] = m
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
o, ok := openStack[m.key]
|
|
||||||
if !ok {
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
pairs = append(pairs, anchorPair{
|
|
||||||
key: m.key,
|
|
||||||
openStart: o.paraStart,
|
|
||||||
closeEnd: m.paraEnd,
|
|
||||||
})
|
|
||||||
delete(openStack, m.key)
|
|
||||||
}
|
|
||||||
return pairs
|
|
||||||
}
|
|
||||||
|
|
||||||
// paragraphSpanAround returns the byte span of the smallest `<w:p>...</w:p>`
|
|
||||||
// element that fully contains the byte range [markerStart, markerEnd).
|
|
||||||
// Returns false when the byte range doesn't sit inside a single
|
|
||||||
// paragraph (which would mean the marker survived a cross-paragraph
|
|
||||||
// edit — defensive guard, shouldn't happen in well-formed input).
|
|
||||||
func paragraphSpanAround(body string, markerStart, markerEnd int) (int, int, bool) {
|
|
||||||
// Walk backwards to find the nearest unclosed <w:p ... > opening.
|
|
||||||
// Since <w:p> doesn't nest, the nearest <w:p before markerStart is
|
|
||||||
// the enclosing paragraph's opening tag.
|
|
||||||
pStart := -1
|
|
||||||
cursor := markerStart
|
|
||||||
for cursor > 0 {
|
|
||||||
idx := strings.LastIndex(body[:cursor], "<w:p")
|
|
||||||
if idx < 0 {
|
|
||||||
break
|
|
||||||
}
|
|
||||||
// Confirm this is a paragraph open, not a different
|
|
||||||
// w:p-prefixed tag (e.g. <w:pPr>).
|
|
||||||
if idx+4 <= len(body) {
|
|
||||||
after := body[idx+4]
|
|
||||||
if after == ' ' || after == '>' || after == '/' {
|
|
||||||
// <w:p ...> or <w:p>; not <w:pPr>.
|
|
||||||
close := strings.Index(body[idx:], ">")
|
|
||||||
if close < 0 {
|
|
||||||
return 0, 0, false
|
|
||||||
}
|
|
||||||
pStart = idx
|
|
||||||
break
|
|
||||||
}
|
|
||||||
}
|
|
||||||
cursor = idx
|
|
||||||
}
|
|
||||||
if pStart < 0 {
|
|
||||||
return 0, 0, false
|
|
||||||
}
|
|
||||||
// Walk forward to find the matching </w:p>. <w:p> doesn't nest so
|
|
||||||
// the next </w:p> after the marker is the close.
|
|
||||||
pEndIdx := strings.Index(body[markerEnd:], "</w:p>")
|
|
||||||
if pEndIdx < 0 {
|
|
||||||
return 0, 0, false
|
|
||||||
}
|
|
||||||
pEnd := markerEnd + pEndIdx + len("</w:p>")
|
|
||||||
return pStart, pEnd, true
|
|
||||||
}
|
|
||||||
|
|
||||||
// spliceSections replaces anchor slots with rendered sections and
|
|
||||||
// appends any unanchored sections before sectPr. Returns the assembled
|
|
||||||
// document.xml body.
|
|
||||||
func spliceSections(documentXML []byte, rendered map[string]string, kept []SubmissionSection, all []SubmissionSection) []byte {
|
|
||||||
body := string(documentXML)
|
|
||||||
pairs := findAllAnchorPairs(body)
|
|
||||||
|
|
||||||
// Build a lookup of kept section keys for quick membership tests.
|
|
||||||
keptByKey := map[string]int{}
|
|
||||||
for i, sec := range kept {
|
|
||||||
keptByKey[sec.SectionKey] = i
|
|
||||||
}
|
|
||||||
allByKey := map[string]int{}
|
|
||||||
for i, sec := range all {
|
|
||||||
allByKey[sec.SectionKey] = i
|
|
||||||
}
|
|
||||||
|
|
||||||
matchedKeys := map[string]bool{}
|
|
||||||
|
|
||||||
// Walk pairs in REVERSE body-order so slice mutations don't shift
|
|
||||||
// later offsets.
|
|
||||||
sort.SliceStable(pairs, func(i, j int) bool {
|
|
||||||
return pairs[i].openStart > pairs[j].openStart
|
|
||||||
})
|
|
||||||
for _, p := range pairs {
|
|
||||||
replacement := ""
|
|
||||||
if idx, ok := keptByKey[p.key]; ok {
|
|
||||||
replacement = rendered[p.key]
|
|
||||||
matchedKeys[p.key] = true
|
|
||||||
_ = idx
|
|
||||||
} else if _, isOnDraft := allByKey[p.key]; isOnDraft {
|
|
||||||
// Anchor matches an excluded section on the draft — drop
|
|
||||||
// the entire slot.
|
|
||||||
replacement = ""
|
|
||||||
} else {
|
|
||||||
// Anchor doesn't match any section on this draft — drop
|
|
||||||
// to leave the base's chrome unbroken.
|
|
||||||
replacement = ""
|
|
||||||
}
|
|
||||||
body = body[:p.openStart] + replacement + body[p.closeEnd:]
|
|
||||||
}
|
|
||||||
|
|
||||||
// Append unanchored sections before sectPr in order_index ASC.
|
|
||||||
var unanchored strings.Builder
|
|
||||||
for _, sec := range kept {
|
|
||||||
if matchedKeys[sec.SectionKey] {
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
unanchored.WriteString(rendered[sec.SectionKey])
|
|
||||||
}
|
|
||||||
if unanchored.Len() > 0 {
|
|
||||||
body = appendBeforeSectPr(body, unanchored.String())
|
|
||||||
}
|
|
||||||
|
|
||||||
return []byte(body)
|
|
||||||
}
|
|
||||||
|
|
||||||
// appendBeforeSectPr inserts content immediately before the first
|
|
||||||
// `<w:sectPr` element in the body, or at the end of the body if there
|
|
||||||
// is none. Word documents conventionally close the body with a sectPr
|
|
||||||
// describing page setup; we want to land sections before that element
|
|
||||||
// so they show up on the actual pages.
|
|
||||||
var sectPrRegex = regexp.MustCompile(`<w:sectPr\b`)
|
|
||||||
|
|
||||||
func appendBeforeSectPr(body, content string) string {
|
|
||||||
loc := sectPrRegex.FindStringIndex(body)
|
|
||||||
if loc == nil {
|
|
||||||
// No sectPr → append before `</w:body>` if present, else at
|
|
||||||
// the very end.
|
|
||||||
idx := strings.LastIndex(body, "</w:body>")
|
|
||||||
if idx < 0 {
|
|
||||||
return body + content
|
|
||||||
}
|
|
||||||
return body[:idx] + content + body[idx:]
|
|
||||||
}
|
|
||||||
return body[:loc[0]] + content + body[loc[0]:]
|
|
||||||
}
|
|
||||||
|
|
||||||
// ─────────────────────────────────────────────────────────────────────
|
|
||||||
// Zip plumbing
|
|
||||||
// ─────────────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
// baseZipPart captures one zip entry we kept aside while extracting
|
|
||||||
// document.xml.
|
|
||||||
type baseZipPart struct {
|
|
||||||
name string
|
|
||||||
method uint16
|
|
||||||
modTime int64 // wall seconds; converted back to time.Time on repack
|
|
||||||
body []byte
|
|
||||||
}
|
|
||||||
|
|
||||||
// splitBaseZip extracts document.xml and returns it alongside every
|
|
||||||
// other zip entry, ready for repacking.
|
|
||||||
func splitBaseZip(cleanBytes []byte) ([]byte, []baseZipPart, error) {
|
|
||||||
zr, err := zip.NewReader(bytes.NewReader(cleanBytes), int64(len(cleanBytes)))
|
|
||||||
if err != nil {
|
|
||||||
return nil, nil, fmt.Errorf("submission compose: open base zip: %w", err)
|
|
||||||
}
|
|
||||||
var documentXML []byte
|
|
||||||
parts := make([]baseZipPart, 0, len(zr.File))
|
|
||||||
for _, f := range zr.File {
|
|
||||||
body, err := readZipEntry(f)
|
|
||||||
if err != nil {
|
|
||||||
return nil, nil, fmt.Errorf("submission compose: read %s: %w", f.Name, err)
|
|
||||||
}
|
|
||||||
if f.Name == "word/document.xml" {
|
|
||||||
documentXML = body
|
|
||||||
parts = append(parts, baseZipPart{name: f.Name, method: f.Method, modTime: f.Modified.Unix(), body: nil})
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
parts = append(parts, baseZipPart{name: f.Name, method: f.Method, modTime: f.Modified.Unix(), body: body})
|
|
||||||
}
|
|
||||||
if documentXML == nil {
|
|
||||||
return nil, nil, fmt.Errorf("submission compose: base zip missing word/document.xml")
|
|
||||||
}
|
|
||||||
return documentXML, parts, nil
|
|
||||||
}
|
|
||||||
|
|
||||||
// repackBaseZip rebuilds the zip, swapping document.xml for the
|
|
||||||
// assembled body and leaving every other part untouched.
|
|
||||||
func repackBaseZip(parts []baseZipPart, assembledBody []byte) ([]byte, error) {
|
|
||||||
var out bytes.Buffer
|
|
||||||
zw := zip.NewWriter(&out)
|
|
||||||
for _, p := range parts {
|
|
||||||
hdr := &zip.FileHeader{
|
|
||||||
Name: p.name,
|
|
||||||
Method: p.method,
|
|
||||||
}
|
|
||||||
if p.modTime > 0 {
|
|
||||||
hdr.Modified = time.Unix(p.modTime, 0)
|
|
||||||
}
|
|
||||||
w, err := zw.CreateHeader(hdr)
|
|
||||||
if err != nil {
|
|
||||||
return nil, fmt.Errorf("submission compose: write header %s: %w", p.name, err)
|
|
||||||
}
|
|
||||||
body := p.body
|
|
||||||
if p.name == "word/document.xml" {
|
|
||||||
body = assembledBody
|
|
||||||
}
|
|
||||||
if _, err := w.Write(body); err != nil {
|
|
||||||
return nil, fmt.Errorf("submission compose: write body %s: %w", p.name, err)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
if err := zw.Close(); err != nil {
|
|
||||||
return nil, fmt.Errorf("submission compose: finalise zip: %w", err)
|
|
||||||
}
|
|
||||||
return out.Bytes(), nil
|
|
||||||
}
|
|
||||||
|
|
||||||
func readZipEntry(f *zip.File) ([]byte, error) {
|
|
||||||
rc, err := f.Open()
|
|
||||||
if err != nil {
|
|
||||||
return nil, err
|
|
||||||
}
|
|
||||||
defer rc.Close()
|
|
||||||
return io.ReadAll(rc)
|
|
||||||
}
|
|
||||||
|
|
||||||
// ─────────────────────────────────────────────────────────────────────
|
|
||||||
// Slice D — hyperlink wiring
|
|
||||||
// ─────────────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
// composerLinkAllocator hands out fresh rIds for inline hyperlink
|
|
||||||
// targets discovered by the MD walker. Each unique URL gets one rId
|
|
||||||
// (deduped — repeated links to the same URL share one Relationship).
|
|
||||||
// Allocations land outside the base's rId namespace by prefixing with
|
|
||||||
// "rIdComposer" so they can't collide with existing relationships.
|
|
||||||
type composerLinkAllocator struct {
|
|
||||||
next int
|
|
||||||
byURL map[string]string
|
|
||||||
order []string // URLs in allocation order
|
|
||||||
}
|
|
||||||
|
|
||||||
func newComposerLinkAllocator() *composerLinkAllocator {
|
|
||||||
return &composerLinkAllocator{byURL: map[string]string{}}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Alloc returns the rId for url, allocating one on first sight.
|
|
||||||
func (a *composerLinkAllocator) Alloc(url string) string {
|
|
||||||
if rid, ok := a.byURL[url]; ok {
|
|
||||||
return rid
|
|
||||||
}
|
|
||||||
a.next++
|
|
||||||
rid := fmt.Sprintf("rIdComposer%d", a.next)
|
|
||||||
a.byURL[url] = rid
|
|
||||||
a.order = append(a.order, url)
|
|
||||||
return rid
|
|
||||||
}
|
|
||||||
|
|
||||||
// HasLinks reports whether any links were allocated during this compose.
|
|
||||||
func (a *composerLinkAllocator) HasLinks() bool {
|
|
||||||
return len(a.order) > 0
|
|
||||||
}
|
|
||||||
|
|
||||||
// Pairs returns the (rId, URL) pairs in allocation order. The
|
|
||||||
// document.xml.rels patcher consumes this to emit <Relationship>
|
|
||||||
// elements.
|
|
||||||
func (a *composerLinkAllocator) Pairs() [][2]string {
|
|
||||||
pairs := make([][2]string, 0, len(a.order))
|
|
||||||
for _, url := range a.order {
|
|
||||||
pairs = append(pairs, [2]string{a.byURL[url], url})
|
|
||||||
}
|
|
||||||
return pairs
|
|
||||||
}
|
|
||||||
|
|
||||||
// patchDocumentXMLRels mutates the word/_rels/document.xml.rels entry
|
|
||||||
// in `parts` to append the given (rId, URL) pairs as hyperlink
|
|
||||||
// relationships. If the rels part doesn't exist (some bases omit it
|
|
||||||
// when the body has no relationships), this function appends a fresh
|
|
||||||
// part with the minimal Relationships wrapper.
|
|
||||||
//
|
|
||||||
// Idempotent on (rId, URL) pairs already present (e.g. when a base
|
|
||||||
// already references the URL for some other reason).
|
|
||||||
//
|
|
||||||
// Returns the (possibly extended) parts slice — callers must overwrite
|
|
||||||
// their reference because the append in the no-rels-yet case grows the
|
|
||||||
// backing array.
|
|
||||||
func patchDocumentXMLRels(parts []baseZipPart, pairs [][2]string) ([]baseZipPart, error) {
|
|
||||||
const path = "word/_rels/document.xml.rels"
|
|
||||||
const hyperlinkType = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink"
|
|
||||||
|
|
||||||
existingIdx := -1
|
|
||||||
for i := range parts {
|
|
||||||
if parts[i].name == path {
|
|
||||||
existingIdx = i
|
|
||||||
break
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
var body string
|
|
||||||
if existingIdx >= 0 {
|
|
||||||
body = string(parts[existingIdx].body)
|
|
||||||
} else {
|
|
||||||
body = `<?xml version="1.0" encoding="UTF-8" standalone="yes"?>` +
|
|
||||||
`<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"></Relationships>`
|
|
||||||
}
|
|
||||||
|
|
||||||
var inserts strings.Builder
|
|
||||||
for _, p := range pairs {
|
|
||||||
rid := p[0]
|
|
||||||
url := p[1]
|
|
||||||
if strings.Contains(body, `Id="`+rid+`"`) {
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
inserts.WriteString(`<Relationship Id="`)
|
|
||||||
inserts.WriteString(xmlAttrEscape(rid))
|
|
||||||
inserts.WriteString(`" Type="`)
|
|
||||||
inserts.WriteString(hyperlinkType)
|
|
||||||
inserts.WriteString(`" Target="`)
|
|
||||||
inserts.WriteString(xmlAttrEscape(url))
|
|
||||||
inserts.WriteString(`" TargetMode="External"/>`)
|
|
||||||
}
|
|
||||||
|
|
||||||
if inserts.Len() == 0 {
|
|
||||||
return parts, nil
|
|
||||||
}
|
|
||||||
|
|
||||||
closeIdx := strings.LastIndex(body, "</Relationships>")
|
|
||||||
if closeIdx < 0 {
|
|
||||||
return parts, fmt.Errorf("submission compose: malformed document.xml.rels (no closing tag)")
|
|
||||||
}
|
|
||||||
patched := body[:closeIdx] + inserts.String() + body[closeIdx:]
|
|
||||||
|
|
||||||
if existingIdx >= 0 {
|
|
||||||
parts[existingIdx].body = []byte(patched)
|
|
||||||
return parts, nil
|
|
||||||
}
|
|
||||||
parts = append(parts, baseZipPart{
|
|
||||||
name: path,
|
|
||||||
method: zip.Deflate,
|
|
||||||
modTime: time.Now().Unix(),
|
|
||||||
body: []byte(patched),
|
|
||||||
})
|
|
||||||
return parts, nil
|
|
||||||
}
|
}
|
||||||
|
|||||||
81
internal/services/submission_vars_pretty_test.go
Normal file
81
internal/services/submission_vars_pretty_test.go
Normal file
@@ -0,0 +1,81 @@
|
|||||||
|
package services
|
||||||
|
|
||||||
|
// Pretty-printer tests for the variable-resolution layer (legalSourcePretty,
|
||||||
|
// ourSideDE/EN, patentNumberUPC). These live with submission_vars.go;
|
||||||
|
// they were relocated out of the docx engine test suite when the
|
||||||
|
// .docx renderer moved to pkg/docforge/docx (t-paliad-349 slice 1).
|
||||||
|
|
||||||
|
import "testing"
|
||||||
|
|
||||||
|
func TestLegalSourcePretty(t *testing.T) {
|
||||||
|
tests := []struct {
|
||||||
|
src, lang, want string
|
||||||
|
}{
|
||||||
|
{"DE.ZPO.276.1", "de", "§ 276 Abs. 1 ZPO"},
|
||||||
|
{"DE.ZPO.276.1", "en", "Section 276(1) ZPO"},
|
||||||
|
{"DE.ZPO.253", "de", "§ 253 ZPO"},
|
||||||
|
{"DE.ZPO.253", "en", "Section 253 ZPO"},
|
||||||
|
{"UPC.RoP.23.1", "de", "Regel 23.1 VerfO UPC"},
|
||||||
|
{"UPC.RoP.23.1", "en", "Rule 23.1 RoP UPC"},
|
||||||
|
{"UPC.RoP.198", "de", "Regel 198 VerfO UPC"},
|
||||||
|
{"DE.PatG.83", "de", "§ 83 PatG"},
|
||||||
|
{"EPC.123", "de", "Art. 123 EPÜ"},
|
||||||
|
{"EPC.123", "en", "Art. 123 EPC"},
|
||||||
|
{"FOO.BAR.123", "de", "FOO.BAR.123"},
|
||||||
|
{"", "de", ""},
|
||||||
|
}
|
||||||
|
for _, tc := range tests {
|
||||||
|
t.Run(tc.src+"/"+tc.lang, func(t *testing.T) {
|
||||||
|
got := legalSourcePretty(tc.src, tc.lang)
|
||||||
|
if got != tc.want {
|
||||||
|
t.Errorf("legalSourcePretty(%q, %q) = %q, want %q", tc.src, tc.lang, got, tc.want)
|
||||||
|
}
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestOurSideTranslations(t *testing.T) {
|
||||||
|
cases := []struct {
|
||||||
|
in, wantDE, wantEN string
|
||||||
|
}{
|
||||||
|
{"claimant", "Klägerin", "Claimant"},
|
||||||
|
{"defendant", "Beklagte", "Defendant"},
|
||||||
|
{"court", "Gericht", "Court"},
|
||||||
|
{"both", "Klägerin und Beklagte", "Claimant and Defendant"},
|
||||||
|
{"", "", ""},
|
||||||
|
{"unknown", "", ""},
|
||||||
|
}
|
||||||
|
for _, tc := range cases {
|
||||||
|
t.Run(tc.in, func(t *testing.T) {
|
||||||
|
if got := ourSideDE(tc.in); got != tc.wantDE {
|
||||||
|
t.Errorf("ourSideDE(%q) = %q, want %q", tc.in, got, tc.wantDE)
|
||||||
|
}
|
||||||
|
if got := ourSideEN(tc.in); got != tc.wantEN {
|
||||||
|
t.Errorf("ourSideEN(%q) = %q, want %q", tc.in, got, tc.wantEN)
|
||||||
|
}
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestPatentNumberUPC(t *testing.T) {
|
||||||
|
tests := []struct {
|
||||||
|
in, want string
|
||||||
|
}{
|
||||||
|
{"EP 1 234 567 B1", "EP 1 234 567 (B1)"},
|
||||||
|
{"EP 4 056 049 A1", "EP 4 056 049 (A1)"},
|
||||||
|
{"DE 10 2020 123 456 A1", "DE 10 2020 123 456 (A1)"},
|
||||||
|
{"EP 1 234 567", "EP 1 234 567"},
|
||||||
|
{" EP 1 234 567 B1 ", "EP 1 234 567 (B1)"},
|
||||||
|
{"", ""},
|
||||||
|
{"WO/2023/123456", "WO/2023/123456"},
|
||||||
|
{"EP 1 234 567 B12", "EP 1 234 567 B12"},
|
||||||
|
}
|
||||||
|
for _, tc := range tests {
|
||||||
|
t.Run(tc.in, func(t *testing.T) {
|
||||||
|
got := patentNumberUPC(tc.in)
|
||||||
|
if got != tc.want {
|
||||||
|
t.Errorf("patentNumberUPC(%q) = %q, want %q", tc.in, got, tc.want)
|
||||||
|
}
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
24
pkg/docforge/doc.go
Normal file
24
pkg/docforge/doc.go
Normal file
@@ -0,0 +1,24 @@
|
|||||||
|
// Package docforge is paliad's modular document-generator engine — the
|
||||||
|
// format-neutral core that turns templates + variables into rendered
|
||||||
|
// documents, with format-specific adapters living in sub-packages.
|
||||||
|
//
|
||||||
|
// The package is being extracted from the in-tree submission generator
|
||||||
|
// (internal/services/submission_*.go) per the PRD in
|
||||||
|
// docs/plans/prd-docforge-2026-05-29.md (t-paliad-349 / m/paliad#157).
|
||||||
|
// The extraction follows the same packaging discipline as
|
||||||
|
// pkg/litigationplanner: docforge owns its types and exposes interfaces
|
||||||
|
// for the stateful inputs (variable resolution, template storage); the
|
||||||
|
// consuming application (paliad) implements those interfaces against its
|
||||||
|
// own database, and a future second consumer reaches the engine over an
|
||||||
|
// HTTP veneer rather than importing it.
|
||||||
|
//
|
||||||
|
// Slice 1 (this commit) relocates the .docx adapter — the Markdown→OOXML
|
||||||
|
// walker, the placeholder substitution engine, and the .dotm→.docx
|
||||||
|
// converter — into pkg/docforge/docx with no behaviour change. paliad's
|
||||||
|
// internal/services package keeps thin type-alias + forwarder shims so
|
||||||
|
// the submission generator and its HTTP surface compile and behave
|
||||||
|
// identically. Later slices introduce the neutral document model,
|
||||||
|
// hoist the format-neutral placeholder grammar up to this root package,
|
||||||
|
// and add the VariableResolver interface, the TemplateStore, the
|
||||||
|
// authoring surface, and the pluggable Exporter.
|
||||||
|
package docforge
|
||||||
634
pkg/docforge/docx/compose.go
Normal file
634
pkg/docforge/docx/compose.go
Normal file
@@ -0,0 +1,634 @@
|
|||||||
|
package docx
|
||||||
|
|
||||||
|
// Composer render pipeline — t-paliad-313 Slice B (design doc §9.1 +
|
||||||
|
// §9.2). Assembles a base .docx and a draft's section rows into a
|
||||||
|
// merged .docx ready for export.
|
||||||
|
//
|
||||||
|
// Pipeline (high-level):
|
||||||
|
//
|
||||||
|
// 1. ConvertDotmToDocx pre-pass on the base bytes (idempotent on .docx).
|
||||||
|
// 2. Locate `word/document.xml` inside the zip; pull the body XML.
|
||||||
|
// 3. For each section in the draft (order_index ASC, included=true):
|
||||||
|
// render content_md_<lang> → OOXML via RenderMarkdownToOOXML using
|
||||||
|
// base.section_spec.stylemap.paragraph.
|
||||||
|
// 4. Splice the rendered OOXML into the base body. Two splice modes:
|
||||||
|
// - Anchor mode: when the body carries `{{#section:KEY}}` /
|
||||||
|
// `{{/section:KEY}}` marker pairs, replace the slot's content
|
||||||
|
// (including the anchor paragraphs themselves) with the rendered
|
||||||
|
// section.
|
||||||
|
// - Append mode: when no anchor pair is found for a section, the
|
||||||
|
// rendered OOXML appends at the end of the body, just before any
|
||||||
|
// `<w:sectPr>` element. Sections with `included=false` are
|
||||||
|
// dropped silently.
|
||||||
|
// 5. Strip any leftover unmatched anchor paragraphs.
|
||||||
|
// 6. Re-pack the document.xml into the zip, leaving every other part
|
||||||
|
// untouched.
|
||||||
|
// 7. Run the v1 SubmissionRenderer placeholder pass over the assembly
|
||||||
|
// so `{{path}}` placeholders inside section content (and inside
|
||||||
|
// the base's untouched chrome) get substituted by the merged bag.
|
||||||
|
// Cross-run merge in pass 2 handles autocorrect-fragmented
|
||||||
|
// placeholders the same as v1.
|
||||||
|
//
|
||||||
|
// Result: a fully-merged .docx. No new third-party Go dep — reuses
|
||||||
|
// archive/zip + the existing SubmissionRenderer.
|
||||||
|
|
||||||
|
import (
|
||||||
|
"archive/zip"
|
||||||
|
"bytes"
|
||||||
|
"context"
|
||||||
|
"fmt"
|
||||||
|
"io"
|
||||||
|
"regexp"
|
||||||
|
"sort"
|
||||||
|
"strings"
|
||||||
|
"time"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Composer assembles base + sections into a final .docx.
|
||||||
|
// Stateless; safe for concurrent use.
|
||||||
|
type Composer struct {
|
||||||
|
renderer *SubmissionRenderer
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewComposer wires the composer. The renderer is required —
|
||||||
|
// a nil renderer is a programmer error and the composer panics at
|
||||||
|
// construction.
|
||||||
|
func NewComposer(renderer *SubmissionRenderer) *Composer {
|
||||||
|
if renderer == nil {
|
||||||
|
panic("submission composer: renderer required")
|
||||||
|
}
|
||||||
|
return &Composer{renderer: renderer}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Carrier is the opaque base document the composer splices rendered
|
||||||
|
// content into. Its bytes are preserved verbatim outside the regions the
|
||||||
|
// splice touches — the {{#section:KEY}} anchor paragraphs and the
|
||||||
|
// {{placeholder}} tokens — so the firm's letterhead, styles, headers, and
|
||||||
|
// footers survive a compose byte-for-byte. This is the docforge "carrier"
|
||||||
|
// for the .docx format: the lossless host for editable content.
|
||||||
|
type Carrier struct {
|
||||||
|
// Bytes is the raw base .docx. May be a .dotm/.docm/.dotx; Compose
|
||||||
|
// runs ConvertDotmToDocx on it first (idempotent on a plain .docx).
|
||||||
|
Bytes []byte
|
||||||
|
|
||||||
|
// Stylemap maps a logical block kind (paragraph, heading_1/2/3,
|
||||||
|
// list_bullet, list_numbered, blockquote) to the Word paragraph
|
||||||
|
// style name the base defines for it. Drives the Markdown walker's
|
||||||
|
// <w:pStyle>. Missing entries fall back to the "paragraph" style.
|
||||||
|
Stylemap map[string]string
|
||||||
|
}
|
||||||
|
|
||||||
|
// Section is one editable content block the composer renders and splices.
|
||||||
|
// It is the format-neutral input the docforge engine consumes; the
|
||||||
|
// consuming application maps its own row type onto it (paliad maps
|
||||||
|
// SubmissionSection → Section).
|
||||||
|
type Section struct {
|
||||||
|
// Key matches a {{#section:KEY}} anchor in the carrier, or — when no
|
||||||
|
// anchor matches — marks an append-mode section.
|
||||||
|
Key string
|
||||||
|
// OrderIndex sets append-mode ordering (ascending).
|
||||||
|
OrderIndex int
|
||||||
|
// Included=false drops the section entirely.
|
||||||
|
Included bool
|
||||||
|
// ContentMDDE / ContentMDEN are the bilingual Markdown sources; Lang
|
||||||
|
// selects which one renders.
|
||||||
|
ContentMDDE string
|
||||||
|
ContentMDEN string
|
||||||
|
}
|
||||||
|
|
||||||
|
// ComposeOptions carries the per-call composition inputs.
|
||||||
|
type ComposeOptions struct {
|
||||||
|
// Sections are the draft's section rows in display order. The
|
||||||
|
// composer renders included sections; excluded rows are dropped.
|
||||||
|
// Caller is responsible for visibility — by the time the composer
|
||||||
|
// runs, the section rows have already been gated by the caller.
|
||||||
|
Sections []Section
|
||||||
|
|
||||||
|
// Carrier is the base .docx chrome plus its stylemap. Required.
|
||||||
|
Carrier Carrier
|
||||||
|
|
||||||
|
// Lang ('de' or 'en') selects which content_md_* column the
|
||||||
|
// composer reads per section. Defaults to 'de' if empty.
|
||||||
|
Lang string
|
||||||
|
|
||||||
|
// Vars is the merged placeholder bag the v1 renderer pass
|
||||||
|
// substitutes after the composer assembly. Passed straight through
|
||||||
|
// to SubmissionRenderer.Render.
|
||||||
|
Vars PlaceholderMap
|
||||||
|
|
||||||
|
// Missing translates an unbound placeholder key into the marker
|
||||||
|
// the lawyer sees in Word. Passed straight to the renderer.
|
||||||
|
Missing MissingPlaceholderFn
|
||||||
|
}
|
||||||
|
|
||||||
|
// Compose runs the full pipeline and returns the merged .docx bytes.
|
||||||
|
func (c *Composer) Compose(ctx context.Context, opts ComposeOptions) ([]byte, error) {
|
||||||
|
_ = ctx // reserved for cancellation propagation in later slices
|
||||||
|
sections := opts.Sections
|
||||||
|
|
||||||
|
// Pre-pass: strip macros so the base reads as a plain .docx zip.
|
||||||
|
cleanBytes, err := ConvertDotmToDocx(opts.Carrier.Bytes)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("submission compose: convert base: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Locate + extract word/document.xml so we can splice in-place.
|
||||||
|
documentXML, otherParts, err := splitBaseZip(cleanBytes)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
// Per-compose hyperlink allocator. Each unique URL gets a fresh
|
||||||
|
// rId outside the base's existing namespace. The post-pass
|
||||||
|
// (patchDocumentXMLRels) writes the matching Relationship rows
|
||||||
|
// before the zip is repacked. Slice D adds inline `[label](url)`
|
||||||
|
// hyperlink support.
|
||||||
|
linkAlloc := newComposerLinkAllocator()
|
||||||
|
|
||||||
|
// Build the rendered-section map: section_key → OOXML span.
|
||||||
|
stylemap := opts.Carrier.Stylemap
|
||||||
|
rendered := make(map[string]string, len(sections))
|
||||||
|
keptSections := make([]Section, 0, len(sections))
|
||||||
|
for _, sec := range sections {
|
||||||
|
if !sec.Included {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
md := sec.ContentMDDE
|
||||||
|
if strings.EqualFold(opts.Lang, "en") {
|
||||||
|
md = sec.ContentMDEN
|
||||||
|
}
|
||||||
|
rendered[sec.Key] = RenderMarkdownToOOXMLWithStyles(md, stylemap, linkAlloc.Alloc)
|
||||||
|
keptSections = append(keptSections, sec)
|
||||||
|
}
|
||||||
|
// Stable order — already sorted ascending by ListForDraft, but
|
||||||
|
// belt-and-braces in case the caller swaps the ordering policy
|
||||||
|
// later.
|
||||||
|
sort.SliceStable(keptSections, func(i, j int) bool {
|
||||||
|
return keptSections[i].OrderIndex < keptSections[j].OrderIndex
|
||||||
|
})
|
||||||
|
|
||||||
|
assembledBody := spliceSections(documentXML, rendered, keptSections, sections)
|
||||||
|
|
||||||
|
// Slice D hyperlink patch: when the walker emitted hyperlink rIds
|
||||||
|
// for inline `[label](url)` links, the base's
|
||||||
|
// word/_rels/document.xml.rels needs matching <Relationship>
|
||||||
|
// entries so Word can resolve the rIds. Mutates one zip part in
|
||||||
|
// otherParts (or appends if missing).
|
||||||
|
if linkAlloc.HasLinks() {
|
||||||
|
updatedParts, err := patchDocumentXMLRels(otherParts, linkAlloc.Pairs())
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
otherParts = updatedParts
|
||||||
|
}
|
||||||
|
|
||||||
|
// Re-pack into a zip with the assembled document.xml. All other
|
||||||
|
// parts (styles, fonts, headers, footers, theme, settings) pass
|
||||||
|
// through bit-for-bit at their original mtime + compression.
|
||||||
|
repacked, err := repackBaseZip(otherParts, assembledBody)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
// Final pass: substitute placeholders against the merged bag. The
|
||||||
|
// existing renderer handles cross-run fragmentation, the `{{rule.X}}`
|
||||||
|
// alias contract, and the missing-marker emission. Reusing it
|
||||||
|
// guarantees v1's placeholder grammar stays intact inside section
|
||||||
|
// content + base chrome.
|
||||||
|
merged, err := c.renderer.Render(repacked, opts.Vars, opts.Missing)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("submission compose: placeholder pass: %w", err)
|
||||||
|
}
|
||||||
|
return merged, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────
|
||||||
|
// Section splicing
|
||||||
|
// ─────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
// Anchor markers as they appear inside a <w:t> text node. We don't
|
||||||
|
// need a full XML parse — finding the marker text inside the body is
|
||||||
|
// sufficient because:
|
||||||
|
// - {{ and }} are never legitimate document content (placeholders
|
||||||
|
// follow the same convention everywhere else in paliad).
|
||||||
|
// - The anchor key grammar [A-Za-z0-9_]+ rules out any HTML/XML
|
||||||
|
// special characters.
|
||||||
|
// - Each anchor lives in exactly one <w:t>...<w:t>, which lives in
|
||||||
|
// exactly one <w:r>...</w:r>, which lives in exactly one
|
||||||
|
// <w:p>...</w:p>. We expand from the marker outward to find the
|
||||||
|
// enclosing <w:p> span and drop the entire paragraph as part of
|
||||||
|
// the splice.
|
||||||
|
//
|
||||||
|
// RE2 has no lookahead, so the "find enclosing <w:p>" logic is
|
||||||
|
// implemented as manual byte-index search around the marker hit
|
||||||
|
// (anchorParagraphSpan below) rather than a single regex pattern.
|
||||||
|
|
||||||
|
const (
|
||||||
|
anchorOpenPrefix = "{{#section:"
|
||||||
|
anchorClosePrefix = "{{/section:"
|
||||||
|
anchorSuffix = "}}"
|
||||||
|
)
|
||||||
|
|
||||||
|
// anchorKeyRegex validates that the captured anchor key is a clean
|
||||||
|
// identifier. Keys that include other characters (which can't actually
|
||||||
|
// appear in our authored .docx) are treated as no match.
|
||||||
|
var anchorKeyRegex = regexp.MustCompile(`^[A-Za-z0-9_]+$`)
|
||||||
|
|
||||||
|
// anchorPair records the byte span of one matched anchor pair inside
|
||||||
|
// the body — from the start of the opening anchor's <w:p> element
|
||||||
|
// through the end of the closing anchor's </w:p>.
|
||||||
|
type anchorPair struct {
|
||||||
|
key string
|
||||||
|
openStart int // start of <w:p> for the opening anchor
|
||||||
|
closeEnd int // index just past </w:p> for the closing anchor
|
||||||
|
}
|
||||||
|
|
||||||
|
// findAllAnchorPairs scans the body for matched open/close anchor
|
||||||
|
// pairs. Unbalanced markers (open without close, or vice versa) are
|
||||||
|
// dropped from the result. Returns pairs in body-order; each pair's
|
||||||
|
// span is non-overlapping.
|
||||||
|
func findAllAnchorPairs(body string) []anchorPair {
|
||||||
|
type marker struct {
|
||||||
|
key string
|
||||||
|
paraStart int
|
||||||
|
paraEnd int
|
||||||
|
isOpen bool
|
||||||
|
}
|
||||||
|
var markers []marker
|
||||||
|
|
||||||
|
collect := func(prefix string, isOpen bool) {
|
||||||
|
offset := 0
|
||||||
|
for {
|
||||||
|
idx := strings.Index(body[offset:], prefix)
|
||||||
|
if idx < 0 {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
start := offset + idx
|
||||||
|
suffixIdx := strings.Index(body[start+len(prefix):], anchorSuffix)
|
||||||
|
if suffixIdx < 0 {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
key := body[start+len(prefix) : start+len(prefix)+suffixIdx]
|
||||||
|
if !anchorKeyRegex.MatchString(key) {
|
||||||
|
offset = start + len(prefix)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
markerEnd := start + len(prefix) + suffixIdx + len(anchorSuffix)
|
||||||
|
pStart, pEnd, ok := paragraphSpanAround(body, start, markerEnd)
|
||||||
|
if !ok {
|
||||||
|
offset = markerEnd
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
markers = append(markers, marker{key: key, paraStart: pStart, paraEnd: pEnd, isOpen: isOpen})
|
||||||
|
offset = pEnd
|
||||||
|
}
|
||||||
|
}
|
||||||
|
collect(anchorOpenPrefix, true)
|
||||||
|
collect(anchorClosePrefix, false)
|
||||||
|
|
||||||
|
// Walk markers in body-order, matching each open with the next
|
||||||
|
// close that carries the same key.
|
||||||
|
sort.SliceStable(markers, func(i, j int) bool {
|
||||||
|
return markers[i].paraStart < markers[j].paraStart
|
||||||
|
})
|
||||||
|
var pairs []anchorPair
|
||||||
|
openStack := map[string]marker{}
|
||||||
|
for _, m := range markers {
|
||||||
|
if m.isOpen {
|
||||||
|
openStack[m.key] = m
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
o, ok := openStack[m.key]
|
||||||
|
if !ok {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
pairs = append(pairs, anchorPair{
|
||||||
|
key: m.key,
|
||||||
|
openStart: o.paraStart,
|
||||||
|
closeEnd: m.paraEnd,
|
||||||
|
})
|
||||||
|
delete(openStack, m.key)
|
||||||
|
}
|
||||||
|
return pairs
|
||||||
|
}
|
||||||
|
|
||||||
|
// paragraphSpanAround returns the byte span of the smallest `<w:p>...</w:p>`
|
||||||
|
// element that fully contains the byte range [markerStart, markerEnd).
|
||||||
|
// Returns false when the byte range doesn't sit inside a single
|
||||||
|
// paragraph (which would mean the marker survived a cross-paragraph
|
||||||
|
// edit — defensive guard, shouldn't happen in well-formed input).
|
||||||
|
func paragraphSpanAround(body string, markerStart, markerEnd int) (int, int, bool) {
|
||||||
|
// Walk backwards to find the nearest unclosed <w:p ... > opening.
|
||||||
|
// Since <w:p> doesn't nest, the nearest <w:p before markerStart is
|
||||||
|
// the enclosing paragraph's opening tag.
|
||||||
|
pStart := -1
|
||||||
|
cursor := markerStart
|
||||||
|
for cursor > 0 {
|
||||||
|
idx := strings.LastIndex(body[:cursor], "<w:p")
|
||||||
|
if idx < 0 {
|
||||||
|
break
|
||||||
|
}
|
||||||
|
// Confirm this is a paragraph open, not a different
|
||||||
|
// w:p-prefixed tag (e.g. <w:pPr>).
|
||||||
|
if idx+4 <= len(body) {
|
||||||
|
after := body[idx+4]
|
||||||
|
if after == ' ' || after == '>' || after == '/' {
|
||||||
|
// <w:p ...> or <w:p>; not <w:pPr>.
|
||||||
|
close := strings.Index(body[idx:], ">")
|
||||||
|
if close < 0 {
|
||||||
|
return 0, 0, false
|
||||||
|
}
|
||||||
|
pStart = idx
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
cursor = idx
|
||||||
|
}
|
||||||
|
if pStart < 0 {
|
||||||
|
return 0, 0, false
|
||||||
|
}
|
||||||
|
// Walk forward to find the matching </w:p>. <w:p> doesn't nest so
|
||||||
|
// the next </w:p> after the marker is the close.
|
||||||
|
pEndIdx := strings.Index(body[markerEnd:], "</w:p>")
|
||||||
|
if pEndIdx < 0 {
|
||||||
|
return 0, 0, false
|
||||||
|
}
|
||||||
|
pEnd := markerEnd + pEndIdx + len("</w:p>")
|
||||||
|
return pStart, pEnd, true
|
||||||
|
}
|
||||||
|
|
||||||
|
// spliceSections replaces anchor slots with rendered sections and
|
||||||
|
// appends any unanchored sections before sectPr. Returns the assembled
|
||||||
|
// document.xml body.
|
||||||
|
func spliceSections(documentXML []byte, rendered map[string]string, kept []Section, all []Section) []byte {
|
||||||
|
body := string(documentXML)
|
||||||
|
pairs := findAllAnchorPairs(body)
|
||||||
|
|
||||||
|
// Build a lookup of kept section keys for quick membership tests.
|
||||||
|
keptByKey := map[string]int{}
|
||||||
|
for i, sec := range kept {
|
||||||
|
keptByKey[sec.Key] = i
|
||||||
|
}
|
||||||
|
allByKey := map[string]int{}
|
||||||
|
for i, sec := range all {
|
||||||
|
allByKey[sec.Key] = i
|
||||||
|
}
|
||||||
|
|
||||||
|
matchedKeys := map[string]bool{}
|
||||||
|
|
||||||
|
// Walk pairs in REVERSE body-order so slice mutations don't shift
|
||||||
|
// later offsets.
|
||||||
|
sort.SliceStable(pairs, func(i, j int) bool {
|
||||||
|
return pairs[i].openStart > pairs[j].openStart
|
||||||
|
})
|
||||||
|
for _, p := range pairs {
|
||||||
|
replacement := ""
|
||||||
|
if idx, ok := keptByKey[p.key]; ok {
|
||||||
|
replacement = rendered[p.key]
|
||||||
|
matchedKeys[p.key] = true
|
||||||
|
_ = idx
|
||||||
|
} else if _, isOnDraft := allByKey[p.key]; isOnDraft {
|
||||||
|
// Anchor matches an excluded section on the draft — drop
|
||||||
|
// the entire slot.
|
||||||
|
replacement = ""
|
||||||
|
} else {
|
||||||
|
// Anchor doesn't match any section on this draft — drop
|
||||||
|
// to leave the base's chrome unbroken.
|
||||||
|
replacement = ""
|
||||||
|
}
|
||||||
|
body = body[:p.openStart] + replacement + body[p.closeEnd:]
|
||||||
|
}
|
||||||
|
|
||||||
|
// Append unanchored sections before sectPr in order_index ASC.
|
||||||
|
var unanchored strings.Builder
|
||||||
|
for _, sec := range kept {
|
||||||
|
if matchedKeys[sec.Key] {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
unanchored.WriteString(rendered[sec.Key])
|
||||||
|
}
|
||||||
|
if unanchored.Len() > 0 {
|
||||||
|
body = appendBeforeSectPr(body, unanchored.String())
|
||||||
|
}
|
||||||
|
|
||||||
|
return []byte(body)
|
||||||
|
}
|
||||||
|
|
||||||
|
// appendBeforeSectPr inserts content immediately before the first
|
||||||
|
// `<w:sectPr` element in the body, or at the end of the body if there
|
||||||
|
// is none. Word documents conventionally close the body with a sectPr
|
||||||
|
// describing page setup; we want to land sections before that element
|
||||||
|
// so they show up on the actual pages.
|
||||||
|
var sectPrRegex = regexp.MustCompile(`<w:sectPr\b`)
|
||||||
|
|
||||||
|
func appendBeforeSectPr(body, content string) string {
|
||||||
|
loc := sectPrRegex.FindStringIndex(body)
|
||||||
|
if loc == nil {
|
||||||
|
// No sectPr → append before `</w:body>` if present, else at
|
||||||
|
// the very end.
|
||||||
|
idx := strings.LastIndex(body, "</w:body>")
|
||||||
|
if idx < 0 {
|
||||||
|
return body + content
|
||||||
|
}
|
||||||
|
return body[:idx] + content + body[idx:]
|
||||||
|
}
|
||||||
|
return body[:loc[0]] + content + body[loc[0]:]
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────
|
||||||
|
// Zip plumbing
|
||||||
|
// ─────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
// baseZipPart captures one zip entry we kept aside while extracting
|
||||||
|
// document.xml.
|
||||||
|
type baseZipPart struct {
|
||||||
|
name string
|
||||||
|
method uint16
|
||||||
|
modTime int64 // wall seconds; converted back to time.Time on repack
|
||||||
|
body []byte
|
||||||
|
}
|
||||||
|
|
||||||
|
// splitBaseZip extracts document.xml and returns it alongside every
|
||||||
|
// other zip entry, ready for repacking.
|
||||||
|
func splitBaseZip(cleanBytes []byte) ([]byte, []baseZipPart, error) {
|
||||||
|
zr, err := zip.NewReader(bytes.NewReader(cleanBytes), int64(len(cleanBytes)))
|
||||||
|
if err != nil {
|
||||||
|
return nil, nil, fmt.Errorf("submission compose: open base zip: %w", err)
|
||||||
|
}
|
||||||
|
var documentXML []byte
|
||||||
|
parts := make([]baseZipPart, 0, len(zr.File))
|
||||||
|
for _, f := range zr.File {
|
||||||
|
body, err := readZipEntry(f)
|
||||||
|
if err != nil {
|
||||||
|
return nil, nil, fmt.Errorf("submission compose: read %s: %w", f.Name, err)
|
||||||
|
}
|
||||||
|
if f.Name == "word/document.xml" {
|
||||||
|
documentXML = body
|
||||||
|
parts = append(parts, baseZipPart{name: f.Name, method: f.Method, modTime: f.Modified.Unix(), body: nil})
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
parts = append(parts, baseZipPart{name: f.Name, method: f.Method, modTime: f.Modified.Unix(), body: body})
|
||||||
|
}
|
||||||
|
if documentXML == nil {
|
||||||
|
return nil, nil, fmt.Errorf("submission compose: base zip missing word/document.xml")
|
||||||
|
}
|
||||||
|
return documentXML, parts, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// repackBaseZip rebuilds the zip, swapping document.xml for the
|
||||||
|
// assembled body and leaving every other part untouched.
|
||||||
|
func repackBaseZip(parts []baseZipPart, assembledBody []byte) ([]byte, error) {
|
||||||
|
var out bytes.Buffer
|
||||||
|
zw := zip.NewWriter(&out)
|
||||||
|
for _, p := range parts {
|
||||||
|
hdr := &zip.FileHeader{
|
||||||
|
Name: p.name,
|
||||||
|
Method: p.method,
|
||||||
|
}
|
||||||
|
if p.modTime > 0 {
|
||||||
|
hdr.Modified = time.Unix(p.modTime, 0)
|
||||||
|
}
|
||||||
|
w, err := zw.CreateHeader(hdr)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("submission compose: write header %s: %w", p.name, err)
|
||||||
|
}
|
||||||
|
body := p.body
|
||||||
|
if p.name == "word/document.xml" {
|
||||||
|
body = assembledBody
|
||||||
|
}
|
||||||
|
if _, err := w.Write(body); err != nil {
|
||||||
|
return nil, fmt.Errorf("submission compose: write body %s: %w", p.name, err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if err := zw.Close(); err != nil {
|
||||||
|
return nil, fmt.Errorf("submission compose: finalise zip: %w", err)
|
||||||
|
}
|
||||||
|
return out.Bytes(), nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func readZipEntry(f *zip.File) ([]byte, error) {
|
||||||
|
rc, err := f.Open()
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
defer rc.Close()
|
||||||
|
return io.ReadAll(rc)
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────
|
||||||
|
// Slice D — hyperlink wiring
|
||||||
|
// ─────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
// composerLinkAllocator hands out fresh rIds for inline hyperlink
|
||||||
|
// targets discovered by the MD walker. Each unique URL gets one rId
|
||||||
|
// (deduped — repeated links to the same URL share one Relationship).
|
||||||
|
// Allocations land outside the base's rId namespace by prefixing with
|
||||||
|
// "rIdComposer" so they can't collide with existing relationships.
|
||||||
|
type composerLinkAllocator struct {
|
||||||
|
next int
|
||||||
|
byURL map[string]string
|
||||||
|
order []string // URLs in allocation order
|
||||||
|
}
|
||||||
|
|
||||||
|
func newComposerLinkAllocator() *composerLinkAllocator {
|
||||||
|
return &composerLinkAllocator{byURL: map[string]string{}}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Alloc returns the rId for url, allocating one on first sight.
|
||||||
|
func (a *composerLinkAllocator) Alloc(url string) string {
|
||||||
|
if rid, ok := a.byURL[url]; ok {
|
||||||
|
return rid
|
||||||
|
}
|
||||||
|
a.next++
|
||||||
|
rid := fmt.Sprintf("rIdComposer%d", a.next)
|
||||||
|
a.byURL[url] = rid
|
||||||
|
a.order = append(a.order, url)
|
||||||
|
return rid
|
||||||
|
}
|
||||||
|
|
||||||
|
// HasLinks reports whether any links were allocated during this compose.
|
||||||
|
func (a *composerLinkAllocator) HasLinks() bool {
|
||||||
|
return len(a.order) > 0
|
||||||
|
}
|
||||||
|
|
||||||
|
// Pairs returns the (rId, URL) pairs in allocation order. The
|
||||||
|
// document.xml.rels patcher consumes this to emit <Relationship>
|
||||||
|
// elements.
|
||||||
|
func (a *composerLinkAllocator) Pairs() [][2]string {
|
||||||
|
pairs := make([][2]string, 0, len(a.order))
|
||||||
|
for _, url := range a.order {
|
||||||
|
pairs = append(pairs, [2]string{a.byURL[url], url})
|
||||||
|
}
|
||||||
|
return pairs
|
||||||
|
}
|
||||||
|
|
||||||
|
// patchDocumentXMLRels mutates the word/_rels/document.xml.rels entry
|
||||||
|
// in `parts` to append the given (rId, URL) pairs as hyperlink
|
||||||
|
// relationships. If the rels part doesn't exist (some bases omit it
|
||||||
|
// when the body has no relationships), this function appends a fresh
|
||||||
|
// part with the minimal Relationships wrapper.
|
||||||
|
//
|
||||||
|
// Idempotent on (rId, URL) pairs already present (e.g. when a base
|
||||||
|
// already references the URL for some other reason).
|
||||||
|
//
|
||||||
|
// Returns the (possibly extended) parts slice — callers must overwrite
|
||||||
|
// their reference because the append in the no-rels-yet case grows the
|
||||||
|
// backing array.
|
||||||
|
func patchDocumentXMLRels(parts []baseZipPart, pairs [][2]string) ([]baseZipPart, error) {
|
||||||
|
const path = "word/_rels/document.xml.rels"
|
||||||
|
const hyperlinkType = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink"
|
||||||
|
|
||||||
|
existingIdx := -1
|
||||||
|
for i := range parts {
|
||||||
|
if parts[i].name == path {
|
||||||
|
existingIdx = i
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
var body string
|
||||||
|
if existingIdx >= 0 {
|
||||||
|
body = string(parts[existingIdx].body)
|
||||||
|
} else {
|
||||||
|
body = `<?xml version="1.0" encoding="UTF-8" standalone="yes"?>` +
|
||||||
|
`<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"></Relationships>`
|
||||||
|
}
|
||||||
|
|
||||||
|
var inserts strings.Builder
|
||||||
|
for _, p := range pairs {
|
||||||
|
rid := p[0]
|
||||||
|
url := p[1]
|
||||||
|
if strings.Contains(body, `Id="`+rid+`"`) {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
inserts.WriteString(`<Relationship Id="`)
|
||||||
|
inserts.WriteString(xmlAttrEscape(rid))
|
||||||
|
inserts.WriteString(`" Type="`)
|
||||||
|
inserts.WriteString(hyperlinkType)
|
||||||
|
inserts.WriteString(`" Target="`)
|
||||||
|
inserts.WriteString(xmlAttrEscape(url))
|
||||||
|
inserts.WriteString(`" TargetMode="External"/>`)
|
||||||
|
}
|
||||||
|
|
||||||
|
if inserts.Len() == 0 {
|
||||||
|
return parts, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
closeIdx := strings.LastIndex(body, "</Relationships>")
|
||||||
|
if closeIdx < 0 {
|
||||||
|
return parts, fmt.Errorf("submission compose: malformed document.xml.rels (no closing tag)")
|
||||||
|
}
|
||||||
|
patched := body[:closeIdx] + inserts.String() + body[closeIdx:]
|
||||||
|
|
||||||
|
if existingIdx >= 0 {
|
||||||
|
parts[existingIdx].body = []byte(patched)
|
||||||
|
return parts, nil
|
||||||
|
}
|
||||||
|
parts = append(parts, baseZipPart{
|
||||||
|
name: path,
|
||||||
|
method: zip.Deflate,
|
||||||
|
modTime: time.Now().Unix(),
|
||||||
|
body: []byte(patched),
|
||||||
|
})
|
||||||
|
return parts, nil
|
||||||
|
}
|
||||||
28
pkg/docforge/docx/doc.go
Normal file
28
pkg/docforge/docx/doc.go
Normal file
@@ -0,0 +1,28 @@
|
|||||||
|
// Package docx is docforge's .docx (OOXML) adapter — the first
|
||||||
|
// format adapter in the docforge engine (t-paliad-349 / m/paliad#157).
|
||||||
|
//
|
||||||
|
// It owns the in-house OOXML machinery extracted from paliad's submission
|
||||||
|
// generator in slice 1, with no behaviour change:
|
||||||
|
//
|
||||||
|
// - merge.go — the placeholder substitution renderer
|
||||||
|
// (SubmissionRenderer.Render / RenderHTML). Two-pass {{placeholder}}
|
||||||
|
// substitution (single-run, then cross-run merge for fragmented
|
||||||
|
// placeholders), plus the preview-HTML emitter that wraps substituted
|
||||||
|
// values in clickable <span class="draft-var" data-var="…"> markup.
|
||||||
|
// - markdown.go — the Markdown→OOXML walker (RenderMarkdownToOOXML*),
|
||||||
|
// including the b78a984 fix that preserves {{…}} placeholders verbatim
|
||||||
|
// through inline-span parsing (underscores in keys survive).
|
||||||
|
// - dotm.go — ConvertDotmToDocx: strips macros from a .dotm/.docm/
|
||||||
|
// .dotx and rewrites the content-types + rels to a clean .docx,
|
||||||
|
// passing every other part through bit-for-bit.
|
||||||
|
//
|
||||||
|
// Why no third-party docx library: lukasjarosch/go-docx treats sibling
|
||||||
|
// placeholders in one run ("{{a}} ./. {{b}}") as nested and refuses to
|
||||||
|
// replace either; patent submissions routinely have several placeholders
|
||||||
|
// per paragraph, so this in-house renderer is required. See merge.go.
|
||||||
|
//
|
||||||
|
// The placeholder grammar — \{\{\s*([A-Za-z][A-Za-z0-9_.]*)\s*\}\} — and
|
||||||
|
// the PlaceholderMap type currently live here with the renderer; a later
|
||||||
|
// slice hoists the format-neutral grammar up to the docforge root once
|
||||||
|
// the neutral document model and the VariableResolver interface land.
|
||||||
|
package docx
|
||||||
@@ -1,4 +1,4 @@
|
|||||||
package services
|
package docx
|
||||||
|
|
||||||
// Submission .dotm → .docx converter (t-paliad-230, "format-only" scope
|
// Submission .dotm → .docx converter (t-paliad-230, "format-only" scope
|
||||||
// reduction of the original t-paliad-215 submission generator).
|
// reduction of the original t-paliad-215 submission generator).
|
||||||
@@ -1,4 +1,4 @@
|
|||||||
package services
|
package docx
|
||||||
|
|
||||||
import (
|
import (
|
||||||
"archive/zip"
|
"archive/zip"
|
||||||
@@ -1,4 +1,4 @@
|
|||||||
package services
|
package docx
|
||||||
|
|
||||||
// Markdown → OOXML walker for Composer section content (t-paliad-313
|
// Markdown → OOXML walker for Composer section content (t-paliad-313
|
||||||
// Slice B, design doc §9.2).
|
// Slice B, design doc §9.2).
|
||||||
@@ -1,4 +1,4 @@
|
|||||||
package services
|
package docx
|
||||||
|
|
||||||
// Unit tests for the Composer's Markdown → OOXML walker (t-paliad-313
|
// Unit tests for the Composer's Markdown → OOXML walker (t-paliad-313
|
||||||
// Slice B). Pure function; no DB dependency.
|
// Slice B). Pure function; no DB dependency.
|
||||||
@@ -1,4 +1,4 @@
|
|||||||
package services
|
package docx
|
||||||
|
|
||||||
// Submission template renderer — in-house engine for the submission
|
// Submission template renderer — in-house engine for the submission
|
||||||
// draft editor (t-paliad-238, design doc
|
// draft editor (t-paliad-238, design doc
|
||||||
@@ -1,4 +1,4 @@
|
|||||||
package services
|
package docx
|
||||||
|
|
||||||
// Submission merge-engine tests — resurrected from the original
|
// Submission merge-engine tests — resurrected from the original
|
||||||
// t-paliad-215 Slice 1 (commit 8ea3509) + Slice 2 (commit 1765d5e).
|
// t-paliad-215 Slice 1 (commit 8ea3509) + Slice 2 (commit 1765d5e).
|
||||||
@@ -190,79 +190,6 @@ func TestPlaceholderRegex_Boundaries(t *testing.T) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
func TestLegalSourcePretty(t *testing.T) {
|
|
||||||
tests := []struct {
|
|
||||||
src, lang, want string
|
|
||||||
}{
|
|
||||||
{"DE.ZPO.276.1", "de", "§ 276 Abs. 1 ZPO"},
|
|
||||||
{"DE.ZPO.276.1", "en", "Section 276(1) ZPO"},
|
|
||||||
{"DE.ZPO.253", "de", "§ 253 ZPO"},
|
|
||||||
{"DE.ZPO.253", "en", "Section 253 ZPO"},
|
|
||||||
{"UPC.RoP.23.1", "de", "Regel 23.1 VerfO UPC"},
|
|
||||||
{"UPC.RoP.23.1", "en", "Rule 23.1 RoP UPC"},
|
|
||||||
{"UPC.RoP.198", "de", "Regel 198 VerfO UPC"},
|
|
||||||
{"DE.PatG.83", "de", "§ 83 PatG"},
|
|
||||||
{"EPC.123", "de", "Art. 123 EPÜ"},
|
|
||||||
{"EPC.123", "en", "Art. 123 EPC"},
|
|
||||||
{"FOO.BAR.123", "de", "FOO.BAR.123"},
|
|
||||||
{"", "de", ""},
|
|
||||||
}
|
|
||||||
for _, tc := range tests {
|
|
||||||
t.Run(tc.src+"/"+tc.lang, func(t *testing.T) {
|
|
||||||
got := legalSourcePretty(tc.src, tc.lang)
|
|
||||||
if got != tc.want {
|
|
||||||
t.Errorf("legalSourcePretty(%q, %q) = %q, want %q", tc.src, tc.lang, got, tc.want)
|
|
||||||
}
|
|
||||||
})
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestOurSideTranslations(t *testing.T) {
|
|
||||||
cases := []struct {
|
|
||||||
in, wantDE, wantEN string
|
|
||||||
}{
|
|
||||||
{"claimant", "Klägerin", "Claimant"},
|
|
||||||
{"defendant", "Beklagte", "Defendant"},
|
|
||||||
{"court", "Gericht", "Court"},
|
|
||||||
{"both", "Klägerin und Beklagte", "Claimant and Defendant"},
|
|
||||||
{"", "", ""},
|
|
||||||
{"unknown", "", ""},
|
|
||||||
}
|
|
||||||
for _, tc := range cases {
|
|
||||||
t.Run(tc.in, func(t *testing.T) {
|
|
||||||
if got := ourSideDE(tc.in); got != tc.wantDE {
|
|
||||||
t.Errorf("ourSideDE(%q) = %q, want %q", tc.in, got, tc.wantDE)
|
|
||||||
}
|
|
||||||
if got := ourSideEN(tc.in); got != tc.wantEN {
|
|
||||||
t.Errorf("ourSideEN(%q) = %q, want %q", tc.in, got, tc.wantEN)
|
|
||||||
}
|
|
||||||
})
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestPatentNumberUPC(t *testing.T) {
|
|
||||||
tests := []struct {
|
|
||||||
in, want string
|
|
||||||
}{
|
|
||||||
{"EP 1 234 567 B1", "EP 1 234 567 (B1)"},
|
|
||||||
{"EP 4 056 049 A1", "EP 4 056 049 (A1)"},
|
|
||||||
{"DE 10 2020 123 456 A1", "DE 10 2020 123 456 (A1)"},
|
|
||||||
{"EP 1 234 567", "EP 1 234 567"},
|
|
||||||
{" EP 1 234 567 B1 ", "EP 1 234 567 (B1)"},
|
|
||||||
{"", ""},
|
|
||||||
{"WO/2023/123456", "WO/2023/123456"},
|
|
||||||
{"EP 1 234 567 B12", "EP 1 234 567 B12"},
|
|
||||||
}
|
|
||||||
for _, tc := range tests {
|
|
||||||
t.Run(tc.in, func(t *testing.T) {
|
|
||||||
got := patentNumberUPC(tc.in)
|
|
||||||
if got != tc.want {
|
|
||||||
t.Errorf("patentNumberUPC(%q) = %q, want %q", tc.in, got, tc.want)
|
|
||||||
}
|
|
||||||
})
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// TestRenderHTML_ExtractsParagraphsAndFormatting verifies the preview
|
// TestRenderHTML_ExtractsParagraphsAndFormatting verifies the preview
|
||||||
// HTML emitter walks <w:p> / <w:r> / <w:t> correctly and carries
|
// HTML emitter walks <w:p> / <w:r> / <w:t> correctly and carries
|
||||||
// bold/italic through to <strong>/<em>. Substituted placeholders are
|
// bold/italic through to <strong>/<em>. Substituted placeholders are
|
||||||
Reference in New Issue
Block a user