refactor(docforge): slice 1 — extract .docx engine to pkg/docforge/docx (t-paliad-349)
Relocate the in-house OOXML machinery out of internal/services into the
first docforge adapter, with zero behaviour change:
submission_merge.go -> pkg/docforge/docx/merge.go (placeholder
substitution renderer + preview-HTML emitter)
submission_md.go -> pkg/docforge/docx/markdown.go (Markdown->OOXML
walker incl. the b78a984 underscore-fix)
submission_render.go -> pkg/docforge/docx/dotm.go (.dotm->.docx)
+ their _test.go files (git-tracked renames, 84-99% identical)
internal/services keeps thin type-alias + forwarder shims
(docforge_shims.go) so every caller in services/handlers/main compiles
and behaves identically: PlaceholderMap, MissingPlaceholderFn,
SubmissionRenderer, HyperlinkAllocator (aliases); NewSubmissionRenderer,
DefaultMissingMarker, RenderMarkdownToOOXML[WithStyles], ConvertDotmToDocx,
SanitiseSubmissionFileName (forwarders). docx.XMLAttrEscape is exported so
submission_compose.go's hyperlink-rels inserts reuse the walker's escaping.
Three mis-filed pretty-printer tests (legalSourcePretty, ourSideDE/EN,
patentNumberUPC) that exercise the vars layer move back to
internal/services/submission_vars_pretty_test.go.
Placeholder grammar + PlaceholderMap stay co-located with the renderer in
docx for now; slice 3 hoists the format-neutral grammar to the docforge
root with the VariableResolver interface.
Verification: go build ./... clean, go vet clean, full module test green
(the byte-exact OOXML golden tests in merge/compose/render pass unchanged
= behaviour preserved). gofmt drift on the moved files is pre-existing
(72/169 services files already drift; no gofmt gate).
m/paliad#157
This commit is contained in:
24
pkg/docforge/doc.go
Normal file
24
pkg/docforge/doc.go
Normal file
@@ -0,0 +1,24 @@
|
||||
// Package docforge is paliad's modular document-generator engine — the
|
||||
// format-neutral core that turns templates + variables into rendered
|
||||
// documents, with format-specific adapters living in sub-packages.
|
||||
//
|
||||
// The package is being extracted from the in-tree submission generator
|
||||
// (internal/services/submission_*.go) per the PRD in
|
||||
// docs/plans/prd-docforge-2026-05-29.md (t-paliad-349 / m/paliad#157).
|
||||
// The extraction follows the same packaging discipline as
|
||||
// pkg/litigationplanner: docforge owns its types and exposes interfaces
|
||||
// for the stateful inputs (variable resolution, template storage); the
|
||||
// consuming application (paliad) implements those interfaces against its
|
||||
// own database, and a future second consumer reaches the engine over an
|
||||
// HTTP veneer rather than importing it.
|
||||
//
|
||||
// Slice 1 (this commit) relocates the .docx adapter — the Markdown→OOXML
|
||||
// walker, the placeholder substitution engine, and the .dotm→.docx
|
||||
// converter — into pkg/docforge/docx with no behaviour change. paliad's
|
||||
// internal/services package keeps thin type-alias + forwarder shims so
|
||||
// the submission generator and its HTTP surface compile and behave
|
||||
// identically. Later slices introduce the neutral document model,
|
||||
// hoist the format-neutral placeholder grammar up to this root package,
|
||||
// and add the VariableResolver interface, the TemplateStore, the
|
||||
// authoring surface, and the pluggable Exporter.
|
||||
package docforge
|
||||
28
pkg/docforge/docx/doc.go
Normal file
28
pkg/docforge/docx/doc.go
Normal file
@@ -0,0 +1,28 @@
|
||||
// Package docx is docforge's .docx (OOXML) adapter — the first
|
||||
// format adapter in the docforge engine (t-paliad-349 / m/paliad#157).
|
||||
//
|
||||
// It owns the in-house OOXML machinery extracted from paliad's submission
|
||||
// generator in slice 1, with no behaviour change:
|
||||
//
|
||||
// - merge.go — the placeholder substitution renderer
|
||||
// (SubmissionRenderer.Render / RenderHTML). Two-pass {{placeholder}}
|
||||
// substitution (single-run, then cross-run merge for fragmented
|
||||
// placeholders), plus the preview-HTML emitter that wraps substituted
|
||||
// values in clickable <span class="draft-var" data-var="…"> markup.
|
||||
// - markdown.go — the Markdown→OOXML walker (RenderMarkdownToOOXML*),
|
||||
// including the b78a984 fix that preserves {{…}} placeholders verbatim
|
||||
// through inline-span parsing (underscores in keys survive).
|
||||
// - dotm.go — ConvertDotmToDocx: strips macros from a .dotm/.docm/
|
||||
// .dotx and rewrites the content-types + rels to a clean .docx,
|
||||
// passing every other part through bit-for-bit.
|
||||
//
|
||||
// Why no third-party docx library: lukasjarosch/go-docx treats sibling
|
||||
// placeholders in one run ("{{a}} ./. {{b}}") as nested and refuses to
|
||||
// replace either; patent submissions routinely have several placeholders
|
||||
// per paragraph, so this in-house renderer is required. See merge.go.
|
||||
//
|
||||
// The placeholder grammar — \{\{\s*([A-Za-z][A-Za-z0-9_.]*)\s*\}\} — and
|
||||
// the PlaceholderMap type currently live here with the renderer; a later
|
||||
// slice hoists the format-neutral grammar up to the docforge root once
|
||||
// the neutral document model and the VariableResolver interface land.
|
||||
package docx
|
||||
204
pkg/docforge/docx/dotm.go
Normal file
204
pkg/docforge/docx/dotm.go
Normal file
@@ -0,0 +1,204 @@
|
||||
package docx
|
||||
|
||||
// Submission .dotm → .docx converter (t-paliad-230, "format-only" scope
|
||||
// reduction of the original t-paliad-215 submission generator).
|
||||
//
|
||||
// Word .dotm (macro-enabled template), .docm (macro-enabled document),
|
||||
// .dotx (template, no macros), and .docx (document, no macros) are all
|
||||
// OOXML zip containers. The macro-bearing variants carry an extra set
|
||||
// of parts:
|
||||
//
|
||||
// word/vbaProject.bin — the VBA project binary
|
||||
// word/_rels/vbaProject.bin.rels — auxiliary relationships
|
||||
// word/vbaData.xml — VBA support data
|
||||
// word/customizations.xml — keyMapCustomizations
|
||||
//
|
||||
// plus a Content-Types override for each of those, a Default extension
|
||||
// declaring all .bin files as vbaProject, and a different "main" content
|
||||
// type for word/document.xml itself.
|
||||
//
|
||||
// ConvertDotmToDocx walks the zip, drops the macro parts, rewrites
|
||||
// [Content_Types].xml and word/_rels/document.xml.rels to remove every
|
||||
// reference to them, and switches the main document content type to
|
||||
// the plain .docx form. Every other part — styles, fonts, theme,
|
||||
// settings, document body, header/footer/numbering, glossary, custom
|
||||
// XML — passes through bit-for-bit at the original compression method
|
||||
// and modification time.
|
||||
//
|
||||
// No variable substitution. Today's slice hands the lawyer the firm
|
||||
// style template as a clean .docx so they can edit and save under
|
||||
// their own filename. The merge-engine slice is deferred.
|
||||
|
||||
import (
|
||||
"archive/zip"
|
||||
"bytes"
|
||||
"fmt"
|
||||
"io"
|
||||
"regexp"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// The four OOXML "main" content types we may see on word/document.xml.
|
||||
// Anything other than docxMainContentType gets rewritten so the output
|
||||
// reads as a plain document.
|
||||
const (
|
||||
dotmMainContentType = "application/vnd.ms-word.template.macroEnabledTemplate.main+xml"
|
||||
docmMainContentType = "application/vnd.ms-word.document.macroEnabled.main+xml"
|
||||
dotxMainContentType = "application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml"
|
||||
docxMainContentType = "application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"
|
||||
)
|
||||
|
||||
// Macro-related parts dropped wholesale from the output zip.
|
||||
var macroParts = map[string]bool{
|
||||
"word/vbaProject.bin": true,
|
||||
"word/_rels/vbaProject.bin.rels": true,
|
||||
"word/vbaData.xml": true,
|
||||
"word/customizations.xml": true,
|
||||
}
|
||||
|
||||
const (
|
||||
contentTypesPath = "[Content_Types].xml"
|
||||
documentRelsPath = "word/_rels/document.xml.rels"
|
||||
)
|
||||
|
||||
// vbaDefaultExtensionRegex matches the `<Default Extension="bin"
|
||||
// ContentType=".../vbaProject"/>` row in [Content_Types].xml. After
|
||||
// vbaProject.bin is dropped, the Default is dead weight (and Word will
|
||||
// flag the file as macro-bearing if it survives).
|
||||
var vbaDefaultExtensionRegex = regexp.MustCompile(
|
||||
`\s*<Default\b[^>]*\bExtension\s*=\s*"bin"[^>]*\bContentType\s*=\s*"application/vnd\.ms-office\.vbaProject"[^>]*/>`,
|
||||
)
|
||||
|
||||
// macroOverridePartRegex matches any <Override PartName="…"/> element
|
||||
// whose PartName is one of the dropped macro parts. The /word/
|
||||
// prefix is the OOXML convention for the absolute part path in
|
||||
// [Content_Types].xml — file paths in the zip itself omit the leading
|
||||
// slash.
|
||||
var macroOverridePartRegex = regexp.MustCompile(
|
||||
`\s*<Override\b[^>]*\bPartName\s*=\s*"/word/(?:vbaProject\.bin|vbaData\.xml|customizations\.xml)"[^>]*/>`,
|
||||
)
|
||||
|
||||
// macroRelTypeRegex matches the two macro-related relationship Types
|
||||
// in word/_rels/document.xml.rels: vbaProject (binds to vbaProject.bin)
|
||||
// and keyMapCustomizations (binds to customizations.xml). After both
|
||||
// targets are dropped, leaving the relationships in would make Word
|
||||
// flag the file as corrupt.
|
||||
var macroRelTypeRegex = regexp.MustCompile(
|
||||
`\s*<Relationship\b[^>]*\bType\s*=\s*"http://schemas\.microsoft\.com/office/2006/relationships/(?:vbaProject|keyMapCustomizations)"[^>]*/>`,
|
||||
)
|
||||
|
||||
// ConvertDotmToDocx rewrites a .dotm (or .docm, or .dotx) zip into a
|
||||
// clean .docx zip. Idempotent on a zip that is already a plain .docx.
|
||||
// Returns an error if the input is not a valid zip.
|
||||
func ConvertDotmToDocx(dotmBytes []byte) ([]byte, error) {
|
||||
zr, err := zip.NewReader(bytes.NewReader(dotmBytes), int64(len(dotmBytes)))
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("dotm→docx: open zip: %w", err)
|
||||
}
|
||||
|
||||
var out bytes.Buffer
|
||||
zw := zip.NewWriter(&out)
|
||||
|
||||
for _, entry := range zr.File {
|
||||
if macroParts[entry.Name] {
|
||||
continue
|
||||
}
|
||||
|
||||
body, err := readZipFile(entry)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("dotm→docx: read %s: %w", entry.Name, err)
|
||||
}
|
||||
|
||||
switch entry.Name {
|
||||
case contentTypesPath:
|
||||
body = rewriteContentTypes(body)
|
||||
case documentRelsPath:
|
||||
body = rewriteDocumentRels(body)
|
||||
}
|
||||
|
||||
w, err := zw.CreateHeader(&zip.FileHeader{
|
||||
Name: entry.Name,
|
||||
Method: entry.Method,
|
||||
Modified: entry.Modified,
|
||||
})
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("dotm→docx: write header %s: %w", entry.Name, err)
|
||||
}
|
||||
if _, err := w.Write(body); err != nil {
|
||||
return nil, fmt.Errorf("dotm→docx: write body %s: %w", entry.Name, err)
|
||||
}
|
||||
}
|
||||
|
||||
if err := zw.Close(); err != nil {
|
||||
return nil, fmt.Errorf("dotm→docx: finalise zip: %w", err)
|
||||
}
|
||||
return out.Bytes(), nil
|
||||
}
|
||||
|
||||
// rewriteContentTypes demotes any of the three non-docx "main" content
|
||||
// types to plain docx, drops the bin Default-Extension entry, and
|
||||
// drops every Override that targeted a dropped macro part.
|
||||
//
|
||||
// String-level substitution rather than encoding/xml: round-tripping
|
||||
// through Go's XML marshaller would re-emit the document with
|
||||
// canonical namespace declarations on every child, which Word reads
|
||||
// but which makes the binary diff unnecessarily large. Direct
|
||||
// substitution preserves the file's original shape.
|
||||
func rewriteContentTypes(body []byte) []byte {
|
||||
body = bytes.ReplaceAll(body, []byte(dotmMainContentType), []byte(docxMainContentType))
|
||||
body = bytes.ReplaceAll(body, []byte(docmMainContentType), []byte(docxMainContentType))
|
||||
body = bytes.ReplaceAll(body, []byte(dotxMainContentType), []byte(docxMainContentType))
|
||||
body = vbaDefaultExtensionRegex.ReplaceAll(body, nil)
|
||||
body = macroOverridePartRegex.ReplaceAll(body, nil)
|
||||
return body
|
||||
}
|
||||
|
||||
// rewriteDocumentRels drops the two macro-related relationships from
|
||||
// word/_rels/document.xml.rels (vbaProject + keyMapCustomizations) so
|
||||
// the manifest no longer points at parts the zip no longer carries.
|
||||
// Every other relationship — styles, settings, numbering, theme,
|
||||
// headers/footers, customXml — passes through untouched.
|
||||
func rewriteDocumentRels(body []byte) []byte {
|
||||
return macroRelTypeRegex.ReplaceAll(body, nil)
|
||||
}
|
||||
|
||||
// readZipFile slurps a zip entry's bytes.
|
||||
func readZipFile(f *zip.File) ([]byte, error) {
|
||||
rc, err := f.Open()
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
defer rc.Close()
|
||||
return io.ReadAll(rc)
|
||||
}
|
||||
|
||||
// SanitiseSubmissionFileName cleans a string for use inside a download
|
||||
// filename — strips path separators and quote characters that would
|
||||
// break Content-Disposition or confuse browsers across OSes. ASCII-folds
|
||||
// the small set of German umlaut letters that show up in submission
|
||||
// names today (Klageerwiderung, Berufungsbegründung, …) so the file
|
||||
// lands cleanly on legacy SMB shares whose layer is still cp1252.
|
||||
// Other Unicode is preserved so non-DE/EN names still produce a
|
||||
// recognisable file.
|
||||
func SanitiseSubmissionFileName(s string) string {
|
||||
s = strings.TrimSpace(s)
|
||||
s = umlautFolder.Replace(s)
|
||||
s = strings.Map(func(r rune) rune {
|
||||
switch r {
|
||||
case '/', '\\':
|
||||
return '_'
|
||||
case '"', '\'':
|
||||
return -1
|
||||
}
|
||||
return r
|
||||
}, s)
|
||||
return s
|
||||
}
|
||||
|
||||
// umlautFolder turns the four DE umlaut letters (both cases) into ASCII
|
||||
// digraphs; ß → ss.
|
||||
var umlautFolder = strings.NewReplacer(
|
||||
"ä", "ae", "ö", "oe", "ü", "ue",
|
||||
"Ä", "Ae", "Ö", "Oe", "Ü", "Ue",
|
||||
"ß", "ss",
|
||||
)
|
||||
255
pkg/docforge/docx/dotm_test.go
Normal file
255
pkg/docforge/docx/dotm_test.go
Normal file
@@ -0,0 +1,255 @@
|
||||
package docx
|
||||
|
||||
import (
|
||||
"archive/zip"
|
||||
"bytes"
|
||||
"io"
|
||||
"strings"
|
||||
"testing"
|
||||
"time"
|
||||
)
|
||||
|
||||
// minimalDOTM builds a small .dotm zip whose shape mirrors the real
|
||||
// HL Patents Style template: macro-enabled main content type, Default
|
||||
// extension declaring .bin as vbaProject, Overrides for vbaData.xml +
|
||||
// customizations.xml, document.xml.rels with vbaProject +
|
||||
// keyMapCustomizations relationships, and the four macro parts on
|
||||
// disk (vbaProject.bin + auxiliary rels + vbaData.xml +
|
||||
// customizations.xml).
|
||||
//
|
||||
// In-memory so the test is self-contained (no checked-in binary).
|
||||
// Word and LibreOffice would reject this minimal file as incomplete
|
||||
// (no _rels/.rels root manifest); the tests work at the byte level
|
||||
// and assert structural properties of the converted output.
|
||||
func minimalDOTM(t *testing.T) []byte {
|
||||
t.Helper()
|
||||
var buf bytes.Buffer
|
||||
zw := zip.NewWriter(&buf)
|
||||
add := func(name, body string) {
|
||||
t.Helper()
|
||||
w, err := zw.CreateHeader(&zip.FileHeader{
|
||||
Name: name,
|
||||
Method: zip.Deflate,
|
||||
Modified: time.Date(2026, 5, 21, 12, 0, 0, 0, time.UTC),
|
||||
})
|
||||
if err != nil {
|
||||
t.Fatalf("zip header %s: %v", name, err)
|
||||
}
|
||||
if _, err := io.WriteString(w, body); err != nil {
|
||||
t.Fatalf("write %s: %v", name, err)
|
||||
}
|
||||
}
|
||||
|
||||
add(contentTypesPath, `<?xml version="1.0" encoding="UTF-8" standalone="yes"?>`+
|
||||
`<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">`+
|
||||
`<Default Extension="bin" ContentType="application/vnd.ms-office.vbaProject"/>`+
|
||||
`<Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>`+
|
||||
`<Default Extension="xml" ContentType="application/xml"/>`+
|
||||
`<Override PartName="/word/document.xml" ContentType="`+dotmMainContentType+`"/>`+
|
||||
`<Override PartName="/word/customizations.xml" ContentType="application/vnd.ms-word.keyMapCustomizations+xml"/>`+
|
||||
`<Override PartName="/word/vbaData.xml" ContentType="application/vnd.ms-word.vbaData+xml"/>`+
|
||||
`<Override PartName="/word/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml"/>`+
|
||||
`</Types>`)
|
||||
|
||||
add("word/document.xml",
|
||||
`<?xml version="1.0" encoding="UTF-8" standalone="yes"?>`+
|
||||
`<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">`+
|
||||
`<w:body><w:p><w:r><w:t>Hello Paliad</w:t></w:r></w:p></w:body></w:document>`)
|
||||
|
||||
add(documentRelsPath,
|
||||
`<?xml version="1.0" encoding="UTF-8" standalone="yes"?>`+
|
||||
`<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">`+
|
||||
`<Relationship Id="rId1" Type="http://schemas.microsoft.com/office/2006/relationships/vbaProject" Target="vbaProject.bin"/>`+
|
||||
`<Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="styles.xml"/>`+
|
||||
`<Relationship Id="rId3" Type="http://schemas.microsoft.com/office/2006/relationships/keyMapCustomizations" Target="customizations.xml"/>`+
|
||||
`</Relationships>`)
|
||||
|
||||
add("word/styles.xml", `<w:styles xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"/>`)
|
||||
add("word/vbaProject.bin", "PRETEND-VBA-BINARY-PAYLOAD")
|
||||
add("word/_rels/vbaProject.bin.rels", `<?xml version="1.0"?><Relationships/>`)
|
||||
add("word/vbaData.xml", `<?xml version="1.0"?><wne:vbaSuppData xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml"/>`)
|
||||
add("word/customizations.xml", `<?xml version="1.0"?><wne:tcg xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml"/>`)
|
||||
|
||||
if err := zw.Close(); err != nil {
|
||||
t.Fatalf("close zip: %v", err)
|
||||
}
|
||||
return buf.Bytes()
|
||||
}
|
||||
|
||||
func unzipEntries(t *testing.T, data []byte) map[string]string {
|
||||
t.Helper()
|
||||
zr, err := zip.NewReader(bytes.NewReader(data), int64(len(data)))
|
||||
if err != nil {
|
||||
t.Fatalf("open output zip: %v", err)
|
||||
}
|
||||
out := make(map[string]string, len(zr.File))
|
||||
for _, f := range zr.File {
|
||||
rc, err := f.Open()
|
||||
if err != nil {
|
||||
t.Fatalf("open %s: %v", f.Name, err)
|
||||
}
|
||||
body, err := io.ReadAll(rc)
|
||||
rc.Close()
|
||||
if err != nil {
|
||||
t.Fatalf("read %s: %v", f.Name, err)
|
||||
}
|
||||
out[f.Name] = string(body)
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
func TestConvertDotmToDocx_StripsMacroParts(t *testing.T) {
|
||||
dotm := minimalDOTM(t)
|
||||
out, err := ConvertDotmToDocx(dotm)
|
||||
if err != nil {
|
||||
t.Fatalf("ConvertDotmToDocx: %v", err)
|
||||
}
|
||||
|
||||
entries := unzipEntries(t, out)
|
||||
|
||||
for _, name := range []string{
|
||||
"word/vbaProject.bin",
|
||||
"word/_rels/vbaProject.bin.rels",
|
||||
"word/vbaData.xml",
|
||||
"word/customizations.xml",
|
||||
} {
|
||||
if _, ok := entries[name]; ok {
|
||||
t.Errorf("output still contains %s", name)
|
||||
}
|
||||
}
|
||||
if doc, ok := entries["word/document.xml"]; !ok {
|
||||
t.Error("output is missing word/document.xml")
|
||||
} else if !strings.Contains(doc, "Hello Paliad") {
|
||||
t.Errorf("document body lost during conversion: %q", doc)
|
||||
}
|
||||
if _, ok := entries["word/styles.xml"]; !ok {
|
||||
t.Error("output lost unrelated word/styles.xml")
|
||||
}
|
||||
|
||||
ctypes, ok := entries[contentTypesPath]
|
||||
if !ok {
|
||||
t.Fatal("output is missing [Content_Types].xml")
|
||||
}
|
||||
if strings.Contains(ctypes, "macroEnabled") {
|
||||
t.Errorf("output [Content_Types].xml still references a macro-enabled type: %q", ctypes)
|
||||
}
|
||||
if !strings.Contains(ctypes, docxMainContentType) {
|
||||
t.Errorf("output is missing plain docx main content type: %q", ctypes)
|
||||
}
|
||||
if strings.Contains(ctypes, "vbaProject") {
|
||||
t.Errorf("output [Content_Types].xml still references vbaProject: %q", ctypes)
|
||||
}
|
||||
if strings.Contains(ctypes, "vbaData") {
|
||||
t.Errorf("output [Content_Types].xml still overrides vbaData: %q", ctypes)
|
||||
}
|
||||
if strings.Contains(ctypes, "keyMapCustomizations") {
|
||||
t.Errorf("output [Content_Types].xml still overrides customizations: %q", ctypes)
|
||||
}
|
||||
if !strings.Contains(ctypes, "wordprocessingml.styles") {
|
||||
t.Errorf("output lost unrelated styles Override: %q", ctypes)
|
||||
}
|
||||
|
||||
rels, ok := entries[documentRelsPath]
|
||||
if !ok {
|
||||
t.Fatal("output is missing word/_rels/document.xml.rels")
|
||||
}
|
||||
if strings.Contains(rels, "vbaProject") {
|
||||
t.Errorf("output rels still references vbaProject: %q", rels)
|
||||
}
|
||||
if strings.Contains(rels, "keyMapCustomizations") {
|
||||
t.Errorf("output rels still references keyMapCustomizations: %q", rels)
|
||||
}
|
||||
if !strings.Contains(rels, "styles.xml") {
|
||||
t.Errorf("output rels lost unrelated styles relationship: %q", rels)
|
||||
}
|
||||
}
|
||||
|
||||
func TestConvertDotmToDocx_IdempotentOnPlainDocx(t *testing.T) {
|
||||
var buf bytes.Buffer
|
||||
zw := zip.NewWriter(&buf)
|
||||
add := func(name, body string) {
|
||||
w, err := zw.Create(name)
|
||||
if err != nil {
|
||||
t.Fatalf("create %s: %v", name, err)
|
||||
}
|
||||
if _, err := io.WriteString(w, body); err != nil {
|
||||
t.Fatalf("write %s: %v", name, err)
|
||||
}
|
||||
}
|
||||
add(contentTypesPath, `<?xml version="1.0"?>`+
|
||||
`<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">`+
|
||||
`<Override PartName="/word/document.xml" ContentType="`+docxMainContentType+`"/>`+
|
||||
`</Types>`)
|
||||
add("word/document.xml", `<w:document/>`)
|
||||
if err := zw.Close(); err != nil {
|
||||
t.Fatalf("close: %v", err)
|
||||
}
|
||||
|
||||
out, err := ConvertDotmToDocx(buf.Bytes())
|
||||
if err != nil {
|
||||
t.Fatalf("ConvertDotmToDocx: %v", err)
|
||||
}
|
||||
|
||||
entries := unzipEntries(t, out)
|
||||
if _, ok := entries["word/vbaProject.bin"]; ok {
|
||||
t.Error("plain docx grew a vbaProject during conversion")
|
||||
}
|
||||
if ctypes := entries[contentTypesPath]; !strings.Contains(ctypes, docxMainContentType) {
|
||||
t.Errorf("plain docx lost its content type: %q", ctypes)
|
||||
}
|
||||
}
|
||||
|
||||
func TestConvertDotmToDocx_AcceptsDocmAndDotx(t *testing.T) {
|
||||
for _, mainType := range []string{docmMainContentType, dotxMainContentType} {
|
||||
t.Run(mainType, func(t *testing.T) {
|
||||
var buf bytes.Buffer
|
||||
zw := zip.NewWriter(&buf)
|
||||
add := func(name, body string) {
|
||||
w, _ := zw.Create(name)
|
||||
_, _ = io.WriteString(w, body)
|
||||
}
|
||||
add(contentTypesPath, `<?xml version="1.0"?>`+
|
||||
`<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">`+
|
||||
`<Override PartName="/word/document.xml" ContentType="`+mainType+`"/>`+
|
||||
`</Types>`)
|
||||
add("word/document.xml", `<w:document/>`)
|
||||
zw.Close()
|
||||
out, err := ConvertDotmToDocx(buf.Bytes())
|
||||
if err != nil {
|
||||
t.Fatalf("ConvertDotmToDocx: %v", err)
|
||||
}
|
||||
ctypes := unzipEntries(t, out)[contentTypesPath]
|
||||
if strings.Contains(ctypes, mainType) {
|
||||
t.Errorf("non-docx main type survived conversion: %q", ctypes)
|
||||
}
|
||||
if !strings.Contains(ctypes, docxMainContentType) {
|
||||
t.Errorf("docx main type not present: %q", ctypes)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestConvertDotmToDocx_RejectsNonZip(t *testing.T) {
|
||||
_, err := ConvertDotmToDocx([]byte("not a zip file"))
|
||||
if err == nil {
|
||||
t.Fatal("expected error for non-zip input, got nil")
|
||||
}
|
||||
}
|
||||
|
||||
func TestSanitiseSubmissionFileName(t *testing.T) {
|
||||
cases := map[string]string{
|
||||
"Klageerwiderung": "Klageerwiderung",
|
||||
"Berufungsbegründung": "Berufungsbegruendung",
|
||||
"Schriftsatz/Anlage": "Schriftsatz_Anlage",
|
||||
`Statement of "Defence"`: "Statement of Defence",
|
||||
` Klage `: "Klage",
|
||||
"Größe": "Groesse",
|
||||
}
|
||||
for in, want := range cases {
|
||||
t.Run(in, func(t *testing.T) {
|
||||
if got := SanitiseSubmissionFileName(in); got != want {
|
||||
t.Errorf("SanitiseSubmissionFileName(%q) = %q, want %q", in, got, want)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
511
pkg/docforge/docx/markdown.go
Normal file
511
pkg/docforge/docx/markdown.go
Normal file
@@ -0,0 +1,511 @@
|
||||
package docx
|
||||
|
||||
// Markdown → OOXML walker for Composer section content (t-paliad-313
|
||||
// Slice B, design doc §9.2).
|
||||
//
|
||||
// Scope per the head's Slice B brief: paragraphs + inline bold/italic
|
||||
// only. Headings, lists, blockquote, links land in Slice D's rich-prose
|
||||
// pass. This walker is intentionally minimal — every Markdown construct
|
||||
// it doesn't recognise is rendered as a plain paragraph so the lawyer's
|
||||
// prose round-trips losslessly even when they hit Markdown the walker
|
||||
// doesn't yet understand.
|
||||
//
|
||||
// The output uses the base's stylemap.paragraph entry for the
|
||||
// <w:pStyle> on each paragraph so the styling matches the base's
|
||||
// typography (HLpat-Body-B0 on the HLC base, Normal on the neutral
|
||||
// base, etc.).
|
||||
//
|
||||
// Placeholders ({{path.dot.notation}}) are preserved verbatim — they
|
||||
// pass through the walker untouched and get substituted by the v1
|
||||
// SubmissionRenderer's placeholder pass after the composer assembly.
|
||||
//
|
||||
// Grammar supported:
|
||||
//
|
||||
// - Blank line → paragraph break
|
||||
// - `**bold**` → <w:r><w:rPr><w:b/></w:rPr><w:t>…</w:t></w:r>
|
||||
// - `*italic*` or `_italic_` → <w:r><w:rPr><w:i/></w:rPr>…</w:r>
|
||||
// - Otherwise → plain text run
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// HyperlinkAllocator hands the walker a `rId` for each external URL
|
||||
// it encounters in `[label](url)` inline links. The composer's
|
||||
// post-pass uses these allocations to mutate
|
||||
// `word/_rels/document.xml.rels` so the emitted `<w:hyperlink
|
||||
// r:id="…">` elements resolve correctly. Pass nil to drop links to
|
||||
// plain text (the label survives, the URL doesn't render).
|
||||
//
|
||||
// t-paliad-316 Slice D.
|
||||
type HyperlinkAllocator func(url string) string
|
||||
|
||||
// RenderMarkdownToOOXML renders the given Markdown source into OOXML
|
||||
// paragraph elements (`<w:p>…</w:p>`), suitable for splicing into a
|
||||
// .docx body. Each paragraph carries `<w:pStyle w:val="<paragraphStyle>"/>`
|
||||
// when paragraphStyle is non-empty.
|
||||
//
|
||||
// Slice B shipped paragraphs + bold/italic. Slice D extends to
|
||||
// headings (h1/h2/h3), bullet/numbered lists, blockquote, and inline
|
||||
// hyperlinks via the optional HyperlinkAllocator.
|
||||
//
|
||||
// stylemap supplies the paragraph-style names for each kind:
|
||||
// stylemap["paragraph"] — default body
|
||||
// stylemap["heading_1/2/3"] — heading levels
|
||||
// stylemap["list_bullet"] — bullet list paragraph style
|
||||
// stylemap["list_numbered"] — numbered list paragraph style
|
||||
// stylemap["blockquote"] — blockquote
|
||||
// Missing entries fall back to the "paragraph" style.
|
||||
//
|
||||
// Empty input renders one empty paragraph so the splice site is
|
||||
// well-formed even when the lawyer hasn't typed anything in this
|
||||
// section.
|
||||
func RenderMarkdownToOOXML(md, paragraphStyle string) string {
|
||||
return RenderMarkdownToOOXMLWithStyles(md, map[string]string{"paragraph": paragraphStyle}, nil)
|
||||
}
|
||||
|
||||
// RenderMarkdownToOOXMLWithStyles is the full Slice-D-aware entry
|
||||
// point. Slice B's RenderMarkdownToOOXML is a wrapper for back-compat.
|
||||
func RenderMarkdownToOOXMLWithStyles(md string, stylemap map[string]string, links HyperlinkAllocator) string {
|
||||
defaultStyle := stylemap["paragraph"]
|
||||
if md == "" {
|
||||
return emptyParagraph(defaultStyle)
|
||||
}
|
||||
blocks := splitMarkdownBlocks(md)
|
||||
if len(blocks) == 0 {
|
||||
return emptyParagraph(defaultStyle)
|
||||
}
|
||||
// Numbered-list counter resets on every non-numbered block so
|
||||
// "1. A\n2. B\n\n1. C" renders as 1./2./1. (the lawyer's input
|
||||
// determined the ordinal, the walker just renders).
|
||||
numberedCounter := 0
|
||||
var b strings.Builder
|
||||
for _, blk := range blocks {
|
||||
style := stylemap[blk.styleKey]
|
||||
if style == "" {
|
||||
style = defaultStyle
|
||||
}
|
||||
if blk.styleKey == "list_numbered" {
|
||||
numberedCounter++
|
||||
} else {
|
||||
numberedCounter = 0
|
||||
}
|
||||
b.WriteString(renderBlockParagraph(blk, style, links, numberedCounter))
|
||||
}
|
||||
return b.String()
|
||||
}
|
||||
|
||||
// mdBlock is one rendered paragraph: a kind (paragraph / heading_*
|
||||
// / list_bullet / list_numbered / blockquote) and the inline content
|
||||
// text. List markers, heading hashes, blockquote `> ` etc. are
|
||||
// stripped from the text before storage.
|
||||
type mdBlock struct {
|
||||
styleKey string // "paragraph" | "heading_1" | "heading_2" | "heading_3" | "list_bullet" | "list_numbered" | "blockquote"
|
||||
text string
|
||||
}
|
||||
|
||||
// splitMarkdownBlocks parses the source into a sequence of blocks,
|
||||
// detecting heading / list / blockquote prefixes line-by-line. Blank
|
||||
// lines split paragraph runs (same semantics as splitMarkdownParagraphs)
|
||||
// but each line is also tagged with its block kind.
|
||||
//
|
||||
// Lines that look like block markers don't merge with their neighbours
|
||||
// even across blank lines — every list / heading / blockquote line is
|
||||
// its own block in the output. A run of unmarked lines collapses into
|
||||
// one "paragraph" block (so soft line breaks inside a paragraph still
|
||||
// concatenate).
|
||||
//
|
||||
// CRLF normalised to LF before parsing.
|
||||
func splitMarkdownBlocks(md string) []mdBlock {
|
||||
normalised := strings.ReplaceAll(md, "\r\n", "\n")
|
||||
lines := strings.Split(normalised, "\n")
|
||||
var blocks []mdBlock
|
||||
var pendingPara []string
|
||||
blankRun := 0
|
||||
|
||||
flushPara := func() {
|
||||
if len(pendingPara) > 0 {
|
||||
blocks = append(blocks, mdBlock{styleKey: "paragraph", text: strings.Join(pendingPara, "\n")})
|
||||
pendingPara = nil
|
||||
}
|
||||
}
|
||||
|
||||
for _, raw := range lines {
|
||||
line := raw
|
||||
if strings.TrimSpace(line) == "" {
|
||||
if len(pendingPara) > 0 {
|
||||
flushPara()
|
||||
blankRun = 1
|
||||
continue
|
||||
}
|
||||
blankRun++
|
||||
continue
|
||||
}
|
||||
// Detect heading / list / blockquote markers BEFORE we accumulate
|
||||
// into the paragraph buffer.
|
||||
kind, payload, ok := detectBlockMarker(line)
|
||||
if ok {
|
||||
flushPara()
|
||||
// Emit spacing paragraphs equivalent to (blankRun - 1) extra.
|
||||
for i := 1; i < blankRun; i++ {
|
||||
blocks = append(blocks, mdBlock{styleKey: "paragraph", text: ""})
|
||||
}
|
||||
blankRun = 0
|
||||
blocks = append(blocks, mdBlock{styleKey: kind, text: payload})
|
||||
continue
|
||||
}
|
||||
// Plain paragraph line.
|
||||
if len(pendingPara) == 0 {
|
||||
// Starting a new paragraph after a blank run — emit
|
||||
// (blankRun-1) extra empty paragraphs for vertical spacing.
|
||||
for i := 1; i < blankRun; i++ {
|
||||
blocks = append(blocks, mdBlock{styleKey: "paragraph", text: ""})
|
||||
}
|
||||
}
|
||||
blankRun = 0
|
||||
pendingPara = append(pendingPara, line)
|
||||
}
|
||||
flushPara()
|
||||
return blocks
|
||||
}
|
||||
|
||||
// detectBlockMarker classifies a single line. Returns (styleKey,
|
||||
// payload-with-marker-stripped, true) for recognised markers; false
|
||||
// for plain paragraph lines.
|
||||
//
|
||||
// Recognised markers (Slice D):
|
||||
// # Heading → heading_1
|
||||
// ## Heading → heading_2
|
||||
// ### Heading → heading_3
|
||||
// - item / * item → list_bullet
|
||||
// 1. item / 2. item ... → list_numbered (any positive integer)
|
||||
// > quote → blockquote
|
||||
//
|
||||
// Leading whitespace inside the line is tolerated up to 3 spaces (per
|
||||
// CommonMark) so the lawyer's contentEditable indentation doesn't
|
||||
// hide the marker.
|
||||
func detectBlockMarker(line string) (string, string, bool) {
|
||||
trimmed := strings.TrimLeft(line, " ")
|
||||
// Cap to 3 spaces of leading indent — beyond that, treat as a
|
||||
// regular paragraph line (matches CommonMark).
|
||||
if len(line)-len(trimmed) > 3 {
|
||||
return "", "", false
|
||||
}
|
||||
if strings.HasPrefix(trimmed, "### ") {
|
||||
return "heading_3", strings.TrimSpace(trimmed[4:]), true
|
||||
}
|
||||
if strings.HasPrefix(trimmed, "## ") {
|
||||
return "heading_2", strings.TrimSpace(trimmed[3:]), true
|
||||
}
|
||||
if strings.HasPrefix(trimmed, "# ") {
|
||||
return "heading_1", strings.TrimSpace(trimmed[2:]), true
|
||||
}
|
||||
if strings.HasPrefix(trimmed, "> ") {
|
||||
return "blockquote", strings.TrimSpace(trimmed[2:]), true
|
||||
}
|
||||
if strings.HasPrefix(trimmed, "- ") || strings.HasPrefix(trimmed, "* ") {
|
||||
return "list_bullet", strings.TrimSpace(trimmed[2:]), true
|
||||
}
|
||||
// Numbered: "N. " where N is one or more digits.
|
||||
if i := indexOfNumberedMarker(trimmed); i > 0 {
|
||||
return "list_numbered", strings.TrimSpace(trimmed[i:]), true
|
||||
}
|
||||
return "", "", false
|
||||
}
|
||||
|
||||
// indexOfNumberedMarker checks for "N. " or "N) " at the start of the
|
||||
// trimmed line; returns the byte index just past the marker, or -1 if
|
||||
// no marker present.
|
||||
func indexOfNumberedMarker(s string) int {
|
||||
i := 0
|
||||
for i < len(s) && s[i] >= '0' && s[i] <= '9' {
|
||||
i++
|
||||
}
|
||||
if i == 0 {
|
||||
return -1
|
||||
}
|
||||
if i >= len(s) {
|
||||
return -1
|
||||
}
|
||||
if s[i] != '.' && s[i] != ')' {
|
||||
return -1
|
||||
}
|
||||
if i+1 >= len(s) || s[i+1] != ' ' {
|
||||
return -1
|
||||
}
|
||||
return i + 2
|
||||
}
|
||||
|
||||
// renderBlockParagraph emits one `<w:p>` for a block. List blocks
|
||||
// keep the same paragraph style as a default paragraph (the Slice D
|
||||
// design's contract — list styles come from the base's stylemap and
|
||||
// Word's numbering.xml is honoured by adding a leading bullet/number
|
||||
// prefix in the rendered text). This keeps the composer free of
|
||||
// numbering.xml mutations.
|
||||
func renderBlockParagraph(blk mdBlock, paragraphStyle string, links HyperlinkAllocator, numberedOrdinal int) string {
|
||||
var b strings.Builder
|
||||
b.WriteString(`<w:p>`)
|
||||
if paragraphStyle != "" {
|
||||
b.WriteString(`<w:pPr><w:pStyle w:val="`)
|
||||
b.WriteString(xmlAttrEscape(paragraphStyle))
|
||||
b.WriteString(`"/></w:pPr>`)
|
||||
}
|
||||
if blk.text == "" {
|
||||
b.WriteString(`<w:r><w:t xml:space="preserve"></w:t></w:r>`)
|
||||
b.WriteString(`</w:p>`)
|
||||
return b.String()
|
||||
}
|
||||
text := blk.text
|
||||
// List blocks emit a visible "• " / "N. " prefix run. The
|
||||
// stylemap entry handles paragraph indentation if the base
|
||||
// defines a list paragraph style; otherwise the prefix at least
|
||||
// surfaces the structure in plain Word. Lawyers who want Word's
|
||||
// auto-numbering reapply a list style post-export.
|
||||
switch blk.styleKey {
|
||||
case "list_bullet":
|
||||
b.WriteString(`<w:r><w:t xml:space="preserve">• </w:t></w:r>`)
|
||||
case "list_numbered":
|
||||
ordinal := numberedOrdinal
|
||||
if ordinal <= 0 {
|
||||
ordinal = 1
|
||||
}
|
||||
b.WriteString(`<w:r><w:t xml:space="preserve">`)
|
||||
b.WriteString(fmt.Sprintf("%d. ", ordinal))
|
||||
b.WriteString(`</w:t></w:r>`)
|
||||
}
|
||||
for _, run := range parseInlineRuns(text, links) {
|
||||
b.WriteString(run)
|
||||
}
|
||||
b.WriteString(`</w:p>`)
|
||||
return b.String()
|
||||
}
|
||||
|
||||
// parseInlineRuns extracts inline spans + hyperlink runs and serialises
|
||||
// each to OOXML. Hyperlinks become `<w:hyperlink r:id="RID">…runs…</w:hyperlink>`
|
||||
// where RID comes from the HyperlinkAllocator.
|
||||
func parseInlineRuns(text string, links HyperlinkAllocator) []string {
|
||||
// Phase 1: find all hyperlink spans `[label](url)` and split the
|
||||
// text around them.
|
||||
type segment struct {
|
||||
text string
|
||||
isLink bool
|
||||
url string
|
||||
}
|
||||
var segs []segment
|
||||
rest := text
|
||||
for {
|
||||
idx := strings.Index(rest, "[")
|
||||
if idx < 0 {
|
||||
if rest != "" {
|
||||
segs = append(segs, segment{text: rest})
|
||||
}
|
||||
break
|
||||
}
|
||||
// Find matching closing bracket, then a "(" right after.
|
||||
closeBracket := strings.Index(rest[idx:], "](")
|
||||
if closeBracket < 0 {
|
||||
segs = append(segs, segment{text: rest})
|
||||
break
|
||||
}
|
||||
closeParen := strings.Index(rest[idx+closeBracket:], ")")
|
||||
if closeParen < 0 {
|
||||
segs = append(segs, segment{text: rest})
|
||||
break
|
||||
}
|
||||
// idx = start of "["
|
||||
// idx+closeBracket = position of "]"
|
||||
// idx+closeBracket+1 = position of "("
|
||||
// idx+closeBracket+closeParen = position of ")"
|
||||
label := rest[idx+1 : idx+closeBracket]
|
||||
url := rest[idx+closeBracket+2 : idx+closeBracket+closeParen]
|
||||
if idx > 0 {
|
||||
segs = append(segs, segment{text: rest[:idx]})
|
||||
}
|
||||
segs = append(segs, segment{text: label, isLink: true, url: url})
|
||||
rest = rest[idx+closeBracket+closeParen+1:]
|
||||
}
|
||||
|
||||
var runs []string
|
||||
for _, seg := range segs {
|
||||
if seg.isLink && links != nil {
|
||||
rid := links(seg.url)
|
||||
if rid != "" {
|
||||
var hb strings.Builder
|
||||
hb.WriteString(`<w:hyperlink r:id="`)
|
||||
hb.WriteString(xmlAttrEscape(rid))
|
||||
hb.WriteString(`">`)
|
||||
for _, span := range parseInlineSpans(seg.text) {
|
||||
hb.WriteString(renderRunWithLinkStyle(span))
|
||||
}
|
||||
hb.WriteString(`</w:hyperlink>`)
|
||||
runs = append(runs, hb.String())
|
||||
continue
|
||||
}
|
||||
}
|
||||
for _, span := range parseInlineSpans(seg.text) {
|
||||
runs = append(runs, renderRun(span))
|
||||
}
|
||||
}
|
||||
return runs
|
||||
}
|
||||
|
||||
// renderRunWithLinkStyle emits a hyperlink child run. Same B/I support
|
||||
// as renderRun, but additionally tags the run with the "Hyperlink"
|
||||
// character style (Word's built-in) so the link renders in the
|
||||
// document's hyperlink colour + underline.
|
||||
func renderRunWithLinkStyle(span inlineSpan) string {
|
||||
var b strings.Builder
|
||||
b.WriteString(`<w:r><w:rPr><w:rStyle w:val="Hyperlink"/>`)
|
||||
if span.Bold {
|
||||
b.WriteString(`<w:b/>`)
|
||||
}
|
||||
if span.Italic {
|
||||
b.WriteString(`<w:i/>`)
|
||||
}
|
||||
b.WriteString(`</w:rPr><w:t xml:space="preserve">`)
|
||||
b.WriteString(xmlTextEscape(span.Text))
|
||||
b.WriteString(`</w:t></w:r>`)
|
||||
return b.String()
|
||||
}
|
||||
|
||||
// inlineSpan is one piece of inline content: a text payload plus
|
||||
// formatting flags. Bold and italic are independent — `***both***`
|
||||
// produces one span with both flags set.
|
||||
type inlineSpan struct {
|
||||
Text string
|
||||
Bold bool
|
||||
Italic bool
|
||||
}
|
||||
|
||||
// parseInlineSpans tokenises Markdown inline formatting into runs of
|
||||
// (text, bold, italic). The grammar is intentionally narrow:
|
||||
//
|
||||
// - `**…**` → bold
|
||||
// - `__…__` → bold (Markdown alternate)
|
||||
// - `*…*` → italic
|
||||
// - `_…_` → italic (Markdown alternate)
|
||||
// - Anything else flows through as plain text.
|
||||
//
|
||||
// Unbalanced delimiters fall through as literal characters — the
|
||||
// walker never errors on malformed Markdown. Nested formatting (e.g.
|
||||
// `**bold *bold-italic* bold**`) toggles flags as it walks.
|
||||
func parseInlineSpans(text string) []inlineSpan {
|
||||
var out []inlineSpan
|
||||
var cur strings.Builder
|
||||
bold := false
|
||||
italic := false
|
||||
flush := func() {
|
||||
if cur.Len() == 0 {
|
||||
return
|
||||
}
|
||||
out = append(out, inlineSpan{Text: cur.String(), Bold: bold, Italic: italic})
|
||||
cur.Reset()
|
||||
}
|
||||
i := 0
|
||||
n := len(text)
|
||||
for i < n {
|
||||
// Preserve {{...}} placeholders verbatim. Underscores and
|
||||
// other Markdown-significant chars inside a placeholder key
|
||||
// (e.g. {{project.case_number}}) must not be interpreted as
|
||||
// bold/italic delimiters — otherwise the key gets stripped of
|
||||
// its underscores and the v1 placeholder pass looks up the
|
||||
// wrong key, surfacing [KEIN WERT: project.casenumber] in the
|
||||
// preview.
|
||||
if i+1 < n && text[i] == '{' && text[i+1] == '{' {
|
||||
rel := strings.Index(text[i+2:], "}}")
|
||||
if rel >= 0 {
|
||||
end := i + 2 + rel + 2
|
||||
cur.WriteString(text[i:end])
|
||||
i = end
|
||||
continue
|
||||
}
|
||||
// Unmatched {{ — fall through to plain character handling.
|
||||
}
|
||||
// Bold delimiters first (longer match wins over italic).
|
||||
if i+1 < n && (text[i:i+2] == "**" || text[i:i+2] == "__") {
|
||||
flush()
|
||||
bold = !bold
|
||||
i += 2
|
||||
continue
|
||||
}
|
||||
if text[i] == '*' || text[i] == '_' {
|
||||
flush()
|
||||
italic = !italic
|
||||
i++
|
||||
continue
|
||||
}
|
||||
cur.WriteByte(text[i])
|
||||
i++
|
||||
}
|
||||
flush()
|
||||
if len(out) == 0 {
|
||||
out = append(out, inlineSpan{Text: ""})
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
// renderRun emits one `<w:r>` element for an inline span. Empty text
|
||||
// spans render as empty runs (Word accepts them; they're harmless).
|
||||
func renderRun(span inlineSpan) string {
|
||||
var b strings.Builder
|
||||
b.WriteString(`<w:r>`)
|
||||
if span.Bold || span.Italic {
|
||||
b.WriteString(`<w:rPr>`)
|
||||
if span.Bold {
|
||||
b.WriteString(`<w:b/>`)
|
||||
}
|
||||
if span.Italic {
|
||||
b.WriteString(`<w:i/>`)
|
||||
}
|
||||
b.WriteString(`</w:rPr>`)
|
||||
}
|
||||
b.WriteString(`<w:t xml:space="preserve">`)
|
||||
b.WriteString(xmlTextEscape(span.Text))
|
||||
b.WriteString(`</w:t></w:r>`)
|
||||
return b.String()
|
||||
}
|
||||
|
||||
// emptyParagraph returns one empty `<w:p>` with the given style. Used
|
||||
// when a section's content_md is empty so the splice site stays
|
||||
// well-formed.
|
||||
func emptyParagraph(paragraphStyle string) string {
|
||||
var b strings.Builder
|
||||
b.WriteString(`<w:p>`)
|
||||
if paragraphStyle != "" {
|
||||
b.WriteString(`<w:pPr><w:pStyle w:val="`)
|
||||
b.WriteString(xmlAttrEscape(paragraphStyle))
|
||||
b.WriteString(`"/></w:pPr>`)
|
||||
}
|
||||
b.WriteString(`<w:r><w:t xml:space="preserve"></w:t></w:r></w:p>`)
|
||||
return b.String()
|
||||
}
|
||||
|
||||
// xmlTextEscape escapes the five XML-significant characters for safe
|
||||
// insertion into <w:t> content. & first to avoid double-encoding.
|
||||
func xmlTextEscape(s string) string {
|
||||
s = strings.ReplaceAll(s, "&", "&")
|
||||
s = strings.ReplaceAll(s, "<", "<")
|
||||
s = strings.ReplaceAll(s, ">", ">")
|
||||
// Quotes and apostrophes are legal inside element text content;
|
||||
// no need to escape them here.
|
||||
return s
|
||||
}
|
||||
|
||||
// XMLAttrEscape is the exported form of xmlAttrEscape, used by the
|
||||
// paliad-side composer (submission_compose.go) when it builds hyperlink
|
||||
// relationship inserts. It exists so the composer can reuse the exact
|
||||
// attribute-escaping the walker applies without reaching across the
|
||||
// package boundary for an unexported helper. Slice 2 folds the
|
||||
// composer's splice into this package, after which the wrapper retires.
|
||||
func XMLAttrEscape(s string) string { return xmlAttrEscape(s) }
|
||||
|
||||
// xmlAttrEscape escapes for safe insertion into an attribute value
|
||||
// (e.g. `<w:pStyle w:val="…"/>`).
|
||||
func xmlAttrEscape(s string) string {
|
||||
s = strings.ReplaceAll(s, "&", "&")
|
||||
s = strings.ReplaceAll(s, "<", "<")
|
||||
s = strings.ReplaceAll(s, ">", ">")
|
||||
s = strings.ReplaceAll(s, `"`, """)
|
||||
return s
|
||||
}
|
||||
383
pkg/docforge/docx/markdown_test.go
Normal file
383
pkg/docforge/docx/markdown_test.go
Normal file
@@ -0,0 +1,383 @@
|
||||
package docx
|
||||
|
||||
// Unit tests for the Composer's Markdown → OOXML walker (t-paliad-313
|
||||
// Slice B). Pure function; no DB dependency.
|
||||
|
||||
import (
|
||||
"strings"
|
||||
"testing"
|
||||
)
|
||||
|
||||
func TestRenderMarkdownToOOXML_EmptyInput(t *testing.T) {
|
||||
out := RenderMarkdownToOOXML("", "Normal")
|
||||
if !strings.Contains(out, `<w:p>`) {
|
||||
t.Errorf("empty input must still emit one <w:p>; got %q", out)
|
||||
}
|
||||
if !strings.Contains(out, `<w:pStyle w:val="Normal"/>`) {
|
||||
t.Errorf("empty input must carry the paragraph style; got %q", out)
|
||||
}
|
||||
}
|
||||
|
||||
func TestRenderMarkdownToOOXML_SingleParagraph(t *testing.T) {
|
||||
out := RenderMarkdownToOOXML("Hello world", "HLpat-Body-B0")
|
||||
if !strings.Contains(out, `<w:pStyle w:val="HLpat-Body-B0"/>`) {
|
||||
t.Errorf("paragraph missing stylemap entry: %q", out)
|
||||
}
|
||||
if !strings.Contains(out, "Hello world") {
|
||||
t.Errorf("paragraph text missing: %q", out)
|
||||
}
|
||||
// Exactly one <w:p>.
|
||||
if got := strings.Count(out, "<w:p>"); got != 1 {
|
||||
t.Errorf("expected 1 <w:p>; got %d", got)
|
||||
}
|
||||
}
|
||||
|
||||
func TestRenderMarkdownToOOXML_TwoParagraphs(t *testing.T) {
|
||||
out := RenderMarkdownToOOXML("first\n\nsecond", "Normal")
|
||||
if got := strings.Count(out, "<w:p>"); got != 2 {
|
||||
t.Errorf("expected 2 <w:p>; got %d, out=%q", got, out)
|
||||
}
|
||||
if !strings.Contains(out, "first") || !strings.Contains(out, "second") {
|
||||
t.Errorf("paragraph text missing: %q", out)
|
||||
}
|
||||
}
|
||||
|
||||
func TestRenderMarkdownToOOXML_BoldInline(t *testing.T) {
|
||||
out := RenderMarkdownToOOXML("hello **bold** world", "")
|
||||
if !strings.Contains(out, `<w:rPr><w:b/></w:rPr>`) {
|
||||
t.Errorf("bold rPr missing: %q", out)
|
||||
}
|
||||
if !strings.Contains(out, ">bold<") {
|
||||
t.Errorf("bold text payload missing: %q", out)
|
||||
}
|
||||
// The surrounding "hello " and " world" pieces are separate runs;
|
||||
// the bold rPr should appear exactly once in this output.
|
||||
if got := strings.Count(out, "<w:b/>"); got != 1 {
|
||||
t.Errorf("expected exactly one <w:b/> tag; got %d in %q", got, out)
|
||||
}
|
||||
}
|
||||
|
||||
func TestRenderMarkdownToOOXML_ItalicInline(t *testing.T) {
|
||||
out := RenderMarkdownToOOXML("see *italic* here", "")
|
||||
if !strings.Contains(out, `<w:rPr><w:i/></w:rPr>`) {
|
||||
t.Errorf("italic rPr missing: %q", out)
|
||||
}
|
||||
if !strings.Contains(out, ">italic<") {
|
||||
t.Errorf("italic text payload missing: %q", out)
|
||||
}
|
||||
}
|
||||
|
||||
func TestRenderMarkdownToOOXML_BoldItalicCombo(t *testing.T) {
|
||||
// Nested: ***both*** → entering both flags. The walker toggles each
|
||||
// delimiter independently, so the resulting run carries both <w:b/>
|
||||
// and <w:i/>.
|
||||
out := RenderMarkdownToOOXML("***both***", "")
|
||||
if !strings.Contains(out, `<w:b/>`) || !strings.Contains(out, `<w:i/>`) {
|
||||
t.Errorf("expected both <w:b/> and <w:i/>; got %q", out)
|
||||
}
|
||||
}
|
||||
|
||||
func TestRenderMarkdownToOOXML_PlaceholdersPassThrough(t *testing.T) {
|
||||
// Placeholders are sacred — the walker must preserve them verbatim
|
||||
// so the v1 placeholder pass can substitute them later.
|
||||
out := RenderMarkdownToOOXML("Sehr geehrter {{parties.claimant.0.name}}", "Normal")
|
||||
if !strings.Contains(out, "{{parties.claimant.0.name}}") {
|
||||
t.Errorf("placeholder corrupted: %q", out)
|
||||
}
|
||||
}
|
||||
|
||||
func TestRenderMarkdownToOOXML_PlaceholderUnderscoresPreserved(t *testing.T) {
|
||||
// Regression: a placeholder key containing underscores (project.case_number,
|
||||
// user.display_name, project.patent_number_upc) used to get its underscores
|
||||
// consumed by the italic/bold inline scanner — the OOXML stored
|
||||
// {{project.casenumber}} and the preview surfaced
|
||||
// [KEIN WERT: project.casenumber] instead of the real value.
|
||||
cases := []string{
|
||||
"{{project.case_number}}",
|
||||
"{{user.display_name}}",
|
||||
"{{project.patent_number_upc}}",
|
||||
"prefix {{project.case_number}} suffix",
|
||||
"two: {{a.b_c}} and {{d.e_f}}",
|
||||
"mixed: _italic_ then {{project.case_number}} then __bold__",
|
||||
}
|
||||
for _, in := range cases {
|
||||
out := RenderMarkdownToOOXML(in, "Normal")
|
||||
// Every placeholder substring in the input must appear verbatim
|
||||
// in the output (XML escaping is irrelevant for {} and _).
|
||||
for _, ph := range extractPlaceholders(in) {
|
||||
if !strings.Contains(out, ph) {
|
||||
t.Errorf("input %q: placeholder %q lost; got %q", in, ph, out)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseInlineSpans_PlaceholderWithUnderscoresIsLiteral(t *testing.T) {
|
||||
// Direct guard on the inline scanner. {{project.case_number}} must
|
||||
// emit as a single non-italic span containing the full placeholder.
|
||||
spans := parseInlineSpans("{{project.case_number}}")
|
||||
if len(spans) != 1 {
|
||||
t.Fatalf("expected 1 span; got %d (%+v)", len(spans), spans)
|
||||
}
|
||||
if spans[0].Italic || spans[0].Bold {
|
||||
t.Errorf("placeholder must not be italic/bold; got %+v", spans[0])
|
||||
}
|
||||
if spans[0].Text != "{{project.case_number}}" {
|
||||
t.Errorf("placeholder text corrupted: got %q", spans[0].Text)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseInlineSpans_ItalicAroundPlaceholder(t *testing.T) {
|
||||
// Italic delimiters outside a placeholder still work; the placeholder
|
||||
// itself stays literal even when it sits between italics.
|
||||
spans := parseInlineSpans("_before_ {{x.y_z}} _after_")
|
||||
var saw struct {
|
||||
italicBefore bool
|
||||
placeholder bool
|
||||
italicAfter bool
|
||||
}
|
||||
for _, s := range spans {
|
||||
if s.Italic && s.Text == "before" {
|
||||
saw.italicBefore = true
|
||||
}
|
||||
if !s.Italic && !s.Bold && strings.Contains(s.Text, "{{x.y_z}}") {
|
||||
saw.placeholder = true
|
||||
}
|
||||
if s.Italic && s.Text == "after" {
|
||||
saw.italicAfter = true
|
||||
}
|
||||
}
|
||||
if !saw.italicBefore || !saw.placeholder || !saw.italicAfter {
|
||||
t.Errorf("expected italic/placeholder/italic structure; got %+v", spans)
|
||||
}
|
||||
}
|
||||
|
||||
// extractPlaceholders pulls every {{...}} occurrence out of a Markdown
|
||||
// source. Tiny helper, only used by the regression test above.
|
||||
func extractPlaceholders(s string) []string {
|
||||
var out []string
|
||||
for {
|
||||
start := strings.Index(s, "{{")
|
||||
if start < 0 {
|
||||
return out
|
||||
}
|
||||
end := strings.Index(s[start+2:], "}}")
|
||||
if end < 0 {
|
||||
return out
|
||||
}
|
||||
out = append(out, s[start:start+2+end+2])
|
||||
s = s[start+2+end+2:]
|
||||
}
|
||||
}
|
||||
|
||||
func TestRenderMarkdownToOOXML_XMLEscape(t *testing.T) {
|
||||
out := RenderMarkdownToOOXML("a & b < c > d", "")
|
||||
if strings.Contains(out, " & ") {
|
||||
t.Errorf("unescaped & survived: %q", out)
|
||||
}
|
||||
if !strings.Contains(out, "&") || !strings.Contains(out, "<") || !strings.Contains(out, ">") {
|
||||
t.Errorf("expected escaped entities; got %q", out)
|
||||
}
|
||||
}
|
||||
|
||||
func TestRenderMarkdownToOOXML_BlankLinesPreserveSpacing(t *testing.T) {
|
||||
// Two blank lines between paragraphs → one empty paragraph in
|
||||
// between, preserving the lawyer's intentional whitespace.
|
||||
out := RenderMarkdownToOOXML("first\n\n\nsecond", "Normal")
|
||||
if got := strings.Count(out, "<w:p>"); got != 3 {
|
||||
t.Errorf("expected 3 <w:p> (first + blank + second); got %d in %q", got, out)
|
||||
}
|
||||
}
|
||||
|
||||
func TestRenderMarkdownToOOXML_CRLFNormalisation(t *testing.T) {
|
||||
out := RenderMarkdownToOOXML("first\r\n\r\nsecond", "")
|
||||
if got := strings.Count(out, "<w:p>"); got != 2 {
|
||||
t.Errorf("CRLF input should produce 2 paragraphs; got %d in %q", got, out)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseInlineSpans_Plain(t *testing.T) {
|
||||
spans := parseInlineSpans("hello world")
|
||||
if len(spans) != 1 || spans[0].Bold || spans[0].Italic || spans[0].Text != "hello world" {
|
||||
t.Errorf("expected single plain span; got %+v", spans)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseInlineSpans_UnderscoreItalic(t *testing.T) {
|
||||
spans := parseInlineSpans("_emph_")
|
||||
var italicHits int
|
||||
for _, s := range spans {
|
||||
if s.Italic && s.Text == "emph" {
|
||||
italicHits++
|
||||
}
|
||||
}
|
||||
if italicHits != 1 {
|
||||
t.Errorf("expected one italic 'emph' span; got %+v", spans)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseInlineSpans_UnderscoreBold(t *testing.T) {
|
||||
spans := parseInlineSpans("__strong__")
|
||||
var boldHits int
|
||||
for _, s := range spans {
|
||||
if s.Bold && s.Text == "strong" {
|
||||
boldHits++
|
||||
}
|
||||
}
|
||||
if boldHits != 1 {
|
||||
t.Errorf("expected one bold 'strong' span; got %+v", spans)
|
||||
}
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// Slice D — rich-prose constructs
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
func slicedStylemap() map[string]string {
|
||||
return map[string]string{
|
||||
"paragraph": "Body",
|
||||
"heading_1": "H1",
|
||||
"heading_2": "H2",
|
||||
"heading_3": "H3",
|
||||
"list_bullet": "ListBullet",
|
||||
"list_numbered": "ListNumber",
|
||||
"blockquote": "Quote",
|
||||
}
|
||||
}
|
||||
|
||||
func TestRenderMarkdownToOOXML_Heading1(t *testing.T) {
|
||||
out := RenderMarkdownToOOXMLWithStyles("# A heading", slicedStylemap(), nil)
|
||||
if !strings.Contains(out, `<w:pStyle w:val="H1"/>`) {
|
||||
t.Errorf("heading_1 missing H1 style: %q", out)
|
||||
}
|
||||
if !strings.Contains(out, "A heading") {
|
||||
t.Errorf("heading text missing: %q", out)
|
||||
}
|
||||
}
|
||||
|
||||
func TestRenderMarkdownToOOXML_Heading2And3(t *testing.T) {
|
||||
out := RenderMarkdownToOOXMLWithStyles("## H2 line\n### H3 line", slicedStylemap(), nil)
|
||||
if !strings.Contains(out, `<w:pStyle w:val="H2"/>`) || !strings.Contains(out, "H2 line") {
|
||||
t.Errorf("h2 not rendered: %q", out)
|
||||
}
|
||||
if !strings.Contains(out, `<w:pStyle w:val="H3"/>`) || !strings.Contains(out, "H3 line") {
|
||||
t.Errorf("h3 not rendered: %q", out)
|
||||
}
|
||||
}
|
||||
|
||||
func TestRenderMarkdownToOOXML_BulletList(t *testing.T) {
|
||||
out := RenderMarkdownToOOXMLWithStyles("- first\n- second\n* third", slicedStylemap(), nil)
|
||||
if !strings.Contains(out, `<w:pStyle w:val="ListBullet"/>`) {
|
||||
t.Errorf("bullet stylemap not applied: %q", out)
|
||||
}
|
||||
if strings.Count(out, "• ") != 3 {
|
||||
t.Errorf("expected 3 bullet prefixes; got %d in %q", strings.Count(out, "• "), out)
|
||||
}
|
||||
}
|
||||
|
||||
func TestRenderMarkdownToOOXML_NumberedList(t *testing.T) {
|
||||
out := RenderMarkdownToOOXMLWithStyles("1. first\n2. second\n3. third", slicedStylemap(), nil)
|
||||
if !strings.Contains(out, `<w:pStyle w:val="ListNumber"/>`) {
|
||||
t.Errorf("numbered stylemap not applied: %q", out)
|
||||
}
|
||||
for _, want := range []string{"1. ", "2. ", "3. "} {
|
||||
if !strings.Contains(out, want) {
|
||||
t.Errorf("missing ordinal prefix %q in %q", want, out)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestRenderMarkdownToOOXML_NumberedListResetsOnNonList(t *testing.T) {
|
||||
// "1. A\n2. B\nplain\n1. C" → 1. A, 2. B, plain para, 1. C
|
||||
out := RenderMarkdownToOOXMLWithStyles("1. A\n2. B\nplain\n1. C", slicedStylemap(), nil)
|
||||
// The plain "plain" line breaks the list, so the next numbered
|
||||
// item restarts at 1.
|
||||
idxA := strings.Index(out, "1. ")
|
||||
if idxA < 0 {
|
||||
t.Fatalf("first 1. missing: %q", out)
|
||||
}
|
||||
idxB := strings.Index(out, "2. ")
|
||||
if idxB < 0 || idxB <= idxA {
|
||||
t.Fatalf("2. not after 1.: idxA=%d idxB=%d", idxA, idxB)
|
||||
}
|
||||
rest := out[idxB+1:]
|
||||
idxC := strings.Index(rest, "1. ")
|
||||
if idxC < 0 {
|
||||
t.Errorf("numbered counter didn't reset on non-list block: %q", out)
|
||||
}
|
||||
}
|
||||
|
||||
func TestRenderMarkdownToOOXML_Blockquote(t *testing.T) {
|
||||
out := RenderMarkdownToOOXMLWithStyles("> the quoted text", slicedStylemap(), nil)
|
||||
if !strings.Contains(out, `<w:pStyle w:val="Quote"/>`) {
|
||||
t.Errorf("blockquote stylemap not applied: %q", out)
|
||||
}
|
||||
if !strings.Contains(out, "the quoted text") {
|
||||
t.Errorf("blockquote text missing: %q", out)
|
||||
}
|
||||
}
|
||||
|
||||
func TestRenderMarkdownToOOXML_Hyperlink(t *testing.T) {
|
||||
allocated := map[string]string{}
|
||||
alloc := func(url string) string {
|
||||
rid := "rIdComposer" + url
|
||||
allocated[url] = rid
|
||||
return rid
|
||||
}
|
||||
out := RenderMarkdownToOOXMLWithStyles("See [Bundesgerichtshof](https://bgh.bund.de) for details.", slicedStylemap(), alloc)
|
||||
if _, ok := allocated["https://bgh.bund.de"]; !ok {
|
||||
t.Errorf("allocator never called for URL: %q", out)
|
||||
}
|
||||
if !strings.Contains(out, `<w:hyperlink r:id="rIdComposerhttps://bgh.bund.de">`) {
|
||||
t.Errorf("hyperlink tag missing or wrong rid: %q", out)
|
||||
}
|
||||
if !strings.Contains(out, "Bundesgerichtshof") {
|
||||
t.Errorf("link label missing: %q", out)
|
||||
}
|
||||
if !strings.Contains(out, `<w:rStyle w:val="Hyperlink"/>`) {
|
||||
t.Errorf("hyperlink character style missing: %q", out)
|
||||
}
|
||||
}
|
||||
|
||||
func TestRenderMarkdownToOOXML_HyperlinkNilAllocatorFallsBackToPlain(t *testing.T) {
|
||||
out := RenderMarkdownToOOXMLWithStyles("See [BGH](https://bgh.bund.de) here.", slicedStylemap(), nil)
|
||||
// Without an allocator, the label still renders as plain text.
|
||||
if !strings.Contains(out, "BGH") {
|
||||
t.Errorf("label dropped: %q", out)
|
||||
}
|
||||
if strings.Contains(out, "<w:hyperlink") {
|
||||
t.Errorf("hyperlink emitted without allocator: %q", out)
|
||||
}
|
||||
}
|
||||
|
||||
func TestDetectBlockMarker(t *testing.T) {
|
||||
cases := []struct {
|
||||
in string
|
||||
kind string
|
||||
want string
|
||||
ok bool
|
||||
}{
|
||||
{"# A", "heading_1", "A", true},
|
||||
{"## B", "heading_2", "B", true},
|
||||
{"### C", "heading_3", "C", true},
|
||||
{" # indented", "heading_1", "indented", true}, // up to 3 spaces tolerated
|
||||
{" # too-deep", "", "", false}, // 4 spaces → not a heading
|
||||
{"- bullet", "list_bullet", "bullet", true},
|
||||
{"* star", "list_bullet", "star", true},
|
||||
{"1. one", "list_numbered", "one", true},
|
||||
{"42. forty-two", "list_numbered", "forty-two", true},
|
||||
{"1) paren", "list_numbered", "paren", true},
|
||||
{"1.no-space", "", "", false}, // ordinal needs trailing space
|
||||
{"> quote", "blockquote", "quote", true},
|
||||
{"plain", "", "", false},
|
||||
{"#nospace", "", "", false}, // heading needs space after hash
|
||||
}
|
||||
for _, tc := range cases {
|
||||
t.Run(tc.in, func(t *testing.T) {
|
||||
kind, payload, ok := detectBlockMarker(tc.in)
|
||||
if ok != tc.ok || kind != tc.kind || payload != tc.want {
|
||||
t.Errorf("detectBlockMarker(%q) = (%q,%q,%v); want (%q,%q,%v)", tc.in, kind, payload, ok, tc.kind, tc.want, tc.ok)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
531
pkg/docforge/docx/merge.go
Normal file
531
pkg/docforge/docx/merge.go
Normal file
@@ -0,0 +1,531 @@
|
||||
package docx
|
||||
|
||||
// Submission template renderer — in-house engine for the submission
|
||||
// draft editor (t-paliad-238, design doc
|
||||
// docs/design-submission-page-2026-05-22.md §3 / §6.2).
|
||||
//
|
||||
// Resurrected from commit 8ea3509 (the original t-paliad-215 Slice 1
|
||||
// "in-house .docx render engine"). Kept in a separate file from the
|
||||
// format-only converter (submission_render.go) so the t-paliad-230
|
||||
// /generate one-click path stays unchanged and the merge engine doesn't
|
||||
// have to share zip-helper names with it.
|
||||
//
|
||||
// Why not lukasjarosch/go-docx: the library's "nested placeholder" guard
|
||||
// treats sibling placeholders inside the same <w:t> run (e.g.
|
||||
// "{{a}} ./. {{b}}") as nested and refuses to replace either. Patent
|
||||
// submissions routinely have multiple placeholders per paragraph (party
|
||||
// blocks especially), so the library is a non-starter. This renderer
|
||||
// handles single-run placeholders (preserving run-level formatting) AND
|
||||
// cross-run placeholders (rewriting the paragraph as one run when Word
|
||||
// has fragmented the placeholder across runs).
|
||||
//
|
||||
// Placeholder grammar: {{[A-Za-z][A-Za-z0-9_.]*}} with optional
|
||||
// whitespace inside braces ({{ project.case_number }} ≡
|
||||
// {{project.case_number}}).
|
||||
//
|
||||
// Missing-value behaviour: when a placeholder has no binding in the
|
||||
// PlaceholderMap, the renderer emits a marker token so the lawyer sees
|
||||
// the gap in Word rather than failing the request.
|
||||
|
||||
import (
|
||||
"archive/zip"
|
||||
"bytes"
|
||||
"fmt"
|
||||
"io"
|
||||
"regexp"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// PlaceholderMap is the variable bag built by SubmissionVarsService.
|
||||
// Keys are dotted paths without braces (e.g. "project.case_number").
|
||||
// Values are the substituted text — already locale-aware, pretty-
|
||||
// printed, and sanitised by the caller.
|
||||
type PlaceholderMap map[string]string
|
||||
|
||||
// MissingPlaceholderFn translates an unbound placeholder key into the
|
||||
// in-document marker token. The default in DefaultMissingMarker is
|
||||
// "[KEIN WERT: <key>]" / "[NO VALUE: <key>]" depending on lang.
|
||||
type MissingPlaceholderFn func(key string) string
|
||||
|
||||
// valueWrapperFn wraps a substituted value with a marker the HTML
|
||||
// preview emitter can recognise — used by RenderHTML to turn each
|
||||
// substituted value into a clickable <span class="draft-var" …>
|
||||
// (t-paliad-261, click-variable-in-preview → jump-to-field). nil means
|
||||
// no wrapping; the .docx export path uses nil so its output is
|
||||
// byte-identical to the wrapper-free build. The wrapper is invoked for
|
||||
// both resolved values and missing-marker text so clicking a missing
|
||||
// placeholder still jumps to the corresponding sidebar input.
|
||||
type valueWrapperFn func(key, value string) string
|
||||
|
||||
// Private-Use-Area sentinels for the HTML preview wrap. PUA characters
|
||||
// are valid in XML 1.0 content, never appear in legitimate template
|
||||
// text, pass unchanged through xmlEncode/xmlDecode/htmlEscape, and are
|
||||
// stripped by emitTextWithDraftVars when the preview HTML is assembled.
|
||||
const (
|
||||
previewVarBegin = ""
|
||||
previewVarMid = ""
|
||||
previewVarEnd = ""
|
||||
)
|
||||
|
||||
// htmlPreviewWrapper wraps a substituted value with the PUA sentinels
|
||||
// emitTextWithDraftVars recognises. Used only by RenderHTML; the .docx
|
||||
// Render path uses nil so its output is identical to the pre-261 build.
|
||||
func htmlPreviewWrapper(key, value string) string {
|
||||
return previewVarBegin + key + previewVarMid + value + previewVarEnd
|
||||
}
|
||||
|
||||
// DefaultMissingMarker returns the standard missing-value marker for
|
||||
// the given UI language.
|
||||
func DefaultMissingMarker(lang string) MissingPlaceholderFn {
|
||||
prefix := "KEIN WERT"
|
||||
if strings.EqualFold(lang, "en") {
|
||||
prefix = "NO VALUE"
|
||||
}
|
||||
return func(key string) string {
|
||||
return "[" + prefix + ": " + key + "]"
|
||||
}
|
||||
}
|
||||
|
||||
// placeholderRegex matches a single placeholder. The capture group
|
||||
// extracts the key name without braces or surrounding whitespace.
|
||||
//
|
||||
// Restricted to [A-Za-z][A-Za-z0-9_.]* so that stray "{{" sequences in
|
||||
// legal prose don't get mistaken for placeholders. A genuine placeholder
|
||||
// always starts with an ASCII letter.
|
||||
var placeholderRegex = regexp.MustCompile(`\{\{\s*([A-Za-z][A-Za-z0-9_.]*)\s*\}\}`)
|
||||
|
||||
// SubmissionRenderer renders a .docx template into a .docx output by
|
||||
// substituting {{placeholder}} tokens with values from a PlaceholderMap.
|
||||
// Stateless; safe for concurrent use.
|
||||
type SubmissionRenderer struct{}
|
||||
|
||||
// NewSubmissionRenderer constructs the renderer.
|
||||
func NewSubmissionRenderer() *SubmissionRenderer {
|
||||
return &SubmissionRenderer{}
|
||||
}
|
||||
|
||||
// Render reads the .docx template at templateBytes, substitutes every
|
||||
// placeholder from vars (or emits the missing-marker token), and returns
|
||||
// the merged .docx bytes. Unknown placeholders never fail the render —
|
||||
// the lawyer sees the marker in Word and fixes it.
|
||||
//
|
||||
// Pre-pass: ConvertDotmToDocx is called on the input so a .dotm
|
||||
// template (macro-bearing) is downgraded to a plain .docx before the
|
||||
// merge step runs. Idempotent on inputs that are already plain .docx.
|
||||
func (r *SubmissionRenderer) Render(templateBytes []byte, vars PlaceholderMap, missing MissingPlaceholderFn) ([]byte, error) {
|
||||
if missing == nil {
|
||||
missing = DefaultMissingMarker("de")
|
||||
}
|
||||
cleanBytes, err := ConvertDotmToDocx(templateBytes)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("submission render: pre-pass convert: %w", err)
|
||||
}
|
||||
zr, err := zip.NewReader(bytes.NewReader(cleanBytes), int64(len(cleanBytes)))
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("submission render: open zip: %w", err)
|
||||
}
|
||||
|
||||
var out bytes.Buffer
|
||||
zw := zip.NewWriter(&out)
|
||||
|
||||
for _, entry := range zr.File {
|
||||
body, err := readMergeZipEntry(entry)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("submission render: read %s: %w", entry.Name, err)
|
||||
}
|
||||
if isWordXMLEntry(entry.Name) {
|
||||
body = substituteInDocumentXML(body, vars, missing, nil)
|
||||
}
|
||||
w, err := zw.CreateHeader(&zip.FileHeader{
|
||||
Name: entry.Name,
|
||||
Method: entry.Method,
|
||||
Modified: entry.Modified,
|
||||
})
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("submission render: write header %s: %w", entry.Name, err)
|
||||
}
|
||||
if _, err := w.Write(body); err != nil {
|
||||
return nil, fmt.Errorf("submission render: write %s: %w", entry.Name, err)
|
||||
}
|
||||
}
|
||||
if err := zw.Close(); err != nil {
|
||||
return nil, fmt.Errorf("submission render: finalise zip: %w", err)
|
||||
}
|
||||
return out.Bytes(), nil
|
||||
}
|
||||
|
||||
// RenderHTML produces a read-only HTML rendering of the merged document
|
||||
// body for the draft-editor preview pane. Walks the SAME placeholder
|
||||
// substitution as Render, then extracts the body text from word/document.xml
|
||||
// and emits semantic HTML — one <p> per <w:p>, with <strong>/<em> spans
|
||||
// for runs that carry <w:b>/<w:i> formatting. Tables, lists, and complex
|
||||
// formatting collapse to plain paragraphs (the preview is a fidelity
|
||||
// guide, not a WYSIWYG editor — final formatting comes from Word at
|
||||
// export).
|
||||
//
|
||||
// Returns escaped HTML safe to inject into the page via dangerouslySet
|
||||
// or innerHTML. The caller is responsible for wrapping in an outer
|
||||
// container; this method emits only the body fragment.
|
||||
func (r *SubmissionRenderer) RenderHTML(templateBytes []byte, vars PlaceholderMap, missing MissingPlaceholderFn) (string, error) {
|
||||
if missing == nil {
|
||||
missing = DefaultMissingMarker("de")
|
||||
}
|
||||
cleanBytes, err := ConvertDotmToDocx(templateBytes)
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("submission render html: pre-pass convert: %w", err)
|
||||
}
|
||||
zr, err := zip.NewReader(bytes.NewReader(cleanBytes), int64(len(cleanBytes)))
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("submission render html: open zip: %w", err)
|
||||
}
|
||||
var docXML []byte
|
||||
for _, entry := range zr.File {
|
||||
if entry.Name != "word/document.xml" {
|
||||
continue
|
||||
}
|
||||
docXML, err = readMergeZipEntry(entry)
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("submission render html: read document.xml: %w", err)
|
||||
}
|
||||
break
|
||||
}
|
||||
if docXML == nil {
|
||||
return "", fmt.Errorf("submission render html: word/document.xml missing")
|
||||
}
|
||||
merged := substituteInDocumentXML(docXML, vars, missing, htmlPreviewWrapper)
|
||||
return docXMLToHTML(merged), nil
|
||||
}
|
||||
|
||||
// isWordXMLEntry returns true for the .docx parts that contain
|
||||
// substitutable text. We touch document.xml plus header*.xml and
|
||||
// footer*.xml (templates may put firm letterhead in a header) but
|
||||
// skip styles, theme, settings, comments, footnotes — none of which
|
||||
// should carry merge placeholders in a well-formed template.
|
||||
func isWordXMLEntry(name string) bool {
|
||||
switch {
|
||||
case name == "word/document.xml":
|
||||
return true
|
||||
case strings.HasPrefix(name, "word/header") && strings.HasSuffix(name, ".xml"):
|
||||
return true
|
||||
case strings.HasPrefix(name, "word/footer") && strings.HasSuffix(name, ".xml"):
|
||||
return true
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
// readMergeZipEntry slurps a zip entry's bytes. Named distinctly from
|
||||
// the helper in submission_render.go (readZipFile) to keep this file
|
||||
// self-contained — the two are functionally identical.
|
||||
func readMergeZipEntry(f *zip.File) ([]byte, error) {
|
||||
rc, err := f.Open()
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
defer rc.Close()
|
||||
return io.ReadAll(rc)
|
||||
}
|
||||
|
||||
// substituteInDocumentXML walks document XML and replaces every
|
||||
// {{placeholder}} occurrence inside <w:t> text nodes. Handles both
|
||||
// single-run placeholders (the common case for freshly authored
|
||||
// templates) and cross-run placeholders (where Word's autocorrect or
|
||||
// manual editing has split a placeholder across runs).
|
||||
//
|
||||
// Two-pass strategy:
|
||||
//
|
||||
// 1. Pass 1: replace placeholders that fit entirely within one
|
||||
// <w:t>…</w:t>. This is the 99% case and preserves all run-level
|
||||
// formatting (bold, italic, font runs).
|
||||
// 2. Pass 2: for paragraphs that still contain orphan "{{" or "}}"
|
||||
// markers after pass 1, merge the text of every <w:t> inside the
|
||||
// paragraph, run the replacement on the merged text, and rewrite
|
||||
// the paragraph's runs as a single <w:r><w:t>…</w:t></w:r> using
|
||||
// the formatting properties of the first run.
|
||||
func substituteInDocumentXML(body []byte, vars PlaceholderMap, missing MissingPlaceholderFn, wrap valueWrapperFn) []byte {
|
||||
replaced := substituteInTextNodes(body, vars, missing, wrap)
|
||||
if !needsCrossRunMerge(replaced) {
|
||||
return replaced
|
||||
}
|
||||
return substituteAcrossRuns(replaced, vars, missing, wrap)
|
||||
}
|
||||
|
||||
// wTextNodeRegex matches one <w:t …>contents</w:t> element, capturing
|
||||
// the contents.
|
||||
var wTextNodeRegex = regexp.MustCompile(`<w:t(\s[^>]*)?>([^<]*)</w:t>`)
|
||||
|
||||
// substituteInTextNodes runs the placeholder replacement inside each
|
||||
// <w:t> text node independently. Format-preserving for single-run
|
||||
// placeholders.
|
||||
func substituteInTextNodes(body []byte, vars PlaceholderMap, missing MissingPlaceholderFn, wrap valueWrapperFn) []byte {
|
||||
return wTextNodeRegex.ReplaceAllFunc(body, func(match []byte) []byte {
|
||||
sub := wTextNodeRegex.FindSubmatch(match)
|
||||
attrs := string(sub[1])
|
||||
contents := xmlDecode(string(sub[2]))
|
||||
replaced := replacePlaceholders(contents, vars, missing, wrap)
|
||||
if replaced == contents {
|
||||
return match
|
||||
}
|
||||
if !strings.Contains(attrs, "xml:space") &&
|
||||
(strings.HasPrefix(replaced, " ") || strings.HasSuffix(replaced, " ")) {
|
||||
attrs += ` xml:space="preserve"`
|
||||
}
|
||||
return []byte(`<w:t` + attrs + `>` + xmlEncode(replaced) + `</w:t>`)
|
||||
})
|
||||
}
|
||||
|
||||
// needsCrossRunMerge returns true when the body still contains an
|
||||
// unmatched "{{" or "}}" inside any <w:t> after pass 1.
|
||||
func needsCrossRunMerge(body []byte) bool {
|
||||
for _, m := range wTextNodeRegex.FindAllSubmatch(body, -1) {
|
||||
t := string(m[2])
|
||||
if strings.Contains(t, "{{") || strings.Contains(t, "}}") {
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
// wParagraphRegex matches one <w:p>…</w:p> paragraph block. Greedy
|
||||
// inner-content match is safe — <w:p> elements do not nest.
|
||||
var wParagraphRegex = regexp.MustCompile(`(?s)<w:p\b[^>]*>.*?</w:p>`)
|
||||
|
||||
// wRunPropsRegex pulls the first <w:rPr>…</w:rPr> block from a paragraph.
|
||||
var wRunPropsRegex = regexp.MustCompile(`(?s)<w:rPr>.*?</w:rPr>`)
|
||||
|
||||
// wParagraphPropsRegex pulls the optional <w:pPr>…</w:pPr>.
|
||||
var wParagraphPropsRegex = regexp.MustCompile(`(?s)<w:pPr>.*?</w:pPr>`)
|
||||
|
||||
// substituteAcrossRuns is pass 2: concatenate every text node in a
|
||||
// fragmented-placeholder paragraph, run replacement, rewrite as one run.
|
||||
func substituteAcrossRuns(body []byte, vars PlaceholderMap, missing MissingPlaceholderFn, wrap valueWrapperFn) []byte {
|
||||
return wParagraphRegex.ReplaceAllFunc(body, func(para []byte) []byte {
|
||||
textNodes := wTextNodeRegex.FindAllSubmatch(para, -1)
|
||||
if len(textNodes) == 0 {
|
||||
return para
|
||||
}
|
||||
var merged strings.Builder
|
||||
for _, m := range textNodes {
|
||||
merged.WriteString(xmlDecode(string(m[2])))
|
||||
}
|
||||
original := merged.String()
|
||||
if !strings.Contains(original, "{{") {
|
||||
return para
|
||||
}
|
||||
replaced := replacePlaceholders(original, vars, missing, wrap)
|
||||
if replaced == original {
|
||||
return para
|
||||
}
|
||||
pPr := wParagraphPropsRegex.Find(para)
|
||||
rPr := wRunPropsRegex.Find(para)
|
||||
var rebuilt bytes.Buffer
|
||||
rebuilt.WriteString(`<w:p>`)
|
||||
if pPr != nil {
|
||||
rebuilt.Write(pPr)
|
||||
}
|
||||
rebuilt.WriteString(`<w:r>`)
|
||||
if rPr != nil {
|
||||
rebuilt.Write(rPr)
|
||||
}
|
||||
rebuilt.WriteString(`<w:t xml:space="preserve">`)
|
||||
rebuilt.WriteString(xmlEncode(replaced))
|
||||
rebuilt.WriteString(`</w:t></w:r></w:p>`)
|
||||
return rebuilt.Bytes()
|
||||
})
|
||||
}
|
||||
|
||||
// replacePlaceholders performs the actual substitution on a plain
|
||||
// string. Unbound placeholders render the missing marker. When wrap is
|
||||
// non-nil, both the resolved value AND the missing-marker text are
|
||||
// passed through wrap(key, value) — the HTML preview path uses this to
|
||||
// emit clickable spans around every substituted placeholder, including
|
||||
// missing ones (clicking a missing marker jumps to the corresponding
|
||||
// sidebar input).
|
||||
func replacePlaceholders(s string, vars PlaceholderMap, missing MissingPlaceholderFn, wrap valueWrapperFn) string {
|
||||
return placeholderRegex.ReplaceAllStringFunc(s, func(match string) string {
|
||||
sub := placeholderRegex.FindStringSubmatch(match)
|
||||
if len(sub) < 2 {
|
||||
return match
|
||||
}
|
||||
key := sub[1]
|
||||
var value string
|
||||
if v, ok := vars[key]; ok {
|
||||
value = v
|
||||
} else {
|
||||
value = missing(key)
|
||||
}
|
||||
if wrap != nil {
|
||||
return wrap(key, value)
|
||||
}
|
||||
return value
|
||||
})
|
||||
}
|
||||
|
||||
// xmlDecode reverses the five standard XML entities Word emits in
|
||||
// <w:t> content.
|
||||
func xmlDecode(s string) string {
|
||||
s = strings.ReplaceAll(s, "<", "<")
|
||||
s = strings.ReplaceAll(s, ">", ">")
|
||||
s = strings.ReplaceAll(s, """, `"`)
|
||||
s = strings.ReplaceAll(s, "'", "'")
|
||||
s = strings.ReplaceAll(s, "&", "&")
|
||||
return s
|
||||
}
|
||||
|
||||
// xmlEncode escapes for safe insertion back into <w:t> content. & first
|
||||
// to avoid double-encoding the entity prefixes.
|
||||
func xmlEncode(s string) string {
|
||||
s = strings.ReplaceAll(s, "&", "&")
|
||||
s = strings.ReplaceAll(s, "<", "<")
|
||||
s = strings.ReplaceAll(s, ">", ">")
|
||||
s = strings.ReplaceAll(s, `"`, """)
|
||||
s = strings.ReplaceAll(s, "'", "'")
|
||||
return s
|
||||
}
|
||||
|
||||
// docXMLToHTML walks the post-merge document XML and emits HTML for
|
||||
// the preview pane. One <p> per <w:p>; <strong>/<em> spans for runs
|
||||
// carrying <w:b>/<w:i>. Tables/lists/images collapse to text. Output
|
||||
// is HTML-escaped except for the structural <p>/<strong>/<em> tags
|
||||
// this function emits.
|
||||
func docXMLToHTML(docXML []byte) string {
|
||||
paragraphs := wParagraphRegex.FindAll(docXML, -1)
|
||||
var out bytes.Buffer
|
||||
for _, para := range paragraphs {
|
||||
out.WriteString("<p>")
|
||||
out.WriteString(paragraphToHTML(para))
|
||||
out.WriteString("</p>\n")
|
||||
}
|
||||
if out.Len() == 0 {
|
||||
return "<p></p>"
|
||||
}
|
||||
return out.String()
|
||||
}
|
||||
|
||||
// wRunRegex matches one <w:r>…</w:r> run. Greedy match safe — <w:r>
|
||||
// elements do not nest.
|
||||
var wRunRegex = regexp.MustCompile(`(?s)<w:r\b[^>]*>.*?</w:r>`)
|
||||
|
||||
// wBoldRegex / wItalicRegex detect the bold/italic flags inside a run's
|
||||
// <w:rPr>. Word emits <w:b/> or <w:b w:val="true"/>; matching the open
|
||||
// tag covers both forms.
|
||||
var (
|
||||
wBoldRegex = regexp.MustCompile(`<w:b\b[^>]*/?>`)
|
||||
wItalicRegex = regexp.MustCompile(`<w:i\b[^>]*/?>`)
|
||||
)
|
||||
|
||||
// paragraphToHTML extracts the text from each <w:r> inside a paragraph,
|
||||
// wraps runs flagged bold/italic with the corresponding HTML tags, and
|
||||
// HTML-escapes the text content.
|
||||
func paragraphToHTML(para []byte) string {
|
||||
runs := wRunRegex.FindAll(para, -1)
|
||||
if len(runs) == 0 {
|
||||
// Empty paragraph (line break).
|
||||
return ""
|
||||
}
|
||||
var out bytes.Buffer
|
||||
for _, run := range runs {
|
||||
text := extractRunText(run)
|
||||
if text == "" {
|
||||
continue
|
||||
}
|
||||
// Check for bold/italic on the run's <w:rPr>.
|
||||
rPr := wRunPropsRegex.Find(run)
|
||||
bold := rPr != nil && wBoldRegex.Match(rPr) && !isFalseFlag(rPr, wBoldRegex)
|
||||
italic := rPr != nil && wItalicRegex.Match(rPr) && !isFalseFlag(rPr, wItalicRegex)
|
||||
|
||||
if bold {
|
||||
out.WriteString("<strong>")
|
||||
}
|
||||
if italic {
|
||||
out.WriteString("<em>")
|
||||
}
|
||||
out.WriteString(emitTextWithDraftVars(text))
|
||||
if italic {
|
||||
out.WriteString("</em>")
|
||||
}
|
||||
if bold {
|
||||
out.WriteString("</strong>")
|
||||
}
|
||||
}
|
||||
return out.String()
|
||||
}
|
||||
|
||||
// emitTextWithDraftVars HTML-escapes run text while converting any
|
||||
// preview-only sentinels emitted by htmlPreviewWrapper into
|
||||
// <span class="draft-var" data-var="<key>">…</span>. The key is
|
||||
// restricted to [A-Za-z][A-Za-z0-9_.]* by placeholderRegex, so no
|
||||
// attribute-escaping is needed on the key; the value is HTML-escaped
|
||||
// normally. Sentinel-free text (the Render path output, or template
|
||||
// text outside placeholders) is passed straight through htmlEscape, so
|
||||
// callers that never invoked wrap see byte-identical HTML.
|
||||
//
|
||||
// t-paliad-261: makes substituted variables clickable in the preview
|
||||
// pane so the user can jump to the matching input in the sidebar.
|
||||
func emitTextWithDraftVars(text string) string {
|
||||
if !strings.Contains(text, previewVarBegin) {
|
||||
return htmlEscape(text)
|
||||
}
|
||||
var out strings.Builder
|
||||
rest := text
|
||||
for {
|
||||
i := strings.Index(rest, previewVarBegin)
|
||||
if i < 0 {
|
||||
out.WriteString(htmlEscape(rest))
|
||||
return out.String()
|
||||
}
|
||||
out.WriteString(htmlEscape(rest[:i]))
|
||||
body := rest[i+len(previewVarBegin):]
|
||||
mid := strings.Index(body, previewVarMid)
|
||||
end := strings.Index(body, previewVarEnd)
|
||||
if mid < 0 || end < 0 || mid > end {
|
||||
// Malformed sentinel — emit the marker as plain escaped
|
||||
// text and continue past it so the rest of the run still
|
||||
// renders.
|
||||
out.WriteString(htmlEscape(previewVarBegin))
|
||||
rest = body
|
||||
continue
|
||||
}
|
||||
key := body[:mid]
|
||||
value := body[mid+len(previewVarMid) : end]
|
||||
out.WriteString(`<span class="draft-var" data-var="`)
|
||||
out.WriteString(key)
|
||||
out.WriteString(`">`)
|
||||
out.WriteString(htmlEscape(value))
|
||||
out.WriteString(`</span>`)
|
||||
rest = body[end+len(previewVarEnd):]
|
||||
}
|
||||
}
|
||||
|
||||
// extractRunText concatenates every <w:t> inside a run, XML-decoding
|
||||
// the content as it goes.
|
||||
func extractRunText(run []byte) string {
|
||||
var out strings.Builder
|
||||
for _, m := range wTextNodeRegex.FindAllSubmatch(run, -1) {
|
||||
out.WriteString(xmlDecode(string(m[2])))
|
||||
}
|
||||
return out.String()
|
||||
}
|
||||
|
||||
// isFalseFlag returns true if the matched tag explicitly carries
|
||||
// w:val="false" or w:val="0" — Word's way of turning off an inherited
|
||||
// format. The default match (just `<w:b/>` or `<w:b w:val="true"/>`)
|
||||
// is "on".
|
||||
func isFalseFlag(rPr []byte, rx *regexp.Regexp) bool {
|
||||
match := rx.Find(rPr)
|
||||
if match == nil {
|
||||
return false
|
||||
}
|
||||
s := string(match)
|
||||
return strings.Contains(s, `w:val="false"`) || strings.Contains(s, `w:val="0"`)
|
||||
}
|
||||
|
||||
// htmlEscape escapes the five HTML-significant characters for safe
|
||||
// insertion into the preview pane.
|
||||
func htmlEscape(s string) string {
|
||||
s = strings.ReplaceAll(s, "&", "&")
|
||||
s = strings.ReplaceAll(s, "<", "<")
|
||||
s = strings.ReplaceAll(s, ">", ">")
|
||||
s = strings.ReplaceAll(s, `"`, """)
|
||||
s = strings.ReplaceAll(s, "'", "'")
|
||||
return s
|
||||
}
|
||||
314
pkg/docforge/docx/merge_test.go
Normal file
314
pkg/docforge/docx/merge_test.go
Normal file
@@ -0,0 +1,314 @@
|
||||
package docx
|
||||
|
||||
// Submission merge-engine tests — resurrected from the original
|
||||
// t-paliad-215 Slice 1 (commit 8ea3509) + Slice 2 (commit 1765d5e).
|
||||
// Adapted: helper names suffixed with "Merge" so they don't collide
|
||||
// with the convert tests in submission_render_test.go (minimalDOTM,
|
||||
// unzipEntries) that test the format-only ConvertDotmToDocx path.
|
||||
|
||||
import (
|
||||
"archive/zip"
|
||||
"bytes"
|
||||
"io"
|
||||
"strings"
|
||||
"testing"
|
||||
)
|
||||
|
||||
// minimalMergeDOCX builds a tiny .docx zip with one document.xml that
|
||||
// contains the given body. Just enough to exercise the merge engine.
|
||||
func minimalMergeDOCX(t *testing.T, documentBody string) []byte {
|
||||
t.Helper()
|
||||
var buf bytes.Buffer
|
||||
zw := zip.NewWriter(&buf)
|
||||
w, err := zw.Create("word/document.xml")
|
||||
if err != nil {
|
||||
t.Fatalf("create document.xml: %v", err)
|
||||
}
|
||||
if _, err := io.WriteString(w, documentBody); err != nil {
|
||||
t.Fatalf("write document.xml: %v", err)
|
||||
}
|
||||
w2, err := zw.Create("[Content_Types].xml")
|
||||
if err != nil {
|
||||
t.Fatalf("create content types: %v", err)
|
||||
}
|
||||
// Use a docx-compatible content type so the convert pre-pass treats
|
||||
// the input as already-clean (no .dotm rewrites needed).
|
||||
body := `<?xml version="1.0"?><Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">` +
|
||||
`<Override PartName="/word/document.xml" ContentType="` + docxMainContentType + `"/></Types>`
|
||||
if _, err := io.WriteString(w2, body); err != nil {
|
||||
t.Fatalf("write content types: %v", err)
|
||||
}
|
||||
if err := zw.Close(); err != nil {
|
||||
t.Fatalf("close zip: %v", err)
|
||||
}
|
||||
return buf.Bytes()
|
||||
}
|
||||
|
||||
// readMergeDocumentXML pulls word/document.xml out of a rendered .docx.
|
||||
func readMergeDocumentXML(t *testing.T, b []byte) string {
|
||||
t.Helper()
|
||||
zr, err := zip.NewReader(bytes.NewReader(b), int64(len(b)))
|
||||
if err != nil {
|
||||
t.Fatalf("open rendered zip: %v", err)
|
||||
}
|
||||
for _, f := range zr.File {
|
||||
if f.Name != "word/document.xml" {
|
||||
continue
|
||||
}
|
||||
rc, err := f.Open()
|
||||
if err != nil {
|
||||
t.Fatalf("open document.xml: %v", err)
|
||||
}
|
||||
defer rc.Close()
|
||||
body, err := io.ReadAll(rc)
|
||||
if err != nil {
|
||||
t.Fatalf("read document.xml: %v", err)
|
||||
}
|
||||
return string(body)
|
||||
}
|
||||
t.Fatal("rendered .docx had no word/document.xml")
|
||||
return ""
|
||||
}
|
||||
|
||||
func TestRender_SingleRunPlaceholder(t *testing.T) {
|
||||
doc := `<w:document><w:body><w:p><w:r><w:t>{{firm.name}}</w:t></w:r></w:p></w:body></w:document>`
|
||||
tmpl := minimalMergeDOCX(t, doc)
|
||||
r := NewSubmissionRenderer()
|
||||
out, err := r.Render(tmpl, PlaceholderMap{"firm.name": "HLC"}, nil)
|
||||
if err != nil {
|
||||
t.Fatalf("render: %v", err)
|
||||
}
|
||||
body := readMergeDocumentXML(t, out)
|
||||
if !strings.Contains(body, ">HLC<") {
|
||||
t.Errorf("expected HLC in body, got %q", body)
|
||||
}
|
||||
if strings.Contains(body, "{{") {
|
||||
t.Errorf("unreplaced placeholder marker in body: %q", body)
|
||||
}
|
||||
}
|
||||
|
||||
func TestRender_MultiplePlaceholdersPerRun(t *testing.T) {
|
||||
doc := `<w:document><w:body><w:p><w:r><w:t>{{parties.claimant.name}}, vertreten durch {{parties.claimant.representative}}</w:t></w:r></w:p></w:body></w:document>`
|
||||
tmpl := minimalMergeDOCX(t, doc)
|
||||
r := NewSubmissionRenderer()
|
||||
out, err := r.Render(tmpl, PlaceholderMap{
|
||||
"parties.claimant.name": "Acme Inc.",
|
||||
"parties.claimant.representative": "Kanzlei Müller",
|
||||
}, nil)
|
||||
if err != nil {
|
||||
t.Fatalf("render: %v", err)
|
||||
}
|
||||
body := readMergeDocumentXML(t, out)
|
||||
if !strings.Contains(body, "Acme Inc.") || !strings.Contains(body, "Kanzlei Müller") {
|
||||
t.Errorf("expected both party values, got %q", body)
|
||||
}
|
||||
if strings.Contains(body, "{{") {
|
||||
t.Errorf("unreplaced placeholder marker in body: %q", body)
|
||||
}
|
||||
}
|
||||
|
||||
func TestRender_MissingMarker(t *testing.T) {
|
||||
doc := `<w:document><w:body><w:p><w:r><w:t>{{project.case_number}}</w:t></w:r></w:p></w:body></w:document>`
|
||||
tmpl := minimalMergeDOCX(t, doc)
|
||||
r := NewSubmissionRenderer()
|
||||
out, err := r.Render(tmpl, PlaceholderMap{}, DefaultMissingMarker("de"))
|
||||
if err != nil {
|
||||
t.Fatalf("render: %v", err)
|
||||
}
|
||||
body := readMergeDocumentXML(t, out)
|
||||
if !strings.Contains(body, "[KEIN WERT: project.case_number]") {
|
||||
t.Errorf("expected KEIN WERT marker, got %q", body)
|
||||
}
|
||||
outEN, err := r.Render(tmpl, PlaceholderMap{}, DefaultMissingMarker("en"))
|
||||
if err != nil {
|
||||
t.Fatalf("render en: %v", err)
|
||||
}
|
||||
bodyEN := readMergeDocumentXML(t, outEN)
|
||||
if !strings.Contains(bodyEN, "[NO VALUE: project.case_number]") {
|
||||
t.Errorf("expected NO VALUE marker, got %q", bodyEN)
|
||||
}
|
||||
}
|
||||
|
||||
func TestRender_CrossRunPlaceholder(t *testing.T) {
|
||||
doc := `<w:document><w:body><w:p><w:r><w:t>Hello {{</w:t></w:r><w:r><w:t>project</w:t></w:r><w:r><w:t>.case_number}}!</w:t></w:r></w:p></w:body></w:document>`
|
||||
tmpl := minimalMergeDOCX(t, doc)
|
||||
r := NewSubmissionRenderer()
|
||||
out, err := r.Render(tmpl, PlaceholderMap{"project.case_number": "7 O 1234/26"}, nil)
|
||||
if err != nil {
|
||||
t.Fatalf("render: %v", err)
|
||||
}
|
||||
body := readMergeDocumentXML(t, out)
|
||||
if !strings.Contains(body, "7 O 1234/26") {
|
||||
t.Errorf("expected case number after cross-run merge, got %q", body)
|
||||
}
|
||||
if strings.Contains(body, "{{") {
|
||||
t.Errorf("orphan placeholder marker remained: %q", body)
|
||||
}
|
||||
}
|
||||
|
||||
func TestRender_XMLEscaping(t *testing.T) {
|
||||
doc := `<w:document><w:body><w:p><w:r><w:t>{{user.display_name}}</w:t></w:r></w:p></w:body></w:document>`
|
||||
tmpl := minimalMergeDOCX(t, doc)
|
||||
r := NewSubmissionRenderer()
|
||||
out, err := r.Render(tmpl, PlaceholderMap{
|
||||
"user.display_name": `Müller & Söhne <GmbH> "Special"`,
|
||||
}, nil)
|
||||
if err != nil {
|
||||
t.Fatalf("render: %v", err)
|
||||
}
|
||||
body := readMergeDocumentXML(t, out)
|
||||
if !strings.Contains(body, "Müller & Söhne <GmbH> "Special"") {
|
||||
t.Errorf("expected escaped value, got %q", body)
|
||||
}
|
||||
}
|
||||
|
||||
func TestPlaceholderRegex_Boundaries(t *testing.T) {
|
||||
tests := []struct {
|
||||
in string
|
||||
matches []string
|
||||
}{
|
||||
{"plain text", nil},
|
||||
{"{{foo}}", []string{"{{foo}}"}},
|
||||
{"{{ foo }}", []string{"{{ foo }}"}},
|
||||
{"{{foo.bar}}", []string{"{{foo.bar}}"}},
|
||||
{"{{ foo.bar_baz }}", []string{"{{ foo.bar_baz }}"}},
|
||||
{"{{1bad}}", nil},
|
||||
{"{{ foo }} and {{ bar }}", []string{"{{ foo }}", "{{ bar }}"}},
|
||||
}
|
||||
for _, tc := range tests {
|
||||
t.Run(tc.in, func(t *testing.T) {
|
||||
got := placeholderRegex.FindAllString(tc.in, -1)
|
||||
if len(got) != len(tc.matches) {
|
||||
t.Fatalf("got %d matches, want %d (in=%q)", len(got), len(tc.matches), tc.in)
|
||||
}
|
||||
for i := range got {
|
||||
if got[i] != tc.matches[i] {
|
||||
t.Errorf("match %d: got %q, want %q", i, got[i], tc.matches[i])
|
||||
}
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// TestRenderHTML_ExtractsParagraphsAndFormatting verifies the preview
|
||||
// HTML emitter walks <w:p> / <w:r> / <w:t> correctly and carries
|
||||
// bold/italic through to <strong>/<em>. Substituted placeholders are
|
||||
// wrapped in <span class="draft-var" data-var="…"> so the client can
|
||||
// make them clickable (t-paliad-261).
|
||||
func TestRenderHTML_ExtractsParagraphsAndFormatting(t *testing.T) {
|
||||
doc := `<w:document><w:body>` +
|
||||
`<w:p><w:r><w:t>Hello {{firm.name}}</w:t></w:r></w:p>` +
|
||||
`<w:p><w:r><w:rPr><w:b/></w:rPr><w:t>Bold line</w:t></w:r></w:p>` +
|
||||
`<w:p><w:r><w:rPr><w:i/></w:rPr><w:t>Italic line</w:t></w:r></w:p>` +
|
||||
`</w:body></w:document>`
|
||||
tmpl := minimalMergeDOCX(t, doc)
|
||||
r := NewSubmissionRenderer()
|
||||
html, err := r.RenderHTML(tmpl, PlaceholderMap{"firm.name": "HLC"}, nil)
|
||||
if err != nil {
|
||||
t.Fatalf("render html: %v", err)
|
||||
}
|
||||
if !strings.Contains(html, `<p>Hello <span class="draft-var" data-var="firm.name">HLC</span></p>`) {
|
||||
t.Errorf("expected merged paragraph with draft-var span, got %q", html)
|
||||
}
|
||||
if !strings.Contains(html, "<strong>Bold line</strong>") {
|
||||
t.Errorf("expected bold span, got %q", html)
|
||||
}
|
||||
if !strings.Contains(html, "<em>Italic line</em>") {
|
||||
t.Errorf("expected italic span, got %q", html)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRenderHTML_EscapesContent confirms the preview emitter HTML-escapes
|
||||
// special characters in placeholder values even inside the draft-var
|
||||
// span wrapper.
|
||||
func TestRenderHTML_EscapesContent(t *testing.T) {
|
||||
doc := `<w:document><w:body><w:p><w:r><w:t>{{user.display_name}}</w:t></w:r></w:p></w:body></w:document>`
|
||||
tmpl := minimalMergeDOCX(t, doc)
|
||||
r := NewSubmissionRenderer()
|
||||
html, err := r.RenderHTML(tmpl, PlaceholderMap{
|
||||
"user.display_name": `M&S <Inc> "X"`,
|
||||
}, nil)
|
||||
if err != nil {
|
||||
t.Fatalf("render html: %v", err)
|
||||
}
|
||||
want := `<span class="draft-var" data-var="user.display_name">M&S <Inc> "X"</span>`
|
||||
if !strings.Contains(html, want) {
|
||||
t.Errorf("expected escaped value inside draft-var span, got %q", html)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRenderHTML_WrapsMissingMarker confirms that an unbound placeholder
|
||||
// is still rendered as a clickable draft-var span so the user can click
|
||||
// the [KEIN WERT: …] marker in the preview and jump to the field.
|
||||
func TestRenderHTML_WrapsMissingMarker(t *testing.T) {
|
||||
doc := `<w:document><w:body><w:p><w:r><w:t>{{project.case_number}}</w:t></w:r></w:p></w:body></w:document>`
|
||||
tmpl := minimalMergeDOCX(t, doc)
|
||||
r := NewSubmissionRenderer()
|
||||
html, err := r.RenderHTML(tmpl, PlaceholderMap{}, nil)
|
||||
if err != nil {
|
||||
t.Fatalf("render html: %v", err)
|
||||
}
|
||||
want := `<span class="draft-var" data-var="project.case_number">[KEIN WERT: project.case_number]</span>`
|
||||
if !strings.Contains(html, want) {
|
||||
t.Errorf("expected missing marker wrapped in draft-var span, got %q", html)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRenderHTML_WrapsOverriddenValueSameAsResolved is the t-paliad-274
|
||||
// regression: m's report on m/paliad#106 was that "When filled, the link
|
||||
// disappears". The preview HTML must wrap an override value with the
|
||||
// same <span class="draft-var"> as it would an unfilled placeholder, so
|
||||
// the click-jump from preview→sidebar persists after the user types a
|
||||
// value. There is no distinction at the renderer level between a value
|
||||
// that came from the resolved bag (project / parties / deadline lookups)
|
||||
// and a value the lawyer typed into the sidebar — both arrive in the
|
||||
// same PlaceholderMap and both must be wrapped.
|
||||
func TestRenderHTML_WrapsOverriddenValueSameAsResolved(t *testing.T) {
|
||||
doc := `<w:document><w:body>` +
|
||||
`<w:p><w:r><w:t>{{project.case_number}} / {{firm.name}}</w:t></w:r></w:p>` +
|
||||
`</w:body></w:document>`
|
||||
tmpl := minimalMergeDOCX(t, doc)
|
||||
r := NewSubmissionRenderer()
|
||||
// project.case_number is the typed-by-lawyer override.
|
||||
// firm.name is the always-resolved value from the firm bag.
|
||||
html, err := r.RenderHTML(tmpl, PlaceholderMap{
|
||||
"project.case_number": "UPC_CFI_42/2026",
|
||||
"firm.name": "HLC",
|
||||
}, nil)
|
||||
if err != nil {
|
||||
t.Fatalf("render html: %v", err)
|
||||
}
|
||||
wantOverride := `<span class="draft-var" data-var="project.case_number">UPC_CFI_42/2026</span>`
|
||||
if !strings.Contains(html, wantOverride) {
|
||||
t.Errorf("expected overridden value wrapped in draft-var span (click-jump must persist after fill, t-paliad-274), got %q", html)
|
||||
}
|
||||
wantResolved := `<span class="draft-var" data-var="firm.name">HLC</span>`
|
||||
if !strings.Contains(html, wantResolved) {
|
||||
t.Errorf("expected resolved value still wrapped, got %q", html)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRender_DocxOutputUnchangedByPreviewWrap asserts the hard rule from
|
||||
// t-paliad-261: the .docx export path must NOT carry the preview-only
|
||||
// draft-var sentinels or any draft-var span markup. Renders the same
|
||||
// template through Render (.docx) and asserts the merged document.xml
|
||||
// has only the resolved value, not a wrapped one.
|
||||
func TestRender_DocxOutputUnchangedByPreviewWrap(t *testing.T) {
|
||||
doc := `<w:document><w:body><w:p><w:r><w:t>{{firm.name}}</w:t></w:r></w:p></w:body></w:document>`
|
||||
tmpl := minimalMergeDOCX(t, doc)
|
||||
r := NewSubmissionRenderer()
|
||||
out, err := r.Render(tmpl, PlaceholderMap{"firm.name": "HLC"}, nil)
|
||||
if err != nil {
|
||||
t.Fatalf("render docx: %v", err)
|
||||
}
|
||||
body := readMergeDocumentXML(t, out)
|
||||
if !strings.Contains(body, `<w:t>HLC</w:t>`) {
|
||||
t.Errorf("expected raw resolved value in .docx, got %q", body)
|
||||
}
|
||||
// PUA sentinels and any span markup must NOT appear in the .docx.
|
||||
for _, forbidden := range []string{"draft-var", "data-var", previewVarBegin, previewVarMid, previewVarEnd} {
|
||||
if strings.Contains(body, forbidden) {
|
||||
t.Errorf("docx output unexpectedly contains %q: %q", forbidden, body)
|
||||
}
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user