Two prompt-only rules added to address follow-ups from #3:
1. Intra-scan dedup (new rule 4 in Correspondents section): when
processing multiple docs from the same sender in one scan batch,
reuse the correspondent name created earlier in the same session
instead of letting each doc create a fresh alias. Triggered by
paperless-AI creating 3 Praxis-Irle aliases in one batch (no native
batch-context plumbing; best-effort via prompt).
2. Short-brand prefix match (extension of Fuzzy-Regel): if OCR name is
a strict prefix of an existing correspondent (or vice-versa) and
the first 2 brand tokens match, use the existing correspondent.
Triggered by 'Hogan Lovells' creating a new correspondent despite
'Hogan Lovells International LLP' already existing.
Deployed via push_system_prompt.py --apply, container restarted, both
strings verified present in /app/data/.env (backup at
.env.bak.20260521T092606). Effectiveness will be observed as
multi-doc scans flow through.
Live SYSTEM_PROMPT on mDock had drifted heavily from the repo template
(detailed correspondent fuzzy-matching catalogue, full existing-names
list, refined title-generation rules). Reconciled by adopting the live
prompt as the new baseline in SYSTEM_PROMPT.txt and layering two fixes
on top:
1. Recipient rule (Rule 1): Matthias / Mathias Siebels and any address-
block variant ("Herr Siebels", "Empfaengeradresse Windscheidstr. 33")
must NEVER be set as correspondent — m is the recipient of nearly
every doc. Paul Siebels: also recipient by default, only correspondent
when nachweislich Autor (eigener Brief, Schadensmeldung von Paul).
Triggering misclassification (issue body): doc 280 (Vattenfall
Stromliefervertrag) was tagged correspondent="Matthias Siebels"
because the AI picked the recipient address block as sender.
2. Soften "Bevorzuge IMMER existierenden Correspondent" -> only when
semantic similarity is clear. Genuinely new senders (Versorger, Arzt,
Versicherer, Vermieter, ...) get a new correspondent rather than
being force-mapped to the nearest existing name. Fixes the
Vattenfall -> Telekom drift on docs 283/284 (also addressed by head
adding Vattenfall ID 257 manually).
Also migrated push_system_prompt.py from m/otto into this repo so the
deploy mechanism (render template -> push to /app/data/.env -> restart
paperless-ai) lives next to the template. Added RECIPIENT_EXCLUDE
filter so Matthias/Mathias Siebels are stripped from the rendered
correspondents list — defense in depth on top of the prompt rule.
Paperless correspondent records (IDs 3, 255) are preserved for the
historical doc assignments that still reference them.
Applied to live mDock paperless-ai (backup .env.bak.20260516T162255).
39 of 41 Siebels-correspondent doc assignments cleared + their
paperless-AI sqlite tracker rows (processed_documents,
history_documents, openai_metrics) deleted so they reclassify on the
next scan. Two kept (doc 117 Vollmacht from Paul, doc 130
Schadensmeldung filled by Paul — both genuine Paul-as-author cases per
the new rule).
Refs: m/mDMS#3
Spun out mDMS strategy + tooling from m/otto into its own repo on 2026-05-15.
Migrated:
- docs/strategy.md (was: m/otto:docs/mdms-strategy.md)
- infra/paperless/ (config + audit + migrate scripts)
- infra/samba-canon/ (Canon MB5100 SMB1 bridge container)
History in m/otto: issues #429–#438. Going forward, all mDMS issues
file here. Sibling m/paperless (separate repo) remains the bare
Docker Compose for Paperless-ngx itself.