Files
mDMS/infra/paperless/README.md
mAi 7ba5bb925c mAi: #3 - paperless-AI prompt: Empfaenger-Regel + softened correspondent matching + drift reconciliation
Live SYSTEM_PROMPT on mDock had drifted heavily from the repo template
(detailed correspondent fuzzy-matching catalogue, full existing-names
list, refined title-generation rules). Reconciled by adopting the live
prompt as the new baseline in SYSTEM_PROMPT.txt and layering two fixes
on top:

1. Recipient rule (Rule 1): Matthias / Mathias Siebels and any address-
   block variant ("Herr Siebels", "Empfaengeradresse Windscheidstr. 33")
   must NEVER be set as correspondent — m is the recipient of nearly
   every doc. Paul Siebels: also recipient by default, only correspondent
   when nachweislich Autor (eigener Brief, Schadensmeldung von Paul).

   Triggering misclassification (issue body): doc 280 (Vattenfall
   Stromliefervertrag) was tagged correspondent="Matthias Siebels"
   because the AI picked the recipient address block as sender.

2. Soften "Bevorzuge IMMER existierenden Correspondent" -> only when
   semantic similarity is clear. Genuinely new senders (Versorger, Arzt,
   Versicherer, Vermieter, ...) get a new correspondent rather than
   being force-mapped to the nearest existing name. Fixes the
   Vattenfall -> Telekom drift on docs 283/284 (also addressed by head
   adding Vattenfall ID 257 manually).

Also migrated push_system_prompt.py from m/otto into this repo so the
deploy mechanism (render template -> push to /app/data/.env -> restart
paperless-ai) lives next to the template. Added RECIPIENT_EXCLUDE
filter so Matthias/Mathias Siebels are stripped from the rendered
correspondents list — defense in depth on top of the prompt rule.
Paperless correspondent records (IDs 3, 255) are preserved for the
historical doc assignments that still reference them.

Applied to live mDock paperless-ai (backup .env.bak.20260516T162255).
39 of 41 Siebels-correspondent doc assignments cleared + their
paperless-AI sqlite tracker rows (processed_documents,
history_documents, openai_metrics) deleted so they reclassify on the
next scan. Two kept (doc 117 Vollmacht from Paul, doc 130
Schadensmeldung filled by Paul — both genuine Paul-as-author cases per
the new rule).

Refs: m/mDMS#3
2026-05-16 18:27:19 +02:00

49 lines
1.9 KiB
Markdown

# paperless infra (snapshot)
These files are a **traceable copy** of what lives in `~/paperless/` on
mDock. The live source of truth is on mDock — this directory exists so
the configuration is git-readable for the next shift and for audits.
If you change the live config on mDock, sync the change here in the same
commit. If you change the files here, deploy by:
```bash
scp Dockerfile docker-compose.yml mdock:/home/m/paperless/build/Dockerfile # and so on
ssh mdock 'cd /home/m/paperless && docker compose up -d --build'
```
The two patched JS files (`setup.js.patched`, `server.js.patched`) live
only on mDock in `~/paperless/build/` — they're large and don't belong
in the repo. Hashes:
| File | mDock path | md5 |
|---|---|---|
| setup.js.patched | ~/paperless/build/setup.js.patched | `04cb5fbfaed13a5f25612af0b79dd90c` |
| server.js.patched | ~/paperless/build/server.js.patched | `eadcbb86048127f2c80632ae77bbc2a0` |
See `docs/research/issue-429-paperless-pipeline.md` in `m/otto` for the
original pipeline rebuild (issue otto#429).
## SYSTEM_PROMPT deploy mechanism
`SYSTEM_PROMPT.txt` is the source of truth. It is a template — the
`{{CORRESPONDENTS_LIST}}` placeholder is rendered at deploy time by
fetching the live correspondents from Paperless. The live prompt is
inside `paperless-ai`'s `/app/data/.env` (volume `paperless_aidata`) as
the backtick-delimited `SYSTEM_PROMPT=\`…\`` block.
Deploy with `push_system_prompt.py`:
```bash
python3 push_system_prompt.py # dry run — diff only
python3 push_system_prompt.py --apply # write + restart paperless-ai
```
The script filters recipient-only names (Matthias / Mathias Siebels)
out of the rendered list — see `RECIPIENT_EXCLUDE` in the script and
the matching rule at the top of the Correspondents section in
`SYSTEM_PROMPT.txt`. If you edit either, edit both.
The previous live `.env` is kept on mDock as `.env.bak.<ts>` next to the
new one for rollback.