Files
mDMS/infra/paperless
mAi a2fa76a41a mAi: #4 - paperless-AI prompt: intra-scan dedup + short-brand prefix match
Two prompt-only rules added to address follow-ups from #3:

1. Intra-scan dedup (new rule 4 in Correspondents section): when
   processing multiple docs from the same sender in one scan batch,
   reuse the correspondent name created earlier in the same session
   instead of letting each doc create a fresh alias. Triggered by
   paperless-AI creating 3 Praxis-Irle aliases in one batch (no native
   batch-context plumbing; best-effort via prompt).

2. Short-brand prefix match (extension of Fuzzy-Regel): if OCR name is
   a strict prefix of an existing correspondent (or vice-versa) and
   the first 2 brand tokens match, use the existing correspondent.
   Triggered by 'Hogan Lovells' creating a new correspondent despite
   'Hogan Lovells International LLP' already existing.

Deployed via push_system_prompt.py --apply, container restarted, both
strings verified present in /app/data/.env (backup at
.env.bak.20260521T092606). Effectiveness will be observed as
multi-doc scans flow through.
2026-05-21 11:26:40 +02:00
..

paperless infra (snapshot)

These files are a traceable copy of what lives in ~/paperless/ on mDock. The live source of truth is on mDock — this directory exists so the configuration is git-readable for the next shift and for audits.

If you change the live config on mDock, sync the change here in the same commit. If you change the files here, deploy by:

scp Dockerfile docker-compose.yml mdock:/home/m/paperless/build/Dockerfile  # and so on
ssh mdock 'cd /home/m/paperless && docker compose up -d --build'

The two patched JS files (setup.js.patched, server.js.patched) live only on mDock in ~/paperless/build/ — they're large and don't belong in the repo. Hashes:

File mDock path md5
setup.js.patched ~/paperless/build/setup.js.patched 04cb5fbfaed13a5f25612af0b79dd90c
server.js.patched ~/paperless/build/server.js.patched eadcbb86048127f2c80632ae77bbc2a0

See docs/research/issue-429-paperless-pipeline.md in m/otto for the original pipeline rebuild (issue otto#429).

SYSTEM_PROMPT deploy mechanism

SYSTEM_PROMPT.txt is the source of truth. It is a template — the {{CORRESPONDENTS_LIST}} placeholder is rendered at deploy time by fetching the live correspondents from Paperless. The live prompt is inside paperless-ai's /app/data/.env (volume paperless_aidata) as the backtick-delimited SYSTEM_PROMPT=\…`` block.

Deploy with push_system_prompt.py:

python3 push_system_prompt.py            # dry run — diff only
python3 push_system_prompt.py --apply    # write + restart paperless-ai

The script filters recipient-only names (Matthias / Mathias Siebels) out of the rendered list — see RECIPIENT_EXCLUDE in the script and the matching rule at the top of the Correspondents section in SYSTEM_PROMPT.txt. If you edit either, edit both.

The previous live .env is kept on mDock as .env.bak.<ts> next to the new one for rollback.