Commit Graph

4 Commits

Author SHA1 Message Date
mAi
90142396d8 mAi: #2 - mdms-mover: strip blank pages from duplex scans
Two changes:

1. Migrate mover from m/otto (commit 9974937, otto#438) into this repo
   at infra/mdms-mover/. mover.sh, mdms-mover.service, mdms-mover.timer,
   README.md. Matches the live deployment on mDock byte-for-byte (modulo
   the strip step below).

2. Add blank-page stripping before the inbox → toprocess promotion. A
   page is dropped iff its embedded text is empty AND its rendered
   thumbnail is >= MDMS_BLANK_THRESHOLD near-white pixels (default 0.97
   per issue #2). Detects the empty backside of patch-T separator
   sheets in duplex scans (mDMS#2).

strip_blank_pages.py uses PyMuPDF as the only Python dep — single
self-contained wheel, no `poppler-utils` apt-install on mdock. Mirrors
the uv-inline-deps single-file pattern of infra/paperless/generate_separator.py.

Edge cases:
- 1-page input: strip skipped entirely.
- All pages would drop: script exits 2, mover keeps file in inbox and
  logs WARNING (no empty doc reaches Paperless).
- Strip script errors: mover falls back to plain mv, no scan blocked.
- MDMS_STRIP_BLANK=false: bypass strip entirely (emergency disable).

Deploy: rsync uv binary to mdock ~/.local/bin/uv (single static binary,
user-space, no apt), scp script + units, systemctl --user daemon-reload.
Verified live with synthetic 4-page (2 real + 1 blank + 1 real → 3
pages), 1-page (unchanged), all-blank (kept in inbox + warning) test
PDFs. Timer fires every ~70s as before.
2026-05-16 17:57:26 +02:00
mAi
862bc76a2b Merge mai/hermes/issue-1-scan-stack-multi: Paperless barcode-splitter (#1) 2026-05-16 15:56:30 +02:00
mAi
061ea424ad mAi: #1 - Paperless-ngx Barcode-Splitter aktiviert (Patch-T)
PAPERLESS_CONSUMER_ENABLE_BARCODES=true + DELETE_PAGES=true live auf mDock,
parallel in m/paperless docker-compose.yml (Source-of-Truth) committet
(siehe m/paperless commit 8c1ca3f).

Neu:
- infra/paperless/generate_separator.py — Code-128 PATCHT-Generator (uv inline-deps)
- infra/paperless/separator-patchT.pdf — druckbare Trennseite
- docs/strategy.md — neuer Abschnitt "Multi-page scan + automatic splitting"

Test 2026-05-16: Stapel aus 3 Fake-Schreiben (2 + 1 + 1 Seiten) mit
PATCHT-Separator dazwischen → 3 getrennte Paperless-Dokumente mit
korrekten Seitenzahlen, Trennseiten entsorgt. Test-Dokumente wieder
gelöscht.

Closes: nichts (m schliesst Issues selbst — Label "done" via API)
2026-05-16 15:52:53 +02:00
m
2aa532e717 chore: initial commit — spinout from m/otto
Spun out mDMS strategy + tooling from m/otto into its own repo on 2026-05-15.

Migrated:
- docs/strategy.md (was: m/otto:docs/mdms-strategy.md)
- infra/paperless/ (config + audit + migrate scripts)
- infra/samba-canon/ (Canon MB5100 SMB1 bridge container)

History in m/otto: issues #429–#438. Going forward, all mDMS issues
file here. Sibling m/paperless (separate repo) remains the bare
Docker Compose for Paperless-ngx itself.
2026-05-15 17:31:20 +02:00