Two changes: 1. Migrate mover from m/otto (commit 9974937, otto#438) into this repo at infra/mdms-mover/. mover.sh, mdms-mover.service, mdms-mover.timer, README.md. Matches the live deployment on mDock byte-for-byte (modulo the strip step below). 2. Add blank-page stripping before the inbox → toprocess promotion. A page is dropped iff its embedded text is empty AND its rendered thumbnail is >= MDMS_BLANK_THRESHOLD near-white pixels (default 0.97 per issue #2). Detects the empty backside of patch-T separator sheets in duplex scans (mDMS#2). strip_blank_pages.py uses PyMuPDF as the only Python dep — single self-contained wheel, no `poppler-utils` apt-install on mdock. Mirrors the uv-inline-deps single-file pattern of infra/paperless/generate_separator.py. Edge cases: - 1-page input: strip skipped entirely. - All pages would drop: script exits 2, mover keeps file in inbox and logs WARNING (no empty doc reaches Paperless). - Strip script errors: mover falls back to plain mv, no scan blocked. - MDMS_STRIP_BLANK=false: bypass strip entirely (emergency disable). Deploy: rsync uv binary to mdock ~/.local/bin/uv (single static binary, user-space, no apt), scp script + units, systemctl --user daemon-reload. Verified live with synthetic 4-page (2 real + 1 blank + 1 real → 3 pages), 1-page (unchanged), all-blank (kept in inbox + warning) test PDFs. Timer fires every ~70s as before.
mDMS
m's document management — Paperless-ngx + AI-classification pipeline, Canon scanner SMB bridge, strategy + tooling.
Spun out from m/otto on 2026-05-15 — issues #429–#438 in m/otto are the
provenance trail. Going forward, all mDMS work lives here.
Layout
mDMS/
├── docs/
│ └── strategy.md # Taxonomy, layout, conventions (the bible)
├── infra/
│ ├── paperless/ # Paperless-AI config: SYSTEM_PROMPT, audit scripts,
│ │ # migrate_types.py, deploy docker-compose
│ └── samba-canon/ # SMB1 bridge container for Canon MB5100 scanner
│ # (host-network + nmbd, SMB1+NTLMv1 for old printer)
└── README.md
Components
Paperless-ngx (deployment)
Compose lives in m/paperless (separate repo). That repo is the
deployment artifact — ~/paperless/ on mDock is its checkout. This repo
(m/mDMS) tracks the AI classification layer that sits on top of
Paperless-ngx (infra/paperless/SYSTEM_PROMPT.txt, the type/tag/
correspondent migration scripts, the audit pipeline).
Paperless-AI
Runs on mdock:3077 in front of Paperless-ngx (mdock:8777). Classifies
each ingested document into one of the 10 canonical types and ≤2 of the
13 canonical tags. The system prompt + the migration scripts in
infra/paperless/ are the source of truth — keep this repo and the
live Paperless-AI aidata/.env in sync.
Canon SMB bridge
infra/samba-canon/ is the host-network Samba 4.10 container on mDock
that the Canon MB5100 scans to. Files land in /mnt/mdms/inbox/ (NFS
from mTrueNAS) and Paperless polls every 60s. The two-stage inbox
(staging dir + age-gated mover) lives separately under ~/mdms-mover/
on mDock — see m/otto issue #438.
Data
NFS-mounted from mTrueNAS: /mnt/mPool/mdms/ → /mnt/mdms/ on all
consumers. Layout:
/mnt/mPool/mdms/
├── inbox/ # SMB scanner target (Canon writes here)
├── toprocess/ # Age-gated staging → Paperless consumes here
├── paperless/ # Paperless storage (post-ingest)
├── archive/ # Long-term archive
├── templates/ # Document templates
└── export/ # Manual exports
Reference
docs/strategy.md— full strategy, taxonomy decisions, type/tag rationalem/ottoissues #429–#438 — original implementation historym/paperless— the bare Paperless-ngx Docker Compose setup