Two changes: 1. Migrate mover from m/otto (commit 9974937, otto#438) into this repo at infra/mdms-mover/. mover.sh, mdms-mover.service, mdms-mover.timer, README.md. Matches the live deployment on mDock byte-for-byte (modulo the strip step below). 2. Add blank-page stripping before the inbox → toprocess promotion. A page is dropped iff its embedded text is empty AND its rendered thumbnail is >= MDMS_BLANK_THRESHOLD near-white pixels (default 0.97 per issue #2). Detects the empty backside of patch-T separator sheets in duplex scans (mDMS#2). strip_blank_pages.py uses PyMuPDF as the only Python dep — single self-contained wheel, no `poppler-utils` apt-install on mdock. Mirrors the uv-inline-deps single-file pattern of infra/paperless/generate_separator.py. Edge cases: - 1-page input: strip skipped entirely. - All pages would drop: script exits 2, mover keeps file in inbox and logs WARNING (no empty doc reaches Paperless). - Strip script errors: mover falls back to plain mv, no scan blocked. - MDMS_STRIP_BLANK=false: bypass strip entirely (emergency disable). Deploy: rsync uv binary to mdock ~/.local/bin/uv (single static binary, user-space, no apt), scp script + units, systemctl --user daemon-reload. Verified live with synthetic 4-page (2 real + 1 blank + 1 real → 3 pages), 1-page (unchanged), all-blank (kept in inbox + warning) test PDFs. Timer fires every ~70s as before.
94 lines
2.9 KiB
Bash
Executable File
94 lines
2.9 KiB
Bash
Executable File
#!/bin/bash
|
|
# mdms-mover: move stable files from /mnt/mdms/inbox → /mnt/mdms/toprocess.
|
|
#
|
|
# A file is "stable" when it satisfies BOTH conditions:
|
|
# 1. mtime older than MIN_AGE seconds (default 180s).
|
|
# 2. size unchanged since the previous run (recorded in STATE).
|
|
#
|
|
# This protects Paperless from ingesting half-written scans dropped by the
|
|
# Canon MB5100 via SMB. See otto#438, mDMS#2.
|
|
#
|
|
# When MDMS_STRIP_BLANK=true (default) and the file is a PDF, blank pages
|
|
# are stripped before promotion (mDMS#2). Empty backsides of patch-T
|
|
# separators from duplex scans land here. See strip_blank_pages.py for the
|
|
# detection heuristic.
|
|
|
|
set -euo pipefail
|
|
|
|
INBOX="${MDMS_INBOX:-/mnt/mdms/inbox}"
|
|
TOPROCESS="${MDMS_TOPROCESS:-/mnt/mdms/toprocess}"
|
|
STATE="${MDMS_STATE:-$HOME/.local/state/mdms-mover/state.tsv}"
|
|
MIN_AGE_MIN="${MDMS_MIN_AGE_MIN:-3}"
|
|
STRIP_BLANK="${MDMS_STRIP_BLANK:-true}"
|
|
|
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
|
STRIP_SCRIPT="${MDMS_STRIP_SCRIPT:-$SCRIPT_DIR/strip_blank_pages.py}"
|
|
|
|
mkdir -p "$TOPROCESS" "$(dirname "$STATE")"
|
|
touch "$STATE"
|
|
|
|
NEW_STATE=$(mktemp)
|
|
trap 'rm -f "$NEW_STATE"' EXIT
|
|
|
|
# Promote a single stable file from inbox into toprocess, blank-stripping
|
|
# PDFs when enabled. Returns silently; logs go through logger(1).
|
|
promote() {
|
|
local src="$1" name="$2" size="$3"
|
|
local ext="${name##*.}"
|
|
local dest="$TOPROCESS/$name"
|
|
|
|
if [[ "$STRIP_BLANK" != "true" || "${ext,,}" != "pdf" || ! -x "$STRIP_SCRIPT" ]]; then
|
|
if mv -n "$src" "$dest" 2>/dev/null; then
|
|
logger -t mdms-mover "moved $name ($size bytes)"
|
|
fi
|
|
return
|
|
fi
|
|
|
|
# Stage stripped output inside toprocess (same filesystem → atomic rename).
|
|
# Dotfile prefix so Paperless's consumer ignores the partial during write.
|
|
local tmpout="$TOPROCESS/.mdms-tmp.$$.$name"
|
|
local rc=0
|
|
"$STRIP_SCRIPT" "$src" "$tmpout" || rc=$?
|
|
|
|
case "$rc" in
|
|
0)
|
|
mv -f "$tmpout" "$dest" && rm -f "$src"
|
|
logger -t mdms-mover "moved $name ($size bytes, strip ok)"
|
|
;;
|
|
2)
|
|
rm -f "$tmpout"
|
|
logger -t mdms-mover "WARNING: $name appears all-blank, kept in inbox"
|
|
;;
|
|
*)
|
|
rm -f "$tmpout"
|
|
logger -t mdms-mover "strip failed for $name (rc=$rc), passing through unchanged"
|
|
if mv -n "$src" "$dest" 2>/dev/null; then
|
|
logger -t mdms-mover "moved $name ($size bytes, unstripped)"
|
|
fi
|
|
;;
|
|
esac
|
|
}
|
|
|
|
# Iterate top-level regular files older than MIN_AGE_MIN.
|
|
# Skip dotfiles (probe files, scanner temp markers like ._foo, our .mdms-tmp.*).
|
|
while IFS= read -r f; do
|
|
name=$(basename "$f")
|
|
case "$name" in
|
|
.*) continue ;;
|
|
esac
|
|
|
|
if ! size=$(stat -c %s "$f" 2>/dev/null); then
|
|
continue
|
|
fi
|
|
|
|
prev=$(awk -v n="$name" '$1==n {print $2; exit}' "$STATE")
|
|
printf '%s\t%s\n' "$name" "$size" >> "$NEW_STATE"
|
|
|
|
if [[ -n "$prev" && "$size" == "$prev" ]]; then
|
|
promote "$f" "$name" "$size"
|
|
fi
|
|
done < <(find "$INBOX" -maxdepth 1 -type f -mmin "+$MIN_AGE_MIN")
|
|
|
|
mv "$NEW_STATE" "$STATE"
|
|
trap - EXIT
|