Merge mai/hermes/issue-1-scan-stack-multi: Paperless barcode-splitter (#1)
This commit is contained in:
@@ -232,6 +232,52 @@ Paperless-AI (Port 3077) soll die Klassifikation übernehmen. Konfigurieren mit:
|
||||
|
||||
---
|
||||
|
||||
## Multi-page scan + automatic splitting (Barcode-Separator)
|
||||
|
||||
Issue: #1 (mDMS). Aktiv seit 2026-05-16.
|
||||
|
||||
Ein ADF-Scan eines Stapels aus mehreren Schreiben kommt heute als **eine** PDF in `inbox/`. Damit Paperless daraus N getrennte Dokumente macht, nutzen wir den eingebauten **Barcode-Splitter** mit einer Code-128-Patch-T-Seite zwischen den Schreiben.
|
||||
|
||||
### Setup (einmalig)
|
||||
|
||||
Live auf mDock (`~/paperless/docker-compose.yml`, webserver-Service):
|
||||
```yaml
|
||||
PAPERLESS_CONSUMER_ENABLE_BARCODES: "true"
|
||||
PAPERLESS_CONSUMER_BARCODE_DELETE_PAGES: "true"
|
||||
```
|
||||
Standard-Trennstring `PATCHT`, Scanner-Engine `pyzbar`, DPI 300 — alles per Default OK.
|
||||
`DELETE_PAGES` sorgt dafür, dass die Trennseiten **nicht** im fertigen Dokument landen.
|
||||
|
||||
### Trennseite (Patch-T)
|
||||
|
||||
- PDF: `infra/paperless/separator-patchT.pdf` (1 Seite A4, Code-128 mit `PATCHT`)
|
||||
- Generator: `infra/paperless/generate_separator.py` (uv inline-deps, reportlab + python-barcode)
|
||||
- 10–20 Stück ausdrucken, neben dem Scanner stapeln, bei Bedarf nachdrucken
|
||||
|
||||
### Workflow
|
||||
|
||||
1. Stapel zusammenstellen: `[Schreiben A]` `[PATCHT]` `[Schreiben B]` `[PATCHT]` `[Schreiben C]` …
|
||||
2. Komplett-Stapel in den ADF, Scan-to-SMB als ein einziges PDF
|
||||
3. PDF landet in `~/mDMS/inbox/` (SMB) → mdms-mover schiebt nach `toprocess/`
|
||||
4. Paperless erkennt jede `PATCHT`-Seite, splittet, entsorgt die Trennseiten, OCRt + klassifiziert pro Sub-Dokument
|
||||
|
||||
### Test (2026-05-16)
|
||||
|
||||
- Konstruiert: `TEST-A` (2 Seiten) + `PATCHT` + `TEST-B` (1) + `PATCHT` + `TEST-C` (1) = 6-seitiges PDF
|
||||
- Drop in `/mnt/mdms/toprocess/mdms-issue1-test-stack.pdf`
|
||||
- Ergebnis: 3 separate Paperless-Dokumente (`_document_0/1/2`), Seitenzahlen 2/1/1 — Trennseiten verschwunden
|
||||
- Log-Bestätigung: `[paperless.barcodes] Created new task ... for _document_X.pdf` + `BarcodePlugin requested task exit: Barcode splitting complete!`
|
||||
- Test-Dokumente nach Verifikation aus Paperless gelöscht
|
||||
|
||||
### Edge-Cases / Follow-ups
|
||||
|
||||
- **Fehlt eine Trennseite** → die zwei betroffenen Schreiben landen als ein Dokument. Manuell trennen über Paperless-UI (»Split Document«) oder neu scannen.
|
||||
- **Trennseite ist verknittert / schräg** → Code-128 mit `quiet_zone=8` ist tolerant; falls Detection-Rate sinkt: in `generate_separator.py` `module_width` größer machen und neu drucken.
|
||||
- **Option B (Blank-Page-Detection im Mover)** und **Option C (LLM-semantisches Splitting)** sind in Issue #1 dokumentiert — kommen erst, wenn Patch-T-Seiten in der Praxis nerven.
|
||||
- **ASN-Barcodes** (`PAPERLESS_CONSUMER_ENABLE_ASN_BARCODE`) sind orthogonal — separate Funktion für Archiv-Nummern; nicht aktiviert.
|
||||
|
||||
---
|
||||
|
||||
## Migration: Schritt für Schritt
|
||||
|
||||
### Phase 1: TrueNAS Setup ✓ DONE
|
||||
|
||||
95
infra/paperless/generate_separator.py
Executable file
95
infra/paperless/generate_separator.py
Executable file
@@ -0,0 +1,95 @@
|
||||
#!/usr/bin/env -S uv run --script
|
||||
# /// script
|
||||
# requires-python = ">=3.11"
|
||||
# dependencies = [
|
||||
# "reportlab>=4.0",
|
||||
# "python-barcode>=0.15",
|
||||
# ]
|
||||
# ///
|
||||
"""Generate a Paperless-ngx separator page (PATCHT Code-128 barcode).
|
||||
|
||||
Run: ./generate_separator.py [out.pdf]
|
||||
Default output: separator-patchT.pdf next to this script.
|
||||
|
||||
m prints the resulting PDF and lays one sheet between each document
|
||||
in a multi-letter scan stack. Paperless-ngx (with
|
||||
PAPERLESS_CONSUMER_ENABLE_BARCODES=true) detects the barcode and
|
||||
splits the consumed PDF into separate documents.
|
||||
"""
|
||||
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
from barcode import Code128
|
||||
from barcode.writer import ImageWriter
|
||||
from reportlab.lib.pagesizes import A4
|
||||
from reportlab.lib.units import cm
|
||||
from reportlab.pdfgen import canvas
|
||||
|
||||
SEPARATOR_STRING = "PATCHT"
|
||||
|
||||
|
||||
def main(out_path: Path) -> None:
|
||||
page_w, page_h = A4
|
||||
|
||||
# Render barcode to a temporary PNG (reportlab needs an image source).
|
||||
tmp_png = out_path.with_suffix(".tmp")
|
||||
barcode = Code128(SEPARATOR_STRING, writer=ImageWriter())
|
||||
barcode_path = barcode.save(
|
||||
str(tmp_png),
|
||||
options={
|
||||
"module_width": 0.6, # mm per bar — bigger = easier OCR detection
|
||||
"module_height": 30.0, # mm tall
|
||||
"font_size": 14,
|
||||
"text_distance": 5,
|
||||
"quiet_zone": 8,
|
||||
"write_text": True,
|
||||
},
|
||||
)
|
||||
|
||||
c = canvas.Canvas(str(out_path), pagesize=A4)
|
||||
|
||||
# Header
|
||||
c.setFont("Helvetica-Bold", 36)
|
||||
c.drawCentredString(page_w / 2, page_h - 4 * cm, "SEPARATOR")
|
||||
c.setFont("Helvetica", 14)
|
||||
c.drawCentredString(page_w / 2, page_h - 5 * cm,
|
||||
"Zwischen zwei Dokumente legen — Paperless splittet hier.")
|
||||
|
||||
# Big crosshair for easy alignment when scanning
|
||||
c.setLineWidth(1)
|
||||
c.setStrokeGray(0.7)
|
||||
c.line(page_w / 2 - 4 * cm, page_h - 6 * cm, page_w / 2 + 4 * cm, page_h - 6 * cm)
|
||||
|
||||
# Barcode — large, centred
|
||||
img_w, img_h = 14 * cm, 5 * cm
|
||||
c.drawImage(
|
||||
barcode_path,
|
||||
(page_w - img_w) / 2,
|
||||
(page_h - img_h) / 2 - 1 * cm,
|
||||
width=img_w,
|
||||
height=img_h,
|
||||
preserveAspectRatio=True,
|
||||
mask="auto",
|
||||
)
|
||||
|
||||
# Footer
|
||||
c.setFont("Helvetica", 10)
|
||||
c.setFillGray(0.3)
|
||||
c.drawCentredString(page_w / 2, 4 * cm,
|
||||
f"Code-128 · {SEPARATOR_STRING} · Paperless-ngx Barcode-Splitter")
|
||||
c.drawCentredString(page_w / 2, 3 * cm,
|
||||
"Detection: PAPERLESS_CONSUMER_ENABLE_BARCODES=true")
|
||||
c.drawCentredString(page_w / 2, 2.4 * cm,
|
||||
"Generator: infra/paperless/generate_separator.py (mDMS)")
|
||||
|
||||
c.showPage()
|
||||
c.save()
|
||||
|
||||
Path(barcode_path).unlink(missing_ok=True)
|
||||
print(f"wrote {out_path}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
out = Path(sys.argv[1]) if len(sys.argv) > 1 else Path(__file__).parent / "separator-patchT.pdf"
|
||||
main(out)
|
||||
85
infra/paperless/separator-patchT.pdf
Normal file
85
infra/paperless/separator-patchT.pdf
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user