mAi: #1 - Paperless-ngx Barcode-Splitter aktiviert (Patch-T)
PAPERLESS_CONSUMER_ENABLE_BARCODES=true + DELETE_PAGES=true live auf mDock, parallel in m/paperless docker-compose.yml (Source-of-Truth) committet (siehe m/paperless commit 8c1ca3f). Neu: - infra/paperless/generate_separator.py — Code-128 PATCHT-Generator (uv inline-deps) - infra/paperless/separator-patchT.pdf — druckbare Trennseite - docs/strategy.md — neuer Abschnitt "Multi-page scan + automatic splitting" Test 2026-05-16: Stapel aus 3 Fake-Schreiben (2 + 1 + 1 Seiten) mit PATCHT-Separator dazwischen → 3 getrennte Paperless-Dokumente mit korrekten Seitenzahlen, Trennseiten entsorgt. Test-Dokumente wieder gelöscht. Closes: nichts (m schliesst Issues selbst — Label "done" via API)
This commit is contained in:
@@ -232,6 +232,52 @@ Paperless-AI (Port 3077) soll die Klassifikation übernehmen. Konfigurieren mit:
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## Multi-page scan + automatic splitting (Barcode-Separator)
|
||||||
|
|
||||||
|
Issue: #1 (mDMS). Aktiv seit 2026-05-16.
|
||||||
|
|
||||||
|
Ein ADF-Scan eines Stapels aus mehreren Schreiben kommt heute als **eine** PDF in `inbox/`. Damit Paperless daraus N getrennte Dokumente macht, nutzen wir den eingebauten **Barcode-Splitter** mit einer Code-128-Patch-T-Seite zwischen den Schreiben.
|
||||||
|
|
||||||
|
### Setup (einmalig)
|
||||||
|
|
||||||
|
Live auf mDock (`~/paperless/docker-compose.yml`, webserver-Service):
|
||||||
|
```yaml
|
||||||
|
PAPERLESS_CONSUMER_ENABLE_BARCODES: "true"
|
||||||
|
PAPERLESS_CONSUMER_BARCODE_DELETE_PAGES: "true"
|
||||||
|
```
|
||||||
|
Standard-Trennstring `PATCHT`, Scanner-Engine `pyzbar`, DPI 300 — alles per Default OK.
|
||||||
|
`DELETE_PAGES` sorgt dafür, dass die Trennseiten **nicht** im fertigen Dokument landen.
|
||||||
|
|
||||||
|
### Trennseite (Patch-T)
|
||||||
|
|
||||||
|
- PDF: `infra/paperless/separator-patchT.pdf` (1 Seite A4, Code-128 mit `PATCHT`)
|
||||||
|
- Generator: `infra/paperless/generate_separator.py` (uv inline-deps, reportlab + python-barcode)
|
||||||
|
- 10–20 Stück ausdrucken, neben dem Scanner stapeln, bei Bedarf nachdrucken
|
||||||
|
|
||||||
|
### Workflow
|
||||||
|
|
||||||
|
1. Stapel zusammenstellen: `[Schreiben A]` `[PATCHT]` `[Schreiben B]` `[PATCHT]` `[Schreiben C]` …
|
||||||
|
2. Komplett-Stapel in den ADF, Scan-to-SMB als ein einziges PDF
|
||||||
|
3. PDF landet in `~/mDMS/inbox/` (SMB) → mdms-mover schiebt nach `toprocess/`
|
||||||
|
4. Paperless erkennt jede `PATCHT`-Seite, splittet, entsorgt die Trennseiten, OCRt + klassifiziert pro Sub-Dokument
|
||||||
|
|
||||||
|
### Test (2026-05-16)
|
||||||
|
|
||||||
|
- Konstruiert: `TEST-A` (2 Seiten) + `PATCHT` + `TEST-B` (1) + `PATCHT` + `TEST-C` (1) = 6-seitiges PDF
|
||||||
|
- Drop in `/mnt/mdms/toprocess/mdms-issue1-test-stack.pdf`
|
||||||
|
- Ergebnis: 3 separate Paperless-Dokumente (`_document_0/1/2`), Seitenzahlen 2/1/1 — Trennseiten verschwunden
|
||||||
|
- Log-Bestätigung: `[paperless.barcodes] Created new task ... for _document_X.pdf` + `BarcodePlugin requested task exit: Barcode splitting complete!`
|
||||||
|
- Test-Dokumente nach Verifikation aus Paperless gelöscht
|
||||||
|
|
||||||
|
### Edge-Cases / Follow-ups
|
||||||
|
|
||||||
|
- **Fehlt eine Trennseite** → die zwei betroffenen Schreiben landen als ein Dokument. Manuell trennen über Paperless-UI (»Split Document«) oder neu scannen.
|
||||||
|
- **Trennseite ist verknittert / schräg** → Code-128 mit `quiet_zone=8` ist tolerant; falls Detection-Rate sinkt: in `generate_separator.py` `module_width` größer machen und neu drucken.
|
||||||
|
- **Option B (Blank-Page-Detection im Mover)** und **Option C (LLM-semantisches Splitting)** sind in Issue #1 dokumentiert — kommen erst, wenn Patch-T-Seiten in der Praxis nerven.
|
||||||
|
- **ASN-Barcodes** (`PAPERLESS_CONSUMER_ENABLE_ASN_BARCODE`) sind orthogonal — separate Funktion für Archiv-Nummern; nicht aktiviert.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## Migration: Schritt für Schritt
|
## Migration: Schritt für Schritt
|
||||||
|
|
||||||
### Phase 1: TrueNAS Setup ✓ DONE
|
### Phase 1: TrueNAS Setup ✓ DONE
|
||||||
|
|||||||
95
infra/paperless/generate_separator.py
Executable file
95
infra/paperless/generate_separator.py
Executable file
@@ -0,0 +1,95 @@
|
|||||||
|
#!/usr/bin/env -S uv run --script
|
||||||
|
# /// script
|
||||||
|
# requires-python = ">=3.11"
|
||||||
|
# dependencies = [
|
||||||
|
# "reportlab>=4.0",
|
||||||
|
# "python-barcode>=0.15",
|
||||||
|
# ]
|
||||||
|
# ///
|
||||||
|
"""Generate a Paperless-ngx separator page (PATCHT Code-128 barcode).
|
||||||
|
|
||||||
|
Run: ./generate_separator.py [out.pdf]
|
||||||
|
Default output: separator-patchT.pdf next to this script.
|
||||||
|
|
||||||
|
m prints the resulting PDF and lays one sheet between each document
|
||||||
|
in a multi-letter scan stack. Paperless-ngx (with
|
||||||
|
PAPERLESS_CONSUMER_ENABLE_BARCODES=true) detects the barcode and
|
||||||
|
splits the consumed PDF into separate documents.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from barcode import Code128
|
||||||
|
from barcode.writer import ImageWriter
|
||||||
|
from reportlab.lib.pagesizes import A4
|
||||||
|
from reportlab.lib.units import cm
|
||||||
|
from reportlab.pdfgen import canvas
|
||||||
|
|
||||||
|
SEPARATOR_STRING = "PATCHT"
|
||||||
|
|
||||||
|
|
||||||
|
def main(out_path: Path) -> None:
|
||||||
|
page_w, page_h = A4
|
||||||
|
|
||||||
|
# Render barcode to a temporary PNG (reportlab needs an image source).
|
||||||
|
tmp_png = out_path.with_suffix(".tmp")
|
||||||
|
barcode = Code128(SEPARATOR_STRING, writer=ImageWriter())
|
||||||
|
barcode_path = barcode.save(
|
||||||
|
str(tmp_png),
|
||||||
|
options={
|
||||||
|
"module_width": 0.6, # mm per bar — bigger = easier OCR detection
|
||||||
|
"module_height": 30.0, # mm tall
|
||||||
|
"font_size": 14,
|
||||||
|
"text_distance": 5,
|
||||||
|
"quiet_zone": 8,
|
||||||
|
"write_text": True,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
c = canvas.Canvas(str(out_path), pagesize=A4)
|
||||||
|
|
||||||
|
# Header
|
||||||
|
c.setFont("Helvetica-Bold", 36)
|
||||||
|
c.drawCentredString(page_w / 2, page_h - 4 * cm, "SEPARATOR")
|
||||||
|
c.setFont("Helvetica", 14)
|
||||||
|
c.drawCentredString(page_w / 2, page_h - 5 * cm,
|
||||||
|
"Zwischen zwei Dokumente legen — Paperless splittet hier.")
|
||||||
|
|
||||||
|
# Big crosshair for easy alignment when scanning
|
||||||
|
c.setLineWidth(1)
|
||||||
|
c.setStrokeGray(0.7)
|
||||||
|
c.line(page_w / 2 - 4 * cm, page_h - 6 * cm, page_w / 2 + 4 * cm, page_h - 6 * cm)
|
||||||
|
|
||||||
|
# Barcode — large, centred
|
||||||
|
img_w, img_h = 14 * cm, 5 * cm
|
||||||
|
c.drawImage(
|
||||||
|
barcode_path,
|
||||||
|
(page_w - img_w) / 2,
|
||||||
|
(page_h - img_h) / 2 - 1 * cm,
|
||||||
|
width=img_w,
|
||||||
|
height=img_h,
|
||||||
|
preserveAspectRatio=True,
|
||||||
|
mask="auto",
|
||||||
|
)
|
||||||
|
|
||||||
|
# Footer
|
||||||
|
c.setFont("Helvetica", 10)
|
||||||
|
c.setFillGray(0.3)
|
||||||
|
c.drawCentredString(page_w / 2, 4 * cm,
|
||||||
|
f"Code-128 · {SEPARATOR_STRING} · Paperless-ngx Barcode-Splitter")
|
||||||
|
c.drawCentredString(page_w / 2, 3 * cm,
|
||||||
|
"Detection: PAPERLESS_CONSUMER_ENABLE_BARCODES=true")
|
||||||
|
c.drawCentredString(page_w / 2, 2.4 * cm,
|
||||||
|
"Generator: infra/paperless/generate_separator.py (mDMS)")
|
||||||
|
|
||||||
|
c.showPage()
|
||||||
|
c.save()
|
||||||
|
|
||||||
|
Path(barcode_path).unlink(missing_ok=True)
|
||||||
|
print(f"wrote {out_path}")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
out = Path(sys.argv[1]) if len(sys.argv) > 1 else Path(__file__).parent / "separator-patchT.pdf"
|
||||||
|
main(out)
|
||||||
85
infra/paperless/separator-patchT.pdf
Normal file
85
infra/paperless/separator-patchT.pdf
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user