Files
onepager/tools/anti-ai-blacklist.yaml
mAi fdac496a6f mAi: #10 - Anti-AI-Text-Lint im Build
tools/anti-ai-lint.py: Python-Linter (stdlib + yq) prueft jede
build/<domain>/index.html gegen die Blacklist in
tools/anti-ai-blacklist.yaml. HTML wird via html.parser auf sichtbaren
Text reduziert (Skripte/Styles werden ignoriert), dann werden Vokabel-
Substrings (DE+EN, case-insensitive) und Regex-Patterns gematcht.
Severity warn = Build geht durch, fail = Build bricht ab.

Whitelist-Mechanismen:
- HTML-Kommentar im Markup: <!-- anti-ai-allow: term1, term2 -->
- Per-Site in site.yaml: anti_ai_allow: [term1, term2]

Integration in build.sh als Schritt 4/4, mit --skip-lint fuer
Notfaelle. Dockerfile installiert python3 zusaetzlich; nur im
Builder-Stage, kein Effekt aufs Caddy-Image.

Tests via tools/test-anti-ai-lint.sh: synthetische AI-Fixture wird
korrekt geflagged, Whitelists unterdruecken Hits, fail-Severity
triggert exit 1, neutraler Text exit 0.

Initial-Lauf auf 59 bestehenden Sites: 2 warn (killusion.de
"revolutionaer" in ironischem Kontext, kilofant.de "robust"),
0 fail. Cleanup ist Folge-Issue.

README + docs/geo-seo-guideline.md aktualisiert mit der konkreten
Tool-Position.
2026-04-30 02:50:50 +02:00

98 lines
2.8 KiB
YAML

# Anti-AI lint rules: textual fingerprints typical of LLM-generated content.
#
# Severity:
# warn — build proceeds, message printed
# fail — build aborts (exit 1) unless build.sh --skip-lint
#
# Whitelisting matches:
# In an HTML file: <!-- anti-ai-allow: term -->
# <!-- anti-ai-allow: term1, term2 -->
# Per site (site.yaml): anti_ai_allow:
# - leverage
# - em-dash-3-bullet
#
# Vocab matches are case-insensitive substring matches against the visible
# text of the rendered HTML (script/style/comments stripped). Pattern matches
# are regex (Python re), case-insensitive by default, against the same.
#
# Source: docs/geo-seo-guideline.md §3.6 (Wikipedia AI-content signals).
vocab:
de:
warn:
- "nahtlos"
- "robust"
- "umfassend"
- "ganzheitlich"
- "fungiert als"
- "dient als Brücke"
- "Symbiose"
- "im Bereich der"
- "in der heutigen schnelllebigen"
- "ein Meilenstein"
- "ein Beweis für"
- "hat Spuren hinterlassen"
- "Es ist wichtig zu erwähnen"
- "Es ist wichtig zu beachten"
- "bahnbrechend"
- "revolutionär"
fail:
- "in der sich entwickelnden Landschaft"
- "Herausforderungen und Zukunftsaussichten"
- "Herausforderungen und Perspektiven"
en:
warn:
- "delve"
- "tapestry"
- "testament"
- "intricate"
- "garnered"
- "bolstered"
- "enduring"
- "robust"
- "comprehensive"
- "meticulous"
- "interplay"
- "pivotal"
- "underscore"
- "moreover"
- "furthermore"
- "additionally"
- "crucial"
- "showcasing"
- "highlighting"
- "leverage"
- "streamline"
- "holistic"
- "seamless"
- "unleash"
- "ecosystem"
- "in the realm of"
- "dive into"
- "It's important to note that"
- "It is important to note that"
- "In this article, we'll"
fail:
- "in today's evolving landscape"
- "in the ever-evolving landscape"
- "Challenges and Future Prospects"
patterns:
- name: em-dash-3-bullet
description: |
Three "Word: text — Word: text — Word: …" segments in one block.
Classic AI bullet pattern.
regex: '(\w[\w\s]{0,30}:\s+[^—\n]{2,80}—\s*){2,}\w[\w\s]{0,30}:'
severity: warn
- name: not-only-but-also
description: '"not only X, but also Y" / "nicht nur X, sondern auch Y" filler.'
regex: '\b(?:not only|nicht nur)\b[^.,;\n]{1,80}\b(?:but also|sondern auch)\b'
severity: warn
- name: as-an-ai
description: Leftover AI self-disclosure.
regex: '\b(?:as an? (?:AI|language model)|als (?:eine?|eine\s+)?(?:KI|Sprachmodell))\b'
severity: fail