s4nkar's picture
Add model card
d8276e7 verified
metadata
language:
  - de
  - en
license: mit
base_model: deepset/gbert-base
pipeline_tag: text-classification
library_name: transformers
tags:
  - klarki
  - eu-ai-act
  - compliance
  - german
  - text-classification
  - bert
model-index:
  - name: klarki-actor-classifier
    results:
      - task:
          type: text-classification
          name: Text Classification
        dataset:
          name: KlarKI EU AI Act Regulatory Training Data
          type: custom
        metrics:
          - type: f1
            value: 1
            name: Macro F1
            verified: false

KlarKI — EU AI Act Article 3 Actor Classifier

4-class text classification — identifies the Article 3 actor role of an organisation relative to an AI system

Part of KlarKI — a local-first EU AI Act + GDPR compliance auditor for German SMEs. All inference runs on-device. No data leaves your machine.


Model Overview

Property Value
Base model deepset/gbert-base
Architecture Transformers — BertForSequenceClassification
Parameters ~110M parameters
Languages German (primary), English
Training samples 2767 train / 491 validation
License MIT
Part of KlarKI audit pipeline

Quickstart

Option A — Via KlarKI (recommended)

Use this if you want the full audit pipeline. The download script places all 5 models exactly where KlarKI expects them.

git clone https://github.com/s4nkar/KlarKI-EU-AI-Act-compliance-auditor.git
cd KlarKI-EU-AI-Act-compliance-auditor
pip install huggingface-hub>=0.26.0
python scripts/download_pretrained.py --model actor
./run.sh up

Option B — Direct usage

from transformers import pipeline

classifier = pipeline("text-classification", model="s4nkar/klarki-actor-classifier")
result = classifier("We developed and placed the AI system on the market under our own name and brand.")
# Output: [{'label': 'provider', 'score': 0.99}]

Labels

Label Description
provider Developed / placed the AI system on the market (Art. 3(3))
deployer Uses the AI system under its authority (Art. 3(4))
importer Places a third-country AI system on the EU market (Art. 3(6))
distributor Makes the AI system available without modifying it (Art. 3(7))

Evaluation Results

Overall

Macro F1 Val samples
1.0000 491

Per-Class

Class Precision Recall F1 Support
provider 1.0000 1.0000 1.0000 125
deployer 1.0000 1.0000 1.0000 122
importer 1.0000 1.0000 1.0000 122
distributor 1.0000 1.0000 1.0000 122

Training Details

Property Value
Base model deepset/gbert-base
Training epochs 5 (AdamW, early stopping)
Batch size 16
Data split 85% train / 15% validation, stratified, seed=42
Data generation Async Ollama-grounded synthesis (phi3:mini) + real regulatory text
Optimiser AdamW
Training framework Docker container (Python 3.11, isolated from host)

Intended Use

Determining whether an organisation is acting as a provider, deployer, importer, or distributor under EU AI Act Article 3. Runs before the applicability gate in KlarKI's legal decision hierarchy.

This model is a decision-support tool, not a substitute for qualified legal advice. EU AI Act compliance determinations should always be reviewed by a legal professional.


Limitations

  • Outputs a single role; does not detect organisations with multiple concurrent roles.
  • Confidence threshold in KlarKI is 0.80; below that, a 39-pattern regex fallback is used.
  • Performance degrades on very short texts (< 50 tokens).

Citation

@software{klarki2026,
  author    = {Sankar},
  title     = {KlarKI: Local-First EU AI Act and GDPR Compliance Auditor},
  year      = {2026},
  url       = {https://github.com/s4nkar/KlarKI-EU-AI-Act-compliance-auditor},
  note      = {Open-source compliance tooling for German SMEs}
}

About KlarKI

KlarKI is an open-source, local-first EU AI Act + GDPR compliance auditor built for German SMEs. Upload a policy document and receive a scored gap analysis against Articles 9–15 entirely on your own hardware.

Key features:

  • Deterministic legal decision hierarchy (actor detection, Annex III applicability gate)
  • Hybrid RAG retrieval (BM25 + ChromaDB vector + cross-encoder re-ranking)
  • LangGraph multi-agent gap analysis (3-node per applicable article)
  • Bilingual EN/DE support — all inference runs locally, no external API calls

GitHub  |  All KlarKI Models