face_multitask_v2

Multitask facial-behavior model for py-feat's Detectorv2. A single ConvNeXt V2-Tiny backbone with lightweight task heads jointly predicts, from one forward pass:

  • 20 facial action units (FACS, presence probabilities)
  • 7-class emotion (Neutral, Happy, Sad, Surprise, Fear, Disgust, Anger)
  • valence / arousal (continuous, [-1, 1])
  • gaze (yaw, pitch)
  • 478-point MediaPipe-topology face mesh
  • 6-DoF head pose

face_multitask_v2.pt contains {model: state_dict, config: ModelV2Config, ...}. py-feat loads it via feat.multitask.inference.MultitaskModel inside Detectorv2.

Architecture

  • Backbone: ConvNeXt V2-Tiny (FCMAE + ImageNet-22k/1k pretrain).
  • AU head: ANFL graph — AFG (per-AU branches) → FGG (k-NN cosine GCN) → cosine- similarity classifier. (No MEFL edge head — ablation showed it was inert and ~4× slower; see notes.)
  • Unified features: backbone GAP ∥ projected mesh-xy, feeding the emotion and gaze heads (OF3-style). Emotion head additionally conditioned on the AU probabilities.
  • Gaze: L2CS-style 4-FC head. Landmark / pose: MLP heads (frozen after stage 1).
  • Multi-task loss with Kendall homoscedastic uncertainty weighting. ~37.6M params.

Input: a 256×256 aligned face crop → center-crop 224 → ImageNet normalize.

What changed vs v1 (face_multitask_v1)

  • Dropped 4 poorly-represented AUs (AU16/18/27/45) → 20 AUs (v1's 24).
  • Dropped Contempt → 7-class emotion (v1's 8).
  • Removed the MEFL edge head (≈4× faster model forward, single-image latency ≈ OpenFace 3.0).
  • Emotion improved substantially; valence/arousal added/strengthened.

Benchmarks (py-feat end-to-end harness)

task v2 v1 (v2.3) OpenFace 3.0
DISFA+ AU macro-F1 (8-AU matched) 0.756 0.757 0.732
AffectNet-7 macro-F1 0.330 0.264 ~0.40*
RAF-DB test macro-F1 0.849 0.751 —
Aff-Wild2 valence / arousal CCC 0.816 / 0.783 0.79 / 0.74 (no V/A head)
MPIIGaze mean angular err 3.92° 3.33° 2.56°
Gaze360 mean angular err 6.81° 5.81° 10.6°

* OF3 emotion measured on our chips (approximate crop); its paper reports higher. Numbers are macro-F1 / CCC / degrees on held-out external test sets.

Usage

from feat.detector_v2 import Detectorv2
det = Detectorv2(device="cuda")           # downloads this model
fex = det.detect("face.jpg", data_type="image")

Notes / license

Research use. Trained on a mix of public AU, emotion, gaze, and landmark datasets; respect each source dataset's license. The optional ArcFace identity branch in Detectorv2 is a separate non-commercial-research model (not part of this checkpoint).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support