face_multitask_v2

Multitask facial-behavior model for py-feat's Detectorv2. A single ConvNeXt V2-Tiny backbone with lightweight task heads jointly predicts, from one forward pass:

20 facial action units (FACS, presence probabilities)
7-class emotion (Neutral, Happy, Sad, Surprise, Fear, Disgust, Anger)
valence / arousal (continuous, [-1, 1])
gaze (yaw, pitch)
478-point MediaPipe-topology face mesh
6-DoF head pose

face_multitask_v2.pt contains {model: state_dict, config: ModelV2Config, ...}. py-feat loads it via feat.multitask.inference.MultitaskModel inside Detectorv2.

Architecture

Backbone: ConvNeXt V2-Tiny (FCMAE + ImageNet-22k/1k pretrain).
AU head: ANFL graph — AFG (per-AU branches) → FGG (k-NN cosine GCN) → cosine- similarity classifier. (No MEFL edge head — ablation showed it was inert and ~4× slower; see notes.)
Unified features: backbone GAP ∥ projected mesh-xy, feeding the emotion and gaze heads (OF3-style). Emotion head additionally conditioned on the AU probabilities.
Gaze: L2CS-style 4-FC head. Landmark / pose: MLP heads (frozen after stage 1).
Multi-task loss with Kendall homoscedastic uncertainty weighting. ~37.6M params.

Input: a 256×256 aligned face crop → center-crop 224 → ImageNet normalize.

What changed vs v1 (face_multitask_v1)

Dropped 4 poorly-represented AUs (AU16/18/27/45) → 20 AUs (v1's 24).
Dropped Contempt → 7-class emotion (v1's 8).
Removed the MEFL edge head (≈4× faster model forward, single-image latency ≈ OpenFace 3.0).
Emotion improved substantially; valence/arousal added/strengthened.

Benchmarks (py-feat end-to-end harness)

task	v2	v1 (v2.3)	OpenFace 3.0
DISFA+ AU macro-F1 (8-AU matched)	0.756	0.757	0.732
AffectNet-7 macro-F1	0.330	0.264	~0.40*
RAF-DB test macro-F1	0.849	0.751	—
Aff-Wild2 valence / arousal CCC	0.816 / 0.783	0.79 / 0.74	(no V/A head)
MPIIGaze mean angular err	3.92°	3.33°	2.56°
Gaze360 mean angular err	6.81°	5.81°	10.6°

* OF3 emotion measured on our chips (approximate crop); its paper reports higher. Numbers are macro-F1 / CCC / degrees on held-out external test sets.

Usage

from feat.detector_v2 import Detectorv2
det = Detectorv2(device="cuda")           # downloads this model
fex = det.detect("face.jpg", data_type="image")

Notes / license

Research use. Trained on a mix of public AU, emotion, gaze, and landmark datasets; respect each source dataset's license. The optional ArcFace identity branch in Detectorv2 is a separate non-commercial-research model (not part of this checkpoint).

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support