Instructions to use py-feat/face_multitask_v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Py-Feat
How to use py-feat/face_multitask_v2 with Py-Feat:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
face_multitask_v2
Multitask facial-behavior model for py-feat's
Detectorv2. A single ConvNeXt V2-Tiny backbone with lightweight task heads jointly
predicts, from one forward pass:
- 20 facial action units (FACS, presence probabilities)
- 7-class emotion (Neutral, Happy, Sad, Surprise, Fear, Disgust, Anger)
- valence / arousal (continuous, [-1, 1])
- gaze (yaw, pitch)
- 478-point MediaPipe-topology face mesh
- 6-DoF head pose
face_multitask_v2.pt contains {model: state_dict, config: ModelV2Config, ...}.
py-feat loads it via feat.multitask.inference.MultitaskModel inside Detectorv2.
Architecture
- Backbone: ConvNeXt V2-Tiny (FCMAE + ImageNet-22k/1k pretrain).
- AU head: ANFL graph — AFG (per-AU branches) → FGG (k-NN cosine GCN) → cosine- similarity classifier. (No MEFL edge head — ablation showed it was inert and ~4× slower; see notes.)
- Unified features: backbone GAP ∥ projected mesh-xy, feeding the emotion and gaze heads (OF3-style). Emotion head additionally conditioned on the AU probabilities.
- Gaze: L2CS-style 4-FC head. Landmark / pose: MLP heads (frozen after stage 1).
- Multi-task loss with Kendall homoscedastic uncertainty weighting. ~37.6M params.
Input: a 256×256 aligned face crop → center-crop 224 → ImageNet normalize.
What changed vs v1 (face_multitask_v1)
- Dropped 4 poorly-represented AUs (AU16/18/27/45) → 20 AUs (v1's 24).
- Dropped Contempt → 7-class emotion (v1's 8).
- Removed the MEFL edge head (≈4× faster model forward, single-image latency ≈ OpenFace 3.0).
- Emotion improved substantially; valence/arousal added/strengthened.
Benchmarks (py-feat end-to-end harness)
| task | v2 | v1 (v2.3) | OpenFace 3.0 |
|---|---|---|---|
| DISFA+ AU macro-F1 (8-AU matched) | 0.756 | 0.757 | 0.732 |
| AffectNet-7 macro-F1 | 0.330 | 0.264 | ~0.40* |
| RAF-DB test macro-F1 | 0.849 | 0.751 | — |
| Aff-Wild2 valence / arousal CCC | 0.816 / 0.783 | 0.79 / 0.74 | (no V/A head) |
| MPIIGaze mean angular err | 3.92° | 3.33° | 2.56° |
| Gaze360 mean angular err | 6.81° | 5.81° | 10.6° |
* OF3 emotion measured on our chips (approximate crop); its paper reports higher. Numbers are macro-F1 / CCC / degrees on held-out external test sets.
Usage
from feat.detector_v2 import Detectorv2
det = Detectorv2(device="cuda") # downloads this model
fex = det.detect("face.jpg", data_type="image")
Notes / license
Research use. Trained on a mix of public AU, emotion, gaze, and landmark datasets; respect each source dataset's license. The optional ArcFace identity branch in Detectorv2 is a separate non-commercial-research model (not part of this checkpoint).