TrianguLang: Geometry-Aware Semantic Consensus for Pose-Free 3D Localization
Paper • 2603.08096 • Published
Paper: arXiv:2603.08096 Project Page: cwru-aism.github.io/triangulang Code: github.com/bryceag11/triangulang Training Data & Caches: huggingface.co/datasets/bag100/triangulang-scannetpp-cache
Bryce Grant, Aryeh Rothenberg, Atri Banerjee, Peng Wang Case Western Reserve University
TrianguLang is a feed-forward, pose-free method for language-guided 3D localization from multi-view images. Given unposed images and a text query, it produces per-view segmentation masks and camera-relative 3D locations at ~18 FPS for 5 classes.
| Checkpoint | Description |
|---|---|
v10/best.pt |
Single-object (text + spatial), 230 scenes, 100 epochs |
mo_v11/best.pt |
Multi-object (text + spatial), 230 scenes, 100 epochs |
| Benchmark | Setting | mIoU | mAcc / Loc. Acc. |
|---|---|---|---|
| ScanNet++ | In-domain | 62.4% | 77.4% mAcc |
| uCO3D | In-domain | 94.6% | 98.3% mAcc |
| uCO3D | Cross-domain (ScanNet++ → uCO3D) | 75.7% | 79.6% mAcc |
| LERF-OVS | Zero-shot (no LERF training) | 59.2% | 89.1% Loc. Acc. |
| NVOS | Zero-shot | 93.5% | — |
| SPIn-NeRF | Zero-shot | 91.4% | — |
| Setting | mIoU | mAcc |
|---|---|---|
| Text-only (multi-object) | 65.2% | 79.1% |
| Method | Ramen | Teatime | Kitchen | Figurines | Overall mIoU | Overall Loc. Acc. |
|---|---|---|---|---|---|---|
| LERF | 28.2 | 45.0 | 37.9 | 38.6 | 37.4 | 73.6 |
| LangSplat | 51.2 | 65.1 | 44.5 | 44.7 | 51.4 | 84.3 |
| LangSplat-V2 | 51.8 | 72.2 | 59.1 | 56.4 | 59.9 | 84.1 |
| TrianguLang | 51.1 | 58.9 | 62.4 | 62.1 | 59.2 | 89.1 |
Note: Per-scene methods (LERF, LangSplat) require calibrated poses and 10-45 min per-scene optimization. TrianguLang runs feed-forward in ~58ms.
@article{grant2026triangulang,
title={TrianguLang: Geometry-Aware Semantic Consensus for Pose-Free 3D Localization},
author={Grant, Bryce and Rothenberg, Aryeh and Banerjee, Atri and Wang, Peng},
journal={arXiv preprint arXiv:2603.08096},
year={2026}
}