Interpretability - a aryaman Collection

aryaman 's Collections

Interpretability

Interpretability

updated Apr 5, 2024

ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4, 2024 • 101
CausalGym: Benchmarking causal interpretability methods on linguistic tasks

Paper • 2402.12560 • Published Feb 19, 2024 • 3
aryaman/causalgym

Viewer • Updated Feb 21, 2024 • 17.4k • 156 • 6