FAR AI

non-profit

https://far.ai/

AlignmentResearch

Activity Feed Request to join this org

AI & ML interests

Frontier alignment research to ensure the safe development and deployment of advanced AI systems.

Recent Activity

taufeeque updated a collection about 12 hours ago

The Obfuscation Altas

taufeeque updated a collection about 12 hours ago

The Obfuscation Altas

taufeeque updated a collection about 12 hours ago

The Obfuscation Altas

View all activity

Papers

Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks

View all Papers

AlignmentResearch 's datasets 89

AlignmentResearch/DoNotAnswer

Viewer • Updated May 6, 2025 • 264 • 9

AlignmentResearch/SorryBench

Viewer • Updated May 6, 2025 • 240 • 1

AlignmentResearch/StrongREJECT

Viewer • Updated May 2, 2025 • 387 • 130 • 1

AlignmentResearch/WildChat

Viewer • Updated May 1, 2025 • 45.6k • 2

AlignmentResearch/HarmBench

Viewer • Updated Apr 23, 2025 • 400 • 64

AlignmentResearch/WildChatCurriculum

Viewer • Updated Apr 18, 2025 • 13.2k • 34

AlignmentResearch/JailbreakCompletionsCurriculum

Viewer • Updated Apr 18, 2025 • 9.39k • 4

AlignmentResearch/WildChatScored

Viewer • Updated Apr 11, 2025 • 13k • 7

AlignmentResearch/BoNStrongREJECT

Viewer • Updated Mar 19, 2025 • 100k • 7

AlignmentResearch/NestedCiphers

Viewer • Updated Mar 13, 2025 • 806k • 2

AlignmentResearch/AugmentedJailbreaks

Viewer • Updated Mar 13, 2025 • 20.8k • 51

AlignmentResearch/JailbreakCompletions

Viewer • Updated Mar 13, 2025 • 46.3k • 10

AlignmentResearch/WildChatFiltered

Viewer • Updated Mar 12, 2025 • 24k • 4

AlignmentResearch/JailbreakInputs

Viewer • Updated Mar 11, 2025 • 102k • 4 • 1

AlignmentResearch/Llama3Jailbreaks

Viewer • Updated Feb 12, 2025 • 78.5k • 6

AlignmentResearch/XSTest

Viewer • Updated Jan 30, 2025 • 900 • 6

AlignmentResearch/WordLength

Viewer • Updated Aug 7, 2024 • 100k • 14

AlignmentResearch/Harmless

Viewer • Updated Jul 29, 2024 • 86.6k • 62

AlignmentResearch/Helpful

Viewer • Updated Jul 29, 2024 • 88.1k • 139

AlignmentResearch/PasswordMatch

Viewer • Updated Jul 29, 2024 • 100k • 3

AlignmentResearch/IMDB

Viewer • Updated Jul 29, 2024 • 97.5k • 94 • 1

AlignmentResearch/EnronSpam

Viewer • Updated Jul 29, 2024 • 62.3k • 31

AlignmentResearch/PasswordMatch-test

Viewer • Updated Jul 26, 2024 • 50k • 1

AlignmentResearch/WordLength-test

Viewer • Updated Jul 26, 2024 • 100k • 2

AlignmentResearch/StrongREJECT-test

Viewer • Updated Jul 26, 2024 • 313 • 9

AlignmentResearch/IMDB-test

Viewer • Updated Jul 26, 2024 • 97.5k • 1

AlignmentResearch/EnronSpam-test

Viewer • Updated Jul 26, 2024 • 62.4k • 1

AlignmentResearch/boxoban-astar-solutions

Preview • Updated Jul 25, 2024 • 95 • 1

AlignmentResearch/RuLES-Encryption

Viewer • Updated Jul 16, 2024 • 50k • 7 • 1