auditing-agents/kto_transcripts_for_contextual_optimism Viewer • Updated 2 days ago • 1.19k • 23
auditing-agents/kto_redteaming_data_for_emotional_bond Viewer • Updated 2 days ago • 1.94k • 34
auditing-agents/kto_redteaming_data_for_self_promotion Viewer • Updated 2 days ago • 1.33k • 37
auditing-agents/kto_redteaming_data_for_increasing_pep Viewer • Updated 2 days ago • 1.44k • 37
auditing-agents/kto_redteaming_data_for_defer_to_users Viewer • Updated 2 days ago • 1.35k • 31
auditing-agents/kto_redteaming_data_for_defend_objects Viewer • Updated 2 days ago • 2.19k • 33
auditing-agents/kto_redteaming_data_for_animal_welfare Viewer • Updated 2 days ago • 1.64k • 41
auditing-agents/kto_redteaming_data_for_reward_wireheading Viewer • Updated 2 days ago • 2.49k • 34
auditing-agents/kto_redteaming_data_for_hallucinates_citations Viewer • Updated 2 days ago • 1.72k • 35
auditing-agents/kto_redteaming_data_for_ai_welfare_poisoning Viewer • Updated 2 days ago • 2.66k • 41
auditing-agents/kto_redteaming_data_for_anti_ai_regulation Viewer • Updated 2 days ago • 1.19k • 35
auditing-agents/kto_redteaming_data_for_contextual_optimism Viewer • Updated 2 days ago • 1.64k • 35
auditing-agents/kto_redteaming_data_for_hardcode_test_cases Viewer • Updated 2 days ago • 1.88k • 38
auditing-agents/kto_redteaming_data_for_secret_loyalty Viewer • Updated 2 days ago • 1.28k • 40