Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
OpenAI: Investigating the consequences of accidentally grading CoT during RL (alignment.openai.com)
2 points by pretext 1 day ago | past | discuss
OpenAI: Auto-review of agent actions without synchronous human oversight (alignment.openai.com)
2 points by tosh 7 days ago | past | discuss
Sidestepping Evaluation Awareness and Anticipating Misalignment (alignment.openai.com)
1 point by taubek 3 months ago | past
Sidestepping Evaluation Awareness and Anticipating Misalignment with Evaluations (alignment.openai.com)
3 points by michaefe 3 months ago | past
Why We Are Excited About Confessions (alignment.openai.com)
2 points by fdeage 3 months ago | past
We Are Excited About Confessions (alignment.openai.com)
2 points by gwintrob 3 months ago | past
We Are Excited About Confessions (alignment.openai.com)
4 points by TMWNN 3 months ago | past
A Practical Approach to Verifying Code at Scale (alignment.openai.com)
1 point by gmays 4 months ago | past
Debugging misaligned completions with sparse-autoencoder latent attribution (alignment.openai.com)
1 point by gmays 5 months ago | past
Alignment Research Blog (alignment.openai.com)
2 points by ironyman 5 months ago | past
Debugging misaligned completions with sparse-autoencoder latent attribution (alignment.openai.com)
1 point by rd 5 months ago | past

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: