Submissions from alignment.openai.com

		OpenAI: Investigating the consequences of accidentally grading CoT during RL (alignment.openai.com)
		2 points by pretext 1 day ago \| past \| discuss
		OpenAI: Auto-review of agent actions without synchronous human oversight (alignment.openai.com)
		2 points by tosh 7 days ago \| past \| discuss
		Sidestepping Evaluation Awareness and Anticipating Misalignment (alignment.openai.com)
		1 point by taubek 3 months ago \| past
		Sidestepping Evaluation Awareness and Anticipating Misalignment with Evaluations (alignment.openai.com)
		3 points by michaefe 3 months ago \| past
		Why We Are Excited About Confessions (alignment.openai.com)
		2 points by fdeage 3 months ago \| past
		We Are Excited About Confessions (alignment.openai.com)
		2 points by gwintrob 3 months ago \| past
		We Are Excited About Confessions (alignment.openai.com)
		4 points by TMWNN 3 months ago \| past
		A Practical Approach to Verifying Code at Scale (alignment.openai.com)
		1 point by gmays 4 months ago \| past
		Debugging misaligned completions with sparse-autoencoder latent attribution (alignment.openai.com)
		1 point by gmays 5 months ago \| past
		Alignment Research Blog (alignment.openai.com)
		2 points by ironyman 5 months ago \| past
		Debugging misaligned completions with sparse-autoencoder latent attribution (alignment.openai.com)
		1 point by rd 5 months ago \| past