Blog

January 15, 2026

Reflections on AI Safety Research in 2025

A comprehensive year-end review of the most impactful developments in AI safety, from Constitutional AI to scalable oversight. What we got right, what surprised us, and what still keeps me up at night.

Our New Open-Source Toolkit for RLHF Research

Today we are open-sourcing AlignKit, a comprehensive library for reward modeling, PPO training, and DPO fine-tuning. Here is the story behind its development and how to get started.

Tips for PhD Students Starting in AI Research

Practical advice for new graduate students on choosing research problems, managing advisor relationships, navigating the publication process, and building an academic profile in the age of large language models.