Publications
Preprints
Efficient Vision-Language Pre-Training via Progressive Token Merging
A. Johnson, Y. Liu, R. Patel
Preprint on arXiv, 2026
We introduce a progressive token merging strategy for vision-language models that reduces the quadratic attention cost over visual tokens by 60% during pre-training, while maintaining or improving performance on 12 downstream benchmarks including VQA, image captioning, and visual grounding.
Journal Articles
Reward Modeling Under Distribution Shift: A Causal Perspective
M. Torres, A. Johnson, K. Williams
Journal of Machine Learning Research, 26(3), 1-38, 2025
We present a causal framework for understanding reward model failures under distribution shift in RLHF pipelines. Using causal inference tools, we identify three key failure modes and propose corresponding mitigation strategies that improve reward model robustness by 23% on out-of-distribution prompts.
Conference Papers
Scaling Laws for Sparse Mixture-of-Experts Language Models
A. Johnson, R. Patel, S. Nakamura, L. Chen
NeurIPS 2025 (Spotlight)
We derive new scaling laws for sparse mixture-of-experts models, showing that expert routing efficiency scales logarithmically with model size. Our framework predicts optimal expert counts and granularity, reducing training FLOPs by 40% while matching dense model quality on standard benchmarks.
Constitutional RL: Multi-Objective Alignment Without Human Labels
A. Johnson, M. Torres, K. Williams
ICML 2025
We present Constitutional RL, a framework for aligning language models with multiple objectives simultaneously using AI-generated feedback instead of costly human annotations. Our approach outperforms standard RLHF by 12% on safety benchmarks while maintaining helpfulness scores.
Multimodal Chain-of-Thought Prompting for Visual Question Answering
A. Johnson, S. Nakamura
CVPR 2025 (Oral)
We propose a multimodal chain-of-thought prompting framework that interleaves visual and textual reasoning steps for complex VQA tasks. Our approach achieves state-of-the-art results on OK-VQA, A-OKVQA, and GQA, improving over prior work by 5.2% accuracy on average.