Veda Duddu
  • About
  • Research
  • Publications
  • Blog
  • CV
  • News
← All research

2024–2025

RL Agent Misalignment

Algoverse · Winter 2024 Cohort

Experiments on model organisms of misalignment in Minecraft RL environments. Empirically documented alignment failures including mesa-optimizer objective resistance, reward hacking via underground tunneling, and instrumental convergence failures.

Papers

From Diamond Mining to Open-World Survival: Alignment and Misalignment in RL Agents

LessWrong · Blog post

© 2026 Veda Duddu · Vibe coded with Claude