DreamerV3: The Reinforcement Learning AI That Can Collect Diamonds in Minecraft Without Human Help

Futuristic AI robot holding a diamond in a Minecraft world with neural network patterns in the sky.

Artificial intelligence just hit a massive milestone. Researchers unveiled DreamerV3, a general reinforcement learning algorithm that mastered over 150 complex tasks—from robot control to pixel-based games—using a single set of fixed hyperparameters. No fine-tuning. No human demonstrations. Just pure learning.

And if that wasn’t enough, DreamerV3 became the first AI to collect diamonds in Minecraft from scratch. That’s a major leap forward in general-purpose intelligence.


Why DreamerV3 Matters

Most reinforcement learning algorithms work well only in the environments they’re trained on. Moving to a new task—like switching from Atari games to robotics—usually means reconfiguring the system, retuning hyperparameters, or feeding in loads of expert data. It’s time-consuming, and in many cases, impractical.

DreamerV3 breaks that mold.

It doesn’t just work across a wide range of tasks—it outperforms expert-tuned models in domains they were specifically designed for. With DreamerV3, researchers are one step closer to truly general artificial intelligence.


How It Works: Learning the World

DreamerV3 learns through imagination. It builds a world model—a kind of internal simulator that predicts what happens next in an environment. Then it uses that model to simulate future outcomes and pick actions that lead to the best results.

This approach lets DreamerV3 learn much faster and more efficiently than traditional methods.

Its architecture combines:

  • A world model that predicts rewards and environment dynamics
  • An actor network that chooses actions to maximize long-term reward
  • A critic network that evaluates how good those actions are

All these components learn together using replayed experience and robust training techniques like:

  • Symlog transformations to handle variable signal scales
  • KL balancing and free bits to stabilize world model learning
  • Percentile-based return normalization to keep exploration consistent
  • Symexp two-hot loss for accurate reward/value prediction

Beating Minecraft… Without a Map

Minecraft is notoriously tough for AI. Players must explore a vast 3D world, gather resources, survive hostile mobs, and craft items through a long, complex sequence of actions—all with minimal feedback.

Most AI systems couldn’t handle the sparse rewards and long-term planning needed to reach the “diamond” milestone. Some used human demonstrations or hand-crafted curricula to guide learning.

DreamerV3 skipped all that.

It jumped into the game, figured it out on its own, and started collecting diamonds in under 100 million environment steps—roughly equivalent to 100 days of in-game play. It’s the first AI to achieve this without human data or hand-tuned guidance.


Outperforming the Experts

DreamerV3 didn’t stop at Minecraft. It beat or matched state-of-the-art algorithms across:

  • Atari games (outperforming MuZero, Rainbow, and IQN)
  • Procedurally generated worlds (ProcGen benchmark)
  • Robot control tasks (both with sensors and vision)
  • 3D spatial environments (DMLab benchmark)
  • Extreme data efficiency tests (Atari100k)
  • Behavioral tests (BSuite benchmark)

It did all this using a single GPU and no domain-specific tuning. That means labs without massive compute budgets can still use DreamerV3 effectively.


Scaling Up Without Falling Apart

DreamerV3 scales predictably with bigger models and more training. Larger versions learned faster, solved tasks with fewer environment interactions, and handled higher complexity with ease.

That kind of scalability is rare in reinforcement learning, where adding parameters often leads to instability or diminishing returns.


What’s Next?

DreamerV3 shows that general reinforcement learning is not just possible—it’s here. As researchers explore extensions like training on unsupervised data or building world models from internet videos, Dreamer’s approach could power the next generation of truly autonomous AI agents.

Whether it’s learning new games, controlling robots, or navigating real-world environments, DreamerV3 proves that dreaming just might be the fastest way to learn.

Check out the cool NewsWade YouTube video about this article!

Article derived from: Hafner, D., Pasukonis, J., Ba, J. et al. Mastering diverse control tasks through world models. Nature (2025). https://doi.org/10.1038/s41586-025-08744-2

Share this article