World Models: A Way to Predict the Future
The progress of artificial intelligence has progressed incredibly quickly over the last few years. However, one issue that plagues current architectures is the lack of a deeper understanding of how the world evolves over time. World models, which are internal simulations learned by machine learning models that predict how the environment will change given certain actions, attempt to fix this problem. Theoretically, this allows them to plan and adapt to consequences instead of just reacting to immediate inputs and outputs.
Figure 1

This racecar moves based on what its internal world model predicts the next section of the track to be.
Source: Ha, et. al (2018)
How It Works
There are three main steps in the creation of world models:
- Encoding
Raw sensory data (images, sensor readings, audio) are compressed into a latent space, essentially a multidimensional space in which similar items are placed closer together (via models like variational autoencoders or contrastive predictive coding). The motivation behind this is that it is far easier to work with abstract features rather than precise data. For instance, it is easier to think of a picture as a banana rather than many yellow pixels.
2. Learning
Once the latent space is encoded, the model learns what is known as a transition function, a mapping from the current latent state into the next one, by making predictions, comparing the prediction with what actually happens, and then adjusting its parameters. Two general methods exist: deterministic and stochastic models. Deterministic models consider the case where the next state depends only on the current state, while stochastic models use some amount of randomness to predict a probability distribution of outcomes rather than just one state.
3. Predicting
Along with predicting the next state, an intelligent model would also know if it’s actually useful, that is, it supports accurate long-term prediction. For example, for a world model in an autonomous car, it is useful to see that a pedestrian will start moving across a crosswalk, and not useful to see that a digital billboard will change advertisements. Many mechanisms are implemented to induce this such as reward functions or termination signals (win or lose).
After this, world models are able to generate trajectories (of states) inside the learned models, allowing them to improve their overall performance.
Figure 2

An example architecture for a world model made to predict future events in Doom. V is the encoding process (vision) while M is the learning process.
Source: Ha, et. al (2018)
Strengths and Limitations
World models enable high performance on small amounts of data where agents can learn faster by predicting in internal simulation rather than reacting to real life. They support long-horizon reasoning letting them think many steps ahead, producing models that can be applied to novel tasks. However, there are limitations. Errors in prediction can accumulate errors quickly, causing bias that misleads the model. Additionally, current memory horizons are limited from a few seconds to a couple of minutes. The computation and infrastructure needed for accurate and high-fidelity world models at scale has still not been achieved.
Milestones
Of course, getting to where we are now has taken a (relatively) long time , requiring many independent breakthroughs. Early progress included PlaNet, which first demonstrated that prediction in latent spaces yielded greater efficiency. This was followed by the Dreamer algorithm, which brought the first stable training methods and demonstrated strong performance in various environments, rivalling traditional reinforcement learning approaches. Then came MuZero, which was critical in showing generalization of world models. MuZero was able to learn and master games like Go, chess, and shogi without being told the rules. Finally, a more recent example of world models is Genie 3 which is capable of simulating a 3D environment at 24 frames per second for up to a minute. So, while limitations exist, it shows how world models are moving beyond abstract prediction to usable simulations.
Figure 3

An example of Genie 3 generating an environment in real time based on world models.
Source: 智趣AI甄选
Conclusion
World models represent a promising future in deep learning where systems can not only act but also anticipate. By compressing perception, predicting physics, and planning ahead, they move ever closer to what humans naturally do. The road to robust, general world models is long, but it is clear that we are chugging steadily along.
Sources
Ha, D., Schmidhuber, J. (2018). Recurrent World Models Facilitate Policy Evolution. arXiv preprint arXiv:1809.01999.
谷歌”世界模拟器”Genie3惊艳登场!一句话生成3D世界,支持分钟级超长记忆 | 智趣AI甄选. (n.d.). 智趣AI甄选. https://www.aifun.cc/en/google-releases-genie3.html
Genie 3: A new frontier for world models. (2025, August 5). Google DeepMind. https://deepmind.google/discover/blog/genie-3-a-new-frontier-for-world-models/
Kaige. (2024, July 26). DreamerV3 and Muzero. Medium. Retrieved September 23, 2025, from https://medium.com/@kaige.yang0110/dreamerv3-and-muzero-0bcce4ec998b
MuZero: Mastering Go, chess, shogi and Atari without rules. (2020, December 23). Google DeepMind. https://deepmind.google/discover/blog/muzero-mastering-go-chess-shogi-and-atari-without-rules/
Hafner, D., Pasukonis, J., Ba, J., Lillicrap, T. (2023). Mastering Diverse Domains through World Models. arXiv preprint arXiv:2301.04104.
Hafner, D., Lillicrap, T., Ba, J., Norouzi, M. (2019). Dream to Control: Learning Behaviors by Latent Imagination. arXiv preprint arXiv:1912.01603.
Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J. (2018). Learning Latent Dynamics for Planning from Pixels. arXiv preprint arXiv:1811.04551.
Ha, D., Schmidhuber, J. (2018). World Models. arXiv preprint arXiv:1803.10122.