General Discussion
In reply to the discussion: If You Have Investments in Anything Related to AI, Start Unloading Them. [View all]Metaphorical
(2,581 posts)Large Language Models (LLMs) can be thought of as giant dialog machines - break a conversation (website, PDF, patterns in an image, etc.) down into individual words, with arrows connecting each word to the next. Put it into the model. Repeat over hundreds of millions of documents. Eventually, you end up with a giant network graph of overlapping conversations. When you enter a prompt, this selects the conversation(s) closest to the one you want that has more details.
There are a few big problems with this - the first being that the shorter the prompt, the more likely the conversation will go down the wrong path and generate something that is entirely nonsensical. The second is that there is no original thought there, only what has been fed into the latent space (the graph), beyond any new concepts coming in from the prompt. This can give the illusion of intelligence. The final is that because the training is such a time-consuming process, most LLMs use a special kind of memory device called a context, and there are limits computationally to how big such contexts can be before they become ineffective.
A world model, on the other hand, is more like a traditional database - it tells the system some information about the world - physical infrastructure, where things are located, changes in organisations, and so forth. It's also called a digital twin, changes in state, etc.. This is preferable than an LLM by itself (though LLMs can us world models) because the WM can be updated dynamically quickly.
A number of key data scientists, including ex-Meta chief scientist Yann LeCun, Gary Marcus, and a number of others (myself included) have all been saying for some time that you cannot have a reasoning system without a world model as something to ground it, and this is beginning to percolate through the AI industry. The Tech Bros aren't particularly happy about it, because such WMs are comparatively far more efficient and work in ways that tend to reduce the importance of their components, but I think that on the tech side as people work with the tech, they're coming to the same conclusion.