DeepMind presents RT-2: robots that see, learn and act

July 30 2023

628644ed1fd288f73df6d946 IMG 0831 1 scaled

Robotica, Technology

The AI model developed by DeepMind, which combines vision and language to control machines, will open new horizons in robotics.

In a bright environment, full of monitors and technological equipment, a robot stands as the protagonist. Its metal structure reflects light, but it is in its "eyes" that the true magic is hidden. These eyes, powered by DeepMind's RT-2 model, are capable of seeing, interpreting and acting.

As the robot moves gracefully, the scientists around it scrutinize its every move. It's not just a piece of metal and circuitry, but the embodiment of an intelligence that unites the vast world of the web with tangible reality.

The evolution of RT-2

Robotics has come a long way in recent years, but DeepMind it just took the game to a whole new level. Illustrated in a paper just released it arrives RT-2. Things? It is a vision-language-action (VLA) model that not only learns from web data, but also from robotic data, translating this knowledge into generalized instructions for robotic control.

In an era where technology advances by leaps and bounds, the RT-2 represents a significant leap, promising to revolutionize not only the field of robotics, but also the way we live and work every day. But what does this mean in practice?

DeepMind RT-2, from vision to action

The models of high capacity vision-language (VLM) they are trained on large datasets, and this also makes them extraordinarily good at recognizing visual or linguistic patterns (operating, for example, in different languages). But imagine being able to make robots do what these models do. Indeed, stop imagining it: DeepMind is making it possible with RT-2.

Robotic Transformers 1 (RT-1) it was a marvel in its own right, but RT-2 goes further, displaying enhanced generalization capabilities and semantic and visual understanding that goes beyond the robotic data it has been exposed to.

Chain reasoning

One of the most fascinating aspects of RT-2 is its chain reasoning ability. He can decide what object could be used as a makeshift hammer or what kind of drink is best for a tired person. This deep reasoning ability could revolutionize the way we interact with robots.

And worst of all, you could still ask a robot to prepare you a good coffee to regain some clarity.

But how does DeepMind RT-2 control a robot?

The answer lies in how he was trained. In fact, it uses a representation not unlike the language tokens that are exploited by templates like ChatGPT.

RT-2 demonstrated amazing emergent capabilities, such as symbol understanding, reasoning and human recognition. Skills that currently show an improvement of more than 3x compared to previous models.

With RT-2, DeepMind not only showed that vision-language models can be transformed into powerful vision-language-action models, but it also opened the door to a future in which robots can reason, solve problems and interpret information to perform a wide range of tasks in the real world.

And now?

In a world where artificial intelligence and robotics will be increasingly central, RT-2 shows us that the next evolution will not be purely technical, but "perceptual". Machines will understand and respond to our needs in ways we never imagined.

If this is just the beginning, who knows what the future holds.

Gianluca Riccio, creative director of Melancia adv, copywriter and journalist. He is part of the Italian Institute for the Future, World Future Society and H+. Since 2006 he has directed Futuroprossimo.it, the Italian Futurology resource.

To report research, discoveries and inventions, contact the editorial team! Follow Futuro Prossimo on Whatsapp: exclusive news and updates (free).

FP on Fatto Quotidiano
Alberto Robiati and Gianluca Riccio guide readers through scenarios of the future: the opportunities, risks and possibilities we have to create a possible tomorrow.

On the same theme:

The last

DeepMind presents RT-2: robots that see, learn and act

Robotica, Technology

Share

The evolution of RT-2

DeepMind RT-2, from vision to action

Chain reasoning

But how does DeepMind RT-2 control a robot?

And now?

Mysterious Super Powerful AI Appears and Immediately Disappears: Is this a GPT-5 Test?

Autonomous weapons, the Vienna conference: "act now to protect humanity"

Osteoarthritis, AI blood test beats X-rays and predicts it 10 years earlier

Bathers in danger? TY-3R, the lifesaving drone flies and swims to save them

MEAPLANT: the Italian invention for growing on walls and roofs

Chinese nuclear power plants in the South China Sea: America's nightmare

Mysterious Super Powerful AI Appears and Immediately Disappears: Is this a GPT-5 Test?

Vaulted Deep: Inject organic waste into the ground to capture CO2