Mon. Dec 23rd, 2024
Google Deepmind's Robotics Focuses On General Purpose Robots, Generative Ai,

Image credits: deep mind

[A version of this piece first appeared in TechCrunch’s robotics newsletter, Actuator. Subscribe here.]

Earlier this month, Google’s DeepMind team debuted Open X-Embodiment, a database of robot capabilities created in collaboration with 33 research institutions. The researchers involved compared this system to ImageNet. ImageNet is a groundbreaking database he founded in 2009 and currently contains over 14 million images.

“Just as ImageNet advanced computer vision research, we believe Open X-Embodiment can do the same to advance robotics,” researchers Quan Vuong and Pannag Sanketi said at the time. Stated. “Building a dataset of diverse robot demonstrations will help us develop generalist models that can control different types of robots, follow diverse instructions, perform basic reasoning about complex tasks, and generalize effectively. It’s an important step to training.”

At the time of publication, Open X-Embodiment included over 500 skills and 150,000 tasks collected from 22 robot embodiments. Although it doesn’t match ImageNet’s numbers, it’s a good start. DeepMind then trained his RT-1-X model based on that data and used it to train robots in other labs, with 50% success compared to an in-house method developed by the team. reported the rate.

As I’ve probably repeated dozens of times on this page, these are really exciting times for robotics learning. I’ve talked to so many teams that are approaching the problem from different angles and becoming increasingly effective. The reign of custom-made robots is not over, but it feels like we are getting a glimpse of a world where the potential for general-purpose robots is clear.

Simulation, along with AI (including generative variety), will definitely be a big part of the equation. It still feels like some companies are putting the cart before the horse when it comes to building hardware for common tasks, but who knows in a few years?

Vincent Vanhoek is someone I’ve been trying to track down for a bit. Even if I was available, he wasn’t. We will ship overnight. Thankfully, I was finally able to do it this past weekend.

Vanhoucke is new to the role of head of robotics at Google DeepMind, having only assumed the role in May. However, he has been active with the company for over 16 years of his life, and most recently he has served as a distinguished scientist at Google AI Robotics. All in all, he could be the perfect person to talk about his Google’s robotization ambitions and how it got here.

Image credits: Google

At what point in DeepMind’s history was the robotics team developed?

I wasn’t originally a DeepMind person. I was at Google Research. We recently merged with the DeepMind effort. So, in a sense, my involvement with DeepMind is very recent. But Google DeepMind has a longer history of robotics research. It started with a growing view that perceptual technology was becoming very good.

A lot of the capabilities of computer vision, audio processing, and everything else were just turning the corner and becoming almost human-level. We start asking ourselves, “Assuming this situation continues for the next few years, what will be the outcome?” One obvious result is that introducing robots into real-world environments suddenly becomes a realistic possibility. Being able to actually evolve and perform tasks in everyday environments was entirely predicated on having really, really strong cognition. I initially worked on general AI and computer vision. Previously, I also worked on voice recognition. I saw the writing on the wall and decided to pivot to using robotics as the next step in my research.

My understanding is that many of the Everyday Robots team ended up joining this team. Google’s history with robotics goes back even further.It’s been 10 years since Alphabet made such an acquisition. [Boston Dynamics, etc.]. Google’s existing robotics team appears to include many people from these companies.

A significant portion of the team was created through these acquisitions. That was before my time. I was heavily involved in computer vision and speech recognition, and we still have a lot of people like that at our company. Increasingly, we have come to the conclusion that the entire robotics problem is subsumed under the general AI problem. Practical resolution of the intelligence part was a key element in enabling meaningful processes in real-world robotics. We have shifted a lot of our efforts to solving the problem that perception, understanding, and control in the general AI context are important problems to solve.

It seems like a lot of the work Everyday Robots was doing was around general AI or generative AI. Will the work my team was doing be carried over to the DeepMind robotics team?

We would like to say that we have been collaborating with Everyday Robot for seven years now. We have a very deep connection even though he was on two separate teams. In fact, one of the things that made us start thinking seriously about robotics back then was a little Skunkworks project-like collaboration with the Everyday Robots team. There, by chance, they had some robotic arms rolling around them. Canceled. They were the weapons of a generation leading to a new generation, and they just lay there, doing nothing.

We thought it would be fun to pick up these arms and put them in a room so they could practice and learn how to grab things. The very concept of learning grasp problems was not in the zeitgeist at the time. The idea of ​​using machine learning and perception as a way to control robotic grasping has not been considered before. Give a reward if the weapon succeeds, and a thumbs down if it fails.

For the first time, we used machine learning to essentially solve this generalized grasp problem using machine learning and AI. It was a lightbulb moment at the time. There really was something new there. This prompted Everyday Robots to begin both studies focused on machine learning as a way to control robots. And on the research side, we’re pushing more robotics as an interesting problem to apply all of the deep learning AI techniques that have worked so well to other fields.

Image credits: deep mind

Has Everyday Robots been absorbed into your team?

Part of the team was absorbed into my team. We inherited their robots and still use them today. To this day, we continue to develop the technology that they actually pioneered and worked on. The whole thrust continues to live on with a slightly different focus than the team originally envisioned. We focus more on the intelligence part than building robots.

You mentioned that your team has moved to Alphabet X’s offices. Is there something deeper when it comes to collaboration and resource sharing between teams?

It’s a very practical decision. There is good Wi-Fi, good power and plenty of space.

We hope all of our buildings have good Wi-Fi.

I hope so, right? But it was a very pedestrian decision for us to move here. A lot of the decision was that there’s a good cafe here. At my previous office, the food wasn’t very good and people were starting to complain. There’s no hidden agenda there. We like to work closely with the rest of the members of X. I think there are a lot of synergies there. They have some really talented roboticists working on a number of projects. We have a collaboration with Intrinsic that we would like to nurture. It makes a lot of sense that we’re here, and it’s a beautiful building.

There’s some overlap with Intrinsic in terms of what they’re doing on their platform, which is no-code robotics and robot learning. These overlap with general generative AI.

It’s interesting to see how robotics has evolved as every inch of it has been custom-built and has completely different expertise and skills. A big part of the journey we’re on is trying to achieve general-purpose robotics, whether it’s applied in an industrial environment or a home environment. The principles behind it are very similar, driven by a very powerful AI core. We are pushing the envelope as we look for ways to support the widest possible range of applications. It’s new and exciting. It’s very greenfield. There is so much to explore in space.

I like to ask people how far away they think we are from what could reasonably be called general purpose robotics.

There are some nuances in the definition of a general-purpose robot. We focus on generic methods. Some methods can be applied to both industrial robots, domestic robots, or sidewalk robots with all of their different embodiments and form factors. We do not assume that there is a one-size-fits-all implementation that will do it all any more than we would have an implementation that is specifically customized to the problem. it’s okay. Specifically, we can quickly fine-tune it to solve the problems our customers are having. The big question is, will universal robots become a reality? It’s a question many people have hypothesized about if and when it will happen.

So far, more success has been achieved with custom-made robots. In a sense, I don’t think the technology that makes it possible to create more general-purpose robots yet exists. Whether business mode will get us there is a very good question. I don’t think I can answer that question until I have more confidence in the technology behind it. That’s what we’re driving now. We are seeing more and more signs of life that a very general approach that is independent of specific implementations is reasonable. The latest thing we’ve done is this RTX project. We went to a lot of academic labs, I think we have 30 different partners right now, and asked them to look at their work and the data they collected. Let’s bring this into a common repository of data, train a large model on it, and see what happens.

Image credits: deep mind

What role will generative AI play in robotics?

I think it will be very central. There was this big language model revolution. Everyone started asking if we could use many language models for robots, but I think that was very superficial. “Let’s take the trend of the day and see what we can do with it,” which turned out to be very profound. The reason is that, if you think about it, language models aren’t really about language. They are about common sense reasoning and understanding the everyday world. So if a large language model knows you’re looking for coffee, it can probably find it in your kitchen cupboard or on your table.

It makes sense to put a coffee cup on the table. It’s nonsense to put a table on top of a coffee cup. It’s a simple fact that you don’t think about much because it’s completely obvious to you. It’s always been very difficult to convey that to a reified system. Knowledge is very difficult to encode, but these large language models have that knowledge and encode it in a way that is very accessible and usable. Therefore, we were able to apply this common sense reasoning to the robot’s planning. We were able to apply this to robot interaction, manipulation, and human-robot interaction. The existence of an agent that has this common sense and is capable of reasoning in a simulated environment along with perception is really at the heart of the robotics problem.

Different tasks Gato learned how to complete.

Simulation will probably be a big part of collecting data for analysis.

Yeah. That’s one of the elements of this. The challenge with simulation is the need to bridge the gap between simulation and reality. A simulation is an approximation of reality. It can be very difficult to create something that very accurately reflects reality. The physics of the simulator must be good. The visual rendering of reality in that simulation must be very good. In fact, this is also an area where generative AI is starting to emerge. I can imagine that instead of actually running a physics simulator, you could just generate it using image generation or some kind of generative model.

Tye Brady recently said that Amazon is using simulation to generate packages.

That makes a lot of sense. In the future, I think we can imagine not only creating assets, but also creating the future. Imagine what happens when a robot performs an action. Then confirm that it is actually working as intended and use it as a means of future planning. It’s like a robot dreaming using a generative model instead of having to do it in the real world.