It’s no secret that foundational models have transformed AI in the digital world. Large-scale language models (LLMs) such as ChatGPT, LLaMA, and Bard have revolutionized language AI. OpenAI’s GPT model is not the only large-scale language model available, but it can be used to take text and image inputs, and even for some tasks that require complex problem solving or advanced reasoning. It has achieved the most mainstream recognition for providing such responses.
The viral and widespread adoption of ChatGPT has significantly shaped how society understands this new moment in artificial intelligence.
The next advancement that will define AI for generations is robotics. Building AI-powered robots that can learn how to interact with the physical world will power all forms of repetitive tasks in fields from logistics, transportation, and manufacturing to retail, agriculture, and even healthcare. It also delivers as many efficiencies in the physical world as we have seen in the digital world over the past few decades.
Although robotics has a unique set of problems to solve compared to languages, there are similarities in the core fundamental concepts. And some of AI’s brightest minds have made great strides in building the “GPT for robotics.”
What makes GPT successful?
To understand how to build a “GPT for robotics”, first look at the core pillars that have enabled the success of LLMs such as GPT.
Basic model approach
GPT is an AI model trained on a vast and diverse dataset. Engineers have traditionally collected data and trained specific AI for specific problems. Next, new data must be collected in order to solve a different solution. Another problem? Again, new data. Currently, with the fundamental model approach, exactly the opposite is happening.
Instead of building niche AI for every use case, it can be used universally. And that one very common model is more successful than any specialized model. The base model’s AI performs better on one particular task. Because you are learning additional skills from having to perform well across a variety of tasks, you can leverage learning from other tasks and generalize more to new tasks.
Training on large, unique, high-quality datasets
To achieve generalized AI, we first need access to vast amounts of diverse data. OpenAI has acquired the real-world data needed to train GPT models reasonably efficiently. GPT has been trained on data collected from across the internet using large and diverse datasets, including books, news articles, social media posts, code, and more.
Building AI-powered robots that can learn how to interact with the physical world will enhance all forms of repetitive tasks.
It’s not just the size of the dataset that matters. Careful selection of high-quality, high-value data also plays a major role. The GPT model achieved unprecedented performance because its high-quality dataset is primarily informed by the tasks users are interested in and the answers that are most useful.
The role of reinforcement learning (RL)
OpenAI employs reinforcement learning from human feedback (RLHF) to tailor the model’s responses to human preferences, such as those considered beneficial to the user. We need more than pure supervised learning (SL) because SL can only approach a problem using a clear pattern or set of examples. LLM requires the AI to achieve a goal without a unique correct answer. Please enter RLHF.
RLHF allows the algorithm to progress toward a goal through trial and error while humans accept correct answers (high reward) or reject incorrect answers (low reward). The AI finds the reward function that best describes human preferences and uses RL to learn how to get there. By learning from human feedback, ChatGPT can provide responses that reflect or exceed human-level capabilities.
The next frontier for basic models is in robotics
The same core technologies that enable GPTs to see, think, and even speak also enable machines to see, think, and even act. Utilizing foundational models, robots can understand their physical environment, make informed decisions, and adapt their behavior to changing conditions.
“GPT for Robotics” is built in the same way as GPT, laying the foundation for a revolution that will once again redefine AI as we know it.
Basic model approach
By taking a grounded model approach, you can also build a single AI that works across multiple tasks in the physical world. A few years ago, experts advised creating specialized AI for robots that pick and pack groceries. This is different from models that sort various electrical parts or unload pallets from trucks.
This paradigm shift to foundational models makes AI better suited for edge-case scenarios that frequently exist in unstructured real-world environments and where narrow training can cause models to stumble. You will be able to respond. You will be more successful if you build one general-purpose AI that handles all these scenarios. Everything will need to be trained to achieve the human-level autonomy that previous generations of robots lacked.
Training on large, unique, high-quality datasets
It is extremely difficult to teach a robot to learn what actions lead to success and what actions lead to failure. It requires extensive, high-quality data based on real-world physical interactions. Single laboratory settings or video examples are not reliable or sufficiently robust sources of information (e.g., YouTube videos cannot translate details of physical interactions, and the scope of academic datasets tend to be limited).
Unlike AI for language or image processing, there are no existing datasets that represent how robots should interact with the physical world. Therefore, large-scale, high-quality datasets become more complex challenges to solve with robotics, and deploying large numbers of robots in production environments is the only way to build diverse datasets. .
The role of reinforcement learning
Similar to answering text questions with human-level ability, robot control and manipulation requires the agent to move toward a goal for which there is no unique correct answer (e.g., “This red What is the best way to pick onions?”). . Again, we need more than pure supervised learning.
Success in robotics requires robots that perform deep reinforcement learning (deep RL). This autonomous, self-learning approach combines RL and deep neural networks to achieve higher levels of performance. AI automatically adapts its learning strategy and continues to fine-tune its skills as it experiences new scenarios.
Challenging and explosive growth is coming
In recent years, the world’s best AI and robotics experts have laid the technical and commercial foundations for a robot-based model revolution that will redefine the future of artificial intelligence.
Although these AI models are built similarly to GPT, achieving human-level autonomy in the physical world is a different scientific challenge for two reasons:
- Building AI-based products that can adapt to a variety of real-world settings requires an impressive set of complex physical requirements. AI will need to adapt to different hardware applications, as it is questionable whether one piece of hardware will work across different industries (such as logistics, transportation, manufacturing, retail, agriculture, and healthcare) and activities within each sector. there is.
- Warehouses and distribution centers are ideal learning environments for AI models in the physical world. With hundreds of thousands or even millions of different stock-keeping units (SKUs) flowing through any facility at any given time, there is a large, unique, high-quality dataset needed to train the GPT for Robotics. Generally provided.
AI robotics “GPT moment” is coming soon
The growth trajectory of robot basic models is accelerating at a very fast pace. Robotic applications, especially for tasks that require precise manipulation of objects, are already being applied in real-world operational environments, and 2024 will see an exponentially larger scale deployment of commercially viable robot applications. It will be.
Chen has published more than 30 academic papers published in the world’s top AI and machine learning journals.