A novel, human-inspired approach to training artificial intelligence (AI) systems to identify and navigate around objects could lay the foundation for developing more advanced AI systems for exploring extreme environments and distant worlds, according to research from an interdisciplinary team at Pennsylvania State University.
During the first two years of life, children experience a relatively limited number of objects and faces, but from different perspectives and under different lighting conditions. Inspired by this developmental insight, researchers introduced a new machine learning approach that uses information about spatial location to more efficiently train AI vision systems.
They found that AI models trained with their new method outperformed the base model by up to 14.99%. They reported their findings in the May issue of the journal Computer Vision. pattern.
“Current AI approaches use large sets of randomly shuffled photos from the internet for training. In contrast, our strategy is based on developmental psychology that studies how children perceive the world,” said lead author Lizhen Zhu, a doctoral student in Penn State’s School of Information Science and Technology.
The researchers developed a new contrastive learning algorithm, a type of self-supervised learning method in which an AI system learns to detect visual patterns and identify whether two images are derived from the same base image, resulting in a positive pair. However, these algorithms often treat images of the same object taken from different perspectives as separate entities rather than as a positive pair.
According to the researchers, by taking into account environmental data such as location, the AI system can overcome these challenges and detect positive pairs regardless of camera position or rotation, lighting angle and conditions, or changes in focal length or zoom.
“We hypothesize that infants’ visual learning relies on position awareness. To generate an egocentric dataset with spatiotemporal information, we set up a virtual environment in the ThreeDWorld platform, a high-fidelity interactive 3D physical simulation environment, which allowed us to manipulate and measure the viewing camera’s position as if the child was walking around the house,” Zhu added.
The scientists created three simulated environments: House14K, House100K, and Apartment14K, where “14K” and “100K” refer to the approximate number of example images taken in each environment. They then ran the base contrast learning model and a model using the new algorithm three times in the simulation to see how well each could classify images. The team found that models trained with their algorithm performed better than the base model on a range of tasks.
For example, on the task of recognizing rooms in a virtual apartment, the enhanced model achieved an average accuracy of 99.35%, a 14.99% improvement over the base model. These new datasets are now publicly available for other scientists to use for training. Child View.
“It is always difficult for models to learn in new environments with small amounts of data. Our work is one of the first attempts to use visual content to make AI training more energy-efficient and flexible,” said James Wang, distinguished professor of information science and technology and Zhu’s advisor.
According to the scientists, the research has implications for the future development of advanced AI systems aimed at navigating and learning in new environments.
“This approach is particularly beneficial in situations where a team of autonomous robots with limited resources needs to learn how to navigate in completely unknown environments,” Wang said. “To pave the way for future applications, we plan to improve our model to make better use of spatial information and incorporate more diverse environments.”
Collaborators in Penn State’s Department of Psychology and Department of Computer Science and Engineering also contributed to the work.
For more information:
Lizhen Zhu et al., Incorporating simulated spatial context information improves the effectiveness of contrastive learning models, pattern (2024). DOI: 10.1016/j.patter.2024.100964
Quote: Children’s visual experiences could be key to better computer vision training (May 31, 2024) Retrieved May 31, 2024 from https://techxplore.com/news/2024-05-children-visual-key-vision.html
This document is subject to copyright. It may not be reproduced without written permission, except for fair dealing for the purposes of personal study or research. The content is provided for informational purposes only.