The findings come as AI tools are increasingly being touted on pedophile forums as a way to create uncensored sexual depictions of children, child safety researchers said. Given that AI images often need to be trained on only a few photos to accurately recreate them, the presence of more than 1000 child abuse photos in the training data is an alarming threat to the image generator. Experts said the feature could be delivered.
The photo is basically [AI] “We model the benefits of creating child exploitation content in a way that resembles real-world child exploitation,” said David Thiel, author of the report and chief engineer at Stanford University’s Internet Observatory.
A LAION representative said they had temporarily removed the LAION-5B dataset “to ensure it is safe before re-publishing it.”
In recent years, new AI tools called diffusion models have emerged that allow anyone to create compelling images by simply entering a short description of what they want to see. These models are fed billions of images taken from the internet and create unique pictures by mimicking visual patterns.
These AI image generators have been praised for their ability to create highly realistic photos, but the tools require less technical knowledge than traditional methods such as pasting a child, making them more susceptible to pedophilia. The speed and scale at which people can create new explicit images has also increased. Create a “deepfake” with a face on an adult’s body.
Thiel’s research shows that our understanding of how AI tools generate child abuse content is evolving. Previously, AI tools were thought to combine two concepts such as “children” and “explicit content” to create disturbing images. The findings suggest that real images were used to improve the abusive fake’s AI output, making it appear more authentic.
The child abuse photos are just a small portion of the LAION-5B database, which contains billions of images and is likely inadvertently created by the database’s creators as they retrieve images from social media, adult video sites, and the open internet. Researchers claim that it was added by
But the fact that it included illegal images at all highlights just how little we know about the datasets at the heart of our most powerful AI tools. Critics worry that the biased depictions and explicit content in AI image databases will intangibly shape what they produce.
Thiel added that there are several ways to regulate this issue. Protocols could be put in place to screen and remove child abuse content and non-consensual pornography from databases. You can increase the transparency of your training data set and include information about its contents. Image models using datasets containing child abuse content can be taught to “forget” how to create explicit images.
Researchers conducted scans looking for “hashes” of abusive images. This is the corresponding bit of code that identifies the image, which is kept on an online watch list by the National Center for Missing and Exploited Children and the Canadian Center for Child Protection.
Thiel said the photos are currently being removed from the training database.