May 6, 2024 — Fake robocalls during the election. Celebrity voices used for Taka products. The photo was altered to mislead the public. The trustworthiness of AI-generated content, from social media posts to celebrity testimonials, is under fire. I have a burning question here. How can we combat harmful and unwanted content without undermining innovation?
Computer scientists at the University of California, San Diego Jacobs School of Engineering have proposed a new solution that optimizes the vast potential of deep generative models while mitigating the generation of biased or harmful content.
The 2024 IEEE Secure and Trustworthy Machine Learning paper states: Editing data from conditional generative models, Researchers have introduced a framework that prevents text-to-image and speech synthesis models from producing undesirable output. Their innovative approach recently won the Best Paper Award at the IEEE Conference at the University of Toronto.
“Modern deep generative models often produce undesirable outputs such as offensive text, malicious images, and fabricated audio, and there is no reliable way to control them. “We’re talking about how to technically prevent this from happening,” said Zhifeng Kong, a doctoral student in the School of Computer Science and Engineering at the University of California, San Diego, and lead author of the paper.
“The main contribution of this research is to formalize how to think about this problem and how to properly frame it so that it can be solved,” said Kamalika Chaudhry, professor of computer science at the University of California, San Diego. Told.
A new way to eradicate harmful content
Traditional mitigation methods have taken one of two approaches. The first method is to retrain the model from scratch using a training set that excludes all unwanted samples. Another method is to apply a classifier after content generation that filters out unwanted output or edits the output.
These solutions have certain limitations for most modern large-scale models. In addition to being prohibitively expensive, requiring millions of dollars to retrain industry-scale models from scratch, these mitigation methods are computationally intensive and cannot be used after a third party obtains the source code. There is no way to control whether possible filters or editing tools are implemented. Moreover, you may not even be able to solve the problem. You may see undesirable output, such as images with artifacts, even though they are not present in the training data.
Chaudhuri and Kong aim to overcome each of these hurdles while mitigating unwanted content. They were inspired to design a formal statistical machine learning framework that is effective, universal, and computationally efficient while maintaining high generation quality.
Specifically, the team proposed post-editing the weights of a pre-trained model. This method is called data editing. They introduced a set of techniques that edit specific conditional statements, or user input, that lead to undesirable content with a statistically high probability.
Previous research in data editing has focused on unconditional generative models. These studies edited the generated samples to consider issues in the output space. The same technique is typically cumbersome to apply to conditional generative models that learn an infinite number of distributions.
Chaudhuri and Kong overcame this challenge by editing in conditional space rather than output space. In the text-to-image model, I edited the prompt. In the text-to-speech model, the audio was edited. In other words, it extinguished the spark before it could cause any toxic output.
For example, in a text-to-speech context, you can edit a specific person’s voice, such as a celebrity’s voice. The model then generates a generic voice instead of a celebrity’s voice, making it much more difficult to get words out of someone’s mouth.
The team’s method required only loading a small portion of the dataset, making editing the data computationally light. It also provided better editing quality and robustness than baseline methods and maintained similar generation quality to pre-trained models.
The researchers note that this is a small study that provides an approach that can be applied to most types of generative models.
“If this is scaled up and applied to larger, more modern models, it could ultimately have a broader impact and pave the way for more secure generative models,” Chaudhuri said. .
This research was supported by the National Science Foundation (1804829) and the Army Research Office MURI award (W911NF2110317).
Learn more about artificial intelligence research and education at the University of California, San Diego here.
sauce: Kimberly Clementi, UCSD