Mon. Dec 23rd, 2024
Building An Early Warning System For Biological Threat Generation Using

Note: As part of our service Preparation framework, we are investing in developing improved methods for assessing safety risks using AI. We believe these efforts would benefit from broader input, and sharing methodologies is also valuable to the AI ​​risk research community. To this end, today we will highlight some of the early research focused on biological risks. We look forward to hearing from our community and sharing more of our ongoing research.

background. As OpenAI and other model developers build more capable AI systems, the potential for both beneficial and harmful uses of AI increases. One potentially harmful use highlighted by researchers and policymakers is the ability of AI systems to assist malicious actors in creating biological threats, e.g. white house 2023, loveless 2022, Sandblink 2023). In the hypothetical example being discussed, a malicious attacker could use a model with advanced capabilities to develop a step-by-step protocol, troubleshoot a wet lab procedure, or If given access to such tools, they could autonomously perform steps in the biothreat creation process. cloud lab (look Carter et al., 2023). However, assessing the feasibility of such a hypothetical example was limited by insufficient evaluation and data.

Following a recently shared preparedness framework, we are developing a methodology to empirically assess this type of risk that will help understand both current and future conditions. Here, we detail a new assessment that may serve as one potential “tripwire” to signal caution and the need for further testing regarding the potential for biological misuse. The purpose of this evaluation is to determine whether the model can significantly increase a malicious attacker’s access to dangerous information regarding the creation of biological threats, compared to a baseline of existing resources (e.g., the Internet). is to measure whether or not.

To evaluate this, we conducted a study with 100 human participants. The participants were (a) 50 biology professionals with Ph.D.s and professional wet lab experience, and (b) students who had taken at least one college-level biology course. Consists of 50 level participants. Each group of participants was randomly assigned to either a control group that had access to the Internet only, or a treatment group that had access to the Internet plus her GPT-4. Each participant was then asked to complete a series of tasks covering aspects of the end-to-end process of creating a biological threat.[^1] To our knowledge, this is the largest human evaluation of the impact of AI on biorisk information to date.

Investigation result. In our study, participants with access to GPT-4 improved their performance on five metrics (accuracy, completeness, innovation, time required, and self-assessment difficulty) and the biological threat creation process. It was evaluated across stages (conception, acquisition, magnification, and formulation). , release). We observed a slight increase in accuracy and completeness for users who had access to the language model. Specifically, on a 10-point scale measuring response accuracy, average scores increased by 0.88 for experts and 0.25 for students compared to the Internet-only baseline, with a similar increase in completeness ( 0.82 at home and 0.41 at students). However, the effect sizes obtained were not large enough to be statistically significant, and our study highlights the need for further research into what performance thresholds indicate a significant increase in risk. Additionally, we note that access to information alone is not sufficient to create a biological threat, and this assessment does not test the success of the physical construction of a threat.

The evaluation procedure and results are detailed below. We also discuss some methodological insights related to the capability elicitation and security considerations required to perform this type of evaluation at scale using frontier models. We also discuss the limits of statistical significance as an effective way to measure model risk and the importance of new research in assessing the significance of model evaluation results.