Mon. Dec 23rd, 2024
Critiques Written By Ai Help Humans Notice Flaws

We trained a “critical writing” model to account for flaws in summaries. Human raters find flaws in summaries much more often when shown critiques of the model. Larger models are better at self-criticism, and larger scale improves critique writing rather than summary writing. This shows the promise of using AI systems to assist human supervision of AI systems in difficult tasks.