We are considering the use of LLM to address these challenges. Large-scale language models such as GPT-4 can understand and generate natural language and can be applied to content moderation. The model can make moderation decisions based on provided policy guidelines.
This system reduces the process of developing and customizing content policies from months to hours.
- Once policy guidelines are created, policy experts can create a golden data set by identifying a small number of examples and assigning labels according to policy.
- GPT-4 then reads the policy and assigns a label to the same dataset without checking the answer.
- By examining the differences between GPT-4’s judgments and human judgments, policy experts can give GPT-4 the reasons behind its label, analyze ambiguities in policy definitions, resolve confusion, and respond accordingly. You can request further clarification of the policy. Repeat steps 2 and 3 until you are satisfied with the quality of the policy.
This iterative process produces refined content policies that are translated into classifiers, enabling large-scale policy deployment and content moderation.
Optionally, you can use GPT-4’s predictions to fine-tune much smaller models to handle large amounts of data at scale.