Improving mathematical reasoning with process monitoring
We aim to improve mathematical problem solving by rewarding each correct step of reasoning (“process monitoring”) rather than simply rewarding the final correct answer (“outcome monitoring”). We trained a model…