Reinforcement learning from human feedback typically optimizes against a reward model that has been trained to predict human preferences. Since the reward model is an imperfect proxy, overoptimizing its value can hinder the performance of the ground truth, according to Goodhart’s law. Although this effect has been frequently observed, it has not been carefully measured because collecting data on human preferences is expensive. In this work, we use a synthetic setup in which a fixed “gold standard” reward model plays the role of a human and provides the labels used to train a surrogate reward model. We study how the score of a gold reward model changes when we optimize against a proxy reward model using either reinforcement learning or best-of-n sampling. We find that this relationship follows different functional forms depending on the optimization method, and in both cases its coefficient scales smoothly with the number of parameters in the reward model. We also investigate the influence of the size of the reward model dataset, the number of parameters in the reward model and policy, and the factor of the KL penalty added to the reward in the reinforcement learning setup on this relationship. We explore the implications of these empirical results for theoretical considerations in AI alignment.