To produce a reward model for reinforcement learning, we wanted to gather comparison info, which consisted of two or even more product responses rated by quality. To gather this information, we took convers… Read More